Protein–ligand docking

Protein–ligand docking is a molecular modelling technique. The goal of protein–ligand docking is to predict the position and orientation of a ligand (a small molecule) when it is bound to a protein receptor or enzyme. Pharmaceutical research employs docking techniques for a variety of purposes, most notably in the virtual screening of large databases of available chemicals in order to select likely drug candidates. There has been rapid development in computational ability to determine protein structure with programs such as AlphaFold, and the demand for the corresponding protein-ligand docking predictions is driving implementation of software that can find accurate models. Once the protein folding can be predicted accurately along with how the ligands of various structures will bind to the protein, the ability for drug development to progress at a much faster rate becomes possible.

History
Computer-aided drug design (CADD) was introduced in the 1980s in order to screen for novel drugs. The underlying premise is that by parsing an extremely large data set for chemical compounds which may be viable to make a certain pharmaceutical, researchers were able to minimize the amount of novel without testing them all experimentally. The ability to accurately predict target binding sites is a new phenomena, however, which expands on the ability to simply parse a data set of chemical compounds; now due to increasing computational capability, it is possible to inspect the actual geometries of the protein-ligand binding site in vitro. Hardware advancements in computation have made these structure-oriented methods of drug discovery the next frontier in the 21st century biopharma. In order to finely train the new algorithms to capture the accurate geometry of the protein-ligand binding capability, an experimentally gathered dataset can be used by applying techniques such as X-ray crystallography or NMR spectroscopy.

Available software
Several protein–ligand docking software applications that calculate the site, geometry and energy of small molecules or peptides interacting with proteins are available, such as AutoDock and AutoDock Vina, rDock, FlexAID, Molecular Operating Environment, and Glide. Peptides are a highly flexible type of ligand that has proven to be a difficult type of structure to predict in protein bonding programs. DockThor implements up to 40 rotatable bonds to help model these complex physicochemical bindings at the target site. Root Mean Square Deviation is the standard method of evaluating various software performance within the binding mode of the protein-ligand structure. Specifically, it is the root-mean-squared deviation between the software-predicted docking pose of the ligand and the experimental binding mode. The RMSD measurement is computed for all of the computer-generated poses of the possible bindings between the protein and ligand. The program does not always perfectly predict the actual physical pose when evaluating the RMSD between candidates. In order to then evaluate the strength of a computer algorithm to predict protein docking, the ranking of RMSD among computer-generated candidates must be examined to determine whether the experimental pose actually was generated but not selected.

Protein flexibility
Computational capacity has increased dramatically over the last two decades making possible the use of more sophisticated and computationally intensive methods in computer-assisted drug design. However, dealing with receptor flexibility in docking methodologies is still a thorny issue. The main reason behind this difficulty is the large number of degrees of freedom that have to be considered in this kind of calculations. However, in most of the cases, neglecting it leads to poor docking results in terms of binding pose prediction in real-world settings. Using coarse grained protein models to overcome this problem seems to be a promising approach. Coarse-grained models are often implemented in the case of protein-peptide docking, as they frequently involve large-scale conformation transitions of the protein receptor.

AutoDock is one of the computational tools frequently used to model the interactions between proteins and ligands during the drug discovery process. Although the classically used algorithms to search for effective poses often assume the receptor proteins to be rigid while the ligand is moderately flexible, newer approaches are implementing models with limited receptor flexibility as well. AutoDockFR is a newer model that is able to simulate this partial flexibility within the receptor protein by letting side-chains of the protein to take various poses among their conformational space. This allows the algorithm to explore a vastly larger space of energetically relevant poses for each ligand tested.

In order to simplify the complexity of the search space for prediction algorithms, various hypotheses have been tested. One such hypothesis is that side-chain conformational changes that contain more atoms and rotations of greater magnitude are actually less likely to occur than the smaller rotations due to the energy barriers that arise. Steric hindrance and rotational energy cost that are introduced with these larger changes made it less likely that they were included in the actual protein-ligand pose. Findings such as these can make it easier for scientists to develop heuristics that can lower the complexity of the search space and improve the algorithms.

Implementations
The original method of testing the molecular models of various binding sites was introduced in the 1980s where the receptor was estimated in a rough manner by spheres which occupied the surface clefts. The ligand was approximated by more spheres which would occupy the relevant volume. Then a search was executed for maximizing the steric overlap between the spheres of both the binding and receptor spheres.

However, the new scoring functions to evaluate molecular dynamics and protein-ligand docking potential are implementing supervised molecular dynamic approach. Essentially, the simulations are sequences of small time windows by which the distance between the center of mass of the ligand and protein is computed. The distance values are updated at regular frequencies and then regressively fitted linearly. When the slope is negative, the ligand is getting nearer to the binding site, and vice versa. When the ligand is departing from the binding site, the tree of possibilities is pruned right at that moment so as to avoid unnecessary computation. The advantage of this method is speed without the introduction of any energetic bias which could foul the model from accurate mappings to the experimental truths.