Searching the conformational space for docking

In molecular modelling, docking is a method which predicts the preferred orientation of one molecule to another when bound together in a stable complex. In the case of protein docking, the search space consists of all possible orientations of the protein with respect to the ligand. Flexible docking in addition considers all possible conformations of the protein paired with all possible conformations of the ligand.

With present computing resources, it is impossible to exhaustively explore these search spaces; instead, there are many strategies which attempt to sample the search space with optimal efficiency. Most docking programs in use account for a flexible ligand, and several attempt to model a flexible protein receptor. Each "snapshot" of the pair is referred to as a pose.

Molecular dynamics (MD) simulations
In this approach, proteins are typically held rigid, and the ligand is allowed to freely explore their conformational space. The generated conformations are then docked successively into the protein, and an MD simulation consisting of a simulated annealing protocol is performed. This is usually supplemented with short MD energy minimization steps, and the energies determined from the MD runs are used for ranking the overall scoring. Although this is a computer-expensive method (involving potentially hundreds of MD runs), it has some advantages: for example, no specialized energy/scoring functions are required. MD force-fields can typically be used to find poses that are reasonable and can be compared with experimental structures.

The Distance Constrained Essential Dynamics method (DCED) has been used to generate multiple structures for docking, called eigenstructures. This approach, although avoiding most of the costly MD calculations, can capture the essential motions involved in a flexible receptor, representing a form of coarse-grained dynamics.

Shape-complementarity methods
The most common technique used in many docking programs, shape-complementarity methods focus on the match between the receptor and the ligand in order to find an optimal pose. Programs include DOCK, FRED, GLIDE, SURFLEX, eHiTS and many more. Most methods describe the molecules in terms of a finite number of descriptors that include structural complementarity and binding complementarity. Structural complementarity is mostly a geometric description of the molecules, including solvent-accessible surface area, overall shape and geometric constraints between atoms in the protein and ligand. Binding complementarity takes into account features like hydrogen bonding interactions, hydrophobic contacts and van der Waals interactions to describe how well a particular ligand will bind to the protein. Both kinds of descriptors are conveniently represented in the form of structural templates which are then used to quickly match potential compounds (either from a database or from the user-given inputs) that will bind well at the active site of the protein. Compared to the all-atom molecular dynamics approaches, these methods are very efficient in finding optimal binding poses for the protein and ligand.

Genetic algorithms
Two of the most used docking programs belong to this class: GOLD and AutoDock. Genetic algorithms allow the exploration of a large conformational space – which is basically spanned by the protein and ligand jointly in this case – by representing each spatial arrangement of the pair as a “gene” with a particular energy. The entire genome thus represents the complete energy landscape which is to be explored. The simulation of the evolution of the genome is carried out by cross-over techniques similar to biological evolution, where random pairs of individuals (conformations) are “mated” with the possibility for a random mutation in the offspring. These methods have proven very useful in sampling the vast state-space while maintaining closeness to the actual process involved.

Although genetic algorithms are quite successful in sampling the large conformational space, many docking programs require the protein to remain fixed, while allowing only the ligand to flex and adjust to the active site of the protein. Genetic algorithms also require multiple runs to obtain reliable answers regarding ligands that may bind to the protein. The time it takes to typically run a genetic algorithm in order to allow a proper pose may be longer, hence these methods may not be as efficient as shape complementarity-based approaches in screening large databases of compounds. Recent improvements in using grid-based evaluation of energies, limiting the exploration of the conformational changes at only local areas (active sites) of interest, and improved tabling methods have significantly enhanced the performance of genetic algorithms and made them suitable for virtual screening applications.