User:Zargulon/Topics in protein-protein docking

Protein-protein docking using the surface method
Surface methods use a representation of the molecular surface of the proteins to generate configurations in which the proteins make tangential contact somewhere. The aim is to exclude  a large proportion of the configurations with severe interpenetration. A set of points evenly distributed on the molecular surfaces is calculated, along with a vector passing through each point which is normal (perpendicular) to the surface at that point. Configurations of the complex are then created by moving each protein so that one of its surface points is exactly superposed on the surface point of another, and then rotating about the common point until the normal vectors are antiparallel (pointing exactly against each other). Further configurations are generated by rotating around the common normal.

Binding-site information can be applied by restricting attention to surface points that lie within or near features of interest, such as putative binding sites or highly conserved regions. Among the methods used to generate the surface points, marching cubes has the advantage that it provides a covering of the surface with triangles, allowing a generation of patches on the surface of the protein, and the areas of those patches to be calculated. Using marching cubes to create a high-resolution surface for an 200-residue globular protein takes about 5 minutes and generates about 100,000 triangles, each about 0.1 square Ångströms in area. Constructing the surface of a non-globular or very large protein can take several hours.

The resolution of the search is controlled separately from the resolution of the marching cubes grid. A subset of triangles, typically about 1 in 1000, is chosen to be active for generating configurations. The active polygons are chosen from among the polygons remaining after any binding site has been imposed. The active polygons are selected to be maximally distant from one another, and to be surrounded by roughly circular patches of inactive polygons of roughly equal area.

Representing the proteins with a triangulation of their molecular surface allows access to a new, geometrically-oriented, class of scoring functions, including the convex hull volume of the configuration, the volume of intersections of the proteins, and the buried surface area. Standard scoring functions based on interatomic distances or inter-residue distances are still available.

Flexibility may be introduced either during scoring or as a refinement step for selecting the best from a subset of top-scoring configurations.

Automatic hybrid score optimization
A statement about the quality of a docking method M are typically of the form "In x% of protein-protein complexes, the set of best N guesses of the complex structure always contain at least one guess with RMSD less than r Ångströms". The best docking method which could exist theoretically, would have x%=100%, N=1 and r=0.

Rotamer transfer
For complexes of known structure in the benchmark, the rotameric states of the sidechains as they lie in the complexed state may be transfered torsion angle by torsion angle onto the backbones of the proteins in their free state. The degree of improvement in docking success after imposing these sidechain hints reflects how important each sidechain torsion angle is for docking, and therefore how much time to spend optimizing the sidechains in order to make a significant difference.

Normal modes
If the decision is taken to permit flexing of the backbone, one way to restrict the number of configurations that need to be analysed is to only allow deformations that can be described by large-amplitude normal modes. Normal modes are oscillating collective motions. For a free protein with 2000 atoms there are about 6000 normal modes and they generally all have different oscillation rates. The slowest oscillating modes have the largest amplitudes, and it is hypothesized that means they define possible conformational changes on docking.

Rigid refinement
For scores which have continuous spatial dependence, such as Lennard-Jones or electrostatic energy, it is possible to calculate a gradient with respect to the relative positions of the proteins. A relative motion of the proteins along the gradient with an appropriate step size may be iterated to improve the configuration. Calculation speed determines how broadly rigid refinement may be applied.

Sidechain torsional refinement
Since each sidechain consists of a small number of atoms, a refinement step may include key sidechains near the interaction region without impinging much on calculation speed.

Multicopy
The multicopy method for sidechain refinement models each sidechain as an weighted ensemble of selected rotamers. The weights are iteratively updated until convergence or timeout.

Sidechain torsional refinement may also be carried out by a gradient method.

Atomic parameters for heterogens
Evaluating electrostatic and steric energy requires estimates of atomic charge and atomic size, which in general vary, not just between atomic species, but between different atoms of the same species in different molecular contexts.Calculations for all the atoms in a protein have been made by chemists, but proteins often contain small non-peptide molecules, called heterogens. A particularly common example are glycoproteins which have sugar groups attached various amino-acids (almost exclusively asparagine in eukaryotes) and the configuration of these sugars is responsible for determining blood group, among other important functions. An automatic method of inferring satisfactory estimates for heterogen atom parameters from known peptide atom parameters would be useful. One possible approach is to consider atoms nearby in the bonding network.