Racemic crystallography

Racemic crystallography is a technique used in structural biology where crystals of a protein molecule are developed from an equimolar mixture of an L-protein molecule of natural chirality and its D-protein mirror image. L-protein molecules consist of 'left-handed' L-amino acids and the achiral amino acid glycine, whereas the mirror image D-protein molecules consist of 'right-handed' D-amino acids and glycine. Typically, both the L-protein and the D-protein are prepared by total chemical synthesis.

Manufacturing
Native chemical ligation of unprotected peptide segments is used to prepare the protein's polypeptide chain, which is then folded to form a protein molecule. In native chemical ligation, a peptide C-terminal thioester reacts with a second peptide that has a cysteine residue at its N-terminus, to give a product with a peptide bond at the ligation site. Multiple unprotected peptide segments can be linked in this way to give the full length polypeptide chain, which is folded to give the target protein molecule. Once the chemical synthesis of an L-protein is achieved, the D-protein enantiomer can be manufactured using synthetic peptide building blocks made from D-amino acids and Gly. Convergent synthesis is most effective in preparing long polypeptide chains, by using peptide-hydrazides, where the hydrazide can be converted to a thioester for use in native chemical ligation. The hydrazide is stable to native chemical ligation reaction conditions, and can be converted in situ to a reactive peptide-thioester for the next native chemical ligation condensation reaction.

Theory
There are just 230 different ways of arranging objects in regular three-dimensional arrays. In molecular crystallography, these arrangements are called 'space groups'. However, only 65 of these arrangements are accessible to chiral objects or chiral molecules. The remaining 165 space groups contain either a center of symmetry or a mirror plane and are thus not accessible to natural globular proteins, which are chiral molecules. Wukowitz and Yeates developed a mathematical theory to explain the preference of globular proteins to crystallize in certain space groups. They suggested the preferred space group was determined by the number of degrees of freedom (D) or dimensionality as a measure of the ease with which a given symmetry can be formed. They analyzed the number of degrees of freedom for both chiral and achiral space groups where it was found that the space group P1(bar) with D=8 is theoretically the most dominant space group. Since the achiral space group had a higher degree of freedom compared to the chiral space groups, they predicted that racemic mixtures of protein enantiomers would crystallize more readily compared to the natural L-proteins alone by forming achiral {L-protein plus D-protein} pairs. While space group P1(bar) is most preferred, P21/c and C2/c are also highly preferred, whereas the other achiral space groups are expected to appear less frequently. Hence, P1(bar), P21/c, and C2/c are considered common centrosymmetric space groups in racemic mixtures.

History
In 1989, Alan Mackay suggested that if chemical synthesis could be used to make L-protein and D-protein enantiomers, it would enable the use of racemic mixtures to crystallize proteins in centrosymmetric space groups. He stated that, because in the X-ray diffraction data obtained from a centrosymmetric crystal the off-diagonal phases would cancel giving phases that differ by 180 degrees, this would facilitate solving the phase problem in protein structure determination through X-ray crystallography.

In 1993, Laura Zawadzke and Jeremy Berg first used the small (45 amino acids) protein rubredoxin to synthesize it in racemic form. This was done since the structural determination would potentially be easier and more robust by using diffraction data from a centrosymmetric crystal, which requires growth from a racemic mixture. By having a centre of symmetry formed by the racemic protein pairs, the steps of phasing diffraction in data analysis would be further simplified. As mentioned above, in 1995 Stephanie Wukovitz and Todd Yeates had developed a mathematical theory to explain why protein molecules tend to crystallize more frequently in certain space groups than in others; they predicted that the most favored protein space group would be P1, and predicted that globular proteins would crystallize more easily as racemates, from a racemic protein mixture.

Notable applications
With the development of native chemical ligation in 1994, total chemical synthesis of pairs of D-protein and L-protein enantiomers became feasible. In the first practical application to solving an unknown structure, racemic and quasi-racemic X-ray crystallography were used to determine the structure of snow flea anti-freeze protein. In the course of that work it was observed that racemic and even quasi-racemic protein mixtures dramatically facilitated the formation of diffraction quality, centrosymmetric crystals. Quasi-racemates are formed by mirror image protein molecules that are not true enantiomers but which are sufficiently similar mirror image objects to form ordered pseudo-centrosymmetric arrays.

Subsequently, pairs of racemic and quasi-racemic protein molecules prepared by total chemical synthesis have been shown to dramatically increase the rate of success in forming diffraction-quality crystals from a wide range of globular protein molecules.

Rv1738, a protein of Mycobacterium tuberculosis is the most up-regulated gene product when M. tb enters persistent dormancy. Preparations of recombinantly expressed Rv1738 L-protein resisted extensive attempts to form crystals. A racemic mixture of the chemically synthesized D-protein and L-protein forms of Rv1738 gave crystals in the centrosymmetric space group C2/c. The structure, containing L-protein and D-protein dimers in a centrosymmetric space group, revealed structural similarity to 'hibernation-promoting factors' that can bind to ribosomes and suppress translation.

Crystallization of ubiquitin protein was successfully done using racemic crystallography. Crystallization of either D-ubiquitin or L-ubiquitin alone is difficult, whereas a racemic mixture of D-ubiquitin and L-ubiquitin was readily crystallized and diffraction quality crystals were obtained overnight in almost half the conditions tested in a standard commercial crystallization screen.

Crystallization of racemates of disulfide-containing microprotein molecules was used to determine the structure of trypsin inhibitor SFTI-1 (14 amino acids,1 disulfide), conotoxin cVc1.1 (22 amino acids, 2 disul-fides) and cyclotide kB1 (29 amino acids, 3 disulfides). Using X-ray diffraction, it was found that the racemates crystallized in the centrosymmetric spacegroups P3(bar), Pbca and P1(bar).

Interestingly, achiral "'peptoid'" chains were found to fold as racemic pairs and crystallize in highly preferred centrosymmetric space groups.

A high-resolution crystal structure of the racemate of a heterochiral D-protein complex with vascular endothelial growth factor A (VEGF-A). The mirror image D-protein form of VEGF-A was used in phage display to identify a 56 residue L-protein binder with nanomolar affinity; the chemically synthesized D-protein binder had the same affinity for the L-protein form of VEGF-A. A mixture of chemically synthesized proteins consisting of D-VEGF-A, L-VEGF-A, and two equivalents each of the D-protein binder and L-protein binder, gave racemic crystals in the centrosymmetric space group P21/n. The structure of this 71kDa heterochiral protein complex was solved at a resolution of 1.6 Å