User:Dcbennett2/sandbox/eng

Genome engineering refers to the strategies and techniques developed in recent years for the targeted, specific modification of the genetic information – or genome – of living organisms.

Genome engineering is a very active field of research because of the wide range of possible applications, particularly in the areas of human health - the correction of a gene carrying a harmful mutation, the production of therapeutic proteins, the elimination of persistent viral sequences - agricultural biotechnology - the development of new generations of genetically modified plants - and for the development of research tools - for example, to explore the function of a gene.

Early technologies that have been developed to insert a gene into a living cell, such as transgenesis, are limited by the random nature of the insertion of the new sequence into the genome. The new gene is positioned blindly and may inactivate or disturb the functioning of other genes or even cause severe unwanted effects such as oncogenic transformation. Furthermore, these technologies offer no degree of reproducibility, as there is no guarantee that the new sequence will be inserted at the same place in two different cells.

The major advantage of genome engineering, which uses more recent knowledge and technology, is that it enables a specific site of the DNA to be modified, increasing the precision of the correction or insertion, preventing any cell toxicity, and offering greatly improved reproducibility. Genome engineering and synthetic genomics (designing artificial genomes) are currently among the most promising technologies in terms of applied biological research and industrial innovation.

General principles
Early approaches to genome engineering involved modifying genetic sequences using only homologous recombination. Using a homologous sequence located on another strand as a model can cause this natural DNA maintenance mechanism to repair a DNA strand. It is possible to induce homologous recombination between a cellular DNA strand and an exogenous DNA strand (inserted in the cell by researchers), using a vector such as the modified genome of a retrovirus. The recombination phenomenon is flexible enough for a certain level of change (i.e., the addition, suppression or modification of a DNA portion) to be introduced to the targeted homologous area.

In the 1980s, Mario R. Capecchi and Oliver Smithies developed homologous recombination of DNA as a “gene targeting” tool for the inactivation or modification of specific genes. Working with Martin J. Evans, they developed a process for the modification of the mouse genome by modifying the DNA of mouse embryonic stem cells in culture and injecting these modified stem cells into mouse embryos. Genetically modified mice generated using this method make useful laboratory models to study human diseases. This tool is now commonly used in medical research. The three researchers were awarded the 2007 Nobel Prize in Medicine for their work.

Modifying genomes using only homologous recombination remained a long and random process until additional developments were made that could increase the rate of homologous recombination in somatic cell types. These developments include two mechanistically distinct methods of triggering the cells inherent DNA repair mechanisms which are required to insert a foreign gene sequence into a live cell. The first method is by site-directed endonucleases (restriction enzymes), which include specific technologies such as zinc finger nucleases (ZFNs) and meganucleases. Site-directed endonucleases achieve gene modification by causing double-stranded DNA (dsDNA) breaks, which triggers the cells natural DNA repair mechanism, predominantly non homologous end joining (NHEJ) as well as a low frequency of homologous recombination (HR). The second method is recombinant adeno-associated virus (rAAV) mediated genome engineering, which induces high frequencies of homologous recombination alone, thus forgoing the need to perform dsDNA breaks.

Three primary approaches are used in genome engineering:


 * Insertion involves introducing a gene into a chromosome to obtain a new function (for example to obtain a better drought-resistant plant) or to compensate for a defective gene, particularly by making it possible to manufacture a functional protein if the protein produced by the patient is defective (such as factor VIII in hemophilia A).
 * Inactivation, or “knock-out”, is today mainly used in fundamental research to shed light on the function of a gene by observing the anomalies that occur as a result of its inactivation. It can also have other applications, for example to remove a persistent viral sequence from infected cells, or in agriculture to eliminate the irritant or allergenic properties of a plant.
 * Correction aims to remove and replace a defective gene sequence with a functional sequence. This correction can be performed on a very short sequence, sometimes just a few nucleotides, such as in the case of drepanocytosis (sickle cell anemia). In plants, this manipulation can also help improve the properties of a species without the addition of foreign DNA.

Multiplex Automated Genomic Engineering (MAGE)
Until very recently, the methods for scientists and researchers wanting to study genomic diversity and all possible associated phenotypes were very slow, expensive, and inefficient. Researchers would have to manipulate individual genes and tweak the genome one small section at a time, observe the phenotype, and start the process over with a different single-gene manipulation. Researchers at the Wyss Institute at Harvard University designed MAGE, a powerful technology that improves the process of in vivo genome editing. This technique allows for quick and efficient manipulations of a genome, all in a machine small enough to put on top of a small kitchen table. Those mutations combine with the variation that naturally occurs during cell mitosis creating billions of cellular mutations.

Chemically combined, synthetic single-stranded DNA (ssDNA) and a pool of oligionucleotides are introduced at targeted areas of the cell, creating genetic modifications. The cyclical process involves transformation of bacterial cells with ssDNA (by electroporation) followed by outgrowth, during which bacteriophage homologous recombination proteins mediate the annealing of ssDNAs to their genomic targets. Experiments targeting selective phenotypic markers can be screened and identified by plating the cells on differential medias. Each cycle takes approximately 2.5 hours to process, with additional time required to grow isogenic cultures and characterize mutations. By iteratively introducing libraries of mutagenic ssDNAs targeting multiple sites, MAGE can generate combinatorial genetic diversity in a cell population. The entire process can generate up to 50 genome edits, ranging from single nucleotide base pairs to whole genomes or gene networks simultaneously, with results in a matter of days.

MAGE experiments can be divided into three classes, characterized by varying degrees of scale and complexity: (i) many target sites, single genetic mutations; (ii) single target site, many genetic mutations; and (iii) many target sites, many genetic mutations. An example of class three was reported in 2009, where Church and colleagues were able to program Escherichia coli to produce five times the normal amount of lycopene, an antioxidant normally found in tomato seeds and linked to anti-cancer properties. They applied MAGE to optimize the 1-deoxy-d-xylulose-5-phosphate (DXP) metabolic pathway in Escherichia coli to overproduce isoprenoid lycopene. It took them about 3 days and just over $1,000 in materials. The ease, speed, and cost efficiency in which MAGE can alter genomes can transform how industries approach the manufacturing and production of important compounds in the bioengineering, bioenergy, biomedical engineering, synthetic biology, pharmaceutical, agricultural, and chemical industries.

Transfection by causing dsDNA breaks
Researchers wishing to efficiently eliminate a gene to study the resulting loss of its function are increasingly opting for a “molecular scissors” approach. These techniques take advantage of natural or modified enzymes with specific properties that enable them to cut the long double DNA strand at a specific sequence to be modified, thereby triggering the NHEJ and HR processes at the required location.

The restriction enzymes that are commonly used in molecular biology to cut DNA interact with sequences of 1 to 10 nucleotides. These sequences, which are very short and generally palindromic, often occur at several sites in the genome (the human genome comprises 6.4 billion bases). Traditional restriction enzymes are therefore likely to cut a given DNA molecule several times. In their efforts to find a "genome surgery" approach offering a higher degree of accuracy and security, scientists therefore turned to more precise tools.

More targeted genome engineering can be performed by using enzymes that are able to recognize and interact with DNA sequences that are sufficiently long so as to occur only once, with high probability, in any given genome. The DNA modification therefore takes place precisely at the site of the target sequence. With recognition sites of over 12 base pairs, meganucleases and zinc finger nucleases offer this degree of precision.

Once the DNA has been cut, natural DNA repair mechanisms and homologous recombination enable the incorporation of a modified sequence or a new gene.

The success of these different stages (recognition, cleavage and recombination) depends on various factors, including the efficacy of the vector that introduces the enzyme into the cell, the enzyme cleavage activity, the cell’s capacity for homologous recombination and probably the state of the chromatin at the given locus.

Meganuclease-based Engineering
Meganucleases, discovered in the late 1980s, are enzymes in the endonuclease family that are characterized by their ability to recognize and cut large DNA sequences (from 12 to 40 base pairs). The most widespread and best known meganucleases are the proteins in the LAGLIDADG family, which owe their name to a conserved amino acid sequence.

These enzymes were identified in the 1990s as promising tools for genome engineering. However, even though they occur in nature and each one exhibits slight variations in its DNA recognition site, there is virtually no chance of finding the exact meganuclease required to act on a specific DNA sequence. Each new genome engineering target therefore requires an initial protein engineering stage to produce a custom meganuclease.

Two methods exist for creating custom meganucleases:
 * Mutagenesis involves generating collections of variants using a meganuclease with properties similar to the desired enzyme, then selecting these variants using high-throughput screening. This procedure can be optimized by adopting what are known as “semi-rational” methods, in which the structural data is electronically processed in order to focus the mutagenesis to the part of the enzyme that interacts with DNA and triggers the cleavage.
 * Combinatorial assembly is a method whereby protein subunits from different enzymes can be associated or fused.

These two approaches can be combined. Scientists from the biotechnology company Cellectis have identified the areas responsible for DNA cleavage and the areas that interact with specific DNA sites in the structure of several meganucleases. By modifying these recognition sites, they have been able to generate variants that interact with DNA sequences that differ from those of the initial meganucleases while retaining their ability to cut DNA and their high degree of specificity.

A large bank containing several tens of thousands of protein units has been created. These units can be combined to obtain chimeric meganucleases that recognize the target site, thereby providing research and development tools that meet a wide range of needs (fundamental research, health, agriculture, industry, energy, etc.).

This technique has enabled the development of several meganucleases specific for sequences in the genomes of viruses, plants, etc., and the industrial-scale production of two meganucleases able to cleave the human XPC gene; mutations in this gene result in Xeroderma pigmentosum, a severe monogenic disorder that predisposes the patients to skin cancer and burns whenever their skin is exposed to UV rays.

Another approach involves using computer models to try to predict as accurately as possible the activity of the modified meganucleases and the specificity of the recognized nucleic sequence. The Northwest Genome Engineering Consortium, a US consortium funded by the National Institutes of Health, has adopted this approach with the aim of treating leukemia by modifying hematopoietic stem cells. The model’s prediction has been verified and guided by means of directed mutagenesis and in vitro biochemical analysis.

A third approach has been taken by the American biotechnology company Precision Biosciences, Inc. The company, funded by the National Institutes of Health and the National Institute of Standards and Technology, has developed a fully rational design process called the Directed Nuclease Editor (DNE) which is capable of creating highly specific engineered meganucleases that successfully target and modify a user-defined location in a genome.

Zinc finger nuclease-based Engineering
Zinc finger motifs occur in several transcription factors. The zinc ion, found in 8% of all human proteins, plays an important role in the organization of their three-dimensional structure. In transcription factors, it is most often located at the protein-DNA interaction sites, where it stabilizes the motif. The C-terminal part of each finger is responsible for the specific recognition of the DNA sequence.

The recognized sequences are short, made up of around 3 base pairs, but by combining 6 to 8 zinc fingers whose recognition sites have been characterized, it is possible to obtain specific proteins for sequences of around 20 base pairs. It is therefore possible to control the expression of a specific gene. It has been demonstrated that this strategy can be used to promote a process of angiogenesis in animals. It is also possible to fuse a protein constructed in this way with the catalytic domain of an endonuclease in order to induce a targeted DNA break, and therefore to use these proteins as genome engineering tools.

The method generally adopted for this involves associating two proteins – each containing 3 to 6 specifically chosen zinc fingers – with the catalytic domain of the FokI endonuclease. The two proteins recognize two DNA sequences that are a few nucleotides apart. Linking the two zinc finger proteins to their respective sequences brings the two endonucleases associated with them closer together. This means that they can be dimerized and then cut the DNA molecule.

Several approaches are used to design specific zinc finger nucleases for the chosen sequences. The most widespread involves combining zinc-finger units with known specificities (modular assembly). Various selection techniques, using bacteria, yeast or mammal cells have been developed to identify the combinations that offer the best specificity and the best cell tolerance. Although the direct genome-wide characterization of zinc finger nuclease activity has not been reported, an assay that measures the total number of double-strand DNA breaks in cells found that only one to two such breaks occur above background in cells treated with zinc finger nucleases with a 24 bp composite recognition site and obligate heterodimer FokI nuclease domains.

Zinc finger nucleases are research and development tools that have already been used to modify a range of genomes, in particular by the laboratories in the Zinc Finger Consortium. The US company Sangamo BioSciences uses zinc finger nucleases to carry out research into the genetic engineering of stem cells and the modification of immune cells for therapeutic purposes. Modified T lymphocytes are currently undergoing phase I clinical trials to treat a type of brain tumor (glioblastoma) and in the fight against AIDS.

TALEN
Transcription activator-like effector nucleases (TALEN) are artificial restriction enzymes generated by fusing a specific DNA-binding domain to a non-specific DNA cleaving domain. The DNA binding domains come from TAL effectors, DNA-binding proteins that are excreted by certain bacteria that infect plants. TAL effectors contain multiple repeats of a well-studied modular variable peptide domain (known as the repeat variable diresidue) that determines DNA binding specificity. By modifying the amino acid sequence of these DNA binding domains, TAL effectors can be engineered to bind any desired DNA sequence (including methylated sequences ) and refined to prevent cleavage at undesired sites with significant sequence similarity. TALEN technology can be used in a similar way to designed zinc finger nucleases but has greater precision.

TALEN technology has shown great promise in gene editing and genome engineering. For example, it has been used to generate targeted mutations in staple food crops to prolong storage or to improve the nutrition of the resulting plants. Recent applications include the generation of engineered T-cell s with chimeric antigen receptors targeting antigens of interest. This approach can be used to make T-cells that are both effective in killing tumor cells and resistant to lymphodepleting drug regimens.

CRISPRs
CRISPRs (Clustered Regularly Interspaced Short Palindromic Repeats) are genetic elements that bacteria use as a kind of acquired immunity to protect against viruses. They consist of short sequences that originate from viral genomes and have been incorporated into the bacterial genome. Cas (CRISPR associated) proteins process these sequences and cut matching viral DNA sequences. By introducing plasmids containing Cas genes and specifically constructed CRISPRs into eukaryotic cells, the eukaryotic genome can be cut at any desired position.

rAAV-based engineering
rAAV mediated genome engineering builds on Capecchi and Smithies’ Nobel Prize–winning discovery that homologous recombination, a natural DNA repair mechanism, can be harnessed to perform precise genome alterations in mice. rAAV improves the efficiency of this approach to permit gene editing in any pre-established and differentiated human cell line, which in contrast to mouse ES cells, have low rates of homologous recombination. Genome editing in human and other mammalian somatic cell types using homologous recombination is now achievable using recombinant adeno-associated virus (rAAV) vectors. These single-stranded DNA viral vectors have high transduction rates and have a unique property of stimulating endogenous HR without causing double strand DNA breaks in the genome, which are typical of other site-directed endonuclease mediated genome editing methods.

Users can design a rAAV vector to any target genomic locus and perform either gross or subtle endogenous gene alterations in mammalian somatic cell types. These include gene knock-outs for functional genomics, or the ‘knock-in’ of protein tag insertions to track translocation events at physiological levels in live cells. Most importantly, rAAV targets a single allele at a time and does not result in any off-target genomic alterations. Because of this, it is able to routinely and accurately model genetic diseases caused by subtle SNPs or point mutations that are increasingly the targets of novel drug discovery programs.

The use of rAAV based engineering has been documented in over 400 peer-reviewed publications. Researchers have employed rAAV based genome editing to engineer human cell lines for use as disease models. These isogenic human disease models are precisely matched pairs of cell lines, where one harbours a cancer-associated mutation in an endogenous gene, just as it occurs in real patients, while the other is a genetically identical cell line carrying a normal version of that gene. These isogenic disease models provide a definitive means to understand disease biology.