Infologs

Infologs are independently designed synthetic genes derived from one or a few genes where substitutions are systematically incorporated to maximize information. Infologs are designed for perfect diversity distribution to maximize search efficiency.

Typical protein engineering methods rely on screening a high number (106-1012 or more) of gene variants to identify individuals with improved activity using a surrogate high throughput screen (HTP) to identify initial hits. Unfortunately, results are defined by what is screened for, thus the “hit” from the HTP screen often has very little real activity in a lower throughput assay more indicative of the improved functionality for which the protein is being developed. By adapting the standard algorithms for engineering complex systems to work with biological systems, the resulting process enables researchers to deconvolute how substitutions within a protein sequence modify its function. Combining these algorithms with an integrated query and ranking mechanism allows the identification of appropriate sequence substitutions. Infologs refers to the set of designed genes, singular use Infolog describes an individual variant.



Ancestry
Homology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs).

Homologs are similar genes and/or proteins which are related by ancestry.

Orthologs are the 'same' gene, but from different organisms. Homologous sequences are orthologous if they were separated by a speciation event: when a species diverges into two separate species, the copies of a single gene in the two resulting species are said to be orthologous. Orthologs, or orthologous genes, are genes in different species that originated by vertical descent from a single gene of the last common ancestor. The term "ortholog" was coined in 1970 by Walter Fitch. Paralogs are related genes originating from one gene that through duplication ended up as two genes that over time has evolved for two separate functions (or, according to a recent Science paper, a promiscuous starting gene that duplicated and each copy evolved towards different functions). Paralogs typically have the same or similar function, but sometimes do not: due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions. Paralogs usually occur from within the same species. Xenologs are homologs resulting from horizontal gene transfer between two organisms. Xenologs can have different functions, if the new environment is vastly different for the horizontally moving gene. In general, though, xenologs typically have similar function in both organisms.

Infologs are similar genes and/or proteins which are related by synthetic ancestry to approach perfect diversity distribution.

Features

 * Optimize directly for function in the final application
 * Does not require high-throughput (HTP) screens
 * Screen small numbers of variants (50-200) directly for the desired function
 * Decreased false positives: variants identified by HTP screens that do not retain activity in 'real' assay
 * Decreased loss of potential positive hits due to screening error or poor correlation between HTP screen and 'real' assay
 * No biodiversity collections required, everything is synthesized as needed
 * Sequence-function relationships provide the basis for strong composition-of-matter patent claims.

Case study
Transforming Protein engineering with Infologs:

Using independently designed synthetic genes where substitutions are systematically incorporated (Infologs) leads to uniform sampling, systematic variance and unrestricted information rich results. Wheat Glutathione S-transferases (GST) with the ability to detoxify a panel of common herbicides was designed using this patented bioengineering method. The relative functional contribution of 60 amino acid substitutions against 14 herbicides was quantified using only 96 Infologs and dramatically improved by a small set (16) of 2nd generation Infologs. In addition, highly predictable GST sequence-function models against two commercially relevant herbicides were created with quantification of relative functional contribution of 60 amino acid substitutions in two dimensions.

Rational design of proteins
In rational protein design, the scientist uses detailed knowledge of the structure and function of the protein to make desired changes. This generally has the advantage of being technically easy and inexpensive, since site-directed mutagenesis techniques are well-developed. However, its major drawback is that detailed structural knowledge of a protein is often unavailable, and even when it is available, it can be extremely difficult to predict the effects of various mutations.

Computational protein design algorithms seek to identify novel amino acid sequences that are low in energy when folded to the pre-specified target structure. While the sequence-conformation space that needs to be searched is large, the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones.