TALE-likes

Transcription Activator-Like Effector-Likes (TALE-likes) are a group of bacterial DNA binding proteins named for the first and still best-studied group, the TALEs of Xanthomonas bacteria. TALEs are important factors in the plant diseases caused by Xanthomonas bacteria, but are known primarily for their role in biotechnology as programmable DNA binding proteins, particularly in the context of TALE nucleases. TALE-likes have additionally been found in many strains of the Ralstonia solanacearum bacterial species complex, in Paraburkholderia rhizoxinica strain HKI 454, and in two unknown marine bacteria. Whether or not all these proteins form a single phylogenetic grouping is as yet unclear.

The unifying feature of the TALE-likes are their tandem arrays of DNA binding repeats. These repeats are, with few exceptions, 33-35 amino acids in length, and composed of two alpha-helices on either side of a flexible loop containing the DNA base binding residues and with neighbouring repeats joined by flexible linker loops. Evidence for this common structure comes in part from solved crystal structures of TALEs and a Burkholderia TALE-like (BAT), but also from the conservation of the code that all TALE-likes use to recognise DNA-sequences. In fact, TALE, RipTAL, and BAT repeats can be mixed and matched to generate functional DNA-binding proteins with varying affinity.

TALEs
TALEs are the first identified, best-studied and largest group within the TALE-likes. TALEs are found throughout the bacterial genus Xanthomonas, comprising mostly plant pathogens. Those TALEs which have been studied have all been shown to be secreted as part of the Type III secretion system into host plant cells. Once inside the host cell they translocate to the nucleus, bind specific DNA sequences within host promoters and turn on downstream genes. Every part of this process is thought to be conserved across all TALEs. The single meaningful difference between individual TALEs, based on current understanding, is the specific DNA sequence that each TALE binds. TALEs from even closely related strains differ in the composition of repeats that make up their DNA binding domain. Repeat composition determines DNA binding preference. In particular position 13 of each repeat confers the DNA base preference of each repeat. During early research it was noted that almost all the differences between repeats of a single TALE repeat array are found in positions 12 and 13 and this finding led to the hypothesis that these residues determine base preference. In fact repeat positions 12 and 13, referred to jointly as the Repeat Variable Diresidue (RVD) are commonly said to confer base specificity despite clear evidence that position 13 is the base determining residue. In addition to the repeat domain TALEs also possess a number of conserved features in the domains flanking the repeats. These include domains for type-III-secretion, nuclear localization and transcriptional activation. This allows TALEs to carry out their biological role as effector proteins secreted into host plant cells to activate expression of specific host genes.

Diversity and evolution
Whilst the RVD positions are commonly the only variable positions within a single TALE repeat array, there are more differences when comparing repeat arrays of different TALEs. The diversity of TALEs across the Xanthomonas genus is considerable, but a particularly striking finding is that the evolutionary history one arrives at by comparing repeat compositions differs from that found when comparing non-repeat sequences. Repeat arrays of TALEs are thought to evolve rapidly, with a number of recombinatorial processes suggested to shape repeat array evolution. Recombination of TALE repeat arrays has been demonstrated in a forced-selection experiment. This evolutionary dynamism is thought to be made possible by the very high sequence identity of TALE repeats, which is a unique feature of TALEs as opposed to other TALE-likes.

T-zero
Another unique feature of TALEs is a set of four repeat structures at the N-terminal flank of the core repeat array. These structures, termed non-canonical or degenerate repeats have been shown to be vital for DNA binding, though all but one do not contact DNA bases and thus make no contribution to sequence preference. The one exception is repeat -1, which encodes a fixed T-zero preference to all TALEs. This means that the target sequences of TALEs are always preceded by a thymine base. This is thought to be common to all TALEs, with the possible exception of TalC from ''Xanthomonas oryzae pv. oryzae'' strain AXO1947.

Discovery and molecular properties
It was noted in the 2002 publication of the genome of reference strain Ralstonia solanacearum GMI1000 that its genome encodes a protein similar to Xanthomonas TALEs. Based on similar domain structure and repeat sequences it was presumed that this gene and homologs in other Ralstonia strains would encode proteins with the same molecular properties as TALEs, including sequence-specific DNA binding. In 2013 this was confirmed by two studies. These genes and the proteins they encode are referred to as RipTALs (Ralstonia injected protein TALE-like) in line with the standard nomenclature of Ralstonia effectors. Whilst the DNA binding code of the core repeats is conserved with TALEs, RipTALs do not share the T-zero preference, instead they have a strict G-zero requirement. In addition repeats within a single RipTAL repeat array have multiple sequence differences beyond the RVD positions, unlike the near-identical repeats of TALEs.

RipTALs have been found in all four phylotypes of R. solanacearum, making it an ancestral feature of this clade. Despite differences in the flanking domains, the sequences their RVDs target are highly similar.

Biological role
Several lines of evidence support the idea that RipTALs function as effector proteins, promoting bacterial growth or disease by manipulating the expression of plant genes. They are secreted into plant cells by the Type III secretion system, which is the main delivery system for effector proteins. They localize to the cell nucleus and are able to function as sequence-specific transcription factors in plant cells. In addition a strain lacking its RipTAL was shown to grow slower inside eggplant leaf tissue than the wild type. Furthermore, a study based on DNA polymorphisms in ripTAL repeat domain sequences and host plants found a statistically significant connection between host plant and repeat domain variants. This is expected if the RipTALs of different strains are adapted to target genes in specific host plants. Despite this, no target genes have been identified for any RipTAL,.

Discovery
The publication of the genome of bacterial strain Paraburkholderia rhizoxinica HKI 454, in 2011 led to the discovery of a set of TALE-like genes that differed considerably in nature from the TALEs and RipTALS. The proteins encoded by these genes were studied for their DNA binding properties by two groups independently and named the Bats (Burkholderia TALE-likes; ) or BurrH. This research showed that the repeat units of the Burkholderia TALE-likes bind DNA with the same code as TALEs, governed by position 13 of each repeat. There are, however, a number of differences.

Biological role
Burkholderia TALE-likes are composed almost entirely of repeats, lacking the large non-repetitive domains found flanking the repeats in TALEs and RpTALs. Those domains are key to the functions of TALEs and RipTALs allowing them to infiltrate the plant nucleus and turn on gene expression. It is therefore currently unclear what the biological roles of Burkholderia TALE-likes are. What is clear is that they are not effector proteins secreted into plant cells to act as transcription factors, the biological role of TALEs and RipTALs. It is not unexpected that they may differ in biological roles from TALEs and RipTALs since the life style of the bacterium they derive from is very unlike that of TALE and RipTAL bearing bacteria. B. rhizoxinica is an endosymbiont, living inside a fungus, unlike Rhizopus microsporus, a plant pathogen. The same fungus is also an opportunistic human pathogen in immuno-compromised patients, but whereas B. rhizoxinica is necessary for pathogenicity on plant hosts it is irrelevant to human infection. It is unclear whether the Burkholderia TALE-likes are ever secreted either into the fungus, let alone into host plants.

Uses in biotechnology
As noted in the publications on Burkholderia TALE-likes there may be some advantages to using these proteins as a scaffold for programmable DNA-binding proteins to function as transcription factors or designer-nucleases, compared to TALEs. It has been fused with a FokI nuclease analogous to TALEN. Advantages include a shorter repeat size, more compact domain structure (no large non-repeat domains), greater repeat sequence diversity enabling the use of PCR on the genes encoding them and making them less vulnerable to recombinatorial repeat loss. In addition, Burkholderia TALE-likes have no T-zero requirement relaxing the constraints on DNA target selection. However, few uses of Burkholderia TALE-likes as programmable DNA binding proteins have been published, outside of the original characterization publications.

Discovery
In 2007 the results of a metagenomic sweep of the world's oceans by the J. Craig Venter Institute were made publicly available. The paper in 2014 on Burkholderia TALE-likes was also the first to report that two entries from that database resembled TALE-likes, based on sequence similarity. These were further characterized and assessed for their DNA-binding potential in 2015. The repeat units encoded by these sequences were found to mediate DNA binding with base preference matching the TALE code, and judged likely to form structures nearly identical to Bat1 repeats based on molecular dynamics simulations. The proteins encoded by these DNA sequences were therefore designated Marine Organism TALE-likes (MOrTLs) 1 and 2 (GenBank:, ). Similar sequences found in metagenomes include and.

Evolutionary relationship to other TALE-likes
Whilst repeats of MOrTL1 and 2 both conform structurally and functionally to the TALE-like norm, they differ considerably at the sequence level both from all other TALE-likes and from one another. It is not known whether they are truly homologous to the other TALE-likes, and thus constitute together with the TALEs, RipTALs and Bats a true protein-family. Alternatively, they may have evolved independently. It is particularly difficult to judge the relationship to the other TALE-likes because almost nothing is known of the organisms that MOrTL1 and MOrTL2 come from. It is known only that they were found in two separate sea-water samples from the Gulf of Mexico and are likely to be bacteria based on size-exclusion before DNA sequencing.

Legal status
A patent for BATs and marine TALE-likes in protein engineering was filed in July 2012. , it is currently pending in all jurisdictions.