Helitron (biology)

Helitrons are one of the three groups of eukaryotic class 2 transposable elements (TEs) so far described. They are the eukaryotic rolling-circle transposable elements which are hypothesized to transpose by a rolling circle replication mechanism via a single-stranded DNA intermediate. They were first discovered in plants (Arabidopsis thaliana and Oryza sativa) and in the nematode Caenorhabditis elegans, and now they have been identified in a diverse range of species, from protists to mammals. Helitrons make up a substantial fraction of many genomes where non-autonomous elements frequently outnumber the putative autonomous partner. Helitrons seem to have a major role in the evolution of host genomes. They frequently capture diverse host genes, some of which can evolve into novel host genes or become essential for Helitron transposition.

History
Helitrons were the first group of TEs to be discovered by computational analysis of whole genome sequences. The first Helitrons described were called Aie, AthE1, Atrep and Basho which are Non-autonomous Helitrons found in the genome of Arabidopsis thaliana, a small flowering plant. Despite these discoveries, the classification of Helitrons was unknown until 2001 when the discovery of protein coding-elements which were predicted to be the autonomous partners. Kapitonov and Jurka investigated the coding capacity of Helitrons in A. thaliana, Oryza sativa, and Caenorhabditis elegans using in silico studies of repetitive DNA of these organisms, computational analysis and Monte Carlo simulation. They described the structure and coding potential of canonical Helitrons and proposed the rolling-circle mechanism of transposition as well as the possibility that some of the encoded genes captured from the host are now used for replication. Their survey of the genome of these organisms showed that Helitron activity could contribute to a significant fraction (~ 2%) of the plant and invertebrate genomes where they were found, but the extent of their distribution elsewhere was not clear.

In 2003, a group of investigators studied the structure of proteins related to Helitrons and the different coding domains within them by looking for Helitron-like elements in vertebrates, specifically zebra fish, Danio rerio and a puffer fish, Sphoeroides nephelus. The Rep/Helicase proteins were predicted to be 500 to 700 amino acids longer because of a C-terminal fusion of a domain with homology to apurinic-apyrimidinic (AP) endonuclease. Previous phylogenetic studies showed that the AP endonuclease is nested within the Chicken Repeat 1 (CR1) clade of non-long terminal repeat (non-LTR) retrotransposons. This relationship suggested that AP endonuclease originated from a retrotransposon insertion either nearby or within a Helitron. These investigators were not able to identify the ends of the Rep/Helicase/Endonuclease unit of Helitrons.

In recent years, Helitrons have been identified in all eukaryotic kingdoms but their genomic copy numbers are highly variable, even among closely related species. They make up 1–5% of the genomic DNA in different fruit flies, 0–3% in mammals, >0.5% in the frog. In most mammals Helitron's presence is negligible and limited to remnants of old transposons, with the exception of bat genomes, which are populated by numerous young elements. However, many years after the description autonomous Helitrons, no mechanistic studies have been published and therefore the rolling-circle mechanism of transposition remains a well-supported but not yet tested hypothesis.

Structure


Helitrons are structurally asymmetric and are the only class of eukaryotic DNA transposons that do not generate duplications of target sites during transposition. Canonical Helitrons typically begin with a 5′ T (C/T) and terminate with the nucleotides CTRR (most frequently CTAG, but occasionally variation has been noted) but do not contain terminal inverted repeats. In addition, they frequently have a short palindromic sequence (16 to 20 nucleotides) hairpin about 11 bp from the 3′ end. They integrate between an AT host dinucleotide. Some families of Helitrons also carry tandem repeats, like microsatellites and minisatellites which are generally highly mutable sequences.

Most Helitrons are non-autonomous elements and share common termini and other structural hallmarks with autonomous Helitrons, but they do not encode any complete set of proteins encoded by the autonomous elements. The main enzymatic hallmarks of Helitrons are the rolling-circle (RC) replication initiator (Rep) and DNA helicase (Hel) domains, which are present in a protein comprising 1000–3000 amino acids (aa) (Rep/Hel) encoded by all autonomous Helitron elements. The Rep/Helicase protein includes zinc finger motifs, the Rep domain (which is a ~100-aa and has HUH endonuclease activity), and an eight-domain PiF1 family helicase (SuperFamily1) which are universally conserved in Helitrons. The zinc finger-like-motifs have been associated with DNA binding. The ~400-aa Hel domain is classified as a 5’ to 3’ DNA Hel which is involved in the breaking and joining of single-stranded DNA and are characterized by both the presence of the HUH motif (two histidine residues separated by a hydrophobic residue) and the Y motif (one or two tyrosine residues that are separated by several amino acids). The PiF1 family of helicases (Hel) has 5′ to 3′ unwinding activity which for many rolling-circle entities this activity is host encoded. Plant Helitrons also encode an open reading frame with homology to single-stranded DNA-binding proteins (RPA). Typically, the RPA proteins in Helitrons are 150 – 500-aa long and are encoded by several exons. In all Helitrons, the Rep domain precedes the Hel domain.

The three-dimensional structure of Helitron transposase covalently bound to the left transposon end has been recently determined by cryoEM.

Mechanisms of rolling-circle transposition
Helitrons are proposed to transpose by a mechanism similar to rolling-circle replication via a single-stranded DNA intermediate. Two models are proposed for the transposition mechanism: the concerted and the sequential. In the concerted model, the donor strand cleavage and ligation occurs simultaneously while in the sequential model they occur in a stepwise fashion. The concerted model does not require a circular intermediate although they could occur if a step fails or is bypassed during transposition. The sequential model differs in that a circular intermediate is a required step of transposition and because, until very recently, circular intermediates were not known for Helitrons, the concerted model was adapted to explain transposition.

In either case, using reconstituted Helraiser transposons to study Helitron transposition, it was shown that the donor site must be double-stranded and that single-stranded donors will not suffice.

The concerted model
Helitron could be either autonomous or non-autonomous. One transposase molecule cleaves at the donor (by the first tyrosine (Y1) residue of the Rep protein) and target sites (by the second tyrosine (Y2) residue) and binds to the resulting 5' ends. The free 3' OH in the target DNA attacks the DNA–Y1 bond and forms a bond with the donor strand resulting in strand transfer. Replication at the cleaved donor site initiates at the free 3' OH where the donor strand serves as a primer for DNA synthesis by host DNA polymerase and replication proceeds to displace one strand of the helitron. If the palindrome and 3' end of the element are recognized correctly, cleavage occurs after the CTRR sequence and the one Helitron strand is transferred to the donor site where DNA replication resolves the heteroduplex.

The sequential model
In 2016, one of the first mechanistic studies of helitron transposition was published in order to shed light on the different steps of transposition. Based on a consensus sequence, it reconstructed the likely ancestor of the Helibat family of helitrons present in the genome of the little brown bat (Myotis Lucifugus), the only group of mammals possessing an important number of helitrons in their genome. This active transposon was inserted into a plasmid acting as the helitron donor. An antibiotic resistance gene was included between the two terminal sequences of the helitron to enable isolation of the cells where transposition occurred.

During transposition of the helitron, a circular intermediate is formed which was isolated in the cells transfected with the plasmid. It is formed by the joining of the terminal ends and suggests a rolling-circle model of transposition during which the cleavage of both the donor and the target strands do not occur at the same time since a single-stranded circular DNA is first formed with one of the strands of the helitron.

This model is supported by the fact that the deletion of one of the two tyrosines (Y727) of the Rep domain thought to be involved in cleavage of the strands doesn't actually affect the efficiency of helitron transposition. Only one of the tyrosines would be required, in order to ensure a two-step process: 1) the cleavage of the donor DNA and 2) the integration into the target site.



Mechanisms of gene capture
The presence of contiguous exons and introns within the host DNA carried by Helitrons suggested a DNA based mechanism of acquisition. Helitron gene capture was proposed to occur in a stepwise or sequential manner, i.e., gene capture occurs during one transposition and capture of a second gene occurs during a subsequent transposition event. Stepwise capture would result in Helitrons that contain gene fragments from different locations. The sequential capture model may explain Helitrons carrying multiple gene fragments observed in other organisms. There are three major models proposed in order to explain the mechanism of gene capture at the DNA level in Helitrons.

End bypass model
Also known as "transduction" or "read-through" model 1 (RTM1). Transposition initiates at the 5′ end and gene capture occurs if the 3′ termination signal is missed. A cryptic downstream palindrome could furnish a new terminator if the normal terminator was bypassed and all intervening sequence would be captured. In this regard, Helitrons can be viewed as an exon shuffling machines. As a random sequence provides the novel termination signal, this model does not require a high density of Helitrons in the genome.

Indeed, in the one-ended-type fusions, the inserted fragment of donor DNA is flanked at one end (constant end) by IRR and at the other end by the CTTG or GTTC sequence present in the donor (variable end) in a way that usually results in multiple tandem insertions of the donor plasmid or capture of flanking sequence in the target site. This failure to recognize the termination signal for Helitron transposition may result in the DNA flanking the 3' end of the Helitron being transferred along with the Helitron to the donor site as well (gene capture). This may be how Helitrons have acquired additional coding sequences. Despite this hypothesis, further experiments are necessary to verify the mechanism of transposition.

Chimeric transposition model
Also known as "read-through" model 2 (RTM2). In this model, transposition initiates at the 5′ end of a Helitron and if the 3′ end of that Helitron is missing, so transposition is terminated at the next 3′ end of a Helitron in the correct orientation, gene capture would occur. The result is that all intervening sequence is captured.

Filler DNA (FDNA) model
In this model, portions of genes or non-coding regions can accidentally serve as templates during repair of double stranded breaks (DSBs) occurring in Helitrons during their transposition. Low-fidelity repair of DSB by Non-Homologous End Joining is more frequent in plants and mammals than repair through homologous recombination, and is often accompanied by insertions of 100–4000 bp long “filler DNA” copied from diverse genomic or extra-chromosomal DNA regions into DSB. This model predicts that 2 to 8 bp regions of microhomology exist between the regions that flank the DSB in the Helitron and that flank the original host sequence captured by the Helitron.

Others
There are also other gene capture mechanism models proposed for Helitrons: Site-specific recombination model which is based on the shared features between Helitrons and Integrons; Transposable element capture which is based on the integration of TEs via transposition into other TEs, also called TE nesting. Despite all these proposed models, there is a lack of examples to limit the mechanism of gene capture to a single model. Further research is needed to understand the molecular mechanism behind gene capture and how it favors the survival of Helitrons.

Evidence supporting the "read-through" models seems to lie in the relative lack of importance of the 3' RTS when compared to the 5' LTS: deletion of the LTS leads to a severe reduction in the efficiency of helitron transposition, whereas the complete deletion of the RTS still leads to significant transposition despite a reduced number of copies. The RTS indicates to the Rep-Hel protein the end of the helitron and thus the end of transposition. The whole of this information lies in the hairpin structure formed by the palindromic sequence of DNA in the 3' end. Such a small structure is likely to be modified over time, enabling to by-pass the helitron's end during its transposition and to capture neighbouring gene sequence.

Impact on gene expression
Helitrons, like all other TEs, are potential insertional mutagens. They might get inserted within the promoter region of a gene that results in the abolition of measurable transcripts and the observed phenotypes. In some cases it has been seen that a Helitron insertion has provided regulatory motifs necessary for transcription initiation. Investigators presented evidence that Helitrons have contributed putative promoters, exons, splice sites, polyadenylation sites, and microRNA binding sites to transcripts otherwise conserved across mammals. Helitrons drive the expression and provides de novo regulatory elements such as CAAT-box, GCbox, octamer motif, and TATA box sites. Helitrons also can alter the length and sequence of both 5′ UTRs and 3′ UTRs of the coding transcripts. Another way Helitrons can control gene expression is through contributing to novel splice variants by promoting alternative splicing and by providing cryptic splice sites. A number of spontaneous mutations have been reported in plants that are caused by intronic Helitron insertions that result in the generation of chimeric transcript species.

Genome-wide identification


The atypical structure, lack of target site modification, and sequence heterogeneity of Helitrons have made automated identification of Helitrons difficult. For genome-wide analysis there are two approaches that have been applied to find canonical Helitrons: De novo repeat identification approaches which can be used to build consensus libraries of all repeated sequences, but De novo repeat finding approaches will only identify Helitrons that are present in multiple relatively homogeneous copies in the genome. Therefore, the low copy and older Helitrons will tend to be fragmented and have poorly defined ends. These approaches are limited by the quality of the genome assembly and the homogeneity of the repeats. Another approach is structure based which relies on the structural features of canonical Helitrons and utilizes programs such as Helitronfinder, HelSearch, Helraizer, and HelitronScanner. As these programs are trained on known Helitron elements, they may not be efficient at identifying divergent families and they generate many false positives. This approach does not create consensus sequences of the candidate Helitrons, resulting in large data sets.

The sensitivity of the structure based approach (correctly identified/(correctly identified + false negatives)) is 93%, and the specificity (correctly identified/(correctly identified + false positives)) is 99%. There are several reasons why all other techniques for Helitron discovery have been less sensitive and/or more error prone: A Rep/helicase protein-based search yields a large number of false negatives, because the majority of Helitrons are non-autonomous elements. A similarity-based search will not identify any new families and will thus work poorly in newly studied genomes. A repeat-based search requires extensive manual curation to identify Helitron families, an overwhelming task in large genomes with substantial DNA repetition. On the basis of the overall sensitivity and specificity, the structure-based approach to identify Helitron elements is quite successful and especially useful to identify Helitron elements in a newly characterized genome. However, because at least 2 copies are needed to make an alignment, single copy Helitrons will be missed.

Vertical inheritance and horizontal transfer
Inheritance: Genome-wide analyses showed that the bulk of Helitrons tend to be quite recent. The young age of Helitron families is of course biased by the genomes that have been examined carefully, which are predominately plant and insect where the unconstrained DNA half-life (the average amount of time when half of DNA not conserved for function is lost) is quite rapid. In contrast to other DNA transposons, Helitrons from some species have been reported to exhibit long-term activity probably due to the mechanism of transposition or inability of the host to recognize Helitrons because of either sequence heterogeneity or host gene capture. In contrast, to the relatively faster unconstrained DNA half-life (2.5–14 my) of the plant and insect genomes, the mammalian DNA half-life is estimated to be much slower (884 my) which along with the minimal requirements of Helitron transposition and the slow rate of decay in mammals have caused this pattern of vertical persistence.

Horizontal Transfer: The impact of horizontal transfer (HT) of transposable elements may be significant due to their mutagenic potential, inherent mobility, and abundance. Researchers found evidence for the repeated HT of four different families of Helitrons in an unprecedented array of organisms, including mammals, reptiles, fish, invertebrates, and insect viruses. The Helitrons present in these species have a patchy distribution and are closely related (80–98% sequence identity), despite the deep divergence times among hosts. In contrast to genes, Helitrons that have horizontally transferred into new host genomes can amplify, in some cases reaching up to several hundred copies and representing a substantial fraction of the genome. As Helitrons are known to frequently capture and amplify gene fragments, HT of this unique group of DNA transposons could lead to horizontal gene transfer and incur dramatic shifts in the trajectory of genome evolution.

Evolutionary implication
Two different scenarios describe the most likely fate of a host gene captured by Helitrons: 1. The captured gene would be destroyed by multiple mutations if it did not provide any selective advantage to the transposons. 2. It would be kept as a gene related to the original host gene if its capture is beneficial for the transposon, which is tolerated by the host. Helitrons, as most of other mobile elements in the A. thaliana and C. elegans genomes are present in the genomes in multiple highly diverged families. Considering the young age of these families and the extent of protein conservation, it is highly unlikely that the divergence observed is resulted from mutations accumulated by the transposons integrated in the host genome, proving that Helitrons work as a powerful tool of evolution. They have recruited host genes, modified them to an extent that is unreachable by the Mendelian process, and multiplied them in the host genomes.

Future
Although it is generally accepted that Helitrons are RC transposons and through numerous investigations, the role of Helitron transposition in gene duplication and shaping the genetic architecture has been proven, but neither the various mechanisms by which this occurs nor the frequency is well understood. At this point, it is even unclear whether the 3' terminus in a Helitron transposon initiates or terminates the Helitron replicative transposition. An important step towards investigating this mechanism would be the isolation of autonomous Helitrons active in vitro and in vivo. This can be done by computational identification of complete young Helitrons. In a near future, detailed computer-assisted sequence studies allow investigators to understand the evolutionary history of Helitrons, together with their mechanism of gene capture and their overall significance for gene evolution.