LTR retrotransposon

LTR retrotransposons are class I transposable elements (TEs) characterized by the presence of long terminal repeats (LTRs) directly flanking an internal coding region. As retrotransposons, they mobilize through reverse transcription of their mRNA and integration of the newly created cDNA into another genomic location. Their mechanism of retrotransposition is shared with retroviruses, with the difference that the rate of horizontal transfer in LTR-retrotransposons is much lower than the vertical transfer by passing active TE insertions to the progeny. LTR retrotransposons that form virus-like particles are classified under Ortervirales.

Their size ranges from a few hundred base pairs to 30 kb, the largest species reported to date are members of the Burro retrotransposon family in Schmidtea mediterranea.

In plant genomes, LTR retrotransposons are the major repetitive sequence class constituting more than 75% of the maize genome. LTR retrotransposons make up about 8% of the human genome and approximately 10% of the mouse genome.

Structure and propagation
LTR retrotransposons have direct long terminal repeats that range from ~100 bp to over 5 kb in size. LTR retrotransposons are further sub-classified into the Ty1-copia-like (Pseudoviridae), Ty3-like (Metaviridae, formally referred to as Gypsy-like, a name that is being considered for retirement ), and BEL-Pao-like (Belpaoviridae) groups based on both their degree of sequence similarity and the order of encoded gene products. Ty1-copia and Ty3-Metaviridae groups of retrotransposons are commonly found in high copy number (up to a few million copies per haploid nucleus) in animals, fungi, protista, and plants genomes. BEL-Pao like elements have so far only been found in animals.

All functional LTR-retrotransposons encode a minimum of two genes, gag and pol, that are sufficient for their replication. Gag encodes a polyprotein with a capsid and a nucleocapsid domain. Gag proteins form virus-like particles in the cytoplasm inside which reverse-transcription occurs. The Pol gene produces three proteins: a protease (PR), a reverse transcriptase endowed with an RT (reverse-transcriptase) and an RNAse H domains, and an integrase (IN).

Typically, LTR-retrotransposon mRNAs are produced by the host RNA pol II acting on a promoter located in their 5’ LTR. The Gag and Pol genes are encoded in the same mRNA. Depending on the host species, two different strategies can be used to express the two polyproteins: a fusion into a single open reading frame (ORF) that is then cleaved or the introduction of a frameshift between the two ORFs. Occasional ribosomal frameshifting allows the production of both proteins, while ensuring that much more Gag protein is produced to form virus-like particles.

Reverse transcription usually initiates at a short sequence located immediately downstream of the 5’-LTR and termed the primer binding site (PBS). Specific host tRNAs bind to the PBS and act as primers for reverse-transcription, which occurs in a complex and multi-step process, ultimately producing a double- stranded cDNA molecule. The cDNA is finally integrated into a new location, creating short TSDs (Target Site Duplications) and adding a new copy in the host genome

Ty1-copia retrotransposons
Ty1-copia retrotransposons are abundant in species ranging from single-cell algae to bryophytes, gymnosperms, and angiosperms. They encode four protein domains in the following order: protease, integrase, reverse transcriptase, and ribonuclease H.

At least two classification systems exist for the subdivision of Ty1-copia retrotransposons into five lineages: Sireviruses/Maximus, Oryco/Ivana, Retrofit/Ale, TORK (subdivided in Angela/Sto, TAR/Fourf, GMR/Tork), and Bianca.

Sireviruses/Maximus retrotransposons contain an additional putative envelope gene. This lineage is named for the founder element SIRE1 in the Glycine max genome, and was later described in many species such as Zea mays, Arabidopsis thaliana, Beta vulgaris, and Pinus pinaster. Plant Sireviruses of many sequenced plant genomes are summarized at the MASIVEdb Sirevirus database.

Ty3-retrotransposons (formally gypsy)
Ty3-retrotransposons are widely distributed in the plant kingdom, including both gymnosperms and angiosperms. They encode at least four protein domains in the order: protease, reverse transcriptase, ribonuclease H, and integrase. Based on structure, presence/absence of specific protein domains, and conserved protein sequence motifs, they can be subdivided into several lineages:

Errantiviruses contain an additional defective envelope ORF with similarities to the retroviral envelope gene. First described as Athila-elements in Arabidopsis thaliana, they have been later identified in many species, such as Glycine max and Beta vulgaris.

Chromoviruses contain an additional chromodomain ( chr omatin o rganization mo difier domain) at the C-terminus of their integrase protein. They are widespread in plants and fungi, probably retaining protein domains during evolution of these two kingdoms. It is thought that the chromodomain directs retrotransposon integration to specific target sites. According to sequence and structure of the chromodomain, chromoviruses are subdivided into the four clades CRM, Tekay, Reina and Galadriel. Chromoviruses from each clade show distinctive integration patterns, e.g. into centromeres or into the rRNA genes.

Ogre-elements are gigantic Ty3-retrotransposons reaching lengths up to 25 kb. Ogre elements have been first described in Pisum sativum.

Metaviruses describe conventional Ty3-gypsy retrotransposons that do not contain additional domains or ORFs.

The Sushi family of Ty3 long terminal repeat retrotransposons were first identified in teleost fish and Sushi-like neogenes were subsequently identified in mammals. Mammalian retrotransposon-derived transcripts (MARTs) cannot transpose but have retained open reading frames, demonstrate high levels of evolutionary conservation and are subject to selective pressures, which suggests some have become neofunctionalized genes with new cellular functions. Retrotransposon gag-like-3 (RTL3/ZCCHC5/MART3) is one of eleven Sushi-like neogenes identified in the human genome.

BEL/pao family
The BEL/pao family is found in animals.

Endogenous retroviruses (ERV)
Although retroviruses are often classified separately, they share many features with LTR retrotransposons. A major difference with Ty1-copia and Ty3-gypsy retrotransposons is that retroviruses have an envelope protein (ENV). A retrovirus can be transformed into an LTR retrotransposon through inactivation or deletion of the domains that enable extracellular mobility. If such a retrovirus infects and subsequently inserts itself in the genome in germ line cells, it may become transmitted vertically and become an Endogenous Retrovirus.

Terminal repeat retrotransposons in miniature (TRIMs)
Some LTR retrotransposons lack all of their coding domains. Due to their short size, they are referred to as terminal repeat retrotransposons in miniature (TRIMs). Nevertheless, TRIMs can be able to retrotranspose, as they may rely on the coding domains of autonomous Ty1-copia or Ty3-gypsy retrotransposons. Among the TRIMs, the Cassandra family plays an exceptional role, as the family is unusually wide-spread among higher plants. In contrast to all other characterized TRIMs, Cassandra elements harbor a 5S rRNA promoter in their LTR sequence. Due to their short overall length and the relatively high contribution of the flanking LTRs, TRIMs are prone to re-arrangements by recombination.