EamA

EamA (named after the O-acetyl-serine/cysteine export gene in E. coli) is a protein domain found in a wide range of proteins including the Erwinia chrysanthemi PecM protein, which is involved in pectinase, cellulase and blue pigment regulation, the Salmonella typhimurium PagO protein (function unknown), and some members of the solute carrier family group 35 (SLC35) nucleoside-sugar transporters. Many members of this family have no known function and are predicted to be integral membrane proteins and many of the proteins contain two copies of the domain.

Domain
EamA was previously called DUF6 (domain unknown function) 6, and was one of the first DUF families to appear in Pfam. Maximum likelihood phylogenetic analysis indicates that this family contains four stable sub-families with high bootstrap values: SLC35C/E, SLC35F, SLC35G (acyl-malonyl condensing enzyme-like AMAC), and purine permeases.

The EamA HMM domain organization shows the two domain structure of EamA. However, the entries for UAA, Nuc_sug_transp, and DUF914, which may likely have derived from EamA, the HMM covers the duplicated structure as a single HMM.

Function
AMAC (acyl-malonyl condensing enzyme) is an interchangeable, but more general biochemical term than FAE 3-ketoacyl-CoA synthase 1, which would refer only to synthase #1. However, the transmembrane structure indicates that AMACs are transporters, not enzymes. Hence TMEM20, TMEM22, AMAC1 and AMAC-like (AMAC1L1, AMAC1L2, AMAC1L3) sequences have been renamed to SLC35Gs in RefSeq for Human and Mouse (SLC35G1 – 6). Furthermore, EamA is the only drug/metabolite transporter family to cross the prokaryote/eukaryote border, even though none of the original families crossed this border. The highly diverse EamA Pfam family has been created by iterative expansion of the original dataset.

Evolution
The likely evolutionary order of human 5 + 5 TM nucleotide sugar transporters is identified. It was done by training HMMs on each halve of these proteins: EamA, TPT, DUF914, UAA, and NST. The first method was multidimensional scaling in IBM SPSS, where a matrix of pairwise similarity measures from HMM-HMM comparisons was used as input. The output was a graph, showing a clear bipartitioning between DMT-1 and DMT-2 domains, where EamA-1 and EamA-2 were clearly in the middle. This result could be interpreted that EamA duplicated, and that the other families represent “diverged” copies from EamA.

The distance (100-p) between domain halves was measured, and the families were sorted by the following distances: EamA (smallest distance between domain halves), TPT, DUF914, UAA, and NST (highest distance between domain halves). What was perhaps surprising was that this order also replicated the distance to EamA, so that NST had the highest “distance” to EamA, UAA the second highest, and so on. The possibility that EamA (previously DUF6) may be an “artifact”, that has formed a "multipotent" HMM through iterative expansion of a diverse seed data, should be considered.

During DNA replication of circular bacterial genomes, multiple proteins are involved in synthesizing the leading strand, and the Okazaki fragments on the lagging strand. If a sequence contains an inverted repeat (a palindrome) longer than 10 bp, and a spacer/insert of less than 75-150 basepairs, the sequence could be accessible to SbcCD, a protein which inhibits the propagation of replicons containing long palindromic DNA sequences. Watson-Crick basepairing of the palindrome, and a break in the sequence may occur, creating an opportunity for priming DNA synthesis in the opposite direction. This may be followed by spontaneous strand switching and continuation of normal replication. This phenomenon is referred to as Tandem Inversion Duplication (TID). Then there may have been degradation of the third (inverted) copy which would be in the middle. Strand slippage deletion (illegitimate recombination) may be responsible. The presence of two palindromes in the regional duplication may increase the probability of degradation.

A concrete bioinformatic example could be a DUF606 protein, known to exist in both paired and fused copies in bacterial genomes, where a DUF606 protein (Accession: ACL39356.1) from Arthrobacter chlorophenolicus A6, has a 5+5 TM structure and matches 2 x DUF606 HMM in Pfam, and thus appears to be duplicated. When the genomic sequence (1530600 – 1531700) of the protein from Arthrobacter is obtained, it is found that it contains a palindrome ( and ) in the middle of the domain halves, although it may be too short and have too long a spacer to be able to initiate a new TID.