User:Ake.vastermark/sandbox

EamA Pfam wikipedia entry

EamA (named after the O-acetyl-serine/cysteine export gene in E. coli) was long called DUF6 (domain unknown function) 6, and was one of the first DUF families to appear in Pfam (Bateman, Coggill, Finn 2010). Maximum likelihood phylogenetic analysis indicates that this family contains four stable sub-families with high bootstrap values: SLC35C/E, SLC35F, AMAC, and purine permeases (Vastermark et al. 2011).

AMAC (acyl-malonyl condensing enzyme) is an interchangeable, but more general biochemical term than FAE 3-ketoacyl-CoA synthase 1, which would refer only to synthase #1. However, the transmembrane structure indicates that AMACs are transporters, not enzymes. HGNC has informed that the TMEM20, TMEM22, AMAC1 and AMAC-like (AMAC1L1, AMAC1L2, AMAC1L3) sequences have been re-named to SLC35Gs in RefSeq for Human and Mouse (SLC35G1 – 6).

Furthermore, EamA is the only drug/metabolite transporter family to cross the prokaryote/eukaryote border, even though none of the original families crossed this border (Jack, Yang, Saier 2001). The highly diverse EamA Pfam family has been created by iterative expansion of the original dataset.

In a recent study (Vastermark et al. 2011), the likely evolutionary order of human 5 + 5 TM nucleotide sugar transporters is identified. It was done by training HMMs on each halve of these proteins: EamA, TPT, DUF914, UAA, and NST. The first method was multidimensional scaling in IBM SPSS, where a matrix of pairwise similarity measures from HMM-HMM comparisons was used as input. The output was a graph, showing a clear bipartitioning between DMT-1 and DMT-2 domains, where EamA-1 and EamA-2 were clearly in the middle. This result could be interpreted that EamA duplicated, and that the other families represent “diverged” copies from EamA.

A table was created in (Vastermark et al. 2011), where the distance (100-p) between domain halves was measured, and ordered the families after this principle: EamA (smallest distance between domain halves), TPT, DUF914, UAA, and NST (highest distance between domain halves). What was perhaps surprising was that this order also replicated the distance to EamA, so that NST had the highest “distance” to EamA, UAA the second highest, and so on. The possibility that EamA (previously DUF6) may be an “artifact”, that has formed a "multipotent" HMM through iterative expansion of a diverse seed data, should be considered.

During DNA replication of circular bacterial genomes, multiple proteins are involved in synthesizing the leading strand, and the Okazaki fragments on the lagging strand. If a sequence contains an inverted repeat (a palindrome) longer than 10 bp, and a spacer/insert of less than 75-150 basepairs, the sequence could be accessible to SbcCD (Leach, Lloyd, Coulson 1992), a protein which inhibits the propagation of replicons containing long palindromic DNA sequences. Watson-Crick basepairing of the palindrome, and a break in the sequence may occur, creating an opportunity for priming DNA synthesis in the opposite direction. This may be followed by spontaneous strand switching and continuation of normal replication. This phenomenon is referred to as Tandem Inversion Duplication (TID) (Kugelberg et al. 2010). Then there may have been degradation of the third (inverted) copy which would be in the middle. Strand slippage deletion (illegitimate recombination) may be responsible. The presence of two palindromes in the regional duplication may increase the probability of degradation.

A concrete bioinformatic example could be a DUF606 protein, known to exist in both paired and fused copies in bacterial genomes (Lolkema, Dobrowolski, Slotboom 2008), where a DUF606 protein (Accession: ACL39356.1) from Arthrobacter chlorophenolicus A6, has a 5+5 TM structure and matches 2 x DUF606 HMM in Pfam, and thus appears to be duplicated. Interestingly, when the genomic sequence (1530600 – 1531700) of the protein from Arthrobacter is obtained, it is found that it contains a palindrome (cgtggcggcg and gcaccgccgc) in the middle of the domain halves, although it may be too short and have too long a spacer to be able to initiate a new TID.

The EamA HMM domain organization shows the two domain structure of EamA. However, the entries for UAA, Nuc_sug_transp, and DUF914, which may likely have derived from EamA, the HMM covers the duplicated structure as a single HMM.

References

Bateman, A, P Coggill, RD Finn. 2010. DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun 66:1148-1152. Jack, DL, NM Yang, MH Saier, Jr. 2001. The drug/metabolite transporter superfamily. Eur J Biochem 268:3620-3639. Kugelberg, E, E Kofoid, DI Andersson, Y Lu, J Mellor, FP Roth, JR Roth. 2010. The tandem inversion duplication in Salmonella enterica: selection drives unstable precursors to final mutation types. Genetics 185:65-80. Leach, DR, RG Lloyd, AF Coulson. 1992. The SbcCD protein of Escherichia coli is related to two putative nucleases in the UvrA superfamily of nucleotide-binding proteins. Genetica 87:95-100. Lolkema, JS, A Dobrowolski, DJ Slotboom. 2008. Evolution of antiparallel two-domain membrane proteins: tracing multiple gene duplication events in the DUF606 family. J Mol Biol 378:596-606. Vastermark, A, MS Almen, MW Simmen, R Fredriksson, HB Schioth. 2011. Functional specialization in nucleotide sugar transporters occurred through differentiation of the gene cluster EamA (DUF6) before the radiation of Viridiplantae. BMC Evol Biol 11:123.