Αr45 RNA

αr45 is a family of bacterial small non-coding RNAs with representatives in a broad group of α-proteobacteria from the order Hyphomicrobiales. The first member of this family (Smr45C) was found in a Sinorhizobium meliloti 1021 locus located in the chromosome (C). Further homology and structure conservation analysis identified homologs in several nitrogen-fixing symbiotic rhizobia (i.e. S. medicae, S. fredii, Rhizobium leguminosarum bv.viciae, R. leguminosarum bv. trifolii, R. etli, and several Mesorhizobium species),  in the plant pathogens belonging to Agrobacterium species  (i.e. A. tumefaciens, A. vitis, A. radiobacter, and Agrobacterium H13) as well as in a broad spectrum of Brucella species (B. ovis, B. canis, B. abortus and  B. microtis, and several biovars of B. melitensis), in Bartonella species (i.e. B. henselae, B. clarridgeiae, B. tribocorum, B. quintana, B. bacilliformis, B. grahamii), in several members of the Xanthobactereacea family (i.e. Azorhizobium caulinodans, Starkey novella, Xhantobacter autotrophicus), and in some representatives of the Beijerinckiaceae family (i.e. Methylocella silvestris, Beijerinckia indica subsp. indica). αr45C RNA species are 147-153 nt long (Table 1) and share a well defined common secondary structure (Figure 1). All of the αr45 transcripts can be catalogued as trans-acting sRNAs expressed from well-defined promoter regions of independent transcription units within intergenic regions (IGRs) of the α-proteobacterial genomes (Figure 5).

Discovery and Structure
Smr45C sRNA was described by del Val et al., as a result of a computational comparative genomic approach in the intergenic regions (IGRs) of the reference S. meliloti 1021 strain (http://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi). Northern hybridization experiments confirmed that the predicted smr45C locus did express a single transcript of 130-179 nt length, which accumulated differentially in free-living and endosymbiotic bacteria. TAP-based 5’-RACE experiments mapped the transcription start site (TSS) of the full-length Smr45C transcript to the 3,105,445 nt position in the S. meliloti 1021 genome (http://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi) whereas the 3’-end was initially assumed to be located at the 3,105,265 nt position matching the last residue of a short stretch of Us (Figure 2) of a putative, but low-rated, Rho-independent terminator. Recent deep sequencing-based characterization of the small RNA fraction (50-350 nt) of S. meliloti 2011 further confirmed the expression of Smr45C (here referred to as SmelC706), and mapped the full-length transcript to the same 5’ end and to the 3' end position 3,105,298.



The nucleotide sequence of Smr45C was initially used as query to search against the Rfam database (version 10.0; http://www.sanger.ac.uk/Software/Rfam). This homology search rendered no matches to known bacterial sRNA in this database. Smr45C was next BLASTed with default parameters against all the currently available bacterial genomes (1,615 sequences at 20 April 2011; https://www.ncbi.nlm.nih.gov). The regions exhibiting significant homology to the query sequence (78-89% similarity) were extracted to create a Covariance Model (CM) from a seed alignment using Infernal (version1.0) (Figure 2). This CM was used in a further search for new members of the αr45 family in the existing bacterial genomic databases.

The results were manually inspected to deduce a consensus secondary structure for the family (Figure 1 and Figure 2). The consensus structure was also independently predicted with the program locARNATE with very similar predictions. The manual inspection of the sequences found with the CM using Infernal allowed finding 39 closer homolog sequences, all of them present as single chromosomal copies in the α-proteobacterial genomes. The rhizobial species encoding these homologs to Smr45C were: S. medicae and S. fredii, two R. leguminosarum trifolii strains (WSM304 and WSM35), two R. etli strains CFN 42 and CIAT 652, the reference R. leguminosarum bv. viciae 3841 strain, and the Agrobacterium species A. vitis,A. tumefaciens, A. radiobacter and A. H13, Brucella species (B. ovis, B. canis, B. abortus, B. microtis, and several biobars of B. melitensis), Brucella anthropi, the Mesorhizobium species loti, M. ciceri and M. BNC., Bartonella species (i.e. B. henselae, B. clarridgeiae, B. tribocorum, B. quintana, B. bacilliformis, B. grahamii). All these sequences showed significant Infernal E-values (8.93E-40 – 6.12E-36) and bit-scores. The rest of the sequences found with the model showed high E-values between (3.28E-06 and 4.56E-04) but lower bit-scores and are encoded by several members of the Xanthobactereacea family (i.e. A. caulinodans, Sa. novella, X. autotrophicus), ''Me. silvestris, and Be. indica'' subsp. indica.





Expression information
The expression of Smr45C in S. meliloti 1021 was assessed under different biological conditions; i.e. bacterial growth in TY, minimal medium (MM) and luteolin-MM broth and endosymbiotic bacteria (i.e. mature symbiotic alfalfa nodules). The expression of Smr45C in free-living bacteria was found to be growth-dependent, being the gene strongly down-regulated when bacteria entered the stationary phase. However, luteolin moderately stimulated the expression of Smr45C (2-fold) but the gene was not detectable in endosymbiotic bacteria.

Recent co-inmuno precipitation experiments corroborate that Smr45C, does bind the bacterial protein Hfq for efficient target binding.

Promoter Analysis
All the promoter regions of the αr45 family members examined so far are very conserved in a sequence stretch extending up to 80 bp upstream of the transcription start site of the sRNA. All closest homolog loci have recognizable σ70-dependent promoters showing a -35/-10 consensus motif CTTAGAC-n17-CTATAT, which has been previously shown to be widely conserved among several other genera in the α-subgroup of proteobacteria. To identify binding sites for other known transcription factors we used the fasta sequences provided by RegPredict (http://regpredict.lbl.gov/regpredict/help.html), and used those position weight matrices (PSWM) provided by RegulonDB (http://regulondb.ccg.unam.mx). We built PSWM for each transcription factor from the RegPredict sequences using the Consensus/Patser program, choosing the best final matrix for motif lengths between 14 and 30 bps a threshold average E-value < 10E-10 for each matrix was established, (see "Thresholded consensus" in http://gps-tools2.its.yale.edu). Moreover, we searched for conserved unknown motifs using MEME (http://meme.sdsc.edu/meme4_6_1/intro.html) and used relaxed regular expressions (i.e. pattern matching) over all Smr45C homologs promoters.

This study predicts differences in the regulation of the expression of the αr45 representatives in the different α-proteobacterial species. The Sinorhizobium, Rhizobium, and Agrobacterium groups presented a very well conserved motif that matches the consensus sequence recognized by the maltose repressor Mall. Furthermore, the promoters of the αr45 members of the. Furthermore, the promoters of the αr45 members of the Sinorhizobium group presented an additional conserved region between positions -60 and -85 (boxed in orange in Figure 4), with significant similarity to the matrix SMb21598_Rhizobiales from Reg_Predict. This binding site corresponds to a transcriptional regulator of the LacI family. The Rhizobium group, presented also a well conserved motif in this region for which no significant similarity could be found (marked in green in Figure 4). This analysis also revealed an extended conserved sequence stretch among the promoters of the Brucella and Bartonella αr45 sRNA loci, but no known transcription factor binding sites were recognizable in these motifs.



Genomic Context
All members of the αr45 family are trans-encoded sRNAs transcribed from independent promoters in chromosomal IGRs. Most of the neighboring genes of the seed alignment's members were not annotated and thus were further manually curated. The genomic regions of the αr45 sRNAs from Sinorhizobium, Rhizobium, A. vitis and A. radiobacter exhibited a great degree of conservation including the upstream and downstream genes which have been predicted to code for a LysR family transcriptional regulator and an ornithine descarboxilase, respectively (solo como aclaración, upstream and downstream se refieren al sentido de transcripción). Partial synteny of the αr45 genomic regions was observed in the Mesorhizobium and Brucella species where instead of aLysR family transcriptional regulator' gene an amidase coding gene was found. In Bartonella species the αr45 upstream gene was always found to code for a protein containing a rhodanase domain. In the genomic regions of the αr45 homologs in more distantly related α-proteobacteria (e.g. Starkeya, Metthylocella or Xanthobacter species) synteny was restricted to the downstream ornithine descarboxylase gene.