RNA-binding protein database

The RNA-binding Proteins Database (RBPDB) is a biological database of RNA-binding protein specificities that includes experimental observations of RNA-binding sites. The experimental results included are both in vitro and in vivo from primary literature. It includes four metazoan species, which are Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. RNA-binding domains included in this database are RNA recognition motif, K homology, CCCH zinc finger, and more domains. , the latest RBPDB release (v1.3, September 2012) includes 1,171 RNA-binding proteins.

Background Information about RNA Binding Protein
Transcription and translation processes are different in prokaryotes and eukaryotes. Unlike prokaryotes, these two processes occur separately in eukaryote's nucleus and cytoplasm. Because of this, eukaryotes apply a strategy called post-transcriptional modification which includes splicing, editing and polyadenylation to process the pre-mRNA. RNA-binding proteins ( RBPs ) play critical role during this process. All RBPs can bind to RNA depends on different specificities and affinities. RBPs contain at least one RNA-binding domains and usually they have multiple binding domains. RNA-binding domain (RBD, also known as RNP domain and RNA recognition motif, RRM), K-homology (KH) domain (type I and type II), RGG (Arg-Gly-Gly) box, Sm domain; DEAD/DEAH box, zinc finger (ZnF, mostly C-x8-X-x5-X-x3-H), double stranded RNA-binding domain (dsRBD), cold-shock domain; Pumilio/FBF (PUF or Pum-HD) domain, and the Piwi/Argonaute/Zwille (PAZ) domain have been well characterized.

RBPs are constructed by multiple binding domains. These domains contain a few basic modular units. Comparing with a single motif, RBPs can recognize a much longer stretch of nucleic acids with those multiple motifs. Meanwhile, RBPs bind to RNA by forming weak interactions. The weak interaction surface is largely increased by these motifs. As the result, RBPs can bind RNA with higher specificity and affinity than single domain. RNA-binding protein database has three main specific categories. They are RNA recognition motif (RRM), K-Homology domain (KH domain) and zinc fingers.

RNA-binding protein domains
In Lunde's article, their group has introduced different types of RNA-binding protein motif and their specific functions.

RNA recognition rotif (RRM)
RNA recognition rotif (RRM) contains about 80–90 amino acids that form four-stranded anti-parallel β-sheet with two helices (βαββαβ topology). The β-sheet plays critical role for RNA recognition. Usually, three conserved residues on the β-sheet are very important for this recognition process. Specifically, an Arg or Lys residue forms a salt bridge to the phosphodiester backbone and another two aromatic residues make stacking interactions with the nucleobases. Each of these four β-sheet recognize one nucleotides. However, with exposed loops and additional secondary structure, RRM can recognized up to 8 nucleotides.

K-homology domain (KH domain)
K-homology domain (KH domain) was the first identified in the human. It is from heterogeneous nuclear ribonucleoprotein (hnRNP) K. Therefore, binding domains that belong to this family are called K-Homology domain. It is a domain that binds to both ssDNA and ssRNA. Eukaryotes, eubacteria and archaea usually have this type of domains. The domain contains about 70 amino acids. The important signature sequence of this domain is (I/L/V)IGXXGXX(I/L/V). All KH domains contain three-stranded β-sheet and three α-helices. There are two subfamilies of this domain. Type I KH domain (βααββα topology) and type II KH domain (αββααβ topology). For both classes, the GXXG loop, the flanking helices, the β-strand and the variable loop between β2 and β3 (type I) or between α2 and β2 (type II) play a very important role in recognizing RNA.

Zinc fingers
Zinc fingers are the domains contain zinc coordinated residues. There are three main types of this domain which are Cys2His2 (CCHH), CCCH or CCHC. Generally, there are several repeats of this domain work together in a protein. When CCHH zinc finger binds to DNA, residues in its recognition α-helix forming hydrogen bonds to Watson–Crick base pairs in the major groove. When It binds to RNA, same residues used to recognize DNA may still be used to recognize RNA. The strategy used by zinc figure to distinguish these two type of nucleotides may contain distinct structural arrangement of this domain. CCCH and CCHC zinc fingers bind to an AU-rich RNA element. Different from CCHH zinc figure, the shape of the protein is the primary determinant of specificity.

Sequence preference of RNA-binding protein
In Ray and Kazan's paper, they address the question about sequence preference of RBPs. In their research, one single RBP is incubated with a vast molar excess of a complex pool of RNAs. The protein is recovered by affinity selection and associated RNAs are interrogated by microarray and computational analyses. Their results show that RNA-binding proteins have sequence preference and Identical or closely related RBPs will bind to specific similar RNA sequence.

Use
Right now, RNA-binding protein database (RBPDB) contains 1171 RNA-binding proteins from Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. Proteins can be searched by domain or species. Both ways will lead to the detail information list of proteins which includes gene symbol, annotation ID, synonyms, gene description, species, RNA-binding domain, number of experiment and homologs. The link on the number of experiments leads to the research articles related to the protein. Also, in this database users can search experiments related to specific RNA binding sequence. Furthermore, this site can help users predict the binding sites for a sequence.