E. Coli Metabolome Database

The E. coli Metabolome Database (ECMDB) is a freely accessible, online database of small molecule metabolites found in or produced by Escherichia coli (E. coli strain K12, MG1655). Escherichia coli is perhaps the best studied bacterium on earth and has served as the "model microbe" in microbiology research for more than 60 years. The ECMDB is essentially an E. coli "omics" encyclopedia containing detailed data on the genome, proteome and metabolome of E. coli. ECMDB is part of a suite of organism-specific metabolomics databases that includes DrugBank, HMDB, YMDB and SMPDB. As a metabolomics resource, the ECMDB is designed to facilitate research in the area gut/microbiome metabolomics and environmental metabolomics. The ECMDB contains two kinds of data: 1) chemical data and 2) molecular biology and/or biochemical data. The chemical data includes more than 2700 metabolite structures with detailed metabolite descriptions along with nearly 5000 NMR, GC-MS and LC-MS spectra corresponding to these metabolites. The biochemical data includes nearly 1600 protein (and DNA) sequences and more than 3100 biochemical reactions that are linked to these metabolite entries. Each metabolite entry in the ECMDB contains more than 80 data fields with approximately 65% of the information being devoted to chemical data and the other 35% of the information devoted to enzymatic or biochemical data. Many data fields are hyperlinked to other databases (KEGG, PubChem, MetaCyc, ChEBI, PDB, UniProt, and GenBank). The ECMDB also has a variety of structure and pathway viewing applets. The ECMDB database offers a number of text, sequence, spectral, chemical structure and relational query searches. These are described in more detail below.

Accessing the database
The ECMDB's content may be explored or searched using a variety of database-specific tools. The text search box (located at the top of every ECMDB page) allows users to conduct a general text search of the database's textual data, including names, synonyms, numbers and identifiers. The ECMDB employs a software tool called "Elastic Search" that allows misspellings and fuzzy text matching. Using the text search, users may select either metabolites or proteins in the "search for" field using the pull-down box located on the right side of the text search box. In this way it is possible to restrict the search to only return results for those items associated with E. coli metabolites or with E. coli proteins. The ECMB has 7 selectable tabs located at the top of every page including: 1) Home; 2) Browse; 3) Search; 4) About; 5) Help; 6) Downloads and 7) Contact Us. The ECMDB's browser (accessed via the Browse tab) can be used to browse through the database and to re-sort its contents. Six different browse options are available: 1) Metabolite Browse (Fig. 1); 2) Protein Browse; 3) Reaction Browse (Fig. 2); 4) Pathway Browse (Fig. 3); 5) Class Browse; and 6) Concentration Browse. By selecting a specific Browse option the ECMDB's content can be displayed in a synoptic tabular format with the ECMDB identifiers, names and other data displayed in re-sortable tables. Clicking on an ECMDB MetaboCard or ProteinCard button will bring up the full data content for the corresponding metabolite (Fig. 4) or the corresponding protein. The ECMDB also offers a number of Search options listed Under the Search link. These include: 1) Chem Query; 2) Text Query; 3) Sequence Search; 4) Data Extractor; and 4 other MS or NMR spectral search tools. Chem Query option allows users to sketch or to type (via a SMILES string) a chemical compound and to search the ECMDB for metabolites similar or identical to the query compound. The Sequence Search can be used to perform BLAST (protein) sequence searches against all the protein sequences contained in ECMDB. Single and multiple sequence (i.e. whole proteome) BLAST queries are supported through this search tool.  It is also possible to perform detailed spectral searches of ECMDB's reference compound NMR and MS spectral data through the ECMDB's MS, MS/MS, GC/MS and NMR Spectra Search links. These tools are intended to support the identification and characterization of bacterial (mainly E. coli) metabolites using NMR spectroscopy, GC-MS spectrometry and LC-MS spectrometry. The ECMDB also contains a large number of statistical tables, with detailed information about not only its content but also about E. coli, in general. In particular, under the "About" tab, a section called "E. coli numbers and stats" contains hundreds of interesting factoids about E. coli and E. coli physiology. Many components of the ECMDB are fully downloadable, including most of textual data, chemical structures and sequence data. These may be retrieved by clicking on the Download button, scrolling through the different files and selecting the appropriate hyperlinks.

Scope and access
All data in ECMDB is non-proprietary or is derived from a non-proprietary source. It is freely accessible and available to anyone. In addition, nearly every data item is fully traceable and explicitly referenced to the original source. ECMDB data is available through a public web interface and downloads.