Rat Genome Database

The Rat Genome Database (RGD) is a database of rat genomics, genetics, physiology and functional data, as well as data for comparative genomics between rat, human and mouse. RGD is responsible for attaching biological information to the rat genome via structured vocabulary, or ontology, annotations assigned to genes and quantitative trait loci (QTL), and for consolidating rat strain data and making it available to the research community. They are also developing a suite of tools for mining and analyzing genomic, physiologic and functional data for the rat, and comparative data for rat, mouse, human, and five other species.

RGD began as a collaborative effort between research institutions involved in rat genetic and genomic research. Its goal, as stated in the National Institutes of Health’s Request for Grant Application: HL-99-013, is the establishment of a Rat Genome Database to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make this data widely available to the scientific community. A secondary, but critical goal is to provide curation of mapped positions for quantitative trait loci, known mutations and other phenotypic data.

The rat continues to be extensively used by researchers as a model organism for investigating pharmacology, toxicology, general physiology and the biology and pathophysiology of disease. In recent years, there has been a rapid increase in rat genetic and genomic data. In addition to this, the Rat Genome Database has become a central point for information on the rat for research and now features information on not just genetics and genomics, but physiology and molecular biology as well. There are tools and data pages available for all of these fields that are curated by RGD staff.

Data
RGD’s data consists of manual annotations from RGD researchers as well as imported annotations from a variety of different sources. RGD also exports their own annotations to share with others.

RGD's Data page lists eight types of data stored in the database: Genes, QTLs, Markers, Maps, Strains, Ontologies, Sequences and References. Of these, six are actively used and regularly updated. The RGD Maps datatype refers to legacy genetic and radiation hybrid maps. This data has been largely supplanted by the rat whole genome sequence. The Sequences data type is not a full list of either genomic, transcript or protein sequences, but rather mostly contains PCR primer sequences which define simple sequence length polymorphism (SSLP) and expressed sequence tag (EST) Markers. Such sequences are useful primarily for researchers still using these markers for genotyping their animals and for distinguishing between markers of the same name. The six major data types in RGD are as follows:


 * Genes: Initial gene records are imported and updated from the National Center for Biotechnology Information's (NCBI's) Gene database on a weekly basis. Data imported during this process includes the Gene ID, Genbank/RefSeq nucleotide and protein sequence identifiers, HomoloGene group IDs and Ensembl Gene, Transcript and Protein IDs. Additional protein-related data is imported from the UniProtKB database. RGD curators review the literature and manually curate Gene Ontology (GO), diseases, phenotypes and pathways for rat genes, diseases and pathways for mouse genes, and diseases, phenotypes and pathways for human genes.  In addition, the site imports GO annotations for mouse and human genes from the GO Consortium, rat electronic annotations from UniProt and mouse phenotype annotations from the Mouse Genome Database/Mouse Genome Informatics (MGD/MGI).
 * QTLs: RGD's staff manually curates data for rat and human QTLs from the literature where such publications exist or from records directly submitted by researchers.  Mouse QTL records, including Mammalian Phenotype (MP) ontology assignments, are imported directly from MGI. For rat and human QTLs, curation includes assigning MP, HP, and disease ontology annotations.  QTL positions are automatically assigned based on the genomic positions of peak and/or flanking markers or single nucleotide polymorphisms (SNPs). QTL records link to information about related strains, candidate genes, associated markers and related QTLs.
 * Strains: Like QTL records, RGD strain records are either manually curated from the literature or submitted by researchers. Strain records include information about the official symbol of the strain, origin and availability of the strain, associated phenotypes, whether the strain is a model for a human disease, and any information that is available about breeding, behavior, husbandry, etc. Strain records link to information about related genes, alleles, and QTLs, associated strains (e.g. parental strains or substrains) and, where available, strain-specific damaging nucleotide variants. For congenic and mutant strains, genomic positions are assigned for the introgressed region (congenic strains) or the location of the mutated sequence (mutant strains).
 * Markers: Because genetic markers such as SSLPs and ESTs have been, and continue to be, used for QTLs and strains, RGD stores marker data for rat, human and mouse. Marker data includes the sequences of the associated forward and reverse PCR primers, genomic positions and links to NCBI's Probe database.  Marker records link to associated QTL, strain and gene records.
 * Cell lines:  RGD stores cell line records based on imports from Cellosaurus. Although the largest numbers of these are human and mouse cell lines, records are also available for rat, bonobo, dog, squirrel, pig, green monkey and naked mole-rat.
 * Ontologies: In order to make RGD's data both human readable and available for computational analysis and retrieval, RGD relies on the use of multiple ontologies.  As of July 2021, RGD used 19 different ontologies to express the various types of data applicable to RGD's diverse datatypes. Ontology annotations are assigned manually by curators  or are imported from external sources through the use of automated pipelines. Six of the ontologies in use at RGD were created or co-created at RGD and seven are under development by RGD staff members or collaborators, these being ontologies for Pathway (PW), Rat Strains (RS), Vertebrate Traits (VT), Disease (RDO), Clinical Measurements (CMO), Measurement Methods (MMO) and Experimental Conditions (XCO). Ontologies which are imported from outside sources are updated weekly.
 * References: RGD references are scientific publications and resources that have been used for curation of information into the database, and are sources for data objects such as QTLs and strains. For references accessed via NCBI's PubMed, imported data includes the title, authors, citation and PubMed ID, and an RGD ID is generated. In some cases, references are generated as internal records, such as bulk uploads from automated pipelines or personal communications with data sources. These additional references give RGD users an identification of the source of particular pieces and types of data for which PubMed records are not available. Both types of reference records provide links to all of the data curated from that article or source, including genes, QTLs, strains, disease and other ontology annotations. The resources curated for information can be retrieved from the database using the reference search page or links on an object page. Uncurated references are also available, which are known to contain relevant data but have not yet been manually reviewed. These are found as PubMed links listed in the ‘References – uncurated’ section of an object report (e.g. a gene report).

Genome tools
RGD's Genome tools include both software tools developed at RGD and tools from third party sources.

Genome tools developed at RGD
RGD develops web-based tools designed to use the data stored in the RGD database for analyses in rat and across species. These include:
 * OntoMate: OntoMate is an ontology-driven, concept-based literature search engine that has been developed by RGD as an alternative for the basic PubMed search engine in the gene curation workflow. Converting data from free text in the scientific literature to a structured searchable format is one of the main tasks of all model organism databases. OntoMate tags abstracts with gene names, gene mutations, organism names, disease, and other terms from the ontologies/vocabularies used at RGD. All terms/ entities tagged to an abstract are listed with the abstract in the search results. OntoMate also provides user-activated filters for species, date and other parameters relevant to the literature search, which has streamlined the process compared to using PubMed. Besides its usefulness for RGD internal curation processes, the tool is available to all RGD users.
 * Gene Annotator: The Gene Annotator or GA tool takes as input a list of gene symbols, RGD IDs, GenBank accession numbers, Ensembl identifiers, or a chromosomal region and retrieves gene orthologs, external database identifiers and ontology annotations for the corresponding genes in RGD. The data can be downloaded into an Excel spreadsheet or analyzed in the tool. The Annotation Distribution function displays a list of terms in each of seven categories with the percentage of genes from the input list with annotations to each term. The Comparison Heat Map function allows comparisons of annotations for genes in the input list across two ontologies or across two branches of the same ontology.
 * Variant Visualizer: Variant Visualizer (VV) is a viewing and analysis tool for rat strain-specific sequence polymorphisms. VV takes as input a list of gene symbols or a genomic region as defined by chromosome, start and stop positions or by two gene or marker symbols. The user must also select their strains of interest from a list of strains for which whole genome sequences exist and can set parameters for the variants in the result set. Output is a heatmap-type display of variants. Additional information for individual variants can be viewed in a detail pane display.
 * Multi-Ontology Enrichment Tool (MOET): MOET is a web-based ontology analysis tool used to identify terms from any or all of the ontologies used by RGD for gene curation (Disease, Pathway, Phenotype, GO, ChEBI) that are over-represented in the annotations for those genes, or for orthologs in other species.It outputs a downloadable graph and a list of statistically overrepresented terms in the user’s list of genes using hypergeometric distribution. MOET also displays the corresponding Bonferroni correction and odds ratio on the results page.
 * Gene Ortholog Location Finder (GOLF): GOLF is used to compare genes or positions within regions of interest across RGD species or assemblies. Results are displayed with the corresponding genes/positions in both species or on both assemblies in a side by side tabular view. Inputs and outputs to GOLF can be exported to other RGD tools for analysis or downloaded using the links on the GOLF results page.
 * InterViewer: InterViewer is a protein-protein interactive viewer that displays the appropriate information about types of interactions and links to associated genes pertaining to the user’s input.
 * PhenoMiner: PhenoMiner combines phenotypic data from different rat strains, so researchers can use filters to find the quantitative phenotypic data they are looking for.
 * OLGA - Object List Generator & Analyzer: OLGA is a search engine designed to allow users to run multiple queries, generate a list of objects from each query and flexibly combine the results using Boolean specifications. OLGA takes as input either a list of object symbols or search parameters based on ontology annotations or position. The final list of genes, QTLs or strains can be downloaded or submitted to the GA Tool, the Variant Visualizer, the Genome Viewer or other RGD tools.
 * Genome Viewer: The Genome Viewer (GViewer) tool provides users with complete genome views of genes, QTLs and mapped strains annotated to a function, biological process, cellular component, phenotype, disease, pathway, or chemical interaction. GViewer allows Boolean searches across multiple ontologies. Output is displayed against a karyotype of the rat genome.
 * Overgo Probe Designer: Overgo probes are pairs of partially overlapping 22mer oligonucleotides derived from repeat-masked genomic sequence and used as high specific activity probes for genome mapping. The Overgo Probe Designer tool takes as input a nucleotide sequence and outputs a list of optimized probe sequences containing the requisite 8 nucleotide overlap on their 3' ends.
 * ACP Haplotyper: The ACP Haplotyper creates a visual haplotype that can be used to identify conserved and non-conserved chromosomal regions between any of the 48 rat strains characterized as part of the ACP project. For the selected chromosome and between the selected strains, the tool compares the allele size data for microsatellite markers on the selected genetic or RH map.

Third party genome tools adapted for use with RGD data
RGD offers several third party software tools that have been adapted for use on the website utilizing data stored in the RGD database. These include:
 * JBrowse: JBrowse is a free, interactive, and database-specific data analysis tool.  The software was created and is currently maintained by the Generic Model Organism Database project, Genetic and phenotypic data types, including fundamental datasets and gene-chemical interaction data, and their relationship to the genomic sequence can be accessed through JBrowse.
 * RatMine: RatMine is a rat-centric version of the InterMine software.  It enables users to mine and analyze rat data from diverse databases including RGD, NCBI, UniProtKB and Ensembl in a single location using a consistent format. The InterMine platform has been adapted for multiple species in other databases and is designed to be interoperable between instances so that users can query across species from the RatMine interface.

Phenotypes and Models Portal
RGD's Phenotypes and Models portal focuses on strains, phenotypes and the rat as a model organism for physiology and disease.
 * Genetic Models: It is the place where all the genomically modified rats (mutant strains and transgenic strains) are listed in a table format for quick access by affected genes, background strains and other available information. This section also contains GERRC strains where genome modified rats were created through the Gene Editing Rat Resource Center.
 * Autism Models: Laboratory rats are the animal of choice in neurobiology. The Medical College of Wisconsin has been working with the Simons Foundation Autism Research Initiative (SFARI) to generate and distribute engineered rat models of autism.
 * PhenoMiner (Quantitative Models): PhenoMiner is a database and web application for finding and analyzing quantitative rat phenotype data.  Data is annotated to ontologies for rat strain, clinical measurement, measurement method, and experimental condition.  Experiments are categorized by the trait or disease assessed by the measurement.  The use of standardized vocabularies and data formats allows comparison of values across experiments for the same measurement.  The PhenoMiner results page includes a graph of the measurement values and a downloadable table of the values with their accompanying metadata.  A link is provided to give users the opportunity to submit their own data to the database.
 * Expected Ranges (Quantitative Models): Expected Ranges is a statistical meta-analysis database where quantitative phenotype values from PhenoMiner are used to calculated the “expected range” of  a measured phenotype for a strain group across different studies.  These expected ranges can be stratified by sex, age and experimental conditions if there are enough data points.
 * Phenotypes: The Phenotypes section contains a large body of data from the PhysGen Program for Genomic Applications project, an NHLBI-funded project to "develop consomic and knockout rat strains, phenotypically characterize these strains, and provide these resources to the scientific community."  Data categories include measurements of cardiovascular, renal and respiratory function, blood chemistry, body morphology and behavior.  Links are also provided to protocols for phenotyping rats and to similar high-throughput phenotyping data at the National BioResource Project for the Rat in Japan (NBRP-Rat).
 * Phenotypic Models and Genomic Resources for Additional Species: In addition to rat, mouse, and human data, the RGD provides integrated access to additional mammalian species' genomic, and in some cases phenotypic, information. These other species, listed below, are important research models for disease, physiology and phenotypes.

Disease Portals
Disease Portals consolidate the data in RGD for a specific disease category and present it in a single group of pages. Genes, QTLs and strains annotated to any disease in the category are listed, with genome-wide views of their locations in rat, human and mouse (see Genome Viewer in Genome tools developed at RGD). Additional sections of the portal display data for phenotypes, biological processes and pathways related to the disease category. Pages are also supplied to give users access to information about rat strains used as models for one or more diseases in the category, tools that could be used to analyze the data and additional resources related to the disease category. Further, access to the RGD's Multi-Ontology Enrichment Tool (MOET) is available at the bottom of the individual disease portals.

As of May 2021, RGD has fifteen disease portals: Disease portals consolidate the data in RGD for a specific disease category and present it in a single group of pages. Genes, QTLs and strains annotated to any disease in the category are listed, with genome-wide views of their locations in rat, human and mouse (see "Genome Viewer" in Genome tools developed at RGD). Additional sections of the portal display data for phenotypes, biological processes and pathways related to the disease category. Pages are also supplied to give users access to information about rat strains used as models for one or more diseases in the category, tools that could be used to analyze the data and additional resources related to the disease category.
 * Aging & Age-Related Disease
 * Cancer
 * Cardiovascular Disease
 * COVID-19
 * Developmental Disease
 * Diabetes
 * Hematologic Disease
 * Immune and Inflammatory Disease
 * Infectious Disease
 * Liver Disease
 * Neurological Disease
 * Obesity and Metabolic Syndrome
 * Renal Disease
 * Respiratory Disease
 * Sensory Organ Disease

Pathways
RGD's Pathway resources include a Pathway Ontology of pathway terms (developed and maintained at RGD, encompassing not only metabolic pathways but also disease, drug, regulatory and signaling pathways), as well as interactive diagrams of the components and interactions of selected pathways. Included on the diagram pages are a description, lists of pathway gene members and additional elements, tables of disease, pathway and phenotype annotations made to pathway member genes, associated references and an ontology path diagram. Pathway Suites and Suite Networks, i.e. groupings of related pathways which all contribute to a larger process such as glucose homeostasis or gene expression regulation are presented, as well as Physiological Pathway diagrams which display networks of organs, tissues, cells and molecular pathways at the whole animal or systems level.

Knockouts
Until recently, direct, specific genomic manipulations in the rat were not possible. However, with the rise of technologies such as Zinc finger nuclease- and CRISPR -based mutagenesis techniques, that is no longer the case. Groups producing rat gene knockouts and other types of genetically modified rats include the Human and Molecular Genetics Center at MCW. RGD links to information about the rat strains produced in these studies via pages about the PhysGen Knockout project and the MCW Gene Editing Rat Resource Center (GERRC), accessed from RGD page headers. Funding for both the PhysGenKO project and the GERRC came from the National Heart Lung and Blood Institute (NHLBI). The stated goal of both projects is to produce rats with alterations in one or more specific genes related to the mission of the NHLBI. Genes were nominated by rat researchers. Nominations were adjudicated by an External Advisory Board. In the case of the PhysGenKO project, many of the rats produced by the group were phenotyped using a standardized high-throughput phenotyping protocol and the data is available in RGD's PhenoMiner tool.

Community outreach and education
RGD reaches out to the rat research community in a variety of ways including an email forum, a news page, a Facebook page, a Twitter account, and regular attendance and presentations at scientific meetings and conferences. Additional educational activities include the production of tutorial videos, both outlining how to use RGD tools and data, and on more general topics such as biomedical ontologies and biological (i.e. gene, QTL and strain) nomenclature. These videos can be viewed on several online video hosting sites including YouTube.

Funding
RGD is funded by grant R01HL64541 from the National Heart, Lung, and Blood Institute (NHLBI) on behalf of the National Institutes of Health (NIH). The principal investigator of the grant is Anne E. Kwitek, who was appointed to this leadership position from Mary E. Shimoyama, in March 2020. Melinda R Dwinell is Co-Investigator.

New Genome Assembly
The new genome rat assembly, mRatBN7.2, was generated by the Darwin Tree of Life Project at the Wellcome Sanger Institute and has been accepted into the Genome Reference Consortium. mRatBN7.2 was derived from a male BN/NHsdMcwi rat that is a direct descendant of the female BN rat previously sequenced. The new BN rat reference genome was created using a variety of technologies including PacBio long reads, 10X linked reads, Bionano maps and Arima Hi-C. Its contiguity is similar to the human or mouse reference assemblies. It is available at NCBI’s GenBank and at RefSeq, and it will be made the primary assembly at RGD in the near future.