Wikipedia:WikiProject Molecular Biology/Style guide (gene and protein articles)

This is a guideline for the structure of gene and protein articles on Wikipedia. It contains the articles naming conventions and the general recommended outline of an article, as well as useful information to bring an article to good article or featured article status.

General considerations
The scope of a gene/protein article is the human gene/protein (including all splice variants derived from that gene) as well as orthologs (as listed in HomoloGene) that exist in other species. If there are paralogs in humans (and by extension other species), then a gene family article in addition to the gene specific articles (see for example dopamine receptor) would be appropriate.

In general, do not hype a study by listing the names, credentials, institutions, or other "qualifications" of their authors. Wikipedia is not a press release. Article prose should focus on what a cited study says about the structure, function, clinical significance, etc. of the gene or protein, not what the gene or protein says about a particular study or the research group who conducted that study. Particularly notable contributions along with who made the discovery however should be mentioned in the discovery/history section.

Article name
If relatively short, the recommended UniProt protein name should be used as the article name. If the protein name is verbose, either a widely used protein acronym or the official HUGO gene symbol, followed by "(gene)" if necessary to disambiguate. UniProt names generally follow the IUBMB recommendations:

If the article is about a viral protein, it is recommended to include the taxon in the title, as "nonstructual protein 2" and "viral protease" can mean many things. A parenthesized term added to disambiguate common symbols does not constitute unnecessary disambiguation even when it is the only article with such a name.

Gene nomenclature
The abbreviations of genes are according to HUGO Gene Nomenclature Committee and written in italic font style (the full names are also written in italic). It is recommended that abbreviations instead of the full name are used. Human gene names are written in capitals, for example ALDOA, INS, etc. For orthologs of human genes in other species, only the initial letter is capitalised, for example mouse Aldoa, bovine Ins, etc.

The following usages of gene symbols are recommended: while the following is not recommended:
 * "the ALDOA gene is regulated...",
 * "the rat gene for Aldoa is regulated..." or
 * "ALDOA is regulated...",
 * "the gene ALDOA is regulated" since it is redundant.

Images and diagrams
Where possible, diagrams should keep to a standard format. If the diagram guide does not give sufficient guidance on the style for the images in an article, consider suggesting expansions to the standardised formatting.

Infoboxes
One or more of the following infoboxes as appropriate should be included at the top of each article:

If there is only one human paralog assigned to a given EC number (the ExPASy database maintains EC number to protein mappings), then in addition to a protein infobox, it may be appropriate to also add the corresponding enzyme infobox. Likewise, if there is only one human paralog that has been assigned to Pfam family, then including a protein family infobox may also be appropriate.

There exist some cases where a large number of infoboxes may apply to an article. You may put less useful ones in a section at the end, laid side-by-side with a table. Collapsing or horizontally scrolling the said table is doubtful, as MOS:COLLAPSE may or may not apply depending on how "extraneous" the boxes are.

Sections

 * 1) Lead
 * The lead section is defined as "the section before the first headline. The table of contents, if displayed, appears between the lead section and the first headline."
 * The first sentence of the lead should define what the scope of the article is. For genes/proteins in which a human ortholog exists, " is a protein that in humans is encoded by the  gene." would be appropriate.
 * 1) Gene
 * Specific information about the gene (on which human chromosome it is located, regulation, etc.). Much of this basic information may already contained in the infobox and should not be unnecessarily repeated in this section unless especially notable.
 * 1) Protein
 * Specific information about the protein (splice variants, post translational modifications, etc.). Again, much of this basic information may already contained in the infobox and should not be unnecessarily repeated unless especially notable.
 * 1) Species, tissue, and subcellular distribution
 * Optional section that concisely describes what species this gene is expressed (e.g., wide species distribution, bacteria, fungi, vertebrates, mammals, etc.), what tissue the protein is expressed, and which subcellular compartments or organelles the protein is found (excreted, cytoplasm, nucleus, mitochondria, cell membrane).
 * 1) Function
 * Describe the function of the transcribed protein.
 * 1) Interactions
 * Optional section that lists proteins that the protein that is the subject of the article is known to interact with.
 * 1) Clinical significance
 * List diseases or conditions that are a result of a mutation in the gene or a deficiency or excess of the expressed protein.
 * 1) History/Discovery
 * In general, it is not appropriate to mention the research group or institution that conducted a study directly in the text of the article. However it is appropriate to list the names of those who made key discoveries concerning the gene or protein in this section (e.g., the scientist or group that originally cloned the gene, determined its function, linked it to a disease, won a major award for the discovery, etc.).

Example articles of what such an organization may look like are: Protein C, Gonadotropin-releasing hormone or Rubisco.

Wikidata item
The Wikipedia article should be linked to a Wikidata item of the entity first mentioned in the first sentence of the lead section, which should be written as defined in WP:MCBMOSSECTIONS. Suppose that the first sentence is "Steroid 21-hydroxylase is a protein that in humans is encoded by the CYP21A2  gene." In this case, the Wikipedia article should be linked to a Wikidata item of the steroid 21-hydroxylase protein rather than the gene.

Citing sources
For guidance on choosing and using reliable sources, see Identifying reliable sources (natural sciences) and Wikipedia:Reliable sources.

For general guidance on citing sources see Citing sources, Footnotes and Wikipedia:Guide to layout.

MCB articles should be relatively dense with inline citations, using either [[H:FOOT| Some editors prefer to expand the abbreviated journal name; others prefer concise standard abbreviations.

Abstracts of most MCB related journals are freely available at PubMed, which includes a means of searching the MEDLINE database. The easiest way to populate the journal and book citation templates is to use Diberri's template-filling web site or the Universal reference formatter. Search PubMed for your journal article and enter the PMID (PubMed Identifier) into Diberri's template filler or the Universal reference formatter. If you use Internet Explorer or Mozilla Firefox (2.0+), then Wouterstomp's bookmarklet can automate this step from the PubMed abstract page. Take care to check that all the fields are correctly populated, since the tool does not always work 100%. For books, enter the ISBN into Diberri's tool. Multiple references to the same source citation can be achieved by ensuring the inline reference is named uniquely. Diberri's tool can format a reference with the PMID or ISBN as the name.

In addition to the standard citation text, it is useful to supply hyperlinks. If the journal abstract is available on PubMed, add a link by typing. If the article has a digital object identifier (DOI), use the doi template. If and only if the article's full text is freely available online, supply a uniform resource locator (URL) to this text by hyperlinking the article title in the citation. If the full text is freely available on the journal's website and on PubMed Central, prefer to link the former as PubMed central's copy is often a pre-publication draft. When the source text is available in both HTML and PDF, the former should be preferred as it is compatible with a larger range of browsers. If citation templates are used, these links can be supplied via the pmid, doi, url and pmc parameters. Do not add a "Retrieved on" date for convenience links to online editions of paper journals (however "Retrieved on" dates are needed on other websources).

For example:

A citation using cite journal:

Or the alternative vcite journal:

Navigation box
Articles about related proteins may be cross linked by including one or more navigation boxes as appropriate. Examples include:


 * Cytoskeletal Proteins
 * Ion channels
 * Transcription factors
 * Transmembrane receptors

Categories
Every Wikipedia article should be added to at least one category. Categories or subcategories that may be appropriate for gene and protein articles include:
 * Category:Proteins
 * Category:Enzymes by function
 * Category:Genes by human chromosome