User:ProteinBoxBot

Overview
The Gene Wiki project was initiated in 2007 or so as an effort to systematically improve the state of gene articles in Wikipedia. The primary mechanism for that project was data-driven bot edits based on commonly-used databases in biomedical research (e.g., NCBI Entrez, UniProt, PDB). More info on this initiative can be found at Portal:Gene Wiki. Some historical context is provided in the ProteinBoxBot and ProteinBoxBot2 sections below.

Currently, we are expanding this game plan to other biomedical concepts. More details are in the ProteinBoxBot 3 section below.

ProteinBoxBot 3 (circa 2013)
In this phase of the ProteinBoxBot, we will expand the scope beyond human genes and proteins. This plan is currently in its infancy and under active discussion. Scope has been discussed with WikiProject Pharmacology and WikiProject Medicine. In order to centralize the discussion among all interested parties, please consult the project page at User:ProteinBoxBot/Phase 3.

ProteinBoxBot 2 (circa 2011)
A new version of the bot was created to maintain the Protein Boxes on Gene Wiki pages. It queries information from the MyGene.info gene annotation service (which itself compiles information from public-domain databases) and compares it to the values in the existing article infoboxes. If the article information is out of date, incorrect, or missing, the infobox is updated with the correct values. It then scans the Commons for protein structure images corresponding to the protein box subject. If it finds one, the image is added; if not and sufficient information is available, the bot generates, uploads, and links a new image. This bot does not create pages or edit full-text articles (for the time being). The source code for the bot is available at the GeneWiki code repository. This version is being designed and maintained by Pleiotrope in conjunction with AndrewGNF.

ProteinBoxBot (circa 2008)
This bot has created or amended ~9,000 pages corresponding to mammalian genes. Each new page was seeded with content from databases in the public domain. This content included information about the gene's symbol, description, function, genomic location, structure and identifiers. Genes which did not have any existing wikipedia pages for its symbol, aliases, or title were created (e.g., MMP9). Genes which did have these conflicts in the wikipedia namespace were flagged for manual integration (e.g., Apolipoprotein_E). This bot is currently being designed and developed by AndrewGNF and JonSDSUGrad, with substantial input from the WP:MCB community.

The team

 * Sebotic
 * Emitraka
 * I9606
 * Andrew Su
 * plus some Wikidata-only accounts here

Technical links

 * Requests for page creation: the requests page
 * Archived bot approval page: Bots/Requests for approval/ProteinBoxBot
 * Bot Log File Index: User:ProteinBoxBot/PBB_Log_Index
 * Bot Page Directory: User:ProteinBoxBot/Protein_Directory
 * Templates used primarily by ProteinBoxBot
 * GNF GO
 * GNF Ortholog box
 * GNF_Protein_box
 * PBB
 * PBB Controls
 * PBB Further reading
 * PBB Image citation
 * PBB Summary
 * Ideas for the future: User:ProteinBoxBot/Ideas

Disabling bot edits on specific gene pages
Add the nobots tag to a page or template to prevent the bot from editing it:

How to create a new gene page

 * 1) Go to biogps.org, then search for your gene of interest
 * 2) View the "Gene Wiki" window.  If a corresponding Wikipedia page already exists, it will be automatically loaded.  If not, a page creation interface will be shown (but may take up to one minute to appear).
 * 3) Create the page after choosing the page name (follow the prompts).
 * 4) Confirm creation. It may take some time for BioGPS to register the update, but you should be able to see it by viewing the ProteinBoxBot's user contributions.

Source code

 * (v3) https://bitbucket.org/eclarke/pygenewiki
 * (v2) https://code.google.com/archive/p/genewiki/source
 * (v1) https://code.google.com/archive/p/protein-box-bot/source/default/source