User:ProteinBoxBot/Specs

ProteinBoxBot specs

 * Content has already been assembled as part of a non-WP project. Data will be provided to ProteinBoxBot as an XML or CSV file.  Images will be provided in a zip file or local directory.
 * For each mammalian gene with significant available annotation, a new gene page will be created that corresponds to the HUGO-approved symbol.
 * If a page with that name already exists:
 * If page contains a Protein infobox or GNF_Protein_box, then changes will overwrite previous infobox but leave surrounding content intact
 * If not, the gene will be flagged for manual review. Log entry and proceed to next gene
 * Image (when available from RSCB according to public domain use) will be uploaded.
 * A protein infobox will be created and populated with relevant data. (Manually-created example: ITK (gene)
 * A redirect will be created from the full gene name. (For example: IL2-inducible T-cell kinase)
 * If a page with the full gene name already exists, gene will be flagged for manual review
 * Free-text summary will be included from NCBI page, add wikilinks if appropriate.
 * Create references section based on gene2pubmed and/or generifs


 * In trial phase, only 10 gene pages will be created. If necessary to better define how much information is necessary for a useful stub, a secondary trial period for ~100 genes will be proposed.
 * Bot will check User_talk:ProteinBoxBot and stop with any new messages.
 * Bot will cap edits at 10 per minute.
 * New protein infoboxes will contain notice that changes can/will be overwritten on further bot updates
 * If bot encounters agreed flag (e.g., " ") then entry will be logged and skipped.
 * Bot will maintain log of all edits and edit times.
 * Add all modified pages to ProteinBoxBot's watchlist to track further page edits.