User:ProteinBoxBot/Ideas

 NOTE : This page is effectively read-only, except by the bot organizers. Please post any ideas and suggestions on the discussion page.

Development plan and future ideas
NOTE: The items below are thoughts for the future and are not included in the initial proposed specs.

See also: User:ProteinBoxBot/Project_proposals

Next up for implementation

 * per discussion on Commons, add PDB infobox to all PDB images (Example )
 * Run bot update
 * needs to work with new data Web services
 * remove PBB_Summary and PBB_Controls from main namespace
 * blocking edits to PBB templates can use the generic nobots
 * pilot project for SWL
 * find some well-known facts
 * encode them in Gene Wiki article using SWL
 * figure out synchronization with wikidraft.org/SMW, converting SWLs to real semantic links
 * OUTPUT: demonstrate real inline queries on wikidraft.org
 * OUTPUT: export from SMW to RDF
 * pilot collaboration with MODs (specifically ZFIN)
 * scan through all Gene Wiki pages for inline citations
 * retrieve MeSH terms identify matching species (human, mouse, zebrafish, fly, rat, yeast)
 * generate four-column output file:
 * WP article name
 * cited pubmed ID
 * matching organisms by MeSH
 * sentence(s) referencing the publication
 * Notes
 * is there a MeSH-to-taxonomy mapping? or do free-text matching?
 * for pubs that reference multiple species, one line per species
 * for articles that reference a pub multiple times, concatenate sentences


 * redesign infobox to better handle linking to MODs (MGD, RGD, ZFIN, FlyBase, WormBase, etc.)

Add additional links

 * GeneCards
 * nextbio.com?
 * wikiprofessional
 * wikigenes
 * WikiPathways.org
 * KEGG (also add wikilinks to other gene pages in the same KEGG pathways)
 * HPRD
 * link to Bioinformatic Harvester? -- would need community consensus...

Add/improve stub data (gene-specific)

 * change format of the references section to make it small-screen friendly
 * Add GeneRIFs and references from Uniprot
 * import and display EC number
 * import and display protein domain information (through Uniprot/PFAM/COGs) See previous discussion.
 * UniProt fields: PFAM, "Protein name", "Synonyms", FUNCTION, DOMAIN, SUBCELLULAR LOCATION, CATALYTIC ACTIVITY, COFACTOR, SUBUNIT, and WEB RESOURCE
 * Need to fix the db links for genome locations: default for mouse has gone to mm9 User_talk:ProteinBoxBot (need to either change default in template, or need to do a second pass run on all infoboxes to add parameter)
 * Load PPI from Entrez Gene User_talk:ProteinBoxBot/Archives/Archive1
 * Add a note in infobox showing last-updated date
 * for GO section, add small note of evidence code and a link to Pubmed reference, if available.
 * add image maps to thumbnail expression images so that tissues can be identified
 * add a banner from gene talk pages to portal page

Add/improve stub data (structure)

 * add reference to GO section of infobox linking Entrez Gene
 * Add a legend to the protein infobox, especially to explain what the expression profiles mean and how they were generated. See User_talk:ProteinBoxBot/Archives/Archive1

Technical bot stuff

 * add MCB template to talk page
 * Create more precise PDB caption by using the PDB "title"
 * Change PDB image name to correspond to the PDB ID, not the gene Symbol
 * change images to upload to Wikimedia Commons
 * Mechanism for users to interrupt actions of bot
 * replace move expression image captions from image to text (Preparing images for upload)
 * add template categories to PBB templates
 * SVG instead of PNG for thumbnail expression images
 * tag review articles in "Further reading" section with REVIEW (see User_talk:ProteinBoxBot/Archives/Archive1)
 * endash instead of hyphens in references
 * change PDB image link (which currently references only www.pdb.org) to a structure-specific page. Also reference the license agreement (http://www.pdb.org/robohelp_f/site_navigation/citing_the_pdb.htm) (This item may become obsolete with change to EBI images in wikicommons...)
 * test out using flare to visualize usage/editing data
 * fix duplicate images
 * only show 2-3 refs per protein interaction, biasing toward review articles (as discussed here)

Parallel efforts

 * upload all PDB to flickr? allows browsing of entire SCOP sub-trees.  maybe geotag by location?
 * create a WP category for every GO category? (Piggy back with Enzyme class effort?)
 * expand to create pages for each disease using Infobox_Disease
 * second bot to wikilink common biology concepts, specifically on pages with PBB_Controls
 * change Gene templates to internal wikilinks
 * systematic creation of articles around protein domains (e.g., SMART database)
 * Mass autogeneration of high-quality PDB images

Other

 * look into HSPA1A and HSPA1B
 * automated way to create this table
 * create a mac dashboard widget for the Gene Wiki?
 * charting library to combine bar chart with background histogram... (not really Gene Wiki related...)

Completed tasks

 * Upload snapshots of all PDB images -- create a gallery? Done!
 * get structure image from RSCB Done!
 * not sure yet how to get links from genes to PDB entries
 * SCB public domain license is here or here.
 * modify orthologs box to automatically adjust rows and columns based on data Done! (I think)...
 * possible add a comment to the protein box area saying that changes (to the protein box only) will be overwritten by the next bot update; this may help us from having to worry about manual edits -- AND/OR -- allow users to manually enter comment in protein box to prevent bot from overwriting Done! through the PBB_Controls template.
 * use "Category: Human proteins" instead of simply "Proteins" Done!
 * add "Category: Gene from chromosome N" Done!
 * change spacing pattern (e.g., ) Fixed when infoboxes moved to template pages

Obsolete tasks

 * second bot to create redirects from gene aliases Removed! better for a human to do
 * add a comment to make it clear where people can/should edit... Removed! better constrain areas for PBB edits
 * changing redirects so that primary title is HGNC name
 * maybe just flag these for manual inspection Removed! A human should handle anything with regards to page moves.
 * adding links to page (e.g., "ITK") from alternate symbols (e.g., EMT; LYK; PSCTK2; MGC126257; MGC126258) and full gene name (e.g., IL2-inducible T-cell kinase)
 * is redirecting from alternate symbols really a good idea? How would one list ITK on the EMT disambiguation page? Removed! Better that a human does this.
 * add a "update_PDB_image" tag in PBB_controls so that people can turn off automated edits for that part of the infobox specifically -- or, don't make any change to existing PDB image, only add if an image didn't previously exist Removed! Already default behavior