Wikipedia:WikiProject Molecular Biology/Genetics/Gene Wiki/Project proposals

Introduction
The ProteinBoxBot (PBB) has a goal of creating and/or enhancing ~10,000 Wikipedia pages corresponding to human genes. PBB harvests data from various public databases and formats them for appropriate visualization in Wikipedia. We hope (and preliminary data suggest) that this structured data from the public domain will then seed contributions of unstructured knowledge from the broader biological community. Although we offer several specific project ideas below, any proposals from students which broadly fit this mission will be considered!

Background
The first version of PBB was created by JonSDSUGrad in Java, in part using the Java Wiki Bot Framework. As of February 2008, PBB has been used to amend ~650 existing gene pages and create ~8000 gene pages. (All PBB pages are listed here.) This project has been supported by the Molecular and Cellular Biology Wikiproject. By all accounts, the Version 1 PBB has been quite successful. Over 65% of PBB pages show up on the first page of a google search (see figure), and edits to newly-created PBB pages account for approximately half of all gene page edits. Source code of PBB has been released at Google Code.

Despite the success of Version 1, many enhancements to the PBB code and resulting Wikipedia pages are needed. The ideas below represent some possible projects that will further improve the utility of the gene page stubs and attract more editors to these pages. The required expertise for the projects listed below ranges from a beginning undergrad in computer science, biology, or bioinformatics, all the way up to master's students.

Questions?
Questions about PBB or any of the projects below? Post on the talk page! This is intrinsically a wiki-based project (not just for the project's output, but for coordination, brainstorming, and discussion), so you might as well learn now! And although there will be one official mentor for projects, everything we do is in the context of the Wikipedia community, which effectively serves as an unlimited supply of mentors.

Automatically turn off summary updates if wikilinking is detected in PBB_Summary
Description: PBB uploads a text summary in the PBB Summary template. Often, editors add wikilinks in that text without turning off the automatic summary update in PBB Controls. This enhancement would automatically detect wikilinking and turn off the summary update.

Difficulty: Low

Skill required: Java familiarity

Automate uploads of PDB images
Description: PBB uploads two-dimensional protein images for all gene pages which have a protein structure available (e.g., Image:PBB_Protein_CDK2_image.jpg). This new module of PBB will periodically upload new images according to current release of PDB. Images should be categorized by protein family classification (SCOP).

Difficulty: Medium/low

Skills required: Familiarity with programmatic interfaces with wikis is useful, but not required.

Add protein domain information to gene pages
Description: See discussion here.

Difficulty: Medium

Skills required: Java, Basic understanding of protein structure

Systematic classification of gene pages by protein family
Description: Genes and proteins are also classified by protein family. For example, this page shows all the protein domains that are found in the gene BTK. (Protein domain IDs begin with "IPR"...) This new module of PBB would parse these data from database dumps, create the appropriate categories in Wikipedia, and assign all relevant genes to each category.

Difficulty: Medium/hard

Skills required: Familiarity with the molecular biology and/or programmatic interfaces with wikis is useful, but not required.

Add Wikiproject MCB to gene talk pages
Description: For all PBB runs, check to make sure Wikiproject MCB appears on talk page. If not, create it.

Difficulty: Medium

Skills required: Java

Various technical fixes
Description: Implement various technical fixes, including:
 * Change uploaded PDB image name to PDB ID instead of Gene Symbol, e.g,. Image:PBB Protein 2CPC image.jpg instead of Image:PBB Protein OBSL1 image.jpg (made obsolete by bigger PDB project above)
 * change spacing pattern in output (e.g., )
 * change expression images to upload to Wikimedia Commons
 * tag review articles in "Further reading" section with REVIEW (see User_talk:ProteinBoxBot/Archives/Archive1)
 * remove pubmed IDs for large-scale cloning papers, e.g.,

Difficulty: Low

Skills: Java