User talk:ProteinBoxBot/Phase 3

GWAS for gene-disease links
Offline, I've been having some discussions with collaborators about how to get gene-disease links. For Wikipedia, we of course want to restrict to the highest confidence links, ideally with specific references to the scientific literature. We are clearly looking at some of the canonical databases in the field (OMIM, PharmGKB, Drugbank), though license restrictions may come into play. We've also been discussing how GWAS hits should be incorporated as a source for gene-disease links. If anyone has feedback/input on this issue, we'd love to hear it here... Cheers, Andrew Su (talk) 21:58, 20 May 2013 (UTC)
 * Many diseases however have dozens if not hundreds of links to genes. Doc James  (talk · contribs · email) (if I write on your page reply on mine) 07:34, 21 May 2013 (UTC)
 * True... I was thinking of two "solutions".  First, we'd only restrict to the highest confidence gene-disease links, which should decrease the number of links for any given disease.  (Exactly how to determine "highest confidence" is still up for debate, and it also plays into whether we include GWAS.)  Second, I was also thinking that in cases where the number of genes is high, we could all but a few hidden behind a show/hide javascript control.  Sort of like how we have the Gene Ontology terms in GNF Protein box (e.g., ITK (gene)).  Thoughts?  Cheers, Andrew Su (talk) 15:44, 21 May 2013 (UTC)
 * There are also links to human diseases in Uniprot. Nice to see you and others around... My very best wishes (talk) 01:13, 22 May 2013 (UTC)
 * Great point, thank you! I added it to the list of potential sources...  Cheers, Andrew Su (talk) 05:09, 22 May 2013 (UTC)

Examples
Since it is proposed that disease infobox have links to relevant drugs and as far as I know, there are no known drugs for Sly syndrome, I suggest we change the example in the disease infobox from Sly syndrome to melanoma. That way, the examples for the drug and disease infoboxes would be interrelated. Boghog (talk) 23:25, 20 May 2013 (UTC)
 * Originally I was thinking of prototyping on the "live" version of a disease (like we did for ITK (gene) way back when) in which case obscure diseases are better, but using sandboxes is a better system. But regardless, I'd like to have several examples actually.  For example, the infobox will look different for Sly syndrome versus melanoma, so we want to make sure things look good for both.  In addition, the question was raised offline whether there should be differences between Mendelian versus complex diseases.  So in any case, feel free to add more examples.  Cheers, Andrew Su (talk) 00:09, 21 May 2013 (UTC)
 * Just added the melanoma box too... Cheers, Andrew Su (talk) 00:44, 21 May 2013 (UTC)
 * Thanks for adding the melanoma disease box and I agree that we will need to include several examples in a sandbox. Boghog (talk) 19:32, 21 May 2013 (UTC)
 * Additionally many diseases are treated with potentially dozens of different drug. Take depression for example. For info boxes you need content with one or two options. Maybe adding number of cases globally or frequency but that info is already in the epidemiology section and the lead. Doc James  (talk · contribs · email) (if I write on your page reply on mine) 07:36, 21 May 2013 (UTC)
 * I'm thinking we could apply the same basic strategy as described for the gene-disease links? i.e., a combination of restricting the list of links (on-label, FDA-approved uses only?) and restricting visibility (show/hide template).  Thoughts?  Cheers, Andrew Su (talk) 15:46, 21 May 2013 (UTC)

drugbox + chembox --> Chembox_Drug?
Anyone know if there have been prior discussions of merging drugbox and chembox? I ask because another one of our disease test cases is going to be Coenzyme Q10 deficiency. One of the drugs used to treat that disease is Ubidecarenone, which is simply another name for Coenzyme Q10. That page already has a chembox, and adding a new drugbox would seem redundant in many cases (for example in the identifiers, the chemical properties, and the InChI). It might seem reasonable that drugbox be merged into chembox by adding a bunch of optional parameters. Of course, I understand that would not be a trivial undertaking.

Now also finding the past discussions at Wikipedia_talk:Chemical_infobox, I wonder if this would be the time to roll out Chembox_Drug (and Chembox_Drug/sandbox? Cheers, Andrew Su (talk) 18:16, 21 May 2013 (UTC)


 * It was two years ago when we last discussed this. As I recall, the thought was to first update the chemical infobox using the infobox system and then merge the drugbox into the chembox.  I was completely overwhelmed by the complexity of the chembox and therefore I didn't get very far.  Please note that the Chembox_Drug is just something I was playing with and that template code would need a complete rewrite.  This rewrite would be a major undertaking which I simply do not have time for at the moment.  I would however consider resuming work on this template during the summer. Boghog (talk) 19:30, 21 May 2013 (UTC)


 * Hi Boghog, yes, I've seen your fingerprints all over those templates and discussions. Impressive amount of work!  You've probably noticed that I went ahead and played around with the Chembox Drug templates.  You can see the comparison to Drugbox/sandbox2 at User:ProteinBoxBot/Phase_3.  As you can see, most of the existing data translates perfectly to what you did two years ago!  And, this modular system sets us up perfectly for much more extensibility, ultimately toward integrating drugbox and chembox (as you designed it).  Anyway, I certainly don't want to put pressure on you to do anything at the moment.  I'm fine prototyping on Drugbox/sandbox2.  But Chembox Drug seems to me to be great infrastructure to build on moving forward!  Cheers, Andrew Su (talk) 20:36, 21 May 2013 (UTC)


 * Actually, let me just put the templates comparison below here. The Phase 3 examples section should really just focus on what we want to show, not how we show it.  So we can continue prototyping on drugbox there... Cheers, Andrew Su (talk) 20:36, 21 May 2013 (UTC)

GSOC project, Convert Gene Wiki Bot to write to Wikidata
User:edsu sent me a link to the GSOC project, Convert Gene Wiki Bot to write to Wikidata. I've been meaning to read it closely to figure out if there's any overlap between this project and the work being done here, but haven't been able to find the time. Anyway, someone here might find it interesting, if you don't already know about it. Klortho (talk) 02:33, 5 June 2013 (UTC)
 * Yes, we are one and the same group, and these are related projects! More details can be found at  and .  We'll do project mentoring and some technical planning via our mailing list, but all the design discussions will be here and at de:Wikidata_talk:Molecular_biology_task_force.  Cheers, Andrew Su (talk) 04:10, 5 June 2013 (UTC)
 * Ah, that makes sense, then. Klortho (talk) 10:47, 5 June 2013 (UTC)

Data sources
FYI all, we're in the process of organizing all the data sources that we will use for this Phase 3 initiative. If anyone would like to help out, feel free to contribute here. Cheers, Andrew Su (talk) 20:49, 14 June 2013 (UTC)
 * PubChem now provides a lot of data in RDF -- not sure if you knew about that yet or not (I see you're linking to their FTP site). E.g.  https://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID22444 or https://pubchem.ncbi.nlm.nih.gov/rest/rdf/substance/SID2244.  It is still in beta, in my understanding.  Klortho (talk) 04:23, 15 June 2013 (UTC)
 * There are mappings from Malacards to other sources at User:Noa.rappaport/Malacard mappings. It's my intention to populate and check wikidata from this information over the next month or so. Josh Parris 12:17, 19 December 2013 (UTC)

Bot is inserting bad statements into Wikidata
See this talk page post. For an example, see this edit. Klortho (talk) 03:56, 30 March 2014 (UTC)

can a request be made for a new article?
Hello. I'm a newbie. I asked a question about this at the help desk, which led me to here. ProteinBoxBot created a page FCHO2 in 2008, but did not create FCHO1 at that time. Can either of you instruct the bot to create a page for FCHO1? Thank you. JeanOhm (talk) 17:44, 13 May 2017 (UTC)
 * Thanks for the note, and welcome to the world of wikipedia and wikidata editing! I was just typing out the instructions on how you can create gene templates automatically, but then I see that our tool is currently down.  (whoops...)  I created a ticket to get that fixed, and we'll let you know when it's back up.  Best, Andrew Su (talk) 20:07, 13 May 2017 (UTC)
 * I created the FCHO1 page and we are troubleshooting the page creation tool. Julialturner (talk) 21:21, 13 May 2017 (UTC)
 * Thank you both! I'd like to explain my interest and make another request. While writing Vesicular transport adaptor protein I learned about the muniscin protein family. That family was first named in this paper, which identified FCHO1, FCHO2, SGIP1 and Syp1 as the family members. As you have probably already guessed, I wonder if you could please use the bot to create the Syp1 page, and then I'll write a page about the muniscin family. Thank you again. JeanOhm (talk) 01:34, 14 May 2017 (UTC)
 * I believe Syp1 is a rat gene, but the ortholog is already created Synaptophysin Julialturner (talk) 03:44, 14 May 2017 (UTC)
 * OOps! Sorry. JeanOhm (talk) 04:04, 14 May 2017 (UTC)

Just FYI, we fixed the tool to create new gene pages on wikipedia. I put the instructions at User:ProteinBoxBot. Please reply if you hit any problems. Best, Andrew Su (talk) 19:31, 16 May 2017 (UTC)
 * Thanks! I just used a bot!!!!!!!! The page is Golgin a7. It looks fine, IMHO, but since it is my first time, you might want to check on it. JeanOhm (talk) 02:51, 24 May 2017 (UTC)
 * looks great to me! Natural next step is to flesh out the article itself, since I'm sure our bot is only scratching the surface on what is really known!  Best, Andrew Su (talk) 05:47, 24 May 2017 (UTC)

Julia, after further digging I learned that Syp1 is not Synaptophysin, but it isn't in biogps.org, so I couldn't use the bot to make a page for it. If either of you want to see how I used FCHO2, see Muniscins. both. JeanOhm (talk) 19:46, 26 May 2017 (UTC)
 * Muniscins looks great! If you wanted to update FCHO2 to add a couple sentences (and link back to the article on muniscins), I think that would be even better!  On Syp1, if you can point me to a gene or protein identifier (from NCBI Gene, Ensembl, UniProt, etc.), then we can make sure the right page gets created.  Best, Andrew Su (talk) 21:07, 26 May 2017 (UTC)
 * Thanks. Yes, I intend to do more with FCHO2, but muniscins hasn't even been reviewed yet. The best link I can give for Syp1 is https://www.ncbi.nlm.nih.gov/protein/KZV12879.1 JeanOhm (talk) 00:11, 27 May 2017 (UTC)
 * okay, makes sense now. that protein identifier you found is for a yeast protein, and it doesn't look like it has an obvious human ortholog (that I can find anyway).  The ProteinBoxBot is really focused on creating pages for human genes and proteins only...  Best, Andrew Su (talk) 00:21, 27 May 2017 (UTC)
 * Yes, some of my personality defects have come from smelling way too many yeast cultures. Here's another gene you might be interested in tackling. CUX1 is a human gene that encodes a homeodomain protein CUTL1, omim https://www.omim.org/entry/116896  In addition, there is a Cutl1 Alternately Spliced Product called CASP which has an entirely different function and location. It is involved in the Golgi matrix, and I am working on a gmatrix article and a major upgrade to the golgi article itself. Neither omim nor biogps have a separate entry for CASP. CASP is well established. Original paper, which also indicats that yeast has Coy1, which also is not in wp. It is still going strong, appearing in "Integrated self-organization of transitional ER and early Golgi compartments" by Ben Glick, a major contributor to the golgi field, doi: 10.1002/bies.201300131 which I bet you have access to. Are you able to make a biogps entry for CASP, in order to use the bot to make a wp article? Thanks, JeanOhm (talk) 00:57, 27 May 2017 (UTC)

Modifying bot summary text?
Hi. I was reading this article (https://www.sciencedaily.com/releases/2015/03/150313094551.htm), and I thought "Hmm, maybe I'll read about DLL4 on Wikipedia", so I did. Then I thought, "I wonder what a homolog is..." and I found this: https://en.wikipedia.org/wiki/Sequence_homology. "Cool", I thought, "maybe I'll add a link to this page from DLL4." Then I found the text to be from the bot's summary. So I thought "Maybe I'll try to add the link to the bot", but I couldn't figure out where the summary text originates. It seems like it would be nice to have that link in the summary text of the homologies, if I'm understanding what's going on. I kinda got stuck. Cheers, Jrichardliston (talk) 00:34, 9 April 2018 (UTC)
 * See this diff -- you can just remove the surrounding template. Sorry, those templates are obsolete now, so we should probably try to get a bot to clean them out... Best, Andrew Su (talk) 00:45, 9 April 2018 (UTC)
 * Cool, thx. Jrichardliston (talk) 00:55, 9 April 2018 (UTC)