Wikipedia:Featured article candidates/Rosetta@home


 * The following is an archived discussion of a featured article nomination. Please do not modify it. Subsequent comments should be made on the article's talk page or in Wikipedia talk:Featured article candidates. No further edits should be made to this page.

The article was promoted by User:SandyGeorgia 00:26, 11 October 2008.

Rosetta@home

 * Nominator(s): Emw2012 (talk)

I'm nominating this article because I think it fulfills the featured article criteria. I've been working regularly on the article for the better part of three months, and now feel that it does justice to an interesting and important example of distributed computing being used for protein structure prediction. During my time working on the article, it has become listed as a good article and undergone two peer reviews (one before and one after GAN). David Baker, the head scientist on the Rosetta team, has read the article and called it an "outstanding job"; I've incorporated his emailed suggestions. Thanks in advance for comments and suggestions. Emw2012 (talk) 05:39, 28 September 2008 (UTC)
 * Comment by jimfbleak No time to read properly yet, but I notice that all of the images have forced thumb sizes which override user settings. Can these settings please be removed? Also, MoS suggests that images should all be right-aligned or alternate - why is one image left aligned? jimfbleak (talk) 06:03, 28 September 2008 (UTC)
 * Alignment has been fixed and forced sizes have been removed for all images, but I think the image in the Project significance section (especially) and the image in the  Volunteer contributions section (currently 180px × 122px and 180px × 83px) are now too small to convey their intended information.  According to WP:MoS, there are exceptions to the policy on forced sizes: "Images in which a small region is relevant, but cropping to that region would reduce the coherence of the image" (e.g. the detailed screensaver in 'Project significance') and "Detailed maps, diagrams or charts" (e.g. the bar chart in 'Volunteer contributions').  Considering that I'd like to restore the previous sizing for those images (300px × 203px and 450px × 207px, respectively) or very slightly smaller.  Please let me know what you think. Emw2012 (talk) 08:41, 28 September 2008 (UTC)
 * My default is 180 px, and I expect to have to click on images if I want more detail - that's the whole point of thumbs. I'm not going to oppose just on this issue, and for time reasons I'm unlikely to be able to do a full review, so probably won't support either unless it's still here in two weeks time. Really just wanted to raise the issue (if it hadn't been FAC I would have just removed the forced image sizes). jimfbleak (talk) 16:34, 28 September 2008 (UTC)
 * Unless anyone objects, I'll keep off forced image sizes per your suggestion. Emw2012 (talk) 22:17, 28 September 2008 (UTC)

Image question - What efforts have been made to get the publishers to release the non-free screenshots on a GFDL licence? Fasach Nua (talk) 13:32, 28 September 2008 (UTC)
 * Originally, all of the images were non-free. I emailed the creator of the Rosetta@home logo; he said something to the effect of "it would be fine to use the image on Wikipedia", but did not respond when I asked him to fill out the standard free license release form.  Considering that I haven't made an effort to get the screensaver freely licensed by the Baker lab.  I will email them again later today.  The next two images, superpositions of solved and predicted protein structures, were both made by me in PyMOL after a fairly long search for the atomic coordinates of the predicted structures.  The bar chart in 'Volunteer contributions' took a while to get appropriately licensed, but now all images on http://boincstats.com are under a free CC license. Emw2012 (talk) 15:40, 28 September 2008 (UTC)
 * For the Rosetta@home screensaver image, would a free license apply to only that particular screenshot of the screensaver, or all screenshots of that type of Rosetta@home screensaver? Emw2012 (talk) 16:24, 29 September 2008 (UTC)
 * Also, if an image (e.g. the Rosetta@home logo) were not under a free license, would it not be shown on alongside the lead if the article were to be made 'Today's featured article' some time in the future? Emw2012 (talk) 02:06, 1 October 2008 (UTC)

Comments
 * What makes the following reliable sources?
 * http://boincstats.com/
 * The site converts XML data exported by distributed computing projects on the BOINC platform into various charts and tables (see http://boincstats.com/page/faq.php#9). It is included in a list of sources for "More detailed statistics for Rosetta@home" on an official project page here: http://boinc.bakerlab.org/rosetta/stats.php, and is almost certainly the most widely used of those sites. Emw2012 (talk) 15:40, 28 September 2008 (UTC)


 * http://www.mrw.interscience.wiley.com/suppmat/0887-3585/suppmat/prot.21636.html gives me a "forbidden" message.
 * I saw that on the link checker as well, but somehow could still access the site. I'm not sure what's going on there -- perhaps I should remove the link and only include the other reference information? Emw2012 (talk) 15:40, 28 September 2008 (UTC)
 * I had no problem accessing the site. Perhaps link checker is incorrect in this case. &mdash; Mattisse  (Talk) 15:58, 28 September 2008 (UTC)
 * I don't blindly trust the link checker, I always try to click through to the article itself. In this case, I'm still getting a 'forbidden' notice, perhaps you both are on an academic network? It's a Wiley Science reference, it appears, so by chance is this an scientific journal accessed through a database? Ealdgyth - Talk 16:05, 28 September 2008 (UTC)
 * I am not on an academic network. Just an ordinary, commercial IP. &mdash; Mattisse  (Talk) 16:19, 28 September 2008 (UTC)
 * I too had no problems with this. Graham Colm Talk 16:28, 28 September 2008 (UTC)


 * http://boinc.bakerlab.org/rosetta/forum_thread.php?id=669&nowrap=true#10910 is a forum thread, but it's their own forums and their own maintained FAQ. Borderline, but probably okay.
 * http://boinc.bakerlab.org/rosetta/forum_thread.php?id=3934&nowrap=true#51199 is likewise from the forums, it needs a publisher
 * Done. Emw2012 (talk) 15:40, 28 September 2008 (UTC)


 * Basically, the "Rosetta@home forum" links, need to be investigated by other reviewers to make sure that they are legitimate uses of the forums, and they ALL need to give a publisher outside the link title.
 * Current ref 27 (David Baker ... Publications on R@H's Alzheimer's ..) needs a last access date.
 * Done. Emw2012 (talk) 15:40, 28 September 2008 (UTC)


 * Current ref 35 (Liu Y et all ) needs a last access date.
 * Done. Emw2012 (talk) 15:40, 28 September 2008 (UTC)


 * Current ref 36 (David Bakers's Rosetta!home journal archives message 40756) has the author and the publisher run into the link title. They need to be broken out from the link.
 * "David Baker's Rosetta@home journal archives" is the actual title of that page, but I've added proper author (David Baker) and publisher (University of Washington) information. Emw2012 (talk) 15:40, 28 September 2008 (UTC)


 * Likewise all the forum posts need to be audited against the content to make sure that they are allowed usages under WP:V and WP:RS.
 * All cited forum posts are authored by either project scientists (e.g. principal investigator David Baker; project scientists are listed as such under their username in each post) or, in one case, a moderator of the forum (moderators in this forum are liaisons between project scientists and project volunteers). I'm aware of the WP policy against using forum posts as references, but consider this particular kind of forum posting both verifiable and reliable considering that they are made project scientists or forum moderators appointed and endorsed by project scientists.  I have only used these forum posts in cases where they provided information that is otherwise unavailable, for example in the project website, the scientific literature, or other sources.  Emw2012 (talk) 15:40, 28 September 2008 (UTC)
 * I forgot to mention one forum post that was by a regular user, current reference #61 ("Foldit forums: How many users does Foldit have? Etc. (message 2)". Retrieved on 2008-09-27.. Considering it simply explains how to estimate the number of Foldit users by multiplying the number of users on each page of the list of all users by the number of pages in that list (i.e., 50 users/page * 1189 pages = 59,450 users), I think it is verifiable. Also, since the author is pseudonymous and the site's publisher is uncertain, I've omitted values for those attributes of the cite template.  Emw2012 (talk) 22:17, 28 September 2008 (UTC)
 * Otherwise sources look okay, links check out with the link checker tool. Ealdgyth - Talk 14:12, 28 September 2008 (UTC)


 * Comment was looking for a direct mention/link of Levinthal's paradox in the article but failed to spot one. Shyamal (talk) 15:59, 28 September 2008 (UTC)
 * Great point -- I've incorporated a sentence on how it relates to Rosetta@home in the Protein significance section. Emw2012 (talk) 22:17, 28 September 2008 (UTC)


 * Support - I peer-reviewed this article a week or two ago and all my major concerns were addressed. My only remaining minor quibble is that the disease-related research section lacks flow. I suggest experimenting with deleting the sub-headings. Graham Colm Talk 16:55, 28 September 2008 (UTC)


 * Note, many citations are incorrectly formatted. Please assure that all citations list a publisher so they can be checked for reliability.  Sandy Georgia  (Talk) 23:50, 28 September 2008 (UTC)
 * I could've sworn I added publisher information for all forum references in a recent edit, but guess I hadn't. That's now done.  In keeping with practice in scientific publications and what I see in other featured articles in the sciences, I have omitted listing publishers for journal citations.  I have also left references to http://boincstats.com without a publisher, since no such information seems to be available (it is a website made by a single man, who I have listed as the author; rationale for reliability is in a previous comment). If they should be there, please let me know, along with anything else I should add. Emw2012 (talk) 04:03, 29 September 2008 (UTC)


 * Comments and almost a Support - this is quite a comprehensive and well written article but I put myself in the shoes of an enquiring reader and found a few thing that could be answered:
 * The computing section does not make it clear how it distributes work on say a single protein to different user machines to ensure that they dont actually search the same conformation space, perhaps this is part of the BOINC platform but it seems to me more domain specific and worth explaining.
 * I've added an explanation of that in the last few sentences of the ' Computing platform' section. Emw2012 (talk) 02:06, 1 October 2008 (UTC)


 * To minimize power consumption or heat production from a computer running at sustained capacity, the maximum percentage of CPU resources that Rosetta@home is allowed to use can be specified through a user's account preferences. The times of day during which Rosetta@home is allowed to do work can also be adjusted, along with many other preferences, through a user's account settings. - I would suggest that the primary motivation is more likely to allow other background processes to execute rather than to prevent heating of the CPU.
 * Since Rosetta@home is run as a lowest-priority task, it throttles back whenever background processes (e.g., ripping/burning media files, virus scanning, etc.) request resources that Rosetta would otherwise be using -- see the sentence preceding your quotation. In light of that, the most important things would be power consumption and heat production, no?  Emw2012 (talk) 02:06, 1 October 2008 (UTC)


 * Protein 3D structures are currently determined experimentally through X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. The process is slow (it can take weeks or even months to figure out how to crystallize a protein for the first time) and comes at high cost (around $100,000 USD per protein). - my understanding is that there are proteins such as GPCRs which cannot be crystallized in their "real-life" conformations.
 * While a few GPCR proteins have been crystallized, you're right that GPCRs (and membrane proteins in general) are especially difficult to solve in terms of structure. I've added information on what Rosetta@home is doing on this front in last few sentences of the second paragraph of the ' Project significance' section. Emw2012 (talk) 02:06, 1 October 2008 (UTC)


 * In other words, Folding@home's strength is protein folding, while Rosetta@home's strength is protein design and prediction of structure and docking. - I find it hard to see this contrast, is it that it reverse engineers the structure of a "locking protein" from a target's active site ?
 * Folding@home is interested in modeling (via molecular dynamics) the trajectories of the backbone and residues as the protein folds to native state. Although better understanding of those trajectories could possibly help structure prediction, Rosetta@home is much less interested that, and instead focuses on the position of all parts of the protein in its native state.  Rosetta's methodology for protein docking prediction is described in the third paragraph of the ' RosettaDock' section.  Let me know if and perhaps how I can further clarify this. Emw2012 (talk) 02:06, 1 October 2008 (UTC)
 * Ok, now I understand. I have some minor concerns about the compliance to WP:RS but I hope this stimulates the project team to publish a proper description report and help replace the citations to the discussion forum. Shyamal (talk) 03:53, 1 October 2008 (UTC)

Good luck. Shyamal (talk) 13:37, 29 September 2008 (UTC)
 * Comments - I am waiting for these issues that are above my technical expertise to be settled so I can register my support. I have been following this article since its GAN days and find it fascinating. I have always wondered what Rosetta@home was and this is a wonderful (and for me, satisfying) explanation. (I did some minor copy editing a while ago.) &mdash; Mattisse (Talk) 14:36, 29 September 2008 (UTC)
 * Comment. I'm close to support.  However, I reviewed the section on Alzheimer's disease, and it was kind of inaccurate.  What the project was doing was quite accurate, but it was less so on describing the biochemical nature of AD.  I corrected it, but I wonder if the same thing is wrong in the other sections on what is being done with the work here.  Also, and I consider this important, is how much bandwidth does this project use?  With some ISP's limiting the amount of bandwidth that can be used per month, will this project be hurt.  It may not be germane to the article, but if I were seeking out information, I'd wonder.   Orange Marlin  Talk• Contributions 18:21, 29 September 2008 (UTC)
 * There isn't much information out there on how much bandwidth Rosetta@home uses per day (or per workunit). I've initiated a conversation at the Rosetta@home forums here: Daily bandwidth usage for Rosetta@home.  Unfortunately neither a project scientist nor moderator has dropped in, so there may be reliability issues.  And though possible, it would be probably be difficult to verify.  Let me know what you think about including information from that Rosetta@home user regarding bandwidth usage. Emw2012 (talk) 02:06, 1 October 2008 (UTC)
 * It doesn't appear to be much, but I don't think a forum passes the WP:RS test. It appears that it takes 1-2Gb per month, which if you're limited to 200Gb, is kind of significant.  I wish there was something more reliable as a source.  Orange Marlin  Talk• Contributions 21:07, 3 October 2008 (UTC)
 * I agree that there may be reliability (and verfiability) issues in forum posts by users who are neither project scientists nor moderators. Given the criteria at WP:SELFPUB, however, there may be a case for including the post in question.
 * Also, I'm not sure how you got to a bandwidth usage of 1-2 GB per month, considering that 1024 MB was the maximum requirement for the most bandwidth-hungry computer being measured (which had eight CPU cores, making it an outlier). The remaining computers being measured (all single core, 2.8-3.0 GHz CPUs) used around 250 MB per month on average, i.e. one 800th of a 200 GB-per-month capacity.
 * I agree that this is somewhat important information, but a well-vetted source seems simply unavailable. There is other equally important information that is unavailable for this and similar projects: how much extra power per hour is consumed by running the project, how much heat, how much RAM does an average workunit use, etc.?  Because of a lack of reliable information, these questions may be beyond our current scope.
 * Finally, I want to reiterate that all but one other forum post referenced are written by project scientists and moderators, not miscellaneous users. So I think other forum references used hold significantly more weight. Excluding all forum references would seriously deprive the article of non-controversial and in my opinion acceptably-sourced information. Emw2012 (talk) 16:43, 4 October 2008 (UTC)
 * Sorry, bad math on my part. Dammit, I'm a doctor, not a computer scientist. (Can't use that Star Trek reference enough.)  250 mb is less than .1% of some of the limits I've read, unless you're using a cellular access to the net.  Not really worthy of adding to the article. Orange Marlin  Talk• Contributions 21:09, 9 October 2008 (UTC)


 * Note, this FAC has been up for almost a week. It is not clear to me that reviewers have checked individually sourced statements as suggested by Ealdgyth, I am uncertain if images are cleared, and it is not apparent that any reviewer has checked Scholarly sources for any coverage of any criticism, controversy or weaknesses per the information in the nominator blurb:  " David Baker, the head scientist on the Rosetta team, has read the article and called it an "outstanding job"; I've incorporated his emailed suggestions."  Sandy Georgia  (Talk) 20:16, 4 October 2008 (UTC)
 * Regarding images, a recent comment on Fasach Nua's user page seems to imply that s/he thinks the article fulfills criterion 3.  I've asked for help vetting sources at Wikiproject MCB's talk page. Emw2012 (talk) 15:46, 7 October 2008 (UTC)
 * There are still missing publishers and incomplete information about sources, and no response if any potential criticism has been adequately researched and covered, considering the Baker endorsement. Sandy Georgia  (Talk) 21:13, 8 October 2008 (UTC)
 * In my previous response to your concern over lack of publisher information, I said: "In keeping with practice in scientific publications and what I see in other featured articles in the sciences, I have omitted listing publishers for journal citations. I have also left references to http://boincstats.com without a publisher, since no such information seems to be available (it is a website made by a single man, who I have listed as the author; rationale for reliability is in a previous comment). If they should be there, please let me know, along with anything else I should add. Emw2012 (talk) 04:03, 29 September 2008 (UTC)".  I just reviewed every reference again, and, among all the references needing a publisher to my understanding, found one without a publisher; there was also one without an author and one without an accessdate.  Since several websites do not list an author, I have omitted that attribute to corresponding references, listing only publishers.  Let me know whether you think the information on references is now satisfactory; if it isn't then please let me know which references to add to and what you'd like me to add.  I will search around for any potential criticism and incorporate any findings before the end of Friday. Thanks again, Emw2012 (talk) 22:41, 8 October 2008 (UTC)
 * Here is a sample of the work needed, from two sections only; it should be apparent to reviewers when a statement is sourced to an internet forum or a self-published source, so they can evaluate the statements for reliability. Boincstats.com as publisher was missing on several in those sections only, forum sources weren't identified, and there were other misc citation items like missing accessdates.  Please complete this work thoughout.  Sandy Georgia  (Talk) 23:10, 8 October 2008 (UTC)


 * Weak Support - I couldn't really make up my mind. My problems with the article center around the "Disease-related research" section. My problem with this section is with the tiny paragraphs and each given a subsection while not being that large. I have a similar problem with "Comparison to similar distributed computing projects". I would recommend removing the subheadings, having it in one large section, and finding a way to merge the paragraphs in a more fluid way. Sorry if I couldn't be more of a help with a better review. It was an interesting article and I didn't really see anything that didn't appeal to me besides the above. Ottava Rima (talk)
 * In light of there now being two experienced editors who have suggested removing subheadings in the 'Disease-related research' section, I'll take care of that soon. I will expand the subsections in 'Comparison to similar computing projects' to at least two paragraphs each; I think they can be filled without simply adding fluff. Emw2012 (talk) 02:50, 8 October 2008 (UTC)


 * Support - If the "new rules" are that a decision to promote has to be made within a week, then I will register my support now, having no problem accessing the source cited as "forbidden" above by Ealdgyth and accepting Ealdgyth's and Shyama's evaluation of the sources as RS, as well as my own evaluation of the matter.  Also,  per GrahamColm's support.   Was waiting for Orangemarlin's response, but with the week deadline I will not wait longer.  &mdash; Mattisse  (Talk) 00:16, 8 October 2008 (UTC)
 * There is no such "rule". Sandy Georgia  (Talk) 21:05, 8 October 2008 (UTC)


 * Support - Thorough and well written. For the references that cite the forum, you can use the Template:Cite web with publisher="boinc.bakerlab.org". Forums can be reliable sources, depending on who's contributing and who's moderating.  There is a general rule not to trust user generated content, however, there are exceptions if the user is an expert writing in their field, especially if there is editorial oversight to ensure quality and accuracy. Jehochman Talk 21:00, 9 October 2008 (UTC)
 * New comment-I just cleaned up the citations in the medical section. I checked them to make sure they verified the statements, which they did.  But the citations were kind of hard to use, lacking PMCID and PMID in almost every case.  Not being a computer person, the rest of the article is not very clear to me, but someone might want to do a citation clean-up.   Orange Marlin  Talk• Contributions 21:13, 9 October 2008 (UTC)
 * I was hoping you would also clean up PMIDs from the top of the article, so I won't have to do that work. (Pointing to PMIDs is preferable to pointing to journal abstracts or journal free full text, as the journals sometimes take down abstracts or free full text.  Also, it makes the citation method consistent with other bio/med articles, using Diberri's PMID template filler, and avoiding subscription only URLs.  We should, however, link to the journal URL when it provides free full text not provided at PubMed Central.  See Wikipedia Signpost/2008-06-30/Dispatches.)  Sandy Georgia  (Talk) 21:19, 9 October 2008 (UTC)
 * This is turning out to take up quite a bit of my time. I noticed that citations were messy even outside of the section I reviewed.  For example, there are a lot of citations that use "et al" using just the main author and not italicizing the et al.  At this point, this article should not be promoted to FA until the citations are cleaned up.  I'll work on them, but usually with articles I read the abstract or source to see if it confirms the statement.  This may take me a long time.  I should have looked more carefully. Orange Marlin  Talk• Contributions 21:31, 9 October 2008 (UTC)
 * Thanks for working on those, Orange; I was chipping away at a few of them myself, but it is time consuming. Sandy Georgia  (Talk) 21:50, 9 October 2008 (UTC)
 * The above discussion is preserved as an archive. Please do not modify it. No further edits should be made to this page.