User talk:EpochFail/ORES audit

Recusing myself
Hey folks. I think that, as the developer of ORES, I ought to recuse myself from assessing these articles. I realize that's awfully convenient of me because it's a lot of work. I'm worried that if I make my own assessments, I might somehow add a bias to the results because of my experience with ORES. --EpochFail (talk &bull; contribs) 21:54, 6 April 2017 (UTC)

Number of assessments
I don't know if the intention is to get one consensual assessment for each article, or a set of personal assessments for each article. The latter will be less work as no negotiation will be required. Also, are we expected to motivate our assessments in any way? For example, without a reasonable number of inline citations, I will give nothing higher than C no matter how good it is otherwise. &bull; &bull; &bull; Peter (Southwood) (talk): 05:55, 7 April 2017 (UTC)
 * Hi Peter. I'm really only looking for one assessment per article.  I imagine that if there are some higher quality assessment (GA or FA), we'll want to double-check those and maybe have a discussion.  I'm striving for a similar level of rigor that regular old WikiProject assessments get. --EpochFail  (talk &bull; contribs) 13:07, 7 April 2017 (UTC)

Absolute scale for assessment?
Currently assessments happen in terms of a project. For example, the American Health Care Act might rank high importance in WikiProject United States, but low importance in WikiProject Medicine. This scheme here seems to be requesting importance rankings outside the context of WikiProjects. Is that what you are seeking?  Blue Rasberry  (talk)  12:36, 7 April 2017 (UTC)
 * Bluerasberry, I'm not looking for any indication of importance here -- just the quality class. Though I am working on that in a related project (m:Research:Automated classification of article importance).  Nettrom is digging into the complexity of WPMED right now. --EpochFail  (talk &bull; contribs) 13:04, 7 April 2017 (UTC)
 * Sorry, yes, I got confused about the two projects, but the same question applies. I thought to provide an example - like for example, maybe an article like "banana" could be high quality in terms of its information about food but low quality in terms of information about botany. However, after checking a number of articles which I thought would have noted differences, it seems to be the case that articles tend to get consistent quality ratings that might not be related to the WikiProject. For GA and FA articles this seems to be uniformly the case. I will give a go at ranking as presented here.  Blue Rasberry   (talk)  14:23, 7 April 2017 (UTC)
 * Thanks Bluerasberry. I see what you're saying.  When we built up the wp10 model for ORES, Nettrom and I made similar observations about article quality assessments.  WikiProject tend to agree more than they disagree.  I'd be interested in discussing any examples where you think there's different assessments warranted.  We should adapt our dataset to reality rather than the other way around. --EpochFail  (talk &bull; contribs) 14:37, 7 April 2017 (UTC)
 * Project-specific differences in class are unusual (although a few have extra classes or don't use all of them). The practice of rating the same article as a stub for its botanical content and B for its food content is discouraged:  a long article is never a stub just because one area is insufficiently developed.  So, at the most, you'll get a difference between a Start- and C-class or between a C- and B-class rating.  WhatamIdoing (talk) 18:18, 7 April 2017 (UTC)
 * Remember that many people will see that WikiProject X classed it as C-class, so when I assess it for WikiProject Y I glance at the article and see that "C-Class looks about right" so I do the same. I've had conversations that indicate that this was very common for articles getting their first assessments.  I suspect that if (in this scenario) I hadn't been able to see the WikiProject X assessment I might have rated it as Start or B.  In other words, it's not true to say that people are incredibly consistent in making assessments! Walkerma (talk) 04:02, 17 April 2017 (UTC)

Example of correct evaluation
Is this an example of an evaluation which is reported correctly in this ORES audit?  Blue Rasberry  (talk)  12:36, 7 April 2017 (UTC)
 * Bluerasberry, Yup! That looks great.  Nothing fancy.  Thank you! --EpochFail  (talk &bull; contribs) 13:02, 7 April 2017 (UTC)

Worst articles
Okay, after doing a few of these I see the pattern that they are all considered horrible articles. Many of these articles are at high risk of deletion, and would be rejected as new articles in the WP:AFC evaluation system, and the usual response to anyone submitting content like this would be some variation of "this is wrong". I ranked a set of them as "start" for having no citations.

These articles are off the usual scale as being content which the Wikimedia community typically does not want. It seems obvious to me that the scale is not set up to evaluate content like this in a way that communicates what the grades are supposed to report.  Blue Rasberry  (talk)  14:33, 7 April 2017 (UTC)


 * Thanks Bluerasberry, that's actually a useful conclusion IMO. We use ORES to look at trends in article quality historically and this is a good limitation to be aware off.  Wikipedia changed so, looking back is a bit weird.  I think it's perfect that you're applying today's standards though.  I believe that ORES is applying today's standards as well.  What other patterns are you seeing in these old articles that make them hard to evaluate with our current criteria?  --EpochFail  (talk &bull; contribs) 14:40, 7 April 2017 (UTC)


 * I'm just coming to this now - I hope to do some assessments over the coming days. Thanks for your work on this!  FYI: I was the originator of this assessment scheme (after changes suggested by WP:CHEM), which I brought to WP:1 when I got involved there.  I wanted to point out that the standards back in 2006 were very different.  I think it was still optional then to have inline references at all - I recall telling someone that "their" featured article would get demoted if they didn't have any inline references and they were quite upset about it.  An article that would make FA then might only make C-class today for that reason, and even a typical B-class article with inline references from 2006 might only be a C-class or even a Start by today's standards.  I suspect that standards since around 2012 have been much more stable on EN:WP.  So you should make it clear whether you want the assessments based on the 2006 standards or the current ones.
 * Also, you should be aware that we didn't introduce C-class until later (I think it was 2010), and I think GA didn't come along till around 2006. This is an interesting idea, so I'll try to make my contribution over the next few days, using whichever you prefer - either the 2006 standards or the 2017 ones.  Thanks! Walkerma (talk) 03:55, 17 April 2017 (UTC)


 * Hi Walkerma! Thanks for helping out.  :)  I'd like you to do your best to apply today's standards when assessing these articles.  It's a bit awkward of course, but maybe it's actually easier to do that because we'd all need to do a lot of digging into histories to figure out how to faithfully apply old assessments.  I'll go add a note about this to the top of the work page now.  --EpochFail  (talk &bull; contribs) 15:57, 17 April 2017 (UTC)

All done -- working on analysis
Thanks for your help folks. I've just turned a snapshot of this page into a dataset. I'll post back here with an analysis of how your assessments compare to ORES shortly. --EpochFail (talk &bull; contribs) 18:53, 1 May 2017 (UTC)