User:Vir/sandbox/quality levels

The problem: wide variability in quality
There is wide variability in quality on the Good article list. This is arguably a current, extenstive and avoidable GA list problem. If you inspect the GA list, it seems quite a number of articles do not meet two or more GA review criteria -- they are not "good".

Common problems are that many GA-approved articles do not have inline references. A number of the articles I've reviewed (mostly in the society and various social/social science subcategories) seem to be not comprehensive and/or not NPOV. A thorough evaluation of the adherence to each GA criteria by a large group of good articles would be useful at this point.

Solutions
Decrease wide quality variability on the GA list through one of these three strategies (or another to be defined strategy):

1. Tighten application of GA criteria: make the application of GA criteria more stringent and reduce greatly the number of GA articles.
 * 1a. Gradual tightening: Grappler has argued that application of criteria will gradually tighten over time, perhaps several years.
 * 1b. Revise soon: Vir has argued: Why wait and allow a problem to persist, even grow? Why loose evaluations of articles down the road? Fix the problem now by adjusting the process.

2. Create a new review project (above or below GA): If GA criteria are loosely applied (not fully applied) or applied tightly, this might make space for another review project. These could be:
 * 2a. a review stage above GA, if GA criteria are loosely applied (not fully applied).
 * 2b. a review stege below GA, if GA criteria are applied tightly.

3. Create quality levels: Have a set of quality levels (perhaps only two) evaluated by a grading scheme. This need not add bureaucracy in terms of lots of extra reviews. GA could still have only two reviews (the nomination and the approval), per review step. But, it could involve more round(s) of review.

Article review criteria
These are various Wikipedia article review or assessment schemes:


 * Article Assement: Assessing an article
 * Good article criteria: What is a good article?
 * Featured article criteria: What is a featured article?

Article quality levels
This is distinct from the grading scheme issue below. What will the ranks of articles be called? Here is one option:
 * Version 1.0 quality levels: Article progress grading scheme


 * Revised 1.0 quality levels (GA-class added by Titoxd, C-class suggested by Vir):

A five domain grading outline
This is a revised version of Stevage's outline on the GA talk page. This could be used to refine the evaluation of good article nominees:
 * Comprehensiveness: rated A, B, C, D, F (A is more comprehensive than Britannica, F is barely a stub)
 * Verifiability: A, B, C, D, F (A is at the level of academic papers, F has no sources and is a suspected hoax)
 * Writing: A, B, C, D, F (A is brilliant prose, F is barely English)
 * Structure/approachability: A, B, C, D, F (A being a brilliantly structured article useful to a layman, F consists of random bits of trivia or is helpful only to insiders)
 * Neutral Point of View: rated A, B, C, D, F (A is all notable views equally represented, F is a very biased push of one POV).

Five domain math-based outlines
These avoid confusion with the A-Class and B-Class WP 1.0 project ranks.


 * Point system: This is the same as the above outline but grades are assigned as points from 1-5 (1,2,3,4,5) or 0-4 (0,1,2,3,4,). This allows easily summation of domain ranks for a total article score.


 * Percentage system: This is the same as the above outline but grades are assigned as percentages from 1-100% in grading bands (100-90, 89-80, etc.). This allows for averaging of domain ranks for a total article score and for finer distinctions in quality.

A five domain pass-fail outline
This is a revised version of Stevage's outline on the GA talk page. This could be used to refine the evaluation of good article nominees:
 * Comprehensiveness: rated pass-fail (pass is comprehensive)
 * Verifiability: rated pass-fail (pass is good number of inline references)
 * Writing: rated pass-fail (pass is well written -- to be defined)
 * Structure/approachability: rated pass-fail (pass is well structured article useful to a layperson)
 * Neutral Point of View: rated pass-fail (pass is a number of notable views equally represented).

Discussion
Please feel free to add more points.

The following points are quotes selected from the most recent long discussions strings about these issues on the GA talk page, organized topically, in order of comment:


 * TheGrappler: "I'm personally of the opinion that there is roughly the right amount of process at the moment, but less would be better than more. ... Inline refs aren't about to happen soon: a very large proportion of FAs still lack them (Harvard referencing is also acceptable, but even rarer). But as more and more articles switch to inline referencing it is inevitable that those that lack it will be, relatively speaking, "less good", and we'll have to deal with it then. Still, that could be several years away so I don't think we need to worry about it now, unless we want to lay some structural groundwork well in advance."


 * Vir: A one-time review process iteration with higher standards=more information and quality: This is low cost and voluntary; it is high yield. Rather than rely on a slow gradual sea change of the average body of articles to seep forward, I think it is a fine option for an editorial community to take the already growing upper quarter or eight or even sixteenth of distinct quality articles and recognize that quality. We are talking now of a distinction between often no inline refs -- not a GP criteria -- and inline refs, an actually stated criteria here (though not mandatory). This is a simple distinction and would serve generating more quality articles sooner (and enable yet more production of quality by making detailed reviews easier).


 * Titoxd: "Why not adjust/expand 1.0's grading scheme for this? A-Class articles are the ones that have at least an opportunity of passing FAC, similar to the higher-category GA classifications above"


 * Maurreen: "But if the goal is essentially to be able designate more levels of quality, I support that at least in principle. ... But the name, whatever that level is called, should be clear and straightforward. It shouldn't require any deciphering."


 * Vir: "I think the points by Titoxd (end of comment just above) and Maurreen here are helpful. They both mention a letter-based grading scheme option (A and B, at least) to distinguish quality. That sounds simplest. Label meaning: It doesn't matter what categories are labeled as long as the meaning is clearly associated with the review criteria. One situation in which this is not so clear seems to be the current GA situation. Here, the label "good" refers to some articles that, by existing evaluation criteria, are faulty in one or more ways, hence, are arguably not "good" articles (depending on how "good" is interpreted). This is more of a label issue than a review issue." ... [Further down the discussion string] A key issue: Different levels of quality articles are passing GA review. Based on looking at many GA articles, I'd say that GA approved articles range in quality very approximately (using the scheme above outlined by Stevage) from what might be called C average articles to A average articles. Various ways to address this issue have been raised.


 * TheGrappler: "One problem here is that pages can change rapidly in quality. Attempting to fine-tune the process may be hard - FAs vary substantially in quality, both in terms of comprehensiveness (which may roughly be gauged by length) and quality of referencing (very many FAs lack inline citations, very many are based on only a few references). It is only to be expected that GAs also vary largely in quality - it was meant to be a measure of "minimal good quality" (any article should aspire to be well-written, referenced, and appropriately illustrated) but obviously that would include at the higher end those articles being finetuned for FA status, creating a substantial spread. I am wary of the possibility of labelling a logical consequence of having a minimal quality mark as a "problem". Accurately locating the quality of an article along a scale is substantially subjective (as the experience of 1.0 suggests) and to locate it accurately or "objectively" would take a more complex process than the current one-person-review GA has. So a balance would have to be struck between accuracy and ease of process. A second obstacle is the fact that article quality is very dynamic and every article is a work-in-progress (sometimes very slow progress!) so the metadata that is collected may quickly become outdated. Even the FA metadata (and much effort has been expended to produce it, in terms of long debates on FAC) has a shelf-life of 12-18 months, judging from the the FA Review and FA Removal Candidates. The finer the quality categories by which articles are graded, the faster that metadata may become redundant. A large scale effort to establish accurate, precise article gradings is without doubt doomed to failure. What needs to be compromised, and this needs to be considered with care, is how fine the article gradings should be, how regularly that metadata would be reviewed and how much effort should be expended in checking that the grading is accurate (i.e. how rigorous a process is required?). I think everybody here believes that the previous WP system of "cleanup", "stub", "featured", "everything else" leaves too big a gap in the middle. Accusations that GA lets in articles of too large a variety of quality need to be judged against the previous situation i.e. no special status being given to any non-stub, non-cleanup article, whatever its quality. If a further breakdown of GA status is deemed desirable, it would be a good idea to integrate it very closely with WP1.0, which is something that GA hasn't done terribly well itself. The forthcoming "article validation feature" should really be considered as well. I'm not necessarily "for" or "against" anything (although I am highly dubious that article quality can be established "scientifically", which makes me wary of finer gradings, and think that it's probably more important that gradings have consensus established than that they should be precise) but would like to see a more concrete proposal that addresses the article validation feature, process and reliability issues, WP1.0 integration, shelf-life of the metadata and whether or not there is sufficient interest and enthusiasm to maintain it (GA seems to have developed a large core of supporters who have maintained and extended it; a more complex and technical project might not). The current GA system certainly has flaws but it also has big advantages, especially its simplicity (this seems, judging by talk page comments from casual users not deeply involved in the project, to actually be its major selling point). It is hard to know how to weigh the advantages and disadvantages except in comparison to specific review systems: for instance, it provides less information about an article than WP:AA does (a place where articles are rated separately on things like writing quality, references, images) but it is simpler and the data lasts longer (many AA reviews will now be completely outdated, and are unlikely to be revised for well over a year). It has less effective quality control that WP:FAC but it is simpler and easier as a system and gives recognition to more than the very best articles. It seems to be slightly less subjective than the WP 1.0 ratings, but provides less specific information about quality (although the simpler information may actually age better). It would be nice to see a couple of specific proposals and what people think about it; I would urge that any changes be made with wide consultation, especially with "casual users" of the GA system, who are effectively its most important users (rather than the core of people who maintain it). TheGrappler 19:01, 20 April 2006 (UTC)"


 * Nifboy: "To me, the point is the amount of meta-Wiki effort that can be exerted towards projects like GA is rather limited. The reason GA actually works right now (unlike WP:AA, which has been stuck on the same topic for over a month now) is that it's a very low-effort, high-thoroughput process. Trying to produce and maintain a better rating scheme requires much more effort; more than is really "out there" for use. Nifboy 20:32, 20 April 2006 (UTC)"


 * Vir: "Nifboy, Maybe, maybe not. Depends on the skill (and work invested) in refining and implementing a widely useable review process. Because of widely varying quality in GA approvals, the current GA process really seems not to be specified enough. How to better specify the GA process and keep things doable? ... --Vir 21:15, 20 April 2006 (UTC)"


 * Titoxd: "Having a wide swath of quality in GA can be a problem, but it can also be looked on the good side: a GA should meet at least some basic criteria, even if it doesn't meet all the criteria for FA. Basically, GA could be used as a "minimum guaranteed quality" mark, and plugging it into the Assessment scale for WP:1.0, we would have something like this: [See altered 1.0 table above]. The benefit of this is that as a baseline grade, there is a guarantee of at least some quality for the article. Also, if GA-Class articles are being reviewed, and one of them looks more like a B-Class article than a GA, it gives a few criteria for delisting. Any comments? Tito xd (?!? - help us) 23:32, 20 April 2006 (UTC)


 * Vir:I think the implication of what you're are suggesting is option 1, defined just above the table: Tighten application of GA criteria. (This is because some current GA-class articles are B-class or even "C-class".) If so, how can GA editors easily make a refined evaluation between B-class and Good articles? I think it would be easier to just call Good articles "B-class" and perhaps combine or change the criteria a bit. Regardless... A separate question from final ranking levels is evaluation process (or assigning ranking amongst levels): would evaluating amongst classes be based on a grading scale criteria? Or, would another another simpler evaluation scheme be better, such as a simple pass/fail rating on each of 5 criteria mentioned above (with, for example, failing 2 or more criteria = less than B-class, perhaps called C-class)? For an example of a pass-fail rating system, see: A five part pass-fail outline, which could be applied to differentiate B-class and "C-class" articles. --Vir 00:48, 21 April 2006 (UTC)


 * Titoxd: In reply to the comments below the table: essentially, the difference between B-Class and GAs would be NPOV and referencing, in a pass-or-fail scale, the way I personally use the 1.0 criteria for WikiProject Tropical cyclones. Start-Class is essentially equivalent to C-Class, while Stub-class (D-Class) and F-Class would be unusuable for our purposes, so there wouldn't be any need to differentiate among them. I personally think that the 1.0 scale is ok as-is, but there have been several inquiries as to where GAs fall, and due to the uncertainty you mention, they can only be given the least common denominator of GA-Class, and then iff they meet heavier requirements, closer to WP:WIAFA, they can be called A-Class. Confirmation of A-Class would be passing at WP:FAC. Tito xd (?!? - help us) 05:45, 21 April 2006 (UTC)


 * Walkerma: I'd say that I consider A-Class to be ready to submit for peer review before FAC - since our assessments are generally not expert assessments, the peer review/FAC processes would probably find some minor gaps & problems before it became FA. Walkerma 06:23, 21 April 2006 (UTC)


 * Nifboy: To me, the line between FAC and peer review is increasingly blurry; FAC just has votes attached to the comments and actually tell you when they like the article. Nifboy 06:43, 21 April 2006 (UTC)

Discussion
Please feel free to add more points.

The following points are quotes from the most recent discussions about these issues on the GA talk page, organized topically:


 * TheGrappler: "I think GA is basically coming up to a transition point - because it is still only proposed policy it isn't on the "article roadmap" yet. Hopefully, one day it will go GA (optional) --> Peer review (currently overloaded, overworked, and with lots of articles that make simple slips like not being referenced, failing WP:FICT, having wrong image tags etc that actually GA would be better at weeding out) --> FAC --> FA. However, at the moment, the bulk of the GA work is going over and reviewing the existing corpus of WP - most GA nominations are by GA project people, because (being unofficial and relatively new) we're not on most people's horizon yet. So, since our current task is more "trying to establish what good stuff is floating about already" than "providing a review process that editors can look up to/try to meet the standards for" GA probably doesn't have a great effect on pushing up standards. When we switch more completely into the second role, GA may well have a crucial role in driving up standards, particularly the standards of "bulk" articles (ones that are not likely to get up to FA, either due to choice of topic or due to lack of editors' time). I actually agree that we ought to switch over to inline references as soon as it seems feasible. But I also feel this is too soon - it's still new, too few articles currently have them, and moreover, having any references at all is still dangerously rare. One of the big problems with waiting too long on this is that, due to editor turnover, some things may become impossible to cite (new editors who have "adopted" an article left by a departee may be using different references to the original, and won't be able to check which bits of the text they adopted came from where) and another is that we'd have to do an absolutely enormous purge of the current GA list... TheGrappler 20:33, 14 April 2006 (UTC)"


 * Vir: Grappler, what you say about timing sounds reasonable. What you say about article flow sounds reasonable. Given the early stage in terms of review of most GA articles, “Good” seems to be a misnomer or misleading in several ways. Not all of these articles are "good" in quality, nor are the "good" in terms of being far along in their process to becoming finished articles. The four steps of the review process might be well to define on the top of the GA page (understanding that while some GA articles may get good reviews early, many may not). Whenever a switch to requiring inline referencing is made, I think it is ok for early good articles to be grandfathered for a longish bit of time (a year or two). This may be a reason to consider renaming this article collection to “(full) Draft not reviewed” or whatever gets at the meaning. --Vir 21:20, 14 April 2006 (UTC)"


 * TheGrappler: "I actually agree that "good article" is a misnomer. I reckon only our better FAs are "good" when compared to commercial encyclopedic content, for instance. However, I can't think of anything better to call them! I noticed that the equivalent of "GA" in other languages is often rendered "articles worth reading"/"worth a look" rather than "excellent articles" (interestingly - if you look at the vote about what to call FAs when the original "Brilliant Prose" scheme was changed, "excellent articles" is what FAs almost got called). However, I can't think of a snappy English phrase for "worth-a-read articles" that doesn't sound (frankly) demeaning, so "good" may be the best we can do. ... Trying to split hairs between "good but actually not that good", "good good" and "even better good but not quite featured" doesn't sound like a great idea (although I must say it is pretty well thought-out and it would solve the "grandfathering" problem - I'd recommend putting something like that in place for a WikiProject with a known number of articles - perhaps several hundred - to be covered, and where the project wants to monitor all their progressions; extending it up to cover all of WP, I doubt it would scale well either in terms of number of articles or, perhaps more critically, number of editors). Interesting idea, but I think it would probably be better to focus on something more specific and simple, like trying to bring up all "good" articles to a state where they have inline references, than to quibble too much about what type of good article it is. The reviewing and nominating system is getting a little flooded as it is! TheGrappler 01:38, 15 April 2006 (UTC)"


 * Vir: "I'd like to emphasize the reasons that I suggested the scheme: The way you characterize this, it sounds like this not be a workable idea (though i'm not sure). The point of suggesting subdividing "good" was to distinguish between articles with inline refs and a peer review, and those without both, however that is done. The point of that emphasizing this division was two-fold: First, to invite and promote more effectively for all good article production to move to including inline references and get peer reviews (or better), by explicitly stating the need for such as part of developing articles. Stating a standard and summarizing the next step on article tags would make a difference. Reminders and stages of labeling, based on needed processes, can make a difference. Second, as you were discussing above, GA might move to the standard that inline refs are required. I was looking for a way possibly to deal with the large mass of articles that are going to be grandfathered in but not meet future "good" standards. A two level rating would seem to do that. Alternately, one could simply note somewhere that review criteria for "good" articles were upgraded on a certain date. ... --Vir 03:21, 15 April 2006 (UTC)"


 * Nifboy: "I don't think adding more steps to the process will get GAs written any faster. If anything you'll have more people go "fuck the process" and do things their way; that's why WP:FF has ten GAs and zero featured articles (one featured list). Nifboy 04:04, 15 April 2006 (UTC)"


 * TheGrappler: "Like I said, I really like the way your proposal deals with grandfathering. If GA moves to requiring inline refs, then we might need something a bit like this, although I think just putting a "*" on the list by articles that lack inline refs is all the complication we need rather than a new class and review system. Otherwise, like Nifboy says, I doubt it will achieve the primary aim of the project, which is to encourage people to produce better quality articles. If the system is hard to get to grips with then people just won't follow it - and people can find relatively simple things hard to get to grips with! TheGrappler 04:48, 15 April 2006 (UTC)"


 * Vir: "Nifboy and Grappler, I just don't know if an extra GA step would be too complex. Going at this from another direction: If not a very common process step (that is easier by virtue being part of an existing process, like GA, rather than an extra work group or sphere of work groups), what is the best way you think for the most WP articles (the widest collection of WP articles) to get inline references and reviews by knowledgeable-laypersons/professionals/specialists, sooner than later? Is high quality impossible here in 5 years for 10,000s of articles? Is really high quality decades away? (I hope not, since this could be the planet's general free knowledge base.) Considering various options to meet this goal is worth the effort. --Vir 05:49, 15 April 2006 (UTC)"


 * Llywrch: "With all due respect, aren't we getting "process-bound"? I haven't been as active in this project in the last couple of months as I was before, but in the description of this proposed standard, it still mentions that this is intended to be as rigid or formal as the FA process. And yet I see that this project is creeping slowly towards being a "Feature article" light -- which many of its critics have accused this project of being. ... Is the need for inline references really necessary? A good article ought to be clearly acknowledgable as "good", rather than conform to some set of rules which could always be gamed. -- llywrch 01:32, 17 April 2006 (UTC)"


 * TheGrappler: Multi-staging looks excessively complicated and probably doesn't have a WP:SNOWBALL's chance of getting consensus. When "brilliant prose" changed to "featured articles" there was simply a mass-re-review of potentially dodgy articles; something similar could happen here. It's not a terrible thing that standards here are getting higher, and more exact criteria help for consistency (FA criteria could probably be reduced to a 15 word sentence and consistency would still be achieved because of the nature of FAC; with GA something more akin to a checklist is probably needed, although it is very true that it risks gameability). WP:WIAGA has grown recently and could probably do with a trimming to make it simpler. WP:GA isn't really here to enforce quality, it's here to identify and encourage it - and to do that we need to avoid complicated process and develop useful criteria (easy to understand, relatively easy to check for either a writer or reviwer, and which actually describe an article of quality if not distinction).TheGrappler 04:48, 17 April 2006 (UTC)


 * Vir: A helpful repetition of the same process is not complex. [A proposed] second iteration of existing WP:WIAGA criteria, applied with more rigor, consistently... isn't irrelevant or extra. It allows [growing] the largest possible pool of good articles (both "ok" ones and really "good" ones). It simply repeats a process that exists here, one time. For those who care to do it, fine. Those who don't want to wouldn't have to. ... Improvement through iteration of an existing standard at higher level is not a problem -- it is a polishing system, helpful. ...


 * Maurreen: "Instead of subdividing GA, it would probably be simpler to set up some other page and process to handle the additional level."


 * Stevage: "Once a decent review process is set up that returns valuable information, then the process of rating articles becomes trivial. The problem is that the current review process for good articles is set up to categorise each facet of the article into either "good" or "not good". Imagine this for a review process: Given such a review, it is easy to say that a featured article must be minimum B in every area with at least two A's, or that a good article must be minimum C in every area. You could even say that articles with any Fs are excluded from dumps for mirrors, define additional levels of "acceptable" articles or whatever. I do feel that setting up parallel, but not identical, processes for featured articles and good articles involving similar, but not identical, reviewing processes is redundant in a bad way. Stevage 11:18, 20 April 2006 (UTC)


 * Vir: "Duplicate processes: The English FAC and GAN evaluation systems are quite different. FAC involves many votes and much discussion per article evaluation and GAN involves only 2 evaluations (the nomination and the approval). I brought up the German/Swedish FAC case, which judges good and featured status at same time (presumably), as a rough parallel to the case of judging "Good" and "OK" articles (or A or B average scores and B or C average scores, excluding any with Fs) in the same evaluation system."


 * Davodd: "I oppose making this project have any more steps/hurdles/hoops/beuaracracy than it currently has. It is simple, elegant and doesn't need fixed, per se, as the first step an article takes on its move to be WP:FA and WP ver 1.0.
 * That said, I think the leap from stub to WP:GA is not comparable to that of WP:GA to WP:FA. We need ANOTHER project that focuses on article in the WP:GA->WP:FA space. Something akin to Good (have) -> Better/Excellent (need) -> Peer Review (have) -> Featured (have) -> WP Ver. 1.0. We need the WP:Better or WP:Excellent project, the closest we have to that is the WP:Peer review., which is not really right either. - Davodd 21:27, 20 April 2006 (UTC)"


 * Nifboy: "I highly doubt another reviewing/analyzing/rewarding project would have the support necessary to keep it running. It just adds more steps/hurdles/hoops/beaurocracy to the process of assessing articles' quality, even if each step itself doesn't have any added complexity. Secondly, I think the space between GA and FA doesn't allow for an intermediate "step": The random variation in articles' quality is too big. Nifboy 21:40, 20 April 2006 (UTC)"


 * Davodd: "It makes no sense to me (aside from naming scheme) as too what the effective difference is between splitting WP:GA into two (or more) separate projects or into two (or more) separate grading schemes. Aside from adding the bureaucracy that this project supposedly is trying to avoid, adding more steps/hurdles/complicated rules is - basically - shoehorning a second or third project into the current one under the guise of not creating a second project. If your supposition is true that there will not be enough interest to support such a endeavor, in essence by adding a second (or third) sub project to WP:AG we may be threatening to break a working project simply because an extra step is needed after this project. Feels like rule creep to to me. - Davodd 22:01, 20 April 2006 (UTC)"


 * Maurreen: Current summary, as I see it:
 * I don't see any consensus that GA has a problem or any need to change, and I doubt such a consensus would develop anytime soon.
 * If it is important enough to anyone to designate an additional level, it would probably be simplest all around for that to be separate from GA. (I had thought myself of doing this before, but it isn't important enough to me, at least for now.) Maurreen 04:30, 21 April 2006 (UTC)


 * Vir: You're probably right about consensus not developing soon about changing the GA process. I think it is worth keeping track of options for dealing with an underlying problem in GA -- the wide variability in quality of articles (some of which are barely "ok"). A separate project might be the thing.
 * One possibly agreeable change: this project and the English WP might be better served by changing the project's name to something like "Adequate" or "Basic" articles, or other options which you mentioned previously. --Vir 05:22, 21 April 2006 (UTC)