Wikipedia talk:Ambassadors/Research/Article quality

DYK
I like the idea of offering a scale by which students can assess whether the article might be a good candidate for GA/FA. Perhaps you could establish minimums for good DYK candidates as well? Rob SchnautZ (WMF) (talk • contribs) 18:24, 14 May 2012 (UTC)

What is the point of this?
Who uses this data and how?  Blue Rasberry   (talk)   18:01, 17 May 2012 (UTC)
 * Great question. I'll definitely add some more introductory context to this as soon as I have the chance to draft it! -- LiAnna Davis (WMF) (talk) 23:04, 17 May 2012 (UTC)
 * The main use, in my eyes, is to be able to provide a numerical assessment of the impact of the program. There is some controversy about the impact of the education programs and it's hard to assess the value of student contributions from anecdotal evidence; pages like this will help the community determine whether students are a net benefit to the encyclopedia. Mike Christie (talk - contribs -  library) 10:19, 18 May 2012 (UTC)
 * Thanks, both of you. LiAnna, the text you added says "we" are doing things. Who is "we"? Can you add a section listing the coordinators of this project? Thanks.  Blue Rasberry    (talk)   12:41, 19 May 2012 (UTC)
 * I'm not sure quite what such a section would say at the moment, since the coordination is in flux. Coordination of education program work predates WMF involvement, but the big push, as I understand it, came with the public policy initiative, which was grant-funded and so had a large WMF component.  For subsequent Education Program work there has been a mix of WMF and community leadership; there is or was a steering committee for the ambassador program, for example.  The WMF has decided it can't continue to play such a large role in running a program like this, and there is a working group that will meet after Wikimania to come up with plans for an organizational structure for the EP.  It can't be coordinated solely off-wiki because the impact on the encyclopedia is going to be enormous; it can't be resourced solely on-wiki because there's a lot of off-wiki activity needed -- most of the involved academics are not Wikipedians, for example.


 * So one answer to your question would be to say: LiAnna and I decided this page was needed, so she created it. In the case of the PPI there was official status for the metrics -- they were required (if I recall correctly) by the terms of the grant.  Here, LiAnna (after discussion with me and others) felt like this was a necessary part of measuring the EP's impact, so she built it.  I see this as part of the transition from off-wiki coordination of the EP to more traditional WP processes; you could regard this as a nascent Wikiproject, for example.


 * Does that answer your question? Mike Christie (talk - contribs - library) 23:19, 19 May 2012 (UTC)
 * It is enough for this page. I want to schedule another appointment to talk to you. I just emailed you about this.  Blue Rasberry    (talk)   13:45, 20 May 2012 (UTC)

participation
Does one have to be an "ambassador" to review here? Riggr Mortis (talk) 22:54, 17 May 2012 (UTC)
 * Not at all; just someone who understands Wikipedia article quality enough to rate articles! As mentioned above, I'll get more context for this up soon. -- LiAnna Davis (WMF) (talk) 23:04, 17 May 2012 (UTC)
 * Thanks for that quick reply. A second question: the reviews are based on a certain version of the article. Does that mean that it would not be a "conflict of interest", for lack of a better term, for me to make changes to the article (formatting, editing, etc.) in the course of the review? I'm not one for idly reviewing in informal processes such as this. Riggr Mortis (talk) 23:16, 17 May 2012 (UTC)
 * Not a conflict of interest at all — in fact, the current versions of some articles are very different from the versions I've pulled for review. The most important thing is to review the version I linked to so everyone is reviewing the same version of the article; you're welcome and encouraged to make any additional edits to the current version of articles as you see fit.-- LiAnna Davis (WMF) (talk) 23:32, 17 May 2012 (UTC)
 * Hi, Riggr; it would be great to have you review some of these -- I think the Education Program is going to have a big impact on the encyclopedia and understanding that impact by measuring the improvement (or otherwise) in article quality is a key part of the program. As LiAnna said, there's no reason not to improve every article as you go.  I don't do that myself because I'm keen to get everything reviewed, but if people want to approach it that way it would be a big help. Mike Christie (talk - contribs -  library) 09:53, 18 May 2012 (UTC)

Aircorn's changes to the illustration section
I'm not sure I agree with this change. For example, if an article has no images of a living person, and there are none on commons and no public domain images available, then I wouldn't score an article 2/2 on illustrations. WP:WIAFA just says "it has images and other media when appropriate", which I think is the most succinct way to say it. If a user can't find an image, I don't think that justifies scoring the article more highly. I'd like to revert the change -- any objections? Mike Christie (talk - contribs - library) 09:58, 18 May 2012 (UTC)
 * I might have read it wrong, but it looked like an article with no images would score a zero. This would rule it out of the potential GA ranking. If you can reword it so it does not look like images are required for GAs then that would be better in my opinion. Why not have a N/A score. AIR corn  (talk) 10:06, 18 May 2012 (UTC)
 * Good point about the GA; hadn't thought of that. I think that list, which correlates these scores to the existing A/B/C and FA/GA ranks, is not really very scientific -- if I remember rightly it was put together when the first version of this assessment was built for WP:PPI and has never really been validated.  I'll change that line to note that images aren't required for GA, and revert your change, and I think that will cover it. Mike Christie (talk - contribs -  library) 10:13, 18 May 2012 (UTC)

Data Analyst Introduction
My name is Luis and I'll be conducting the data analysis for this project. I've met some of the reviewers and am excited to be a part of this. I have been working with the data for just a bit and have an analysis plan in mind but am always looking for new and different perspectives. We want this process to be as transparent and thorough as possible, to that end we will be publicizing a desensitized version of the data as soon as it is complete and ready. If you are curious to play with the data and have some experience in data analysis please feel free to. I am very much looking forward to discussing and working on this project with you all. Thank you.

Luis Campos Lfcampos (talk) 23:50, 27 June 2012 (UTC)


 * Hi Luis, is the desensitized data available? John Vandenberg (chat) 01:03, 3 October 2012 (UTC)
 * I've added the raw data to the results talk page here. The related research that includes sensitive data (from survey research) has not been completed yet.--Sage Ross (WMF) (talk) 13:52, 3 October 2012 (UTC)
 * Thanks! John Vandenberg (chat) 01:11, 4 October 2012 (UTC)

Control group
Has this article quality rubric been used against a control group? If not, one topic area which would be good to push through this quality rubric is the pages created or modified during the 2012 Olympics relevant to that topic, which brings in an awful lot of new articles and new editors in a short period of time. John Vandenberg (chat) 01:10, 3 October 2012 (UTC)
 * I disagree. The work of the education programme participants is overwhelmingly in areas that don't get much love; this is deliberate, and a Good Thing. Olympics articles get far more attention from both new and experienced editors during the period where the Olympics are headline news: comparing the two is not comparing apples to apples in terms of who is participating, how many people are participating, and the relative experience at editing of the participants. Ironholds (talk) 08:12, 3 October 2012 (UTC)
 * While I agree with Ironholds that it's not apples-to-apples enough to be a real control group to test against, I think using the same rubric on (random selections from) other discrete sets of articles would actually be really cool. It's nice to be able to say, "during the Olympics, new articles in WikiProject Olympics had this distribution of quality, and pre-existing articles about competing athletes improved by that amount." If we had a bunch of similar data sets for all sorts of things, we'd start to get a better picture of how all kinds of things affect article quality: article drives, collaborations of the [time], the WikiCup, other forms of outreach, and so on. (I think the way we've done the analysis so far, with article quality as a simply a sum of the different sub-ratings, could be improved upon. Obviously, points in formatting and illustrations and readability aren't worth much if it's an uncited, non-comprehensive POV screed. So some formula that takes into account the relationship between those scores and the overall quality of an article would be good. But that could be done retroactively, since we've got all the sub-scores.)
 * John, if you're interested in organizing an assessment of (say) 2012 Olympics articles, go for it! I'm in the process of learning R, and would be happy to do what I can to help with analysis.--Sage Ross (WMF) (talk) 14:12, 3 October 2012 (UTC)


 * I agree that Olympics isn't a good comparison for the Education program for the reason you give (its high profile), however I believe, but dont know for sure, that Olympics saw involvement by a lot of newbies in a short period of time, in which case Olympics would be a useful way to road-test the rubric on a completely different type of articles and editor patterns.
 * For an under-loved topic which might be more comparable to the Education program, the HOPAU project and the Paralympics would be better comparison, however it is at a much small scale and that would be tooting the horn of WMAU. ;-)
 * I am all ears for a better suggestion for a set of articles to push through this rubric. Doing an assessment of another cluster will be expensive in terms of volunteer time, so we want to find the best cluster to work on.  iirc, there was recently an editor drive on WP:VITAL articles, led by Casliber?  That might be good.  I think WikiCup would be too far at the other end of the spectrum and completely useless for comparison to any outreach effort or Education program - it would consist of very high quality contributions by long standing members of the community.  Another approach would be to pick a month and assess all new articles created by newbies and old articles that increased size by (say) 50%.  Another idea is to make use of the article feedback data and select all articles which were given negative feedback in January 2012 and see whether they have improved. John Vandenberg (chat) 01:37, 4 October 2012 (UTC)

Where did this project go to from here?
I came to this page as I was looking to find tools to calculate readability scores of Wikipedia articles automatically. Does anyone know if such tools inside of Wikipedia exist? - I found the proposed metric to assess the quatliy of articles really interesting. I take it this project from 2012 is now inactive. Are there other more recent projects that looked into this further? Does anyone know, e.g. John Vandenberg, User:Bluerasberry EMsmile (talk) 08:12, 14 August 2017 (UTC)
 * I do not think that the effort described here progressed beyond 2012.
 * For "Why Medical Schools Should Embrace Wikipedia" the team collaborated with contentrules.com, a commercial company using their own proprietary software to calculate readability scores for student contributions to Wikipedia. They provided free use of their software for this project, but the software is not freely available outside some negotiated collaboration.
 * I came to believe that calculating readability scores was a technical field with lots of variation in testing methods. I never saw a quick and easy answer for evaluating Wikipedia articles in this way.  Blue Rasberry   (talk)  11:35, 14 August 2017 (UTC)
 * Thanks, User:Bluerasberry. I find the metrics that they developed in this project pretty good (their overall score doesn't just look at readability but also other aspects, like sourcing, illustrations etc.). Do you know if other WikiProjects have taken this any further or developed other numerical judging criteria? I have in the past assigned the C, B, start, stub level pretty much on gut feeling, but are the levels A, GA and FA based on a scoring system like the one that had proposed in this project? I mean this part here. I've seen the GA and FA criteria but they are not converted into numbers and are just qualitative statements. EMsmile (talk) 13:16, 14 August 2017 (UTC)
 * Yes at mw:ORES they have magic machine learning software that somehow makes very good judgements about lots of things, including article quality ratings and descriptions. There are some things public about this and some things private. Nothing is widely used by anyone. It takes an information scientist to set up, but it has lists of article ratings somewhere. There might be 5 people who understand this so it is not accessible. However, I think that at the current rate of development it might become the standard for Wikipedia in 1-2 years and perhaps all articles will have deep reports available about them in the future.  Blue Rasberry   (talk)  15:53, 14 August 2017 (UTC)
 * thanks for the information, I will keep my eyes open to find out more. EMsmile (talk) 20:34, 27 August 2017 (UTC)