Wikipedia:Wikipedia Signpost/2012-09-10/News and notes

Readability of Simple English and English Wikipedias called into question
In its September issue, the peer-reviewed journal First Monday published The readability of Wikipedia, reporting research which shows that the English Wikipedia is struggling to meet Flesch reading ease test criteria, while the Simple English Wikipedia has "lost its focus".

The statistical method developed by Flesch (1948) focuses on two core components of the concept of readability: word length and sentence length. The test is widely used in the US, with areas of application ranging from Pentagon files to life insurance policies. The concept has been adapted for other languages, including a German version by Toni Amstad (1978). The Flesch test uses the following formula to indicate the readability of a given text:



206.835 - 1.015 \left ( \frac{\mbox{total words}}{\mbox{total sentences}} \right ) - 84.6 \left ( \frac{\mbox{total syllables}}{\mbox{total words}} \right ) $$

Higher scores indicate material that is easier to read, and lower scores that it is more difficult. While in theory the results can vary widely due to the artificial construction of very complex or simple sentences, in practice natural English typically results in a score between 0 and 100, which can be interpreted as shown in the table.

The authors assume that the English Wikipedia should score around 60–70 on average ("standard"), and Simple English, which explicitly aims at audiences with less advanced literacy skills, around 80 ("easy"). An older study, Besten and Dalle (2008), had found on the basis of the same test method that the overall readability of Simple had decreased from around 80 in 2003 to just above 70 in 2006.

The 2012 study examined two 2010 database dumps it sampled from English Wikipedia and Simple. For the study, the scientists filtered out lists, redirects, and disambiguation pages, and removed components such as tables, headings, and images. Thus, the study examined 88% of the English and 85% of Simple Wikipedia's articles in the database dump. In a second step, the methodology excluded short articles with fewer than six sentences (due to their likely wide fluctuation in readability).

The analysis found that English Wikipedia articles scored 51 on average ("fairly difficult") with more than 70% of all articles scoring less than the set goal of 60 ("standard"). Simple scored 62 on average ("standard") with 95% of all entries below the set 80 ("easy") goal. In addition, a set of around 9600 respective articles was comparable between both Wikipedia versions; Simple scored 61 on these, while the related English Wikipedia articles scored 49.

The paper argues that the creation of Simple as a solution for readability issues of the English Wikipedia with some audiences has run into difficulties. The average reading ease of Simple, while still above the English Wikipedia, declined compared to the findings of Besten and Dalle in 2008 (2003: 80, 2006: just above 70) to 62 on average. Based on the outlined methodology, the authors conclude that Simple has "lost its focus … this version now seems suitable for the average reader, instead of aiming at those with limited language abilities."

The English Wikipedia findings indicate that the results of another study in 2010, focusing on the readability of English Wikipedia entries on cancer (Signpost coverage), cannot be fully generalized. The paper in 2010 found that articles in the targeted topic area scored about 30 on average.

However, both studies show that the English Wikipedia potentially excludes major segments of the English-speaking world, including (for example) large parts of the US public. According to a major study on literacy in the US in 2002, 21–23% (extrapolated: more than 40 million people) "demonstrated skills in the lowest level of prose, document, and quantitative proficiencies".

The authors of the study on readability of Wikipedia have set up a demo site where users can calculate the readability of English and Simple English Wikipedia pages based on the automatic measure they deployed in the paper.

Brief notes

 * WMF RfC: The ongoing RfC on whether to establish a legal fees assistance program for volunteer role-specific risks that go beyond the contributor defense policy already in place is in full swing.
 * Audit committee call for volunteers: The WMF audit committee, overseeing financial and audit issues on behalf of the WMF board, has published a call for volunteers to serve on the 2012–13 committee.
 * English Wikipedia report
 * Arbitration Committee: the committee took several actions this week, including a desysop and passing two motions regarding prior cases (Falun Gong 2, The Troubles). Four clarification and amendment requests remain open.
 * Main-page redesign competition: 23 proposals have been lodged, and discussion about the issue continues on the competition talk page. Editors are welcome to submit their own proposals until 30 September.
 * Pending Changes update: The Request for Comment on Pending Changes Level Two continues. Currently, there is an 11–2 vote on the following proposition: "... if there's no clear consensus for one particular point of view after a week (which seems likely at the moment), is there any objection to trying to come up with a third option in a 'committee' that anyone could join?" That committee page is at WP:PC2012/Committee.