Wikipedia:Wikipedia Signpost/2012-09-24/Recent research

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

"The rise and decline" of the English Wikipedia
A paper to appear in a special issue of American Behavioral Scientist (summarized in the research index) sheds new light on the English Wikipedia's declining editor growth and retention trends. The paper describes how "several changes that the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have lead to a more restrictive environment for newcomers". The number of active Wikipedia editors has been declining since 2007 and research examining data up to September 2009 has shown that the root of the problem has been the declining retention of new editors. The authors show this decline is mainly due to a decline among desirable, good-faith newcomers, and point to three factors contributing to the increasingly "restrictive environment" they face.



First, Wikipedia is increasingly likely to reject desirable newcomers' contributions, be it in the form of reverts or deletions. Second, it is increasingly likely to greet them with impersonal messages; the authors cite a study that shows that by mid 2008 over half of new users received their first message in a depersonalized format, usually as a warning from a bot, or an editor using a semi-automated tool. They show a correlation between the growing use of various depersonalized tools for dealing with newcomers, and the dropping retention of newcomers. The authors speculate that unwanted but good faithed contributions were likely handled differently in the early years of the project – unwanted changes were fixed and non-notable articles were merged. Startlingly, the authors find that a significant number of first time editors will make an inquiry about their reverted edit on the talk page of the article they were reverted on only to be ignored by the Wikipedians who reverted them. Specifically editors who use vandal-fighting tools like Huggle or Twinkle are increasingly less likely to follow the Bold, revert, discuss cycle and respond to discussions about their reverts.

As a third factor, the authors note that the majority of Wikipedia rules were created before 2007 and have not changed much since, and thus new editors face the environment where they have little influence on the rules that govern their behavior, and more importantly, how others should behave toward them. The authors note that this violates Ostrom's 3rd principle for stable local common pool resource management, by effectively excluding a group that is very vulnerable to certain rules from being able to effectively influence them.

The authors recognize that automated tools and extensive rules are needed to deal with vandalism and manage a complex project, but they caution that the currently evolved customs and procedures are not sustainable for the long term. They suggest Wikipedia editors could copy the strategy of distributed, automated tools that have proven so effective at dealing with vandalism (e.g. Huggle & User:ClueBot NG) to build tools that aid in identifying and supporting desirable newcomers (a task in which Wikipedia increasingly fails ). Further, they recommend that the newcomers are given a voice, if indirectly via mentors, when it comes to how rules are created and applied.

Overall, the authors present a series of very compelling arguments, and the only complaint this reviewer has is that (even though three of the four were among the Wikimedia Foundation's visiting researchers for the Summer of Research 2011) they do not discuss the fact that the Foundation and the wider community has recognized similar issues, and has engaged in debates, studies, pilot programs and such aimed to remedy the issue (see for example the WMF Editor Trends Study).

Literature reviews of Wikipedia's inputs, processes, and outputs
Nicolas Jullien's "What we know about Wikipedia. A review of the literature analyzing the project(s)" is an attempt at a "comprehensive" literature review of academic research on Wikipedia. Jullien works to distinguish his literature review from previous attempts like those of Okoli and collaborators (cf. earlier coverage: "A systematic review of the Wikipedia literature") and of Park which tend to split the literature into three main themes: (1) motivations of editors to contribute and relationship between motivation and contribution quality, (2) editorial processes and organization and its relationship to quality and (3) the quality and reliability of production.

Jullien builds on this basic framework by Carillo and Okoli, but distinguishes his from their work in several ways. First, Jullien holds that previous work has focused too little on the outputs, which his analysis emphasizes more. Second and crucially, Jullien's review is not limited to material published in journals and, as a result, is more representative of fields like computer science, HCI, and CSCW, which publish many of their most influential articles in conference proceedings. Jullien does not consider articles on how Wikipedia is used, questions of tools and their improvement, and studies that only use Wikipedia as a database (e.g., to test an algorithm). Other than this, the study is not limited to any particular field. It covers articles published in English, French and Spanish before December 2011, mostly based on searches in WebofScience and Scopus (sharing the search query used in the latter). The review is structured around inputs, processes, and outputs.

In terms of inputs, Jullien considers broad cultural factors in the broader environment and questions of why people choose to participate or join Wikipedia. In terms of process, he considers questions about the activities and roles of contributors, the social (e.g., network) structure of both the projects and the individuals who participants, the role of teams and organization of people within them, the processes around editing, creation, deletion, and promotion of articles with a particular focus on conflict, and questions of management and leadership. In terms of outputs, the paper divides publications into studies of process, Wikipedia user experience, the external evaluation of Wikipedia articles, and questions of Wikipedia coverage.

A second recent preprint by Taha Yasseri and János Kertész likewise gives an overview of vast areas of recent research about Wikipedia. Subtitled "Sociophysical studies of Wikipedia" and citing 114 references, it compares some of the authors' own results on e.g. editing patterns (covered in several past issues of this research report, e.g.: "Dynamics of edit wars") with existing literature. The review focuses on quantitative data-driven analyses of Wikipedia production, reproduces and reports a series of previous analyses, and extends some of the earlier findings.

After a detailed description of how Wikipedia works, the authors walk through a series of types of quantitative analyses of patterns of editing to Wikipedia. They use "blocking" of edits to characterize good and "bad" editors and describe different editing patterns between these groups. The authors show that editors, in general, tend to edit in a "bursty" pattern with long periods of breaks and that editing tends to follow daily and weekly patterns that vary by culture. They also walk through several approaches for classifying edits by type, and discuss the characterization of linguistic features with an emphasis on readability.

Much of their article is focused on the issue of conflicts and edit warring. The authors pay particular attention both to the identification of conflicts and of controversial articles and topics and to characterizing the nature of edit warring itself. The paper ends with the description of an agent-based model of edit warring and conflict.

WikiSym 2012: overview report
The International Symposium on Wikis and Open Collaboration -– "WikiSym 2012" – was held August 27–29 in Linz, Austria. The three-day conference featured research papers, posters and demonstrations, and open space discussion sessions. About 80 researchers and wiki experts from around the world attended.

WikiSym is an academic conference, now in its eighth year, that seeks to highlight research on wikis and open collaboration systems. This year’s WikiSym had a strong focus on Wikipedia research, with studies that ranged from analyzing breaking news articles on Wikipedia to looking at the behavior of Wikipedia editors and how long they stay active. In all, 17 papers focused on Wikipedia or MediaWiki, and the two keynotes also focused on Wikipedia research.

The first keynote session was given by Jimmy Wales, who discussed challenges for Wikipedia and potential research questions that matter to the Wikimedia community ; Wales focused particularly on questions around diversity of the editing body, how to grow small language communities, and how to retain editors. The closing keynote was given by Brent Hecht, a researcher from Northwestern University, who spoke on techniques for making multilingual comparisons of content across Wikipedia versions, which in turn allows researchers to identify the potential cultural biases of various Wikipedia editions. Hecht found, for instance, that (looking at interwiki links across 25 languages) the majority of Wikipedia article topics only appear in 1 language; that the overlap between major language editions is relatively small; and that the depth of geographical representation varies widely by language, which a bias towards representing the country or place where that edition's language is prominent. Hecht also compared articles on the same topic across Wikipedias to see the degree of similarity between them. Hecht described his work as "hyperlingual", developing techniques to gain a broader perspective on Wikipedia by looking across language editions. His content comparison tool can be seen at the Omnipedia site, and the WikAPIdia API software he developed can be downloaded here. (See also earlier coverage about Omnipedia: "Navigating conceptual maps of Wikipedia language editions")

In addition to the presented papers, some of which are profiled below, WikiSym has a strong tradition of hosting open space sessions in parallel with the main presentations, so that attendees can discuss topics of interest. This year’s open space topics included helping new wiki users; non-text content in wikis (including videos, images, annotations, slideshows and slidecasting); the future of WikiSym; Wikipedia bots; surveying Wikipedia editors; and realtime wiki synchronization and multilingual synchronization feedback. The conference closed with a panel session entitled "What Aren't We Measuring?", where panelists discussed and debated various methods for quantifying wiki-work (by studying editors, edits, and other metrics).

This year's WikiSym was hosted at the Ars Electronica Center in Linz, a "museum of the future" that hosts the Ars Electronica festival every year. The colorful, dramatic Ars Electronica building is in the heart of Linz, so outside of sessions conference attendees enjoyed exploring and socializing in the city center. The conference dinner was held at the Pöstlingberg Schlössl, which is accessed by one of the steepest mountain trams in the world.

WikiSym 2012 papers and poster and demonstration abstracts may be downloaded from the conference website. Next year’s WikiSym is planned for Hong Kong, just before Wikimania 2013. Updates on the schedule and important dates can be found on the WikiSym blog.

On the "Ethnography Matters" blog, participant Heather Ford looked back at the conference, stating that "WikiSym is dominated by big data quantitative analyses of English Wikipedia", asking "where does ethnography belong?" and counting 82% of the Wikipedia-related papers as examining the English Wikipedia and only 18% about other language Wikipedias. A panel at WikiSym 2011 had called to broaden research to other languages (see last year's coverage: "Wiki research beyond the English Wikipedia at WikiSym").

WikiSym 2012 papers
The conference papers and posters included, (apart from several ones that have been covered in earlier issues of this report):


 *  : The dynamics of referencing in Wikipedia : This paper contributes to the debates on Wikipedia's reliability. The authors find that density of references is correlated with the article length (the longer the article, the more references it will have per given amount of text). They also find that references attract more references (suggesting a form of a snowball mechanism at work) and that the majority of references are added in short periods of time by editors who are more experienced, and who are also adding substantial content. The authors thus conclude that referencing is primarily done by a small number of experienced editors, who prefer to work on longer articles, and who drastically raise the article's quality, by both adding more content, and by adding more references.
 * Etiquette in Wikipedia: Weening [sic] New Editors into Productive Ones : The authors of this paper experimented with alternative warning messages, introducing a set of shorter and more personalized warnings into those delivered by Huggle in the period of November 8 0 December 9 2011. Unfortunately, the authors are rather unclear on how exactly the Huggle tool was influenced, and whether the community was consulted on that. While in fact the community and Huggle developers have been aware of, discussed and approved of this experiment – here or here – the paper's omission to clarify that this was the case can lead to some confusion with regard to research ethics, since a casual reader may assume the researchers have hijacked Huggle without consulting the community. The wording changes were in good faith (making the messages more personalized, friendly and short), and the authors conclude that the new messages they tested proved more conducive to positively influenced new editors who received Level 1 Warnings.
 * WikiTrust algorithm applied to MediaWiki programmers: A paper titled "Towards Content-driven Reputation for Collaborative Code Repositories" reports on an experimental application of the well-known WikiTrust algorithm to the collaboration of programmers on a code repository, namely MediaWiki's own SVN codebase (from 2011, before it was switched to Git). In that model, contributors lose reputation when their contributions are reverted or deleted. According to the abstract, "Analysis is particularly attentive to reputation loss events and attempts to establish ground truth using commit comments and bug tracking. A proof-of-concept evaluation suggests the technique is promising (about two-thirds of reputation loss is justified) with false positives identifying areas for future refinement." An example of such false positives is "The “not now” trap: Frequently a change is reverted with a 'not now' justification, e.g., needing to hold for more testing. When that testing is done the changes are likely to be re-committed in much the same form, punishing the benign reverting editor."
 * "Deletion Discussions in Wikipedia: Decision Factors and Outcomes" found among other things that "69.5% of discussions and 91% of comments are well-represented by just four factors: Notability, Sources, Maintenance and Bias. The best way to avoid deletion is for readers to understand these criteria." One of the authors also co-presented a demo showing mock-ups of possible "alternative interfaces for deletion discussions in Wikipedia", which would highlight the prevalence of each type of argument (e.g. notability, sourcing...) in a deletion discussion more clearly.
 * "Classifying Wikipedia articles using network motif counts and ratios" : Similar to an earlier paper by the same authors (earlier coverage: "Collaboration pattern analysis: Editor experience more important than 'many eyes'"), this paper examined the collaboration network of Wikipedia articles and editors using Network motifs – small graphs which occur particularly frequently as sub-graphs of networks of a certain kind, and can be regarded as its building blocks in some sense. This was then related to the quality ratings of articles: "Pages with good quality scores [e.g. featured articles] have characteristic motif profiles, but pages with good user ratings [from the [[mw:Article feedback|Article Feedback tool] don’t. This suggests that a good quality score is evidence that a collaborative curation process has been pursued. However, not all pages with high quality scores get good user ratings and some pages with low quality scores are trusted by users. Perhaps the Wikipedia quality scale is a low error scale rather than a quality scale?"
 * "'Writing up rather than writing down': becoming Wikipedia Literate" applied "the work of literacy practitioner and theorist Richard Darville" to communication among Wikipedians, e.g. new users and experienced users who deleted some of their contributions. "Using a series of examples drawn from interviews with new editors and qualitative studies of controversies in Wikipedia, we identify and outline several different literacy asymmetries."
 * "How long do wikipedia editors keep active?" found that on the English Wikipedia, "although the survival function of occasional editors roughly follows a lognormal distribution, the survival function of customary editors can be better described by a Weibull distribution (with the median lifetime of about 53 days). Furthermore, for customary editors, there are two critical phases (0–2 weeks and 8–20 weeks) when the hazard rate of becoming inactive increases".

"First Monday" on rhetoric, readability and teaching
First Monday, the veteran open access journal about Internet topics, featured three Wikipedia-themed papers in its September issue:


 * AfD rhetoric examined: "The pentad of cruft: A taxonomy of rhetoric used by Wikipedia editors based on the dramatism of Kenneth Burke" is an essay "describing a method for classifying arguments made by Wikipedia editors based on the theory of "dramatism", developed by the literary theorist Kenneth Burke, and demonstrating how this method can be applied to a small sample of arguments drawn from Wikipedia’s 'Article for Deletion' (AfD) process."
 * "Readability of Wikipedia" applied the standard Flesch Reading Ease test to the English and Simple English Wikipedias (at http://www.readabilityofwikipedia.com/, the authors also offer the possibility to view scores directly). The effort, described as "extensive research" in an university press release found that "overall readability is poor, with 75 percent of all articles scoring below the desired readability score. The "Simple English" Wikipedia scores better, but its readability is still insufficient for its target audience." See also the detailed earlier Signpost coverage: "Readability of Simple English and English Wikipedias called into question", and the summary of an earlier paper which applied a more diverse set of readability measures to both Wikipedias: "Simple English Wikipedia is only partially simpler/controversy reduces complexity"
 * "Wikis and Wikipedia as a teaching tool: Five years later" by longtime Wikipedian (and contributor to this research newletter) Piotr Konieczny first gives an overview over the now widespread use of Wikipedia in the classroom and its advantages, and in a second part offers detailed practical advice drawing from the author's own "five years of experience in teaching with wikis and Wikipedia and holding workshops on the subject".

Briefly

 * Recent changes visualization designed to assist admins: A paper titled "Feeling the Pulse of a Wiki: Visualization of Recent Changes in Wikipedia" will be presented at the upcoming conference "VINCI 2012 : The International Symposium on Visual Information Communication and Interaction". It describes a prototype software (apparently not publicly available yet) that is designed "to aid a wiki administrator to perceive current activity in a wiki", starting out from the idea to map editors and articles in two dimensions: time and activity level. Hosted on the Toolserver, the software directly accesses a wiki's Recent Changes table, containing edits from the last 30 days. Using their tool, the authors visually discerned "six common editing patterns" on the English Wikipedia. E.g. "New article, many editors, many edits: this is the new popular article pattern which almost invariably reflects a current event". The authors also compare their tool to the previous "few and limited efforts" to visualize recent changes: WikipediaVision, Wikipulse and Wikistream.
 * Unearthing the "actual" revision history of a Wikipedia article: A paper by two researchers from Waseda University observes that "Unlike what is very common in software development, Wikipedia does not maintain an explicit revision control system that manages the detailed change through revisions. The chronologically-organized edit history fails to reveal the meaningful scenarios in the actual evolution process of Wiki articles, including reverts, merges, vandalism and edit wars". To extract this "actual" revision graph, where two neighboring nodes correspond to a revision and an earlier one which it was derived from, a similarity measure is needed. The article cites a 2007 paper and other research which had already proposed to understand a page's revision history as a directed tree and used similarity measures such as tf-idf. The present paper uses a similarity measure based on the frequency of n-grams (sequences of n words) and goes further in regarding the revision history as a directed acyclic graph. This allows for version merges, although the actual algorithm presented still focuses on the case of trees.
 * Who deletes Wikipedia – or reverts it: Wibidata, a big data analytics startup based in San Francisco, posted a follow-up to their "Who deletes Wikipedia" analysis (previous coverage), taking into account the effect of reverts, which several Wikipedians had pointed out in response to their earlier blog post.
 * Geospatial characteristics of Wikipedia articles: The authors of this paper attempt to identify what makes Wikipedia articles with geographical coordinates different from others (besides their obvious relation to geographical locations). They rather unsurprisingly find that more developed articles are more likely to have geo-coordinates, and consequently they find that there seems to be a correlation between article quality and having geo-coordinates links. They also find that articles with geo-coordinates are more likely to be linked to, a likely function of them being of above-average quality.
 * Wikipedia's affordances: This paper, framing itself as part of the ecological psychology field, contribute to the discourse about affordances (property of an object that allows one to take a certain action). The authors submit that this concept can be developed to further our understanding of how individuals perceive their socio-technical environment. The authors refine the term "technology affordances", which they define as "functional and relational properties of the user-technology system". Then use Wikipedia as their case study attempting to demonstrate its value, listing six affordances of Wikipedia (or in other words, they note that editors of Wikipedia can take the following six actions): contribution, control, management, collaboration, self-presentation, broadcasting.
 * Hematologists unsure whether "to engage with Wikipedia more constructively": A letter to the medical journal BMJ asks "Should clinicians edit Wikipedia to engage a wider world web?" The authors, a student and a senior lecturer in the field of haematology, "simulated 30 opportunistic internet searches for information on haemophilia in the top three search engines using term permutations: haemophilia or hemophilia (with or without A or B); carrier; information; child; treatment. Wikipedia was the most commonly found top 10 site in all search engines." In an apparent attempt to gauge the authoritativeness of Wikipedia content, "Analysis of editorial authorship of the Haemophilia Wiki [sic] for four weeks found 39 edits by 25 editors, only nine of whom had a profile, and none of whom were experts in haemophilia." Possibly unaware of Wikipedia's "no original research" policy, the authors ask "Given the evolving debate about open access to data, should publishers and authors be mandated to place reviews and key studies [...] in a public domain like Wikipedia?" (naming the example of a recent prominent paper in the field, which the Wikipedia article cites only in form of a New York Times news article about it). The letter concludes "as a professional group, we are not sure whether we wish to engage with Wikipedia more constructively". One-day access to the letter, which is around half a page long, can be purchased at £20/$30/€32 plus VAT, which may not be a very competitive price given the availability of more thorough evaluations of Wikipedia's quality elsewhere in the academic literature.
 * Tracking and verifying sources on Wikipedia: Ethnographer Heather Ford published the final report from her study on how editors track and verify sources on Wikipedia. The report presents an in-depth qualitative analysis of editor discussions around verifiability of information in the early editing phase of the 2011–2012 Egyptian revolution article and reviews how Wikipedia policies around primary vs secondary sources, notability and neutrality were used to make decisions about what sources to cite.
 * A recommender system for infoboxes: A team of computer science researchers at the University of Texas at Arlington developed a classification method to predict infobox template types from articles lacking them, using three types of features: words in articles, categories, and named entities (or words with corresponding Wikipedia entries). The study suggests that articles with infoboxes and articles without infoboxes exhibit a substantially different distributions of the above features. The classifier was tested on data from a 2008 dump of the English Wikipedia.
 * Styles of information search on Wikipedia: A poster presented at the 2nd European Workshop on Human-Computer Interaction and Information Retrieval presents the results of an eye-tracking study looking at patterns of information search in Wikipedia articles. The study looks at task-specific differences in the context of factual information lookup, learning and casual reading activity.
 * Post-edit feedback experiment: The Wikimedia Foundation's "Editor Engagement Experiments" team reported on an experiment with a simple user interface change – adding messages that confirm that an edit has been saved – and its effect on the contributions of new editors.
 * Pilot study about Wikipedia's quality compared to other encyclopedias: The results of a pilot study commissioned by the Wikimedia Foundation, titled "Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic" have been announced.
 * Wikipedia, the first step toward communism: Sylvain Firer-Blaess and Christian Fuchs, in their "info-communist manifesto", argue that Wikipedia is an example of the communist mode of production and participatory democracy—"the brightest info-communist star on the Internet’s class struggle firmament". They suggest that Wikipedia's future will be a choice between co-option into the broader capitalist economy (through the exploitation of the commercial possibilities of Wikipedia's free licensing) or, alongside similar "info-communist" projects, displacing more and more capitalist production of informational goods.
 * Quality flaw detection competition: Maintenance templates on the English Wikipedia (e.g. "citation needed") have attracted the attention of several researchers recently, as easy to parse indicators of quality problems (example). An "Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia" summarizes its outcome as follows: "three quality flaw classifiers have been developed, which employ a total of 105 features to quantify the ten most important quality flaws in the English Wikipedia. Two classifiers achieve promising performance for particular flaws. An important 'by-product' of the competition is the first corpus of flawed Wikipedia articles, the PAN Wikipedia quality flaw corpus 2012 (PAN-WQF-12)", which consists of "1 592 226 English Wikipedia articles, of which 208 228 have been tagged to contain one of ten important quality ﬂaws". One of the two "winners", the "FlawFinder" algorithm, has been described in a paper covered last month. The competition took place on occasion of the CLEF 2012 conference, as did the first Wikipedia Vandalism Detection competition two years ago (Signpost coverage).