Wikipedia:Wikipedia Signpost/2013-01-28/Recent research

Lessons from the wiki research literature in "American Behavioral Scientist" special issue
A special issue of the American Behavioral Scientist is devoted to "open collaboration".
 * Consistent patterns found in Wikipedia and other open collaborations: In the introductory piece, researchers Andrea Forte and Cliff Lampe give an overview of this field, defined as the study of "distributed, collaborative efforts made possible because of changes in information and communication technology that facilitate cooperative activities" - with open source projects and Wikipedia among the most prominent examples. They point out that "[b]y now, thousands of scholars have written about open collaboration systems, many hundreds of thousands of people have participated in them, and millions of people use products of open collaboration every day." Among their "lessons from the literature", they name three "consistent patterns" found by researchers of open collaborations:
 * "Participation Is Unequal" (meaning that some participants contribute vastly more than others: "In Wikipedia, for example, it has long been shown that a few editors provide the bulk of contributions to the site.")
 * "There Are Special Requirements for Socializing New Users"
 * "Users Are Massively Heterogeneous in Both How and Why They Participate"
 * "Ignore All Rules" as "tension release mechanism": The abstract of paper titled "Rules and Roles vs. Consensus: Self-Governed Deliberative Mass Collaboration Bureaucracies" explains "Wikipedia’s unusual policy, ignore all rules (IAR)" as a "tension release mechanism" that is "reconciling the tension between individual agency and collective goals" by "[supporting] individual agency when positions taken by participants might conflict with those reflected in established rules. Hypotheses are tested with Wikipedia data regarding individual agency, bureaucratic processes, and IAR invocation during the content exclusion process. Findings indicate that in Wikipedia each utterance matters in deliberations, rules matter in deliberations, and IAR citation magnifies individual influence but also reinforces bureaucracy."
 * Collaboration on articles about breaking news matures more quickly: "Hot Off the Wiki: Structures and Dynamics of Wikipedia's Coverage of Breaking News Events" analyzes "Wikipedia articles about over 3,000 breaking news events, [investigating] the structure of interactions between editors and articles", finding that "breaking articles emerge into well-connected collaborations more rapidly than nonbreaking articles, suggesting early contributors play a crucial role in supporting these high-tempo collaborations." (see also our earlier review of a similarly-themed paper by the same team: "High-tempo contributions: Who edits breaking news articles?")

A fourth paper in this special issue, titled "The Rise and Decline of an Open Collaboration System: How Wikipedia’s Reaction to Popularity Is Causing Its Decline", found considerable media attention this month, starting with an article in USA Today. It was already reviewed in the September issue of the research report.

Mathematical model for attention to the promoted Wikipedia articles
While the size and growth rate, editorial workflow, and topical coverage of Wikipedia have been vastly studied, there is little work done on the understanding of public attention to the Wikipedia articles. In a working paper by a team from the Barcelona Media Foundation and the University of Twente, placed on arXiv just before Christmas, the number of clicks on the featured articles promoted to the Wikipedia Main page is analysed and modeled.

A total of 684 featured articles are considered and the page view statistics of them is rescaled by the average circadian view rate extracted from a larger set of 871 395 articles in a period of 844 days. The 4-day lifetime of the promoted articles on the Main page is characterised by four phases. A very rapid growth in the number of article clicks just after the article appears on the Main page, followed by a rather homogeneous period of the first day of the promotion. As the article is replaced by a new featured article, and placed in the "recently featured" part of the Main page, the rate of clicks drops dramatically, and finally, the fourth flat phase is experienced during the remaining 3 days at this location.

In the next step, the authors introduce a rather intuitive model based on a few parameters to fully describe the 4 days cycle in a mathematical framework. The model is tuned based on the data of a set of 100 featured articles to predict the number of page hits for the rest of the sample, given the number of the clicks after the first hour of promotion for each article. The model is relatively accurate in predicting the number of clicks, and this accuracy could be even improved by feeding the model with the number of clicks at the end of the first day instead of the first hour after promotion. While the paper is very clear in describing the methodology, it fails to discuss and provide a deeper understanding of the social mechanisms of popularity and public attention, as it is mentioned repeatedly by the authors.

The featured article icon and other heuristics for students to judge article credibility
A paper in Information Processing and Management titled "College students’ credibility judgments and heuristics concerning Wikipedia" used the theory of bounded rationality and a heuristic-systematic model to analyze American college students’ credibility judgments and heuristics concerning Wikipedia. Not surprisingly, authors observe that students used a heuristic (a mental shortcut, such as An article with a long list of references is more credible than with of a short one) in assessing the credibility of Wikipedia. Students (regardless of their knowledge) were much more likely to focus on the number of references than on their quality, and the same article would be seen as more credible depending on how many references it had. The authors conclude that educators need to teach students how to judge the quality of Wikipedia articles that goes beyond checking whether the article has references (and how many). The authors recommend that Wikipedia makes its own assessments (such as the Featured Article star, currently visible only as a small bronze star icon on the top right-hand corner of the article’s page) much more prominent. (This reviewer strongly agrees with the conclusion, but unfortunately the last community discussion appears to have achieved little.)

More interestingly, the authors also find that people with more knowledge found Wikipedia more credible, suggesting that people with low knowledge may be more uneasy with Wikipedia. The authors suggest that the reliability of Wikipedia would be increased if more professional associations implemented programs such as Association for Psychological Science Wikipedia Initiative. In addition to getting the experts more involved in Wikipedia content creation, the authors suggest that a good idea may be for "professional associations themselves [to] provide their own endorsement for the quality of articles in their fields."

The authors also note that peer endorsement is an important factor in credibility, and that the Article Feedback Tool is a step in the right direction, as it provided another credibility assessment for the readers. They note, however, that compared to similar tools implemented on other sites (such as Amazon), "Wikipedia readers need to click on ‘‘View Page Rating,’’ which requires one more step to find out that information. The average reader may not be inclined to do so. It would be useful to display ratings without clicking".

Briefly

 * “Free as in sexist?”: In a paper in this month's First Monday, Joseph Reagle talks about the gender gap in free culture and free and open source software communities. Wikipedia is one of the case studies discussed, but Reagle makes valid observations that it is not so much an exception but a rule in this wider context.
 * Further criticism of "most influential people" infographic: On Ethnography Matters, a blog run by four ethnographers, one of the authors, Heather Ford, discussed a Wired infographic on "History's most influential people, ranked by Wikipedia reach". Like the reviewer in our December issue, the author criticizes the infographic and the accompanying article for lacking any serious description of methodology. She notes that given those shortcomings, the claims made by the article are rather dubious, and the cited research might have well been misquoted. The author further notes that any research that attempts to draw conclusions about "national culture" from analyzing different language Wikipedias runs into a major issue, which is that languages don't always map easily onto national cultures (consider: what is the national culture of Portuguese or English?). She further illustrates this by discussing how often African-language Wikipedias are edited primarily by individuals living outside the country most often associated with a given language.
 * "Sustainability of Open Collaborative Communities:" In an article published in the Technology Innovation management Review (based on a similar work presented at HICCS 2013 and reviewed in the October 2012 edition of WRN), Kevin Crowston, Nicolas Jullien, and Felipe Ortega present a preliminary comparison of the recruitment efficiency of 36 of the largest Wikipedias. The concept of recruitment efficiency refers to the ability of these Wikis to recruit editors from the total population of readers and potential contributors. The authors estimate this quantity using aggregated data on the number of Internet users and tertiary (college) educated speakers of each of the 36 languages that correspond to the Wikipedias included in their analysis. They find suggestive patterns in the results of this comparison, including: (1) Wikipedias of moderate and smaller size exhibit great variations in terms of their recruitment efficiency; and (2) larger Wikipedias appear less efficient in recruiting new members, suggesting a pattern of decreasing returns to scale. The authors conclude that these findings warrant further investigation and analysis, but that they provide preliminary support for the idea that larger Wikipedias face distinct conditions of community sustainability compared to smaller ones. See the extended summary from the October 2012 Wikimedia Research Newsletter for further details.
 * Language comparison algorithm finds new ghosts in England and Scotland: A paper by four Japanese researchers titled "Extracting lack of information on Wikipedia by comparing multilingual articles" proposes "a method for extracting information that exists in one language version, but which does not exist in another language version." Their method uses various steps, starting from a users' search query in their native language Wikipedia, which is automatically translated (using a dictionary) to other "non-native" Wikipedias, and involves use of the link structure between articles, the section structure within one article, and finally the cosine similarity between the nouns of different articles - a low similarity score indicating that information from one article is missing from the other. A small-scale test brought some successes, e.g. the detection of examples in Black dog (ghost) from England and Scotland that were not present in the corresponding article on the Japanese Wikipedia, but also showed problems with the proposed algorithm. The four authors previously published a related paper titled "Extracting Difference Information from Multilingual Wikipedia", covered in the April edition of this research report.
 * Sentiment analysis of articles about politicians: Researcher Finn Årup Nielsen, who works on a project funded to do "Wikipedia sentiment analysis for companies", blogged about applying sentiment analysis to articles about politicians on the Danish Wikipedia.
 * "Clustering Wikipedia infoboxes to discover their types": A paper presented at the CIKM’12 conference describes a method to use infoboxes to detect the entity type of an article (e.g. "movie" for Avatar (2009 film)). The authors explain that Wikipedia's existing category system is not sufficient for this: "Because Wikipedia category names are folksonomic, i.e., they are created by a group of people without the control of a central authority, they are also an unreliable source for inferring the conceptual entity type." As example, the authors cite the article about (the 1981 film) Chariots of Fire, and argue that based on the "categories, a system like Yago would assign to the infobox concepts like film, winner, olympics, culture, university, and sport. However, only film corresponds to the entity described in the infobox." On the other hand, the naming of infoboxes (as templates) is not consistent enough either: "For example, the entity type Film is associated with template names Infobox Film, Infobox Movie, Television Film Infobox, TV film, James Bond film, Chinese film, infobox Korean film, etc." The algorithm described in the paper measures the similarity of different infoboxes based on their set of attributes: "For example, the attribute cluster discovered for the entity type Movie includes the attributes "  The authors report that their clustering algorithm, "WIClust", performed successfully on a sample of "48,000 infoboxes spanning 862 infobox templates", and that in some cases it corrects shortcomings of DBpedia, e.g. by discovering  "that the templates Infobox Movie, Bond film, Japanese film, Chinese film, and Korean film belong to the same group as Infobox Film."
 * How Indic language Wikipedias fared in 2012: In a blog post, Indian Wikimedian Shiju Alex compared the article numbers, user activity levels and pageviews of Wikipedias in Indic languages between December 2011 and December 2012.
 * Students detect vandalism: Three student projects in a course on machine learning at Stanford University concerned the automatic detection of vandalism edits on Wikipedia.
 * "Algorithmic governance" in the German Wikipedia: Leonhard Dobusch, an assistant professor for organization theory at FU Berlin, blogged about an ongoing research project on the sighted revisions on the German Wikipedia as a case of "algorithmic governance".
 * Map visualization of links between geotagged articles: The "Collaborative Cybernetics" blog published maps visualizing the links between Wikipedia articles containing geocoordinates and the geographic distribution of certain topics (example: skiing-related terms).
 * Map of sister cities extracted from Wikipedia: Four researchers from the Barcelona Media Foundation (three of whom also co-authored the paper on featured article pageviews reviewed above) published a preprint where they "extracted the network of sister cites[sic] as reported on the English Wikipedia, as far as we know the most extensive but certainly not complete collection of this kind of relationships", and analyze the resulting social network, including a map visualization of worldwide twin city pairings.
 * New pageview files available: On his personal blog, Wikimedia Foundation data analyst Erik Zachte announced the release of "Monthly page requests, new archives and reports".
 * Upcoming book on "Global Wikipedia": A call for chapters has been issued for an upcoming book titled "Global Wikipedia: International and cross-cultural issues in online collaboration".
 * Wikipedia as part of one's personal memory: In a draft paper titled "Extended Cognition in Science Communication", to appear in Public Understanding of Science, David Ludwig (a scholar of philosophy at Columbia University and admin on the German Wikipedia) argues "that we should treat Wikipedia as part of your memory".
 * Is Wikipedia built on "good faith collaboration" or "destructive editing"? : In his 2010 MIT Press book about Wikipedia (now available online under a free license), Joseph Reagle posited that Wikipedia is based on a culture of “Good Faith Collaboration”. In his 2012 thesis at the University of Cambridge, titled "Destructive Editing and Habitus in the Imaginative Construction of Wikipedia", User:Thedarkfourth argues against this, highlighting the importance of conflicts instead. This month, Reagle responded to the criticism on his blog, asserting that it was based on "a new scholasticism. In this view, a work's contribution consists exclusively of interpreting an interesting phenomenon in the light of dead philosophers". Reagle argues that this view holds that scholars should look at what came before prior to explaining a new phenomenon; they should first refer to libraries and bibliographies before drawing a new hypothesis onto the whiteboard. He defends his position by arguing that his book is not guilty of ahistoricism, as Wallis seems to imply.