Wikipedia:Wikipedia Signpost/2012-12-31/Recent research

How Wikipedia deals with a mass shooting
Northeastern University researcher Brian Keegan analyzed the gathering of hundreds of Wikipedians to cover the Sandy Hook Elementary School shooting in the immediate aftermath of the tragedy. The findings are reported in a detailed blog post that was later republished by the Nieman Journalism Lab. Keegan observes that the Sandy Hook shooting article reached a length of 50Kb within 24 hours of its creation, making it the fastest growing article by length in the first day among recent articles covering mass shootings on the English-language Wikipedia. The analysis compares the Sandy Hook page with six similar articles from a list of 43 articles on shooting sprees in the US since 2007. Among the analyses described in the study, of particular interest is the dynamics of dedicated vs occasional contributors as the article reaches maturity: while in the first few hours contributions are evenly distributed with a majority of single-edit editors, after hour 3 or 4 a number of dedicated editors show up and "begin to take a vested interest in the article, which is manifest in the rapid centralization of the article". A plot of inter-edit time also shows the sustained frequency of revisions that these articles display days after their creation, with Sandy Hook averaging at about 1 edit/minute around 24 hours since its first revision. The notebook and social network data produced by the author for the analysis are available on his website. The Nieman Journalism Lab previously covered the role that Wikipedia is playing as a platform for collaborative journalism, and why its format outperforms Wikinews with an interview of Andrew Lih published in 2010. The early revision history of the Sandy Hook shooting article was also covered in a blog post by Oxford Internet Institute fellow Taha Yasseri, however with a focus on the coverage in different Wikipedia language editions.

Network positions and contributions to online public goods: the case of the Chinese Wikipedia
In a forthcoming paper in the Journal of Management Information Systems (presented earlier at HICSS '12 ), Xiaoquan (Michael) Zhang and Chong (Alex) Wang use a natural experiment to demonstrate that changes to the position of individuals within the editor network of a wiki modify their editing behavior. The data for this study came from the Chinese Wikipedia. In October 2005, the Chinese government suddenly blocked access to the Chinese Wikipedia from mainland China, creating an unanticipated decline in the editor population. As a result, the remaining editors found themselves in a new network structure and, the authors claim, any changes in editor behavior that ensued are likely effects of this discontinuous "shock" to the network. The paper defines each editor as a node (vertex) in the network and a tie (edge) between two editors is created whenever the editors edit the same page in the wiki. They then examine how changes to three aspects of individual editors' relative connectedness (centrality) to other editors within the network altered their subsequent patterns of contribution.

The main finding is that changes in the three kinds of editors' connectedness within the network result in differential changes to their editing behavior. First, an increase in the number of direct connections between one editor and the rest of the network (degree centrality) resulted in fewer edits by that editor, and more work on articles they created. Second, an increase in the overall proximity of an editor to the other members of the network (closeness centrality) resulted in fewer edits and less work on articles they created. Third, an increase in the extent to which an editor connected otherwise isolated groups in the network (betweenness centrality) resulted in more edits and more work by that editor on articles they created. Overall, these results imply that alterations to the network structure of a wiki can change both the quantity and quality of editor contributions. The researchers argue that their findings confirm the predictions of both network game theory and role theory; and that future research should try to analyze the character of the network ties created within platforms for large-scale online collaboration, to better understand how changes to network structure may alter collaborative practices and public goods creation.

Quality of pharmaceutical articles in the Spanish Wikipedia


In an online early version of an upcoming article in Atención Primaria, researchers at the Miguel Hernández University of Elche and the University of Alicante have benchmarked articles on pharmaceutical drugs in the Spanish Wikipedia against information available in a pharmaceutical database, Vademécum. A subset of the Vademécum corpus of 3,595 drugs was created using simple random sampling without replacement, consisting of 386 drugs. Of these, 171 (44%) had entries on the Spanish Wikipedia, which were then scrutinized along several dimensions in May 2012. Usage of the drug was correctly indicated in 155 (91%) of these articles, dosage in 26 (15%), and side-effects in 64 (37%), with only 15 articles (9%) scoring well in all of these dimensions. The researchers conclude that, while Wikipedia has a high potential to help with the dissemination of pharmaceutical knowledge, the Spanish-language edition does not currently live up to this potential. As a possible solution, they suggest the pharmaceutical community more actively participate in editing Wikipedia. The list of the drugs involved has not been made public, since a similar study is currently underway whose results may be distorted by targeted intervention. The authors have signalled to this research report their intention to make the list available after this second study is complete.

Wikipedia editing patterns are consistent with a non-finite state model of computation
A paper posted to ArXiv by SFI's Omidyar fellow Simon DeDeo presents evidence for non-finite state computation in a human social system using data from Wikipedia edit histories. Finite state-systems are the basis for the study of formal languages in computer science and linguistics, and many real-world complex phenomena in biology and the social sciences are also studied empirically by assuming the existence of underlying finite-state processes, for the analysis of which powerful probabilistic methods have been devised. However, the question of whether the description of a system truly entails a finite or a non-finite, unbounded number of states, is an open one. This is significant from a functionalist point of view: can we classify a system by its computational properties, and can these properties help us better understand how the system works regardless of its material details?

The paper's contribution lies in its proof of a probabilistic generalization of the pumping lemma, a device used in theoretical computer science as a necessary condition for a language to be described by only a finite number of states. The lemma is applied to the edit histories of a number of the most frequently edited articles in the English Wikipedia, after being properly transformed into coarse-grain sequences of "cooperative" or "non-cooperative (reversion) edits (reverts being identified by means of their SHA1 field). A Bayesian argument is applied to show that the lemma cannot hold for a majority of sequences, thus showing that Wikipedia's collaborative editing system as a whole cannot be described by any aggregation of finite-state systems. The author discusses the implications of this finding for a more grounded study of Wikipedia's editing model, and for the identification of detailed computational models of other social and biological systems.

Wikipedia as our collective memory
Michela Ferron, a member of the SoNet (Social Networking) research group at the Bruno Kessler Foundation in Trento, Italy submitted her PhD thesis in December 2012. She examined the idea of viewing Wikipedia as a venue for collective memory and the language indicators of the dynamic process of memory formation in response to "traumatic" events. Parts of the thesis have already been published in journals and conference proceedings, such as WikiSym 2011 and 2012 (cf. presentation slides).

A full chapter is dedicated to the background on the concept of collective memory and its appearance in the digital world. The thesis continues with an analysis of "anniversary edits", showing a significant increase in editorial activities on articles related to traumatic events during the anniversary period compared to a large random sample of "other" articles. More detailed linguistic indicators are introduced in the next chapter. It is statistically shown that the terms related to affective processes, negative emotions, and cognitive and social processes occur more often in articles on traumatic events; "Specifically, the relative number of words expressing anxiety (e.g., “worried”), anger (e.g., “hate”) and sadness (e.g., “cry”) was significantly higher in articles about traumatic events".

In the next step, Ferron tried to distinguish between human-made and natural disasters. It has been observed that "human-made traumatic events were characterized by language referring to anger and anxiety, while the collective representation of natural disasters expressed more sadness". Finally, a detailed case study of the talk pages of articles on the 7 July 2005 London bombings and the 2011 Egyptian revolution was carried out, and language indicators, especially those related to emotions, were investigated in a dynamic framework and compared for both examples.

SOPA blackout decision analyzed
A First Monday article reviews several aspects of the Wikipedia participation in the 18 January 2012 protests against SOPA and PIPA legislation in the US. The paper focuses on the question of legitimacy, looking at how the Wikipedia community arrived at the decision to participate in those protests.

The paper provides an interesting discussion of legitimacy in Wikipedia's governance, and discusses the legitimacy of the decision to participate in the protests. The author notes that the initiative was given a major boost by Jimmy Wales' charismatic authority, as Wales posted a straw poll about the issue on his talk page on December 10, 2011, as while the issue was discussed by the community beforehand (for example, in mid-November at the Village Pump), those discussions attracted much less attention. It is hard to say whether the protest would have happened without Jimbo's push for more discussion, as it veers towards "what if" territory; as things happened, it is true that Jimbo's actions began a landslide that led to the protests. However, this reviewer is more puzzled at the claim made in the introduction to the article that the discussion involved a "massive involvement of the Wikimedia Foundation staff". While several WMF staffers were active in the discussions in their official capacity, and while the WMF did issue some official statements about the ongoing discussion, the paper certainly does not provide any evidence to justify the word "massive".

The paper subsequently notes that the WMF focused on providing information and gently steering the discussion, without any coercion; this hardly justifies the claim of "massive involvement". At the very least, a clear explanation is necessary of precisely how many WMF staffers participated in the discussion before such a grandiose adjective as "massive" is used. It is true that the WMF staffers helped push the discussion forward, but this reviewer believes that the paper does not sufficiently justify the stress it puts on their participation, and thus may overestimate their influence.

The third part of the paper discusses how the arguments about legitimacy or the lack of it framed the subsequent discourse of the voters. The author notes that after initial period of discussing SOPA itself, the discussion of whether it was legitimate or not for Wikipedia to become involved in the protest took over, with a major justification for it emerging in the form of an argument that it was legitimate for Wikipedia to protest against SOPA as SOPA threatened Wikipedia itself. While this is an interesting claim, unfortunately, other than citing one single comment, no other qualitative or quantitative data are provided; nor is the methodology discussed. We are not told how many individuals voted, how many commented on legitimacy or illegitimacy, how many felt that Wikipedia is threatened; we do not know how the author classified comments supporting any of the viewpoints, or the shifts in the discussion ... this list could unfortunately go on. In one specific example drawn from the conclusion, the author writes that "The main factor that shaped the multi-phased process was the will to have the community accept the final decision as legitimate, and avoid backlash. This factor especially influenced those who are suspected of relying on traditional means of legitimacy such as charisma or professionalism." At the same time, we are provided with no number, no percentage, and certainly no correlation to back up this claim. Without a clear methodology or distinct data it is hard to verify the author's claims and conclusions.

The introduction also notes that "the mass effort of planning an effective political action was not something “anyone [could] edit”" and "the debate preceding the blackout did not follow Wikipedia’s open and anarchic decision-making system"; unfortunately this reviewer finds no justification for those rather strong claims anywhere else in the article.

Overall, this is an interesting paper about legitimacy in Wikipedia, but it seems to overreach when it tries to draw conclusions from the data that is simply not presented to the reader. It suffers from a failure to explain the research's methodology, making verification of the claims made very hard. Due to the lack of hard data, most conclusions are unfortunately rendered dubious, and the paper has a tendency to make strong claims that are not backed up by data or even developed later on.

Bots and collective intelligence explored in dissertation
In his Communication and Society PhD dissertation, Randall M. Livingstone of the University of Oregon explores the relationship between the social and technical structures of Wikipedia, with a particular focus on bots and bot operators. After a fairly broad literature review (which summarizes the basic approaches to Wikipedia studies from new media theory, social network analysis, science and technology studies, and political economy), Livingstone gives a concise history of the technical development of Wikipedia, from UseModWiki to MediaWiki, and from a single server to hundreds.

The most interesting chapters for Wikipedians will be V – Wikipedia as a Sociotechnical System – and VI – Wikipedia as Collective Intelligence. Chapter 5 looks at the ways the editing community and the evolution of software (both MediaWiki and the semi-automated tools and bots that interact with editors and articles) "construct" each other. Based on 45 interviews with bot operators and WMF staff, this chapter gives an interesting and varied picture of how Wikipedia works as a sociotechnical system. It will in part be a familiar account to the more tech-minded Wikipedians, but offers an accessible overview of bots and their place in the ecosystem to editors who normally steer clear of bots and software development. Chapter 6 looks at theories of intelligence and the concept of collective intelligence, arguing that Wikipedia exhibits (at least to some extent) the key traits of stigmergy, distributed cognition, and emergence.

Briefly

 * "History's most influential people" according to Wikipedia: While more in the realm of popular science, Wired UK, among others, published an infographic attributed to César Hidalgo, head of the MIT Media Lab's Macro Connections group, visualizing "History's most influential people". Unfortunately, beyond noting that rankings "are based on parameters such as the number of language editions in which that person has a page, and the number of people known to speak those languages" the small article does not provide any methodology, nor does it provide much discussion. Until a more extensive description is released, the current graph, while pretty, is little more than a trivia piece.
 * Teachers say 75% of teens use Wikipedia (or online encyclopedias) for research assignments: In a Pew Research survey among more than 2000 US middle and high school teachers 75% said that their teenage students use "Wikipedia or other online encyclopedia" in research assignments, making online encyclopedias the second most popular source for students behind search engines such as Google. This number was lower (68%) "among teachers of the lowest income students (those living below the poverty line)" and higher (80%) for those teaching "mostly upper and upper middle income" students, and it also varied by subject (between 69% for teachers of English and 82% for science teachers). The survey report cautions that the sample "skews towards 'cutting edge' educators who teach some of the most academically successful students in the country". Googlematrixwikipedia2009.jpg of Wikipedia entries, from an earlier paper by the same authors of this study. ]]
 * "Wikipedia communities" as eigenvectors of its Google matrix: An ArXiv preprint studies the "Spectral properties of Google matrix of Wikipedia and other networks". This Google matrix consists of entries for each pair of pages (for the English Wikipedia, including non-mainspace pages like portals), roughly speaking modelling the behavior of a surfer who goes from one page to any of those that it links to, with equal probability (or, with probability $$1-\alpha$$, jumps to a random page; the damping parameter $$\alpha$$ is set to around 0.85 in the Google search engine). The PageRank appears as the eigenvector of this matrix for the eigenvalue $$\lambda = 1$$. The paper studies the spectrum (eigenvalues) and eigenvectors apart from this special case, interpreting them as certain topic areas: "the eigenvectors of the Google matrix of Wikipedia clearly identify certain communities which are relatively weakly connected with the Wikipedia core when the modulus of corresponding eigenvalue is close to unity. For moderate values of $$\left|\lambda\right|$$ we still have well defined communities which however have stronger links with some popular articles (e.g. countries) that leads to a more rapid decay of such eigenmodes."
 * Serial singularities: developing of a network organization by organizing events: In a paper published in the Schmalenbach Business Review, Leonhard Dobusch and Gordon Müller-Seitz from the Freie Universität Berlin suggest that research on organized events has tended to treat those events as isolated and singular events. Using interviews and other data on Wikimania, chapter meetings, and local meet-ups over several years, the authors challenge this idea and show how many different events on different scales and scopes – each with a distinct character – can interact and reinforce each other to help drive the nature of a large distributed organization like Wikimedia.
 * The web mirrors value in the real world: comparing a firm’s valuation with its web network position: In a MIT Sloan Working Paper, Qiaoyun Yun and Peter Gloor create a measure of US and Chinese firms "social network" position by looking at how those firms are linked to from a variety of web sources – prominently Wikipedia. They find a positive correlation between betweenness centrality of a firm in a social network constructed from links online and its innovation capability and financial performance. They find that Wikipedia only predicts a firm's performance in the US.
 * Teahouse analyzed: Jonathan Morgan, Sarah Stierch, Siko Bouterse and Heather Walls, from the Wikimedia Foundation Teahouse team, report on the impact of the initiative on 1,098 new Wikipedia contributors who joined the Teahouse between February and October 2012, in a paper to be presented at CSCW '13. The study reports that participants in the project "make more edits overall, and edit longer", "make more edits, to more articles" and "participate more in discussion spaces" compared to non-visitors. This paper is part of a research track entirely dedicated to Wikipedia Supported Collaborative Work, featuring three other studies. AFT5 2012-Q4 report.pdf
 * Article feedback: The Wikimedia Foundation published an update about the Article feedback tool on the English Wikipedia, providing statistics about the usage of the feature, and about the moderation activities for the feedback provided.
 * New review of Good Faith Collaboration: The reviewer locates Joseph Reagle's 2010 book about Wikipedia (free online version) as following in a wider context of research on Wikipedia: "The reliability of the encyclopaedia’s content.. and quantitative analysis of large-scale public datasets formed the predominant approach in early empirical research on Wikipedia ... This was followed by a more social approach and the adopting of qualitative methods. In this switch to social norms and away from an ethnographic approach, Reagle's book is a main reference, particularly in terms of its cultural and historical specificity." Overall, the review finds that  "The book is well documented, with an elaborative but accessible writing style, which is at times provocative. It results in a form of rich composition of eight pieces (chapters) of Wikipedia 'puzzle', even if some readers might miss a more explicit continuum linking the lines together. Finally, the book is a primary reference point for researchers aiming to study Wikipedia, especially for those unfamiliar with it."
 * Measuring the impact of Wikipedia for GLAM institutions: Ed Baker, software developer at the Natural History Museum in London, has started a series of blog posts on "the impact and use of Wikipedia by organisations". In the first post, he looked at how the scope of pages linking to the NHM's website fits with the overall scope of the institution when pages are ranked either by number of page views or by number of links to the NHM. The latter approach could help identify opportunities for a collaboration between GLAM institutions and the Wikimedia communities.