Wikipedia:Wikipedia Signpost/2012-10-29/Recent research

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

Wikipedia governance found to be mostly informal
A paper in the Journal of the American Society for Information Science and Technology, coming from the social control perspective and employing the repertory grid technique, has contributed interesting observations about the governance of Wikipedia. The paper begins with a helpful if cursory overview of governance theories, moving towards the governance of open source communities and Wikipedia. That cursory treatment is not foolproof, though: for example, the authors mention "bazaar style governance", but attribute it incorrectly&mdash;rather than the 2006 work they cite, the coining of this term dates to Eric S. Raymond's 1999 The Cathedral and the Bazaar. The authors have interviewed a number of Wikipedians and identified a number of formal and informal governance mechanisms. Only one formal mechanism was found important&mdash;the policies&mdash;while seven informal mechanisms were deemed important: collaboration among users, discussions on article talk pages, facilitation by experienced users, individuals acting as guardians of the articles, inviting individuals to participate, large numbers of editors, and participation by highly reputable users. Notably, the interviewed editors did not view elements such as administrator involvement, mediation or voting as important.

The paper concludes that "in the everyday practice of content creation, the informal mechanisms appear to be significantly more important than the formal mechanisms", and note that this likely means that the formal mechanisms are used much more sparingly than informal ones, most likely only in the small percentage of cases where the informal mechanisms fail to provide an agreeable solution for all the parties. It was stressed that not all editors are equal, and certain editors (and groups) have much more power than others, a fact that is quickly recognized by all editors. The authors note the importance of transparent interactions in spaces like talk pages, and note that "the reported use of interaction channels outside the Wikipedia platform (e.g., e-mail) is a cause for concern, as these channels limit involvement and reduce transparency." Citing Ostrom's governance principles, they note that "ensuring participation and transparency is crucial for maintaining the stability of self-governing communities."

Social network analysis of Wikipedia community
This paper looks at the relationships between Wikipedians from the social network analysis perspective (nodes are defined as authors, and links as indicators of collaboration on the same article), treating Wikipedia as an online social network (similar to Facebook). The authors note that while Wikipedia is not primarily a social network site, it has enough social networking qualities to justify being seen as such. They find that Wikipedia can be seen as a very good source of information about online relationships between actors, due to the transparent and public nature of its data. The authors present a brief overview of previous work with a similar approach. Rather unsurprisingly, the authors find that in the very early days of Wikipedia, editors were much more likely to know one another and collaborate on articles than in the later years. They find that the number of editors is highly correlated to the editors' familiarity with one another, and is more relevant than the number of articles, as they find that from 2007, when the number of editors roughly stabilized, so did their levels of connectedness through collaboration.

The paper shows that with very few exceptions (low activity, specialized editors) all Wikipedia editors are connected to one another, and there are no isolated groups (or topic areas). The authors also find that the Wikipedia collaborations can be analyzed using the small-world network approach (suggesting that the distance between editors, defined as the average path length, with links being articles contributed to, is very small). The article focuses primarily on the mathematical side of social network analysis, and unfortunately offers little commentary or analysis of the findings. The validity of the results can also be questioned, as the authors treat bots and semi-automated accounts as "regular authors"; considering that the majority of Wikipedia articles have been edited by bots or editors using scripts, the finding that editor A can be connected to editor B through the fact that they both edited different pages which in turn were edited by the same bot or script-equipped editor is hardly surprising.

Wikipedia's article on the Rorschach inkblot test found to have a limited effect on the test's results
Earlier this month, the Journal of Personality Assessment published a paper titled "More Challenges Since Wikipedia: The Effects of Exposure to Internet Information About the Rorschach on Selected Comprehensive System Variables". Summarizing past events (well-known to Wikipedians) from the point of view of psychologists adhering to the Rorschach test as a diagnostic tool, they write: "The availability of Rorschach information online has become of even greater concern in the last few years, since James Heilman, an emergency-room physician from Canada, posted images of all 10 Rorschach inkblots on the popular online encyclopedia, Wikipedia (Cohen, 2009; Wikipedia, 2004[sic]). This Wikipedia article also describes “common responses” to each blot, which frequently correspond to percepts that would be scored Popular under the current coding rules of Exner’s (2003) Comprehensive System (CS)." They remark that "Although many psychologists decried the publishing of the Rorschach inkblots on Wikipedia, before this study, no published studies had examined whether viewing the inkblots and other Rorschach information posted on Wikipedia would impact examinees’ scores." (As reported last year in this newsletter - see "Psychologists gauge impact of Wikipedia's Rorschach test coverage" - one of the authors had coauthored a study that had investigated the rise in prominence of information about the test on the Internet due to Wikipedia, but not tested its impact on the test itself.)

Before reporting their own results, the authors cite an unpublished dissertation, which had compared test subjects' Rorschach results before and after reading the article. Its tentative results suggested a "significant increase in shading responses [which] then likely affected the corresponding increase in [one variable], but otherwise indicated "that the majority of CS variables do not appear to be affected by exposure to information in the Wikipedia article."

The authors' own study involved 50 participants, half of whom had to read an excerpt of the Rorschach test article (while the control group read one of the Philadelphia Phillies article) before trying to "fake good" on the test, impersonating a character which would have a huge incentive to achieve certain results in the test ("Jack is a 35-year-old father of two wonderful children ...The judge ordered that Jack have a psychological evaluation done to determine whether or not he should be given custody of his kids.")

Among the test features defined in the "CS" system, only "Populars" was found to differ significantly "between the control and experimental groups [...] likely due to the fact that the Rorschach [Wikipedia article excerpt] provided pictures of each of the inkblots, along with "common responses," which, in many cases, corresponded to those responses that are actually coded as Popular according to the CS. However, the Wikipedia information on its own did not appear to directly impact other variables associated with perceptual accuracy."

Commenting on the paper, Heilman told this research report: That reading about the Rorschach before testing affects scores in a group of "normal" individuals is not really surprising. This analysis, however, does not show that the availability of information regarding psychological tests affects clinical important outcomes.

Efficiency of Wikipedia in editor recruitment and content production
A paper titled "Is Wikipedia Inefficient? Modelling Effort and Participation in Wikipedia" will be presented at next year's HICSS '13 conference. The main research concern of the authors is whether the saturation observed in the growth of Wikipedia is due to the maturity of the project or is rather caused by editorial obstacles and inefficient collaboration processes. To address this question, they try to investigate the efficiency of collaboration in 39 language editions of Wikipedia. Two different processes are studied. 1) editor recruitment; the ability of Wikipedia projects to attract editors from the pool of potential editors and 2) the article creation process. For each of these two processes corresponding input and output parameters are chosen and by applying a set of Data Envelopment Analysis the relative efficiency of language projects is calculated. For the editor recruitment process the input parameter is the size of the population speaking the language, having access to Internet and being at a tertiary-level of education and the output is the number of Wikipedia editors contributing to the Wikipedia edition of that language. It is shown that the efficiency of some language editions, e.g. Estonian, Hungarian, Norwegian, and Finnish, are much higher than some other language editions, e.g., Malaysian, Arabic, and Chinese. A decreasing return to scale is reported for all of the studied projects; however, the effect is more pronounced for larger ones. In other words, larger projects can be considered as inefficient in attracting new editors. For the production process, the number of Wikipedia editors is considered this time as the input and 3 outputs: number of edits, number of articles, and number of Featured articles. Here, the results generally suggest that for the larger projects the returns to scale are systematically decreasing, showing the difficulties of maintaining the efficiency of the workflow as the project grows. Some projects, such as the Malaysian and Persian Wikipedias, are not as successful in editor recruitment but are still efficient in creating articles given the capacity of their human resources. As for the quality of articles, it is shown that in larger projects like French and German, the focus is more on increasing the quality of the existing articles, whereas in intermediate-size projects, e.g., Russian and Italian, the main effort is still on increasing the number of articles.

The paper notes a positive correlation between efficiency in the number of edits and the efficiency in number of articles and featured articles. Among the limitations of the study, the authors name the time period of the analysed data, being limited to one month, and the possible flaws in the demographic data used to estimate the input of the editor recruitment process. Excluding contributions from unregistered users due to technical reasons could also have induced biases in the results. Since the article starts by raising the question of efficiency of Wikipedia in general, it ends up by comparing different language editions to each other and presenting the results in only relative terms. The English Wikipedia, which could be a benchmark for such comparisons, is entirely excluded from the study. More importantly, applying the data envelopment analysis, which is originally introduced for evaluating activities of not-for-profit entities participating in public programs, on Wikipedia activity data is not well justified.

Student use of Wikipedia
How students find and evaluate information is a perpetual concern for librarians, who act as educators and guides to finding the best resources for student information needs as well as collection curators. Since the arrival of Wikipedia, librarians have grappled with how the site fits in with and compares to a more traditionally published and reviewed collection, and how best to help students understand and use Wikipedia. This study is an up-to-date addition to the body of literature on this subject. Colón-Aguirre and Fleming-May use a coded qualitative interview approach to understanding undergraduate opinions about Wikipedia, compared to their use of and attitude towards traditional library resources.

The authors conducted interviews with 21 undergraduate students in one college in a large public university in the United States. Based on student responses about their research habits, the authors divided their respondents into three categories: avid library users, occasional library users, and library avoiders. While all categories of students used Wikipedia, there were differences in purpose; avid library users used Wikipedia to gather background information before turning to library-supplied resources like books and journals, while library avoiders relied more on Wikipedia and were lost if they could not find the information they needed on the site or via Google searches. Most of the students interviewed reported getting to Wikipedia via Google or other search engines, and the authors do not report any deep awareness by the students of how the site works or how to evaluate articles; awareness of ability to contribute was not mentioned. Student use of the library versus Wikipedia was also influenced by their perceptions of library resources being difficult to use (both in-person stacks and subscription online resources), particularly compared to the ease of using Wikipedia and online searching; students were also swayed in whether they used the library by their assignment requirements and faculty advice, including professors who advised against using Wikipedia as being "not credible" and required using library resources specifically.

The authors conclude that librarians need to work more with teaching faculty to craft research assignments, and that hands-on instruction in the use of the library does aid student comfort with research. This short article will be most of interest to practicing librarians and undergraduate instructors, who will doubtless see reflections of their own students in the student interviews. Wikipedians who are involved in academic classroom education and outreach will also find this study interesting, if for no other reason than to reinforce the importance of helping students become more knowledgeable about the ways that Wikipedia works with and differs from traditional academic publications.

In brief

 * "Conflict positively influences group performance": Investigating the question "Does conflict matter in the success of mass collaboration?", a paper in the Chinese Journal of Library and Information Science investigates conflict on Wikipedia, analyzing it from the social network analysis perspective (nodes are defined as individuals, and links, as indicators of conflict), and differentiating between positive and negative types of conflict. Their goal is to increase understanding of the conflict mechanism in the mass collaboration setting. The authors find that "that participation positively influences task complexity, conflict, and group performance; task complexity positively influences group performance but negatively influences conflict; and conflict positively influences group performance".
 * Generating a lexical network from Wiktionary: The researcher has created an open source tool – available at http://dbnary.forge.imag.fr/ – that extracts a lexical network (including definitions, translations, synonyms, antonyms, etc.) from Wiktionary data in RDF format, that can be used in existing semantic tools. The author notes that because Wiktionary – unlike traditional dictionaries – treats homonyms (words that share the same spelling and pronunciation but have a different meaning) on single pages with multiple etymology sections, it has not been possible to properly attribute the senses and lexical relations to the proper etymologies (i.e. lexemes).
 * How are article edits and page views related? We still don't know.: This paper attempts to explore the relationship between the "production" and "consumption" of Wikipedia content: the edits that build articles, and the page views from readers. For broad topic areas on English Wikipedia (such as articles in Category:Dance and its subcategories), the pattern of edits mirrors the overall trend of editing activity&mdash;rising exponentially until peaking around 2007, with a linear decline in edit rate since then. Page views for these topic areas, by contrast, show an approximately linear rise page views since late 2007 (which is the earliest period for which we have article traffic statistics). According to the authors, this pattern "conforms to a two-phase evolution framework: one of production followed by consumption", although they do not attempt to establish a causal link between the article content maturation and readership. Unfortunately, the lack of earlier data on article traffic makes it hard to learn much from the relationship between edit rate and article traffic, without taking a more fine-grained approach to identify articles or topic areas whose early phases of rising and peaking edit rates are also covered by page view data.
 * More WikiSym reports: Two more reports from August's annual WikiSym conference were published this month, by the recipient of a travel grant from the UK Wikimedia chapter, and by a Natural Language Processing (NLP) researcher who dubbed the conference "WikipediaSym" because "the conference submissions were mostly inclined towards the information analysis and social aspects of using wikis, in particular Wikipedia, and there were very few submissions on the actual applications of wikis (or wiki-like systems) and the open collaboration context". (See also the overview report in the last issue of the research report)
 * German centrality: A discussion paper examined "Centrality and Content Creation in Networks [in] The Case of German Wikipedia".
 * Systemic bias: Slides of a presentation by a librarian at the University of Massachusetts Amherst (and active Wikipedian) concern "Systemic Bias in Wikipedia: What It Looks Like, and How to Deal with It".
 * Few users who edit Middle East/North Africa articles are from the region: A brief conference paper titled "The vocal minority: Local self-representation and coediting on Wikipedia in the Middle East and North Africa" (presented in a slightly different form at a Workshop at 2012 ACM Web Science Conference in June) analyzed the talk pages of English Wikipedia users who had edited articles geotagged in that region (MENA) "to assess the self-declared locational affiliations of the authors (i.e. where they live, work or were born)" and found that "there exists few authors claiming to be from the MENA region, except for Israel, Iran and to a much lesser extent Egypt."
 * Article Feedback tool as means of "peripheral participation": A paper to be presented at CSCW '13 describes the main findings from the early tests of the Article Feedback v5 on the English Wikipeida, from the lens of legitimate peripheral participation theory. The study reviews the costs and benefits of expanding reader contributions to Wikipedia, using both quantitative and qualitative methods. The results, according to the authors (members of the Wikimedia Foundation team working on the tool), indicate that peripheral contributors add value to the encyclopedia as long as the cost of identifying low quality contributions remains low.
 * Dynamics of read and edit rates on Wikipedia: The ECCS'12 Conference on Complex Systems saw the presentation of a paper titled "From Time Series to Co-Evolving Functional Networks: Dynamics of the Complex System 'Wikipedia'", reporting on research about the "access-rate time series and edit-interval time series" of articles on the English Wikipedia, and about " three organizational and dynamical networks ...: (i) the network of direct links between Wikipedia articles, (ii) the usage network as determined from cross-correlations between access-rate time series of many pairs of articles, and (iii) the edit network as determined from co-incident edit events. The major goal is to find correlations between components of these three networks that characterize the dynamics of information spread in the complex system".
 * Wikipedia articles compared to open source software projects: A paper titled "Similarities, challenges and opportunities of Wikipedia content and open source projects" argues that "the evolution of Wikipedia pages and the OSS projects share some commonalities in terms of their evolutionary patterns; in particular, it was found that a predefined, cubic model could be used to explain several of the similarities in 'abandoned' or 'completed' projects and Wikipedia pages."