Wikipedia:Wikipedia Signpost/2015-10-28/Recent research



Students value Wikipedia both for quick answers and for detailed explorations

 * Reviewed by Jonathan Morgan

This paper reports findings from a survey of Norwegian secondary school students about their use of Wikipedia in the context of their coursework. The survey of 168 students between the ages of 18 and 19 consisted of 33 Likert scale questions and two free response questions. The goal was to assess how Wikipedia figured into students' literacy practices, a concept that encompasses students' and teachers' attitudes towards the resources they use to learn and the social context in which they engage with those resources, as well as the process by which they read, remember, and understand the information provided by each resource.

The main finding of the study is that students' attitudes towards Wikipedia are overwhelmingly positive, but they find the information presented in Wikipedia less trustworthy than their official course materials. Although 90% of respondents rated their textbooks as more trustworthy, they cited the ease of finding factual information (such as dates, names, etc) as a key reason for preferring Wikipedia. They also reported that Wikipedia was better than their textbooks at explaining the "big picture" of a given topic, as well as facilitating more in-depth exploration. In the words of one survey respondent: "If you need to, you can read elaborations about a given topic, or you can just read the summary if that is what you need."

These findings suggest that the primary advantage that Wikipedia offers to students is its flexibility: it allows students to find quick answers and more detailed accounts with equal ease. The findings also suggest that both students and teachers would benefit from a better understanding of how to critically evaluate the quality of information presented in Wikipedia and other open online information resources. The study also confirmed findings from previous studies: that the vast majority of students use Wikipedia to supplement their official course resources (textbooks, etc), that most of them access Wikipedia via Google search, and that English-speaking students tend to seek information on the English-language Wikipedia first, regardless of their first language or national origin.

Jesus, Napoleon, and Obama top the "Wikipedia social network"

 * Reviewed by Piotr Konieczny

A (conference?) paper titled "Beyond Friendships and Followers: The Wikipedia Social Network" applies social network theory to the analysis of relationship between subjects of Wikipedia biographical articles. Using Wikidata and Wikipedia metadata, the authors produce a number of findings. Some of them will not be unexpected to readers, such as that "By far the largest occupational groups are politicians and football players", or "The page with the most mentions of persons is Rosters of the top basketball teams in European club competitions" (with 4,694 mentions of 1,761 different persons). The most referenced persons are Jesus and Napoleon, followed by Barack Obama, Muhammad, Shakespeare, Adolf Hitler, and George W. Bush. Over four fifths of the links in Wikipedia are to male persons, which roughly reflects the gender distribution of Wikipedia biographies; a similar distribution confirms that most of the biographies focus on the 19th and 20th centuries. The authors, however, do not dwell on the social science implications of their findings, but merely suggest that their tool can be used to refine Wikipedia categories and disambiguation tools. The findings are interesting from the perspective of alternate approaches to categorization, as it may suggest possible new categories that haven't yet been created by human editors, and perhaps provides a mathematical model of how Wikipedia categories can be created.

"Exploration of Online Culture Through Network Analysis of Wikipedia"

 * Reviewed by Piotr Konieczny

This paper also uses social network theory, as well as the Hofstede's cultural dimensions theory, Schwartz's Theory of Basic Human Values, and McCrae's Five factor model of personality to ask research questions about the concept of online culture; in particular whether it is universal or differs for various national cultures. It focused on 72 Featured Articles in 12 languages (unfortunately, the authors do not explain any reasons for choosing those particular 12 languages over the others); discounting bots, the authors analyzed more than 150,000 editors and 250,000 edits. The authors find that most Wikipedia edits are what they call self-loops, or individual editors making edits to the same articles they have edited before, without their editing being interrupted by edits by another editor. They fail to make any comment on what that really means for the vision of Wikipedia as a collaborative environment. The authors find significant differences in editing patterns between certain Wikipedia projects, though this reviewer finds the description of said differences (focusing on a case study of one Japanese and one Russian article) rather curt. Similarly, their discussion of how the results fit (or don't) with the established theories of Hofstede and others is interesting, but rather short; that unsatisfying brevity may however be due to editorial requirements (the entire paper is only 3.5k words long, instead of the more common average of about 8k). The authors conclude that "new dimensions of online culture can be explored from directly observed online behavior", something that one hopes they'll revisit themselves, together with their dataset, in a longer paper that will do proper justice to it.



Vandalism detection research neglects smaller languages

 * Reviewed by Morten Warncke-Wang

A paper at the 19th International Conference on Circuits, Systems, Communications and Computers (CSCC) provides an overview of research on vandalism detection in Wikipedia, with a focus on the usage of machine learning. One of the paper’s conclusions is that future research should aim for language-independency, as little progress has been made outside of the English, German, French, and Spanish Wikipedia editions.

Automatic quality assessment using the "collaboration network"

 * Reviewed by Morten Warncke-Wang

“Measuring Article Quality in Wikipedia Using the Collaboration Network” is a paper that proposes an improved model of co-authorship to be used in predicting the quality of Wikipedia articles. Trained on a stratified sample of articles from the English Wikipedia, it is shown to outperform several baselines. Unfortunately, the dataset used for evaluation omits Start-class articles for no apparent reason, and used the latest revision of an article, which might differ considerably from when an article received its quality rating.

Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
 * Dissertations:


 * "607 Journalists: An evaluation of Wikipedia’s response to and coverage of breaking news and current events" See also blog post
 * "Wiki is not paper: Fixing and breaking the 'news' on Wikipedia" From the abstract: "The case studies include the "Barack Obama" article, which is used to investigate the establishment and maintenance of the "fact" that Obama is described as an 'African American,' despite his mixed-race heritage. ... The second case study uses the article on the 2008 war in the Georgian province of South Ossetia to investigate the transnational and transcultural pitfalls of 'bias' in the writing of a 'neutral' article. The final case examines the decision to publish controversial material by examining the article on the 2006 Muhammad cartoons controversy. This article was crucial on Wikipedia in establishing the protocol in publishing such images."
 * "User interaction with community processes in online communities" From the abstract: "We find that articles that are deleted from Wikipedia differ from those that are not in many significant ways. We also find, however, that most deleted articles are deleted extremely hastily, often before they have time to develop. We use our data to create a model that can predict with high precision whether or not an article will be deleted. ... We propose to deploy a system utilizing this model on Wikipedia as a set of decision-support tools to help article creators evaluate and improve their articles before posting. ... English Wikipedia’s Articles for Creation provides a protected space for drafting new articles, which are reviewed against minimum quality guidelines before they are published. We explore the possibility that this drafting process, which is intended to improve the success of newcomers, in fact decreases newcomer productivity in English Wikipedia, and offer recommendations for system designers."
 * "Detecting Vandalism on Wikipedia across Multiple Languages"
 * More recent publications:
 * "Spillovers in Networks of User Generated Content: Pseudo-Experimental Evidence on Wikipedia" From the abstract: "[On the German Wikipedia, the featuring of an article on the main page does] affect neighboring articles substantially: Their viewership increases by almost 70 percent. This, in turn, translates to increased editing activity. Attention is the driving mechanism behind views and short edits. Both outcomes are related to the order of links, while more substantial edits are not." See also by the same author: "Spillovers in Networks of User Generated Content"
 * "Peer Effects in Collaborative Content Generation: The Evidence from German Wikipedia" From the abstract: "editors who contribute to the same articles and exchange comments on articles’ talk pages work in collaborative manner sometimes discussing their work. They can, therefore, be considered as peers, who are likely to influence each other. In this article, I examine whether peer influence, measured by the average amount of peer contributions or by the number of peers, yields spillovers to the amount of individual contributions."
 * "Wikipedia Page View Reflects Web Search Trend"" (see also datasets, slides) From the abstract: "We found frequently searched keywords to have remarkably high correlations with Wikipedia page views."
 * "Wikipedia edition dynamics" From the abstract: "It is argued that the probability to edit is proportional to the editor's number of previous editions (preferential attachment), to the editor's fitness and to an ageing factor." See also by the same authors: "The dynamic nature of conflict in Wikipedia"
 * "Cultural Similarity, Understanding and Affinity on Wikipedia Cuisine Pages" See also "Mining cross-cultural relations from Wikipedia - A study of 31 European food cultures"
 * "The influence of network structures of Wikipedia discussion pages on the efficiency of WikiProjects" From the abstract: "The evaluation suggests that an intermediate level of cohesion with a core of influential users dominating network flow improves effectiveness for a WikiProject, and that greater average membership tenure relates to project efficiency in a positive way."
 * "Technological Nudges and Copyright on Social Media Sites" From the abstract: "Using an adapted taxonomy, this article identifies the technological features on predominant social media sites—Facebook, YouTube, Twitter and Wikipedia—that encourage and constrain users from engaging in generative activities. Notwithstanding the conflicting narrative painted by recent litigation around copyright in relation to content on social media sites, I observe that some of the main technological features on social media sites are designed around copyright considerations." (However, the paper never mentions that Wikipedia's content is under a free license.) "In contrast to the other social media sites, I note that Wikipedia does not allow its users to comment on content; hence there is little room for this alternative form of modification."
 * "The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction"
 * "Students' use of Wikipedia as an academic resource — Patterns of use and perceptions of usefulness" (survey of 1658 undergraduate students) From the abstract: "87.5% of students report using Wikipedia for their academic work, with 24.0% of these considering it ‘very useful’. Use and perceived usefulness of Wikipedia differs by students’ gender; year of study; cultural background and subject studied. Wikipedia mainly plays an introductory and/or clarificatory role in students information gathering and research."
 * "Snooping Wikipedia Vandals with MapReduce" From the abstract: "[Using] MapReduce ... we are able to explore a very large dataset, consisting of over 5 millions articles [actually pages on enwiki, including non-articles] collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. We represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. We then isolate a set of high reputation editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors."
 * "An agent-based model of edit wars in Wikipedia: How and when consensus is reached" From the abstract: "We show that increasing the number of credible or trustworthy agents and agents with a neutral point of view decreases the time taken to reach consensus, whereas the duration is longest when agents with opposing views are in equal proportion." See also last issue's review of a different numerical model of edit wars: "More newbies mean more conflict, but extreme tolerance can still achieve eternal peace"