Wikipedia:Wikipedia Signpost/2013-05-27/Recent research

Motivations to contribute to the Persian Wikipedia
An article in Library Review titled "Motivating and Discouraging Factors for Wikipedians: the Case Study of Persian Wikipedia" offers a much needed comparison of data from a population of editors outside the English Wikipedia. Most findings related to reasons people start and continue contributing confirm previous studies – important reasons for contributing include the desire to share knowledge and gaining recognition, and are reinforced by friendly interactions.

The authors find that "content production and improvement of Wikipedia in local language" is a significant motivation too, something missing or seen as mostly irrelevant for contributors to the English Wikipedia. The authors also look at reasons for editors to become less active, an area that is not as well understood. Their findings confirm previous research – editors may leave because they find rules too confusing or other editors too unfriendly, or because they do not have enough time. They list some additional reasons not mentioned significantly in the existing literature, such as "issues with Persian script; sociocultural characteristics, e.g. lack of research-based teaching instruction and preference for ready-to-use information; strict rules against mass copying and copyright violation; small size of Persian Web content and a shortage of online Persian references." The paper suffers from small sample size (interviews with 15 editors) and does not report statistics or rankings for some of the data, making it difficult, for example, to conclude or verify which motivations are more and less important. (Reviewer note: the reviewed pre-print copy did not include figures, which may contain the missing data.)

Science eight times more popular on the Spanish Wikipedia than on the English Wikipedia?
This paper poses an interesting question: are there differences between what is popular in different language Wikipedias? This is measured through the comparing the highest-traffic articles in different Wikipedias. The authors chose four: the German, English, Spanish, and French, using open-source software for the analysis (from the paper and the software page it is unclear whether the software was developed specifically for this project). The researchers obtained 65 most popular articles from six random months of 2009. They then divided the pages into categories: entertainment (ENT), current issues (CUR), politics and war (POL), geography (GEO), information and communication technologies (ICT), science (SCI), arts and humanities (ART), and sexuality (SEX).

Two tables were compiled, the first showing some major differences between the popularity of articles on different Wikipedias. For example entertainment topics form 45% of popular articles on English Wikipedia, but only 16% on Spanish, where in turn the science articles form 24% (compared with only 3% on the English site). The second table compares the most contributed to content, again noting significant differences between different Wikipedias, as well as suggesting a lack of a major relation between a content's area popularity and number of contributors.

The paper suffers from a number of issues. The authors noted that the division of articles into categories had to be done manually, but the paper does not describe how this was accomplished (this reviewer can't but wonder: how did the researchers deal with classification of an article that would fit into more than one category, for example); nor is there any appendix that would list the articles in question. Given the rather surprising findings ("most remarkable"), this methodological omission raises issues about the reliability of the research. A number of similar issues plague the paper; for example the tables contain a "MAIN" category that is explained nowhere in the paper. The paper does not discuss any potential biases or issues, such as how the results may not be representative of cultural traits, but of short-term media news coverage; or why the data was limited only to few months in 2009 and how this could have affected our ability to generalize from it. There may be, for example, seasonal patterns of interests in certain topics; for example, one could hypothesize that science topics would receive more visits during the school year than holiday months; and if holiday months are different in sampled countries, this could be a factor in the popularity of science topics. (On a side note, this reviewer would also like to point out that his own paper is cited totally out of context by the authors.)

Overall, such exploratory research is certainly valuable, but the authors stop short of any significant analysis of data, in fact noting themselves that the presented data would benefit from a deeper sociological or sociocultural analysis. Unfortunately, there is no indication that their data set has been made publicly available. Nonetheless, despite lack of significant analysis, and methodological issues, the authors' findings are quite intriguing, suggesting that there may be a much more significant difference in coverage of topics by different language Wikipedias than most have suspected so far.

Another paper by the same four authors, titled "Visitors and contributors in Wikipedia" examined a sampled pageview log of the top ten language versions of Wikipedia from 2009, discerning article views, views of history pages and edit requests (URLs with "action=edit" or "action=submit"). Among other things, they find that article views and edit requests "are highly correlated throughout the days of the week only for a group of Wikipedias: German, English, Spanish, French, Italian and Russian. This fact can be associated to a more participative attitude on behalf of the users of these editions as it seems that contributions come from the whole mass of visitors. On the contrary, editions where visits and edits are not correlated, or even negatively correlated [the Japanese and Dutch Wikipedias], can be considered as supported by a minority of contributors." (An earlier paper by some of the same researchers, based on the same 2009 sample, was reviewed in this space in 2011: "Wikipedians' weekends in international comparison".)

In brief
The study shows that the primary reason which new submissions fail review is due to a lack of "newsworthiness" and that, for the most part, University of Wollongong students struggle with a similar set of problems as new submitters do in general. However, the UoW students made slightly more submission attempts per article and were reject more often for lack of newsworthiness than new and regular Wikinews contributors. Overall, accredited contributors seem to be the most successful at passing through the review process. LauraHale concludes with a discussion of implications and recommendations for Wikinews, such as an "improved feedback system" for managing user's unwillingness to read the style guidelines before submitting.
 * Winning and losing argument patterns in deletion debates: A paper subtitled "How Experience Improves the Acceptability of Arguments in Ad-hoc Online Task Groups" (presented at the CSCW'13 conference earlier this year) applied the argumentation theory of Doug Walton to classify comments in a corpus of deletion debates on 72 Wikipedia articles (all AfD that were initialized or relisted on January 29, 2011). The four Ireland-based authors emphasize that compared to previous related research which used simpler methods to classify deletion arguments, e.g. based on keywords or policy areas such as notability, their manual analysis is much more thorough and fine-grained, coding AfD comments into 17 categories based on Walton's classification. Among these, the "Rules" and "Evidence" categories are the most popular, making up 36% of AfD arguments. The papers's two other main results are that "familiarity with community norms correlates with [newbies'] ability to craft persuasive arguments" and that "acceptable arguments use community-appropriate rhetoric that demonstrate knowledge of policies and community values while problematic arguments are based on personal preference and inappropriate analogy to other cases" (drawing a direct comparison between Walton's list of problematic arguments and Wikipedia's list of deprecated AfD arguments, e.g. Walton's "Argument from Analogy" corresponds to WP:OTHER - "Other stuff exists").
 * Why English Wikinews rejects submissions: This write-up describes a study performed by Wikimedian LauraHale on the English Wikinews exploring the acceptance and rejection of submission made by four types of contributors (accredited journalists, new contributors, regular contributors and University of Wollongong students). 203 submissions that failed to pass review were assessed from between January 1, 2013 and April 12, 2013. The article, published on meta, consists of a discussion this dataset.


 * Wikipedia as a discussion forum for Malaysian students: The study looks at a tiny sample of nine undergraduates from the Sunway University in Malaysia. The students in the ENGL1050: Thoughts and writing class were assigned to discuss a topic on Wikipedia. Although the paper does not cite any specific page or account name, based on the description provided the account User:ENGL1050 can be identified. Wikipedia was used as a discussion forum, with the instructor(s) and the student using a single account, and all of their edits consisting of editing the User:ENGL1050 page. The students had a generally favorite view of the assignment, with the majority agreeing that it is a useful tool of learning, collaboration and improving their English skills. Nonetheless, it is clear that the instructor(s) is not familiar with the basics of School and university projects, nor with the basic guidelines such as WP:NOTAFORUM. The described activity had nothing to do with Wikipedia as an encyclopedia, and treated Wikipedia simply as a popular wiki host. (The instructor(s) was likely not aware of the existence of Wikiversity, where such an activity would be within the project scope).


 * Using Wikipedia to predict the stock market: In an article published in Scientific Reports, the authors have studied the page views and edit numbers of Wikipedia articles to reveal correlations to stock market fluctuations. Although the idea of considering Wikipedia activity data as financial indicators has been previously introduced by other researchers (see for example the already reviewed paper on predicting movie revenues using the same source of data), applying the same idea to stock market data has led to interesting results. Moat et al. say "We present evidence in line with the intriguing suggestion that data on changes in how often financially related Wikipedia pages were viewed may have contained early signs of stock market moves". And to show that, they investigate the activity data of 285 Wikipedia articles "on financial topics" from 2007–2012 and establish trade strategies for the Dow Jones Industrial Average. They report on a higher return for the Wikipedia-based strategy in comparison to a random strategy. The article is featured in news media like Wired.com and wallstreet-online.de.
 * Main NPOV concerns in articles about corporations: Promotional language and inclusion of criticism: A conference paper titled "The Ideal of Neutrality on Wikipedia: Discursive Struggle over Promotion and Critique in Corporate Entries" analyzed the edit histories of "14 Finnish corporations ... utilizing the concept of discursive struggle by Laclau and Mouffe", according to the abstract. Looking for "the particular expressions (i.e. key signifiers) that caused NPOV-claims or discussions of neutrality", the researchers identifed two main points of contention: Promotional language in the articles about the corporations, and "corporate critique".


 * "Gangnam Style" pageview trends: A paper presented at this month's WWW 2013 conference describes "An Approach for Using Wikipedia to Measure the Flow of Trends Across Countries". Concretely, the authors a compared pageview numbers for the articles about the 2012 YouTube hit Gangnam Style and its creator, rapper PSY, in both the Korean and English Wikipedia, finding among other things that "the initial spike in views occurred first on the South [sic] Korean article 2 days before the English article" in both cases, and that they reached their peak before the Google trends statistics for the corresponding search terms. A second part of the paper looks at the cumulative page views for the entire Category:Artist (i.e. a set of 7,752 articles retrieved using DBpedia). These showed peaks around the time of the annual Grammy Awards in February 2011 and 2012, and "time periods of dormant activity ... during the months of December and March, which correspond to worldwide holidays of cultural and religious significance, including Christmas and Easter Vacation." Comparing the Gangnam Style/PSY page views against this general backdrop, the three researchers from the University of Southampton speculate that "monitoring a subset of articles may provide an indication of articles ‘soon-to-be’ popular", but appear to delegate the development of a more specific methodology to future research.