Wikipedia:Wikipedia Signpost/2017-06-23/Recent research

"Wikipedia, work, and capitalism. A realm of freedom?"

 * Review by Dorothy Howard

In his first book, ''Wikipedia, Work, and Capitalism. A Realm of Freedom?'', Arwid Lund, lecturer in the program of Information Studies (ALM: Archives, Libraries and Museums) at Uppsala Universitet, Sweden investigates the ideologies that he believes are shared by participants in peer-production projects like Wikipedia. The author typologizes the ways that Wikipedians understand their activities, including “playing v. gaming” and “working v. labouring,” (113-115) to explore his hypothesis that “there is a link between how Wikipedians look upon their activities and how they look upon capitalism.” (117) Lund characterizes peer-production projects by their shared resistance to information capitalism—things like copyright and pay-walled publishing, which they see as limiting creativity and innovation. His thesis is provocative. He claims that the anti-corporatist ideologies intrinsic to peer production and to Wikipedia are unrealistic because capitalism always finds a way to monetize free content. Overall, the book touches on many issues not usually discussed within the Wikipedia community, but which might be a useful entry point for those who want to consider the social impacts of the project.

Lund uses a combination of social critique and qualitative interviews conducted in 2012 to provide supporting evidence for his thesis. One recurrent theme is that Wikipedia is part of a larger trend in gamification—a design technique developed in Human–computer interaction (HCI) to describe the process of using features associated with "play" to motivate interaction and engagement with an interface. One example he gives is that editors report that they find Wikipedia's competitive and confrontational elements to be game-like. (143-144) He also claims that Wikipedians' descriptions of their work and play balance changes as they take on more levels of responsibility and professionalism in the community, such as adminship. Still, it’s highly questionable whether the 8 interviews, which mainly focus on the Swedish Wikipedia, are a sufficient sample size to make his claims scalable.

The culture of Wikipedia valorizes altruism in its embrace of volunteering for the project to produce information for the greater good. Lund argues that Wikipedians' belief in the altruistic aspect of the project, makes it easy for them to depoliticize their work and to ignore the how Wikipedia participates in the corporate, information economy. To him, Wikipedia is symptomatic of the devaluation of digital work, when in past generations, making an encyclopedia might be a source of income and employment opportunities for contributors.

So, he argues, contributors believe that peer production represents a space of increased autonomy, democracy, and creativity in the production of ideas. But from his view, attempts at a “counter-economy,” “hacker communism,” or “gift economies” (239, 303) are prone to manipulation, because we can’t create utopian bubbles within capitalism that aren’t privy to its influence. Still, peer production projects operate as if creation of value outside of the capitalist system is possible. Lund argues that Wikipedia cannot avoid competition with proprietary companies which see Wikipedia as a threat, and have an interest in harvesting its content for their own benefit. (218) Yet it would be nice if he brought in more examples to make this claim. The reader is left wondering who these corporate interests are, and what exactly they derive from Wikipedia. Having this information would help us understand where Lund is coming from.

Although the word “work” in the title might suggest that Lund focuses on wage labour, the author’s aims are more broad, and he uses the word to connote a variety of aspects of social, value-producing activities. (20) Namely, the production of “use-value,” the Marxist term for the productive social activity of creating things which are deemed useful and thus of value to be bought and sold in the market (even if producers don’t consider their work to be commodities). He draws from Marxist thinkers and semioticians, among them V.N. Volosinov, Terry Eagleton, and Louis Althusser, to unpack different approaches to describing why Wikipedians might feel like they are playing when they are really working. (107-108) Marxists call such assumptions “false consciousness,” but the concept is difficult because it requires us to analyze manifest and latent (discursive and non-discursive) awareness. It would have been useful for Lund to look at how the fields of anthropology or psychology talk about ideology. Both fields have extensively researched the topic. More stringent ethnographic or qualitative methods might have also made his argument more convincing. But, based on the references he provides, it seems that the book's target audience may be media theorists and social scientists, people who already familiar with Marxist political economy.

Lund makes a compelling case that capitalism instrumentalizes freely-produced knowledge for its own monetary gains. Meanwhile, he says, Wikipedia's design and its heavily ideological agenda, make it difficult for the community to address the issue. The book is an interesting contribution to ongoing conversations about how Wikipedia and projects motivated by copyleft principles can be defined from a social perspective.

How does unemployment affect reading and editing Wikipedia ? The impact of the Great Recession

 * Review by Tilman Bayer

A discussion paper titled "Economic Downturn and Volunteering: Do Economic Crises Affect Content Generation on Wikipedia?" investigates how "drastically increased unemployment" affects contribution to and readership of Wikipedia. To study this question statistically, the authors (three economists from the Centre for European Economic Research (ZEW) in Mannheim, Germany) regarded the Great Recession that began in 2008 as an "exogeneous shock" that affected unemployment rates in different European countries differently and at different times. They relate these rates to five metrics for the language version of Wikipedia that corresponds to each country:
 * "(1) aggregate views per month, (2) the number of active Wikipedians with a modest number of monthly edits ranging from 5 to 100, (3) the number of active Wikipedians with more than 100 monthly edits, (4) edits per article, and (5) the content growth of a corresponding language edition of Wikipedia in terms of words"

For each of these, the Wikimedia Foundation publishes monthly numbers. Since the researchers did not have access to country-level breakdowns of this data (which is not published for every country/language combination due to privacy reasons, except for some monthly or quarterly overviews which the authors may have overlooked, but only start in 2009 anyway), "to study the relationship of country level unemployment on an entire Wikipedia, we need to focus on countries which have an (ideally) unique language". This excluded some of the European countries that were most heavily affected by the 2008 crisis, e.g. the UK, Spain or Portugal, but still left them with 22 different language versions of Wikipedia to study.

An additional analysis focuses on district-level (Kreise) employment data from Germany and the German Wikipedia, respectively. None of the five metrics are available with that geographical resolution, so the authors resorted to the geolocation data for the (public) IP addresses of anonymous edits (which for several large German ISPs is usually more precise than in many other countries).

In both parts of the analysis, the economic data is related to the Wikipedia participation metrics using a relatively simple statistical approach (difference in differences), whose robustness is however vetted using various means. Still, since in some cases the comparison only included 9 months before and after the start of the crisis (instead of an entire year or several years), this leaves open the question of seasonality (e.g. it is well-known that Wikipedia pageviews are generally down in the summer, possibly due to factors like vacationing that might differ depending on the economic situation).

Summarizing their results, the authors write:
 * "we find that increased unemployment is associated with higher participation of volunteers in Wikipedia and an increased rate of content generation. With higher unemployment, articles are read more frequently and the number of highly active users increases, suggesting that existing editors also increase their activity. Moreover, we find robust evidence that the number of edits per article increases, and slightly weaker support for an increased overall content growth. We find the overall effect to be rather positive than negative, which is reassuring news if the encyclopedia functions as an important knowledge base for the economy."

While leaving open the precise mechanism of these effects, the researchers speculate that "it seems that new editors begin to acquire new capabilities and devote their time to producing public goods. While we observe overall content growth, we could not find robust evidence for an increase in the number of new articles per day [...]. This suggests that the increased participation is focused on adding to the existing knowledge, rather than providing new topics or pages. Doing so requires less experience than creating new articles, which may be interpreted as a sign of learning by the new contributors."

The paper also includes an informative literature review summarizing interesting research results on unemployment, leisure time and volunteering in general. (For example, that "conditional on having Internet access, poorer people spend more time online than wealthy people as they have a lower opportunity cost of time." Also some gender-specific results that, combined with Wikipedia's well-known gender gap, might have suggested a negative effect of rising unemployment on editing activity: "Among men, working more hours is even positively correlated with participation in volunteering" and on the other hand "unemployment has a negative effect on men’s volunteering, which is not the case for women.")

It has long been observed how Wikipedia relies on the leisure time of educated people, in particular by Clay Shirky, who coined the term "cognitive surplus" for it, the title of his 2010 book. The present study provides important insights into a particular aspect of this (although the authors caution that economic crises do not uniformly increase spare time, e.g. "employed people may face larger pressure in their paid job", reducing their available time for editing Wikipedia). The paper might have benefited from including a look at the available demographic data about the life situations of Wikipedia editors (e.g. in the 2012 Wikipedia Editor survey, 60% of respondents were working full-time or part-time, and were school or university students, with some overlap).

How complete are Wikidata entries?

 * Author's summary by Simon Razniewski

While human-created knowledge bases (KBs) such as Wikidata provide usually high-quality data (precision), it is generally hard to understand their completeness. A conference paper titled "Assessing the Completeness of Entities in Knowledge Bases" proposes to assess the relative completeness of entities in knowledge bases, based on comparing the extent of information with other similar entities. It outlines building blocks of this approach, and present a prototypical implementation, which is available on Wikidata as Recoin (https://www.wikidata.org/wiki/User:Ls1g/Recoin).

"Cardinal Virtues: Extracting Relation Cardinalities from Text"

 * Author's summary by Simon Razniewski

Information extraction (IE) from text has largely focused on relations between individual entities, such as who has won which award. However, some facts are never fully mentioned, and no IE method has perfect recall. Thus, it is beneficial to also tap contents about the cardinalities of these relations, for example, how many awards someone has won. This paper introduces this novel problem of extracting cardinalities and discusses the specific challenges that set it apart from standard IE. It present a distant supervision method using conditional random fields. A preliminary evaluation that compares information extracted from Wikipedia with that available on Wikidata shows a precision between 3% and 55%, depending on the difficulty of relations.

Conferences and events
See the research events page on Meta-wiki for upcoming conferences and events, including submission deadlines.

Other recent publications
''Other recent publications that could not be covered in time for this issue include the items listed below. contributions are always welcome for reviewing or summarizing newly published research.''
 * Compiled by Tilman Bayer


 * "Learning by comparing with Wikipedia: the value to students’ learning" From the paper: "The main purpose of this research work is to describe and evaluate a learning technique that actively uses Wikipedia in an online master’s degree course in Statistics. It is based on the comparison between Wikipedia content and standard academic learning materials. We define this technique as ‘learning by comparing’. [...] The main result of the paper shows that [...] active use of Wikipedia in the learning process, through the learning-by-comparing technique, improves the students’ academic performance. [...] The main findings on the students’ perceived quality of Wikipedia indicate that they agree with the idea that the encyclopaedia is complete, reliable, current and useful. Although there is a positive perception of quality, there are some quality factors that obtain better scores than others. The most valued quality aspect was the currentness of the content, and the least valued was its completeness."
 * "Use and awareness of Wikipedia among the M.C.A students of C. D. Jain college of commerce, Shrirampur : A Study"
 * "Comparative assessment of three quality frameworks for statistics derived from big data: the cases of Wikipedia page views and Automatic Identification Systems" From the abstract: " We apply these three quality frameworks in the context of 'experimental' cultural statistics based on Wikipedia page views"
 * "Discovery and efficient reuse of technology pictures using Wikimedia infrastructures. A proposal" From the abstract: "With our proposal, we hope to serve a broad audience which looks up a scientific or technical term in a web search portal first. Until now, this audience has little chance to find an openly accessible and reusable image narrowly matching their search term on first try .."
 * "Extracting scientists from Wikipedia" From the abstract: "... we describe a system that gathers information from Wikipedia articles and existing data from Wikidata, which is then combined and put in a searchable database. This system is dedicated to making the process of finding scientists both quicker and easier."
 * "Where the streets have known names" From the abstract: "We present (1) a technique to establish a correspondence between street names and the entities that they refer to. The method is based on Wikidata, a knowledge base derived from Wikipedia. The accuracy of this mapping is evaluated on a sample of streets in Rome. As this approach reaches limited coverage, we propose to tap local knowledge with (2) a simple web platform. ... As a result, we design (3) an enriched OpenStreetMap web map where each street name can be explored in terms of the properties of its associated entity."