Wikipedia:Wikipedia Signpost/2014-10-29/Recent research

Tl;dr: Users, informed consent and privacy policies online

 * Reviewed by Kim Osman

In new research conducted in light of proposed changes to data protection legislation in the European Union (EU), authors Bart Custers, Simone van der Hof, and Bart Schermer conducted a comparative analysis of social media and user-generated content websites’ privacy policies along with a user survey (N=8,621 in 26 countries) and interviews in 13 different EU countries on awareness, values, and attitudes toward privacy online. The authors state consent regarding personal data use is an important concept and observe, “There is mounting evidence that data subjects do not fully contemplate the consequences and risks of personal data processing.”

Custers, van der Hof and Schermer developed a set of criteria for giving informed consent about the use of personal data, including: “Is it clear who is processing the data and who is accountable?” and “Is the information provided understandable?” When existing privacy policies were applied to these criteria, Wikipedia was the worst performing of the sites analyzed and recommends that it makes clear how minors are dealt with and to provide additional clarity around security measures. It also notes that IP addresses may be traced, therefore making “anonymous” Wikipedia users identifiable.

The study did acknowledge issues around self-presentation and identity in different online contexts and the actual need for a site like Wikipedia to have an extensive privacy policy as users afford criteria regarding privacy different value in these different contexts. The authors do note however, “Wikipedia does collect opinions that may be attributable to individuals and that may be considered privacy sensitive.”

This paper is a well-researched summary of the privacy policies of online sites (including major international platforms like Facebook, Twitter and YouTube), and although from a European perspective (where data collection practices are arguably more stringent than in other places in the world), it raises important questions about how Wikipedia approaches its privacy policy in terms of informed user consent, and would be useful reading for anyone with an interest in how online practices are shaping approaches to user privacy.

For researchers requiring more information about ethics in online research visit the Association of Internet Researchers' wiki.

Briefly

 * Holocaust articles compared across languages: We tell ourselves that Wikipedia works well for the most part, but that finding consensus might break down on controversial articles. Of all article topics, perhaps none is potentially more fraught than the Holocaust, and that is precisely what Rudolf Den Hartogh has tackled in his Master's thesis "The future of the Past: A case study on the representation of the Holocaust on Wikipedia". It is an in-depth compare and contrast analysis of the Holocaust topic in the English, German, and Dutch. Several curious facts come out of this. For instance the average vandalism rate on these articles is 4%, compared with 7% globally - as these articles have been locked at some point, although the Dutch version is no longer protected. Other analyses show edit activity over time, since the articles' inception. The German version saw the height of its shaping 2 years after it was started in 2004, whereas the English and Dutch articles saw their main spurts 5 and 3 years later respectively. Moreover the author finds "that there does not exist one representation of the Holocaust, but each language version has its own unique account of events and phenomena." Finally they "found that none of the Holocaust entries under study is rated ‘good quality’," so we still have not definitively addressed the hardest parts of our encyclopedia.
 * Lensing Wikipedia aims to extract date, location, event and role semantic data from historical English Wikipedia articles. Of course making grand sense of that automatic extraction work requires visualization. Such visualization is difficult on high-dimensional data consisting of e.g. a date, location, multiple events and roles - all at the same time. A short proof of concept "Visualizing Wikipedia using t-SNE" by Jasneet Singh Sabharwal has done just this using a Barnes-Hut simulation variation of the T-distributed stochastic neighbor embedding algorithm. This image shows the closeness of the semantic roles of features found in Wikipedia article text, with colors indicating similar events that articles are describing.
 * "Infoboxes and cleanup tags: Artifacts of Wikipedia newsmaking" looks at use and abuse of cleanup tags and infobox elements as conceptual and symbolic tools. Based on ethnographic observations and several interviews, the author provides a lengthy description of the formative first three or so weeks in the 2011 Egyptian Revolution article. It is a valuable study of how articles are developed, and the collaboration and conflicts that are common in high-activity articles. The author provides a valuable observation that "Classification work... is intensely political" and "the editing of Wikipedia articles involves continuous linking and classifying." The choice of words, categories, article titles, but also specific tags or infoboxes (a particular example discussed - whether to use Template:Infobox uprising or not - concerns a now deleted template) can be quite controversial. The author also puts forth an interesting argument that removal of cleanup tags may give false impressions of stability in articles that are not yet stable; and that infoboxes carry significant, perhaps undue weight, compared to other elements of the article.
 * Wikipedia's identity "based on freedom": This paper looks at Wikipedia through a number of organizational theory lenses, in particular theories of organizational identity. Of particular interest to Wikipedians is one of the aspects analyzed by the editors - identify of the project. The authors state that "the organizational identity at Wikipedia is based on freedom". Next, they discuss the utopian ideals of freedom (such as "anyone can edit"), as contrasted with the freedom-reducing tendencies of censorship, administrative control, and bureaucratization. The authors argue that the common solution to criticism of Wikipedia, within the community, is concealment and marginalization of said criticism. The authors point to the practical defanging of the Ignore all rules policy, which has went through a number of meaning shifts, in which it was redefined to be virtually toothless, even though the name remained the same. Another way that freedom is limited is through end-justifies-the-mean utopian vision of "free access [to Wikipedia] for everyone", replacing the older "anyone can edit" "freedom of editing meaning. Unfortunately, the author's discussion of "the subjugation of contesting voices" is very short on details and specifics; the authors allude to administrator power abuse, but fail to provide any specific discussion of how it occurs; an example they used of "deleted content" can be interpreted as nothing more sinister then admin ability to delete content that does not meet Wikipedia's site policies, including uncontroversial content such as spam.
 * "Copyright or Copyleft? Wikipedia as a Turning Point for Authorship": This paper touches upon a very interesting yet understudied area: what Wikipedia's existence means for copyright law. As the authors note, Wikipedia "appears to challenge some of the notions at the heart of copyright law."
 * Critique of Wikipedia's dispute resolution procedures: This paper claims to presents an ethnographic analysis of and a strong critique of Wikipedia's dispute resolution procedures, and states upfront its goal as "to tease out systemic discrimination or injustice". The strongly worded abstract is attention-drawing, promising that "A number of flaws will be identified including the ability for vocal minorities to dominate the Wikipedia community consensus". Unfortunately, while the paper provides a very detailed description of Wikipedia's dispute resolution scene, it doesn't seem to present any new data; its critique of "vocal minorities", for example, is composed of few sentences, and the entire argument is based on, and essentially a repetition of a similar passage in Reagle's Good Faith Collaboration book. While the paper is well written and presents a number of valid arguments, it does not seem to contribute anything new to our understanding of Wikipedia, being in essence a literature review focused on the topic of dispute resolution on Wikipedia. Which this reviewer finds disappointing, considering that the almost tabloid-style abstract and the introductory section promise ethnographic research, which - like anything else going beyond synthesis of existing, published research - is sadly very much absent from the paper.

Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
 * "Insights from the Wikipedia Contest (IEEE Contest for Data Mining 2011)" (earlier coverage: "Predicting editor survival: The winners of the Wikipedia Participation Challenge")
 * "A Piece of My Mind: A Sentiment Analysis Approach for Online Dispute Detection" (constructs a dispute corpus from Wikipedia talk pages)
 * "Extracting Imperatives from Wikipedia Article for Deletion Discussions" (without conclusions or published dataset, apparently)
 * "Use of Wikipedia by Legal Scholars: Implications for Information Literacy"
 * "Guiding Students in Collaborative Writing of Wikipedia Articles – How to Get Beyond the Black Box Practice in Information Literacy Instruction" (received the EdMedia Outstanding Paper Award)
 * "Two Is Bigger (and Better) Than One: the Wikipedia Bitaxonomy Project" (project home page, allowing the live creation of a taxonomy graph for an arbitrary Wikipedia article: http://wibitaxonomy.org )
 * "Analysis of the accuracy and readability of herbal supplement information on Wikipedia"
 * "Maturity Assessment of Wikipedia Medical Articles"
 * "Computer-supported collaborative accounts of major depression: Digital rhetoric on Quora and Wikipedia"