Wikipedia:Wikipedia Signpost/2013-07-31/Recent research

Multilingual ranking analysis: Napoleon and Michael Jackson as Wikipedia's "global heroes"
An ArXiv preprint titled "Highlighting entanglement of cultures via ranking of multilingual Wikipedia articles", authored by a group of physicists from France, examines the Wikipedia articles on individuals and their position in the hyperlink network of the articles in each Wikipedia language edition. There are 9 language editions studied. The authors try to locate the most "important" individuals ("heroes") in each language edition by calculating two different page rank scores: PageRank and CheiRank. After making the lists of individuals with highest ranks in each language edition (with 30 individuals in each list), overlaps between lists are investigated and local and global "heroes" are introduced. The lists of "global heroes" are topped by Napoleon for PageRank, and Michael Jackson for 2DRank. It is shown that both local and global heroes exist and while global heroes gain their central position in the network due to links from multiple other central nodes, local heroes are mostly notable because of the large number of links directly pointing to them. Finally, based on the nationality (language of origin) of the highly ranked individual, a network of languages is constructed and the position of each language in this network is analysed by calculating rank scores. The authors also analyzed the activities of those important individuals, and have found politicians and scientists to be quite often among the most important ones.



Wikipedia as Cultural Reference: Srebrenica Massacre, Art and Menstruation

 * Editor's note: the contributing editor of this section, Han-Teng Liao, participated at the DMI Summer School 2013, though not affiliated with the DMI or University of Amsterdam.

The book chapter of "Wikipedia as Cultural Reference" in Richard A. Rogers' book "Digital Methods" can be read as an example of the "digital methods" applied to Wikipedia, or a contribution to the emerging literature on cross-language-version or cross-cultural comparison of the same or similar encyclopedia articles in global Wikipedia projects. Not to be confused with "big methods", "virtual methods", etc., the Digital Methods Initiative (DMI) is a school of Internet researchers at University of Amsterdam led by Rogers to 'create a platform to display the tools and methods to perform research that ... take advantage of "web epistemology"'. Currently the DMI has built some basic Wikipedia research tools that help social scientists to analyze cross-lingual images, anonymous edits, tables of contents, etc. Thus, as part of Rogers' research agenda in advocating the "digital methods", the Wikipedia projects become both a data set and analytical devices that can be repurposed for social research: "as a cultural reference, a vigilant community, a scandal machine and a controversy diagnostic machine". Self-defined as "cultural research with Wikipedia", this chapter compared the Srebrenica Articles (The Fall of Srebrenica, the Srebrenica Massacre, and the Srebrenica Genocide) across six language versions: Dutch, English, Bosnian, Croatian, Serbian, and Serbo-Croatian. Using various kinds of datasets, ranging from creation dates, edits by interlanguage article editors and top ten editors, the numbers of victims, tables of contents, referenced websites and images used, the findings show that the principle of neutral point of view does not automatically make Wikipedia articles universal (or at least similar) across language versions. The differences, especially those specific to the Wiki medium, can be used for cultural analysis on the selected topics. The content outcome is found to reflect the dynamics between the power editors in defending their sources and content using Wikipedia policies. Among these "umbrella articles", the English version is a highly contested article among many interlanguage editors, and the Serbo-Croatian version is much softened and unifying with very few editors.



Adopting and extending the digital methods, two groups of participants at the DMI summer school 2013 examined the cross-language-version differences on two topics: art and menstruation. The "Cross Lingual Art Spaces on Wikipedia" project (by Sangeet Kumar, Garance Coggins, Sarah Mc Monagle, Stephan Schlögl, Han-Teng Liao, Michael Stevenson, Federica Bardelli, and Anat Ben-David) sought to find the universal and specific articulations of the concept of art through (1) images and (2) concepts (i.e. strongly related articles), producing an image network visualization for 154 language versions and a concept network visualization for eight selected language versions. A Wikidata scraping tool was developed to identify different names for the same content for the process called "concept reference disambiguation". The second project, "Menstruation Across Cultures Online" (by Astrid Bigoni, Loes Bogers, Zuzana Karascakova, Emily Stacey and Sarah Mc Monagle) looked at the cultural differences of Wikipedia images and Google autocomplete suggestions to find associated images and search queries. In addition, the English version of the article on menstruation was compared with other English-language sources such as Urban Dictionary and Twitter, producing an interesting cross-platform comparative tag cloud. While not full research articles, the research outcomes of the two projects nonetheless demonstrated the potential directions for cross-cultural and cross-platform comparison, when Wikipedia projects are compared among themselves or with other online platforms that contain user-generated content and/or activities.

Decline of adminship candidatures on Polish Wikipedia
A conference paper titled "Does the Acquaintance Relation Close up the Administrator Community of Polish Wikipedia?" investigates why the Polish Wikipedia community of Administrators is growing slower than expected, as defined by a decrease in successful RfAs. The paper presents a useful literature review of related academic work on RfA, and is a welcome study of the under-researched population of editors at non-English Wikipedias. It seems to focus on the computer science dimension, with a developed statistics section, but little theory discussion. In this reviewer's opinion it would've been stronger if the authors engaged with more social science theory, such as the iron law of oligarchy.

The authors suggest at first such a decline may occur because administrators are chosen on the basis of acquaintance, thus creating a closed group which people lacking the right connections cannot join. Later, they conclude that this is unlikely, instead pointing to growing expectations about new candidates. Both of those would be valid hypotheses, but neither is clearly tied to any theory or previous study. The authors' analysis of the data is problematic; at one point they contradict themselves, noting that "[One of the observed phenomena] could indicate, however, that the community is closing up after all" although later their conclusion states "Our conclusion is that it cannot be claimed with certainty that the Polish Wikipedia community is closing up.".

The authors also misunderstand how the WP:RFA process works on English Wikipedia, noting that one of the key differences between Polish and English Wikipedia is voting, as in "in the case of English version of Wikipedia, new administrators are elected not by voting, but by discussion". That the authors are ready to take such policy claims at face value does cast a little doubt on the applicability of their findings.

Overall, the paper presents some interesting statistical data on trends in an understudied community, and contributes to our understanding of the governance of Wikipedia. The analysis of the received data is however rather lacking, particularly through weak ties to literature on leadership, volunteer motivation and related social science areas.

90% of Wikipedia articles have "equivalent or better quality than their Britannica counterparts" in blind expert review
A Portuguese-language dissertation at the University of Évora, titled "Colaboração em Massa ou Amadorismo em Massa?" ("Mass collaboration or mass amateurism?") compared the quality of English Wikipedia with that of Encyclopaedia Britannica. As summarized in English on the author's blog, a representative random sample of 245 article pairs from both encyclopedias was generated, and "reformatted to hide [their] source and then graded by an expert in its subject area using a five-point scale. We asked experts to concentrate only on some [...] intrinsic aspects of the articles' quality, namely accuracy and objectivity, and discard the contextual, representational and accessibility aspects. Whenever possible, the experts invited to participate in the study are University teachers, because they are used to grading students' work not using the reputation of the source." They rated "90% of the Wikipedia articles ... as having equivalent or better quality than their Britannica counterparts".

First WikiSym 2013 papers available
The annual WikiSym research conference is taking place in Hong Kong from August 5 to 7. Since June, the organizers have been featuring the abstracts of the conference's papers on the conference blog, with online publication of full texts planned for August 5. But several authors have already made their papers available elsewhere:
 * Barnstars: "A Preliminary Study on the Effects of Barnstars on Wikipedia Editing" analyzed 21,299 barnstars awarded to 14,074 editors on the English Wikipedia, and found that users tended to be less active in article editing after receiving or presenting barnstars. Although there has been previous research questioning the effectiveness of barnstars, the authors here stop short of concluding that barnstars don't work, but instead hypothesize that the observed effect may be simply because an editor's high activity period "subsequently catches the attention of other editors, who are then more likely to reward them with barnstars."
 * News coverage on Wikipedia and Wiktionary: Researcher Brian Keegan, who has published various research papers on how Wikipedia editors cover breaking news events, uses sociologist Thomas Gieryn's concept of boundary-work to explore "how Wikipedia's response to the 9/11 attacks expanded the role of the encyclopedia to include newswork" in the early years of the project, and describes the "failure of Wikinews" which according to the author "illustrates the pitfalls of misappropriating professional newswork norms as well as the challenges of sustaining online communities."
 * Software library for analyzing collaboration networks on Wikipedia: "Analyzing Multi-Dimensional Networks within MediaWikis" presents a software library for analyzing "a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories", using NodeXL.
 * "An Actionable Quality Model for Wikipedia: Co-authored by the late John Riedl (see "Briefly" section), this paper contains both an overview of existing efforts to assess article quality on Wikipedia and a proposal for a new "simple model of article quality with actionable features".
 * "Temporal Analysis of OpenStreetMap users activity: Taha Yasseri from Oxford Internet Institute, who already has a paper on Circadian and Weekly Patterns of Wikipedia Editorial Activity, together with Giovanni Quattrone, and Afra Mashhadi from University College London, have studied the temporal patterns of user activity on OpenStreetMap, the wiki-based collaborative mapping project. By applying Principal Component Analysis, they have shown how the pattern of editing has been changing over years, most likely due to increase in use of mobile devices by the mappers. The Study compares the two cases of mappers in London and Rome showing a faster change in London compared to Rome.

Survey participation bias analysis: More Wikipedia editors are female, married or parents than previously assumed
The fact that Wikipedia's editing community has a huge gender gap (with vastly more male than female editors contributing to the encyclopedia) was first brought to wider attention by a 2008 survey of Wikipedia readers and editors, whose results were published by UNU-MERIT and the Wikimedia Foundation in 2010. It found that only 17.8% of US-based editors were female, and 12.7% globally. As reported in the Signpost at the time, some concerns were voiced about the possible impact of participation bias on the results (an effect which is frequent in volunteer web surveys), for example because the survey had also found a gender gap in Wikipedia readers (39.9% female in the US), in contrast to other research which estimated the gender ratio among readers closer to 50%.

A new PloS ONE paper titled "The Wikipedia Gender Gap Revisited: Characterizing Survey Response Bias with Propensity Score Estimation" has made it possible for the first time to quantify this participation bias, regarding the subset of US-based editors. Using a method for propensity adjustment for web surveys first published in a 2011 statistical paper, they compare the 2008 survey with Pew Research data from around the same time, which is assumed to be free of the same kind of bias because it was based on different methodology (a phone survey), and had found 49.0% of US Wikipedia readers to be female. The authors write: "We estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%)." Likewise, they find evidence that the proportion of editors who are "married, or parents, [had] been underestimated, while the proportions of immigrants and students [had] been overestimated."

The authors emphasize that their results do not negate the existence of the gender gap in general ("the basic takeaways in regards to the underrepresentation of women in the WMF/UNU-MERIT survey remain intact"), and actually call for "the Wikimedia Foundation's strategic goal to increase female editorship to 25% [...] to be raised in light of these adjusted estimates." They observe that their method is not applicable to the three subsequent editor surveys conducted by the Wikimedia Foundation in 2011/12 (the most recent one by this reviewer), because they focused solely on editors, and therefore the necessary reader comparison data (e.g. the data from Pew Research surveys) is not available. Still, the paper's results will definitely have a positive impact on the research efforts by the Foundation and others to better understand the demographics of the Wikipedia editing community.

Briefly

 * "Researching collaboration for a better world: John T. Riedl (1962–2013)": A blog post by Dario Taraborelli in memory of computer scientist John Riedl and his numerous contributions to understanding of Wikipedia, ranging from the development of SuggestBot, vandalism, deletion, quality control, and editor retention to the gender gap
 * "Coordination and Learning in Wikipedia: Revisiting the dynamics of exploitation and exploration": An academic paper published for researchers of the sociology of organizations, under the volume topic of "Managing ‘Human Resources' by Exploiting and Exploring People's Potentials", applies the exploration vs. exploitation trade-off learning theory to understand the evolution of Wikipedia . The authors thus identify three periods in the evolution of Wikipedia: (i) the establishment/take-off period from 2001 to 2002, (ii) the growth/consolidation period from 2003 to 2006, and (iii) maturation/sustainability period from 2007 onwards.
 * Overview of Wikipedia and other online encyclopedias in China: An academic blog post shares research materials for journalists to cover the Wikimania 2013 and Wikisym+Opensym 2013 events to be held in Hong Kong. It provides up-to-date information on Chinese-language user-generated content and online encyclopedias.
 * "Peer production online community infrastructures": An academic conference paper that examines the role of centralized and decentralized governance and platform architectures in determining a social software system's excludability: the degree to which users can control who contributes to or consumes the system's resources. Closed-source software platforms like Facebook and Twitter are the most excludable. Users have no control over the design of the platform, no ownership of the content, and the system owners have the right and the power to arbitrarily censor content or block contributors. Free software platforms like Kune allow both decentralized architecture and decentralized governance: they can be hosted anywhere and users themselves can decide how the platform and its content are used. Peer-to-peer network services, especially Darknets, are the least excludable. These services are decentralized and anonymous, so users potentially have more privacy and information security. But these features also facilitate their use in criminal activity. Wikipedia exists somewhere in the middle: the use of CC-by-SA license for content, and community-created policies for governance, reduce excludability.  But the Wikimedia Foundation's ownership of the production servers  (along with the technical power invested in administrators) make Wikipedia's architecture and governance more centralized, introducing a degree of excludability.
 * Education Program case study: A paper titled "Wikipedia as a Tool for Teaching Policy Analysis and Improving Public Policy Content Online" shares project objectives and lessons learned from having a class at the Trachtenberg School of Public Policy and Public Administration at George Washington University participate in a Wikipedia writing assignment as part of the Wikimedia Foundation's 2010/11 Public Policy project.
 * "Digital citizens" in the classroom: Similarly, a conference paper titled "Becoming Digital Citizens: Using Wikipedia to Enhance the Classroom" describes the outcome of one course participating in the Wikipedia education Program, including a small survey among participating students (10 respondents). Another paper about the Education Program appeared in First Monday recently.
 * Use P2P techniques to support Wikipedia hosting: According to a simulation by two German computer scientists, the Wikimedia Foundation "can reduce the traffic needed for article lookups in case of Wikipedia up to 72%" by having participants in a P2P network storing and serving some articles from their machines, while still also serving them from a central installation (cloud).
 * Dissertation about vandalism: A dissertation titled "Damage detection and mitigation in open collaboration applications" examines the subject of vandalism on Wikipedia. The author is well-known to Wikipedians as the programmer of the widely used "STiki" vandalism-fighting tool, and for conducting a controversial vandalism experiment himself in 2010.
 * Maintenance tag analysis: A thesis titled "Analyzing and Predicting Quality Flaws in User-generated Content: The Case of Wikipedia" examines the use of cleanup tags on the English Wikipedia. Some of the author's previous work was covered earlier in this newsletter (e.g. " more effective than  ").
 * Wikibooks case study: In "The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability", a group of researchers (including Wikimedian Magnus Manske) describe their use of Wikibooks as a platform to write a handbook about Next Generation Sequencing (NGS). Another paper titled "Analysis of Existing Technological Platforms for the Collaborative Production of Open Textbooks" contains a summary of the advantages and drawbacks of Wikibooks compared to similar platforms.
 * Scraping Wikipedia tables: A conference paper describes "Methods for Exploring and Mining Tables on Wikipedia", with an online demo available.
 * Wiktionary and OmegaWiki compared: A paper analyzing the usefulness of Wiktionary and OmegaWiki for translation applications summarizes the differences of the two platforms as follows: "While the openness and flexibility of Wiktionary has attracted many users, leading to a resource of considerable size and richness, the non-standardized structure of entries also leads to difficulties in the integration into translation applications. OmegaWiki, on the other hand, does not suffer from this problem, but the self-imposed limitations to maintain integrity also constrain its expressiveness and, along with that, the range of information which can be represented in the resource." The authors propose a method for using both at the same time, by automatically aligning the two resources at the level of word senses with good precision. This yields a substantial increase of coverage, especially concerning available translations."
 * Quoting Wikipedians in research papers: try to ask them: On the group blog "Ethnography Matters", researcher Heather Ford explored the ethical dilemma of how to quote online statements by members of collaborative communities such as Wikipedia in research papers: anonymously or by name? Ford arrives at the conclusion that "For now ... I'll use my best efforts to contact those whose statements and conversations on Wikipedia I want to quote. More generally, I'm going to continue to talk to Wikipedians about what they think about these issues."
 * "Algorithmic governance" of Wikipedia: A conference paper titled "Work-to-Rule: The Emergence of Algorithmic Governance in Wikipedia" "collected qualitative and quantitative data from Wikipedia in order to show how a community's consensus gradually converts social mechanisms into algorithmic mechanisms".