Wikipedia:Wikipedia Signpost/2012-08-27/Recent research

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

Wikipedia-based graphs visualize influences between thinkers, writers and musicians
In a blog post titled "Graphing the history of philosophy", Simon Raper of the company MindShare UK describes how he constructed an influence graph of all philosophers using the "Influenced by" and "Influenced" fields of Template:Infobox philosopher (example: Plato). This information was retrieved using DBpedia with a simple SPARQL query. After some cleanup, the result, consisting of triplets in the form  was processed using the open source graph visualization package Gephi to create an impressive overview of the philosophers within their respective spheres of influence.

Brendan Griffen extended the idea to "everyone on Wikipedia. Well, everyone with an infobox containing ‘influences’ and/or ‘influenced by’", arriving at a huge, far more dense "Graph Of Ideas" including not only philosophers, but also novelists, fantasy and science fiction writers, and comedians. In another blog post, Griffen added transitive links as well – so that each person is considered to be influenced both directly and indirectly. The most connected people in the graph were ancient Greek thinkers, with Thales, Pythagoras and Zeno of Elea occupying the top three spots. Griffen remarks that this vindicates a statement in Bertrand Russell's History of Western Philosophy (1945): "Western Philosophy begins With Thales".

Also inspired by Raper's posting, Tony Hirst posted a number of visualizations of the Wikipedia link and category structure (likewise using DBpedia and Gephi, queried via the Semantic Web Import plugin) to visualize related entries and influence graphs in the English Wikipedia. The blog posts (all of which include detailed step-by-step tutorials) examine the related graph of philosophers, and also visualize an influence graph of programming languages and one of musical genres related to psychedelic music. All these visualizations and blog posts by Hirst are released under a Creative Commons Attribution license.

Hirst also mentioned a related tool called "WikiMaps", the subject of a recent article in the International Journal of Organisational Design and Engineering. As described in a press release, the tool provides a "map of what is “important” on Wikipedia and the connections between different entries. The tool, which is currently in the “alpha” phase of development, displays classic musicians, bands, people born in the 1980s, and selected celebrities, including Lady Gaga, Barack Obama, and Justin Bieber. A slider control, or play button, lets you move through time to see how a particular topic or group has evolved over the last 3 or 4 years." A demo version is available online.

See also the recent coverage of a similar visualization, based on wikilinks instead of infoboxes: "The history of art mapped using Wikipedia"

Information retrieval scientists turn their attention to Wikipedia's page view logs
The Time-aware Information Access workshop at this year's SIGIR (Special Interest Group on Information Retrieval) conference brought a wave of attention to Wikipedia's public page-view logs. Detailing the number of page views per hour for every Wikipedia project, these files figure prominently in a variety of open-source intelligence applications presented at the workshop.

A group of researchers from ISLA, University of Amsterdam created an API providing access to this data and performing simple analysis tasks. Though the site appears to be down at the time of writing, the API supports the retrieving a particular article's page-view time series as well as searching for other wikipedia articles based on the similarity of their time series. In addition to machine-readable JSON results, the API will supply simple plots in png format. While the idea of providing page specific time series is not new, support for finding other pages with similar viewing patterns highlights a fascinating new use for Wikipedia page views.

Two other papers are combining Wikipedia page-view information with external time-series data sets. On the intuition that Wikipedia page views should have a strong correlation with real-world events, researchers from the University of Glasgow and Microsoft built a system to detect which hashtags frequently queried on Bing Social Search were event-related. For example, the hashtag #thingsthatannoyme doesn't clearly correspond to an event, whereas a hashtag like "#euro2012" is about the UEFA European Football Championship. After tokenizing the hashtags into a list of words, the researchers queried Wikipedia for those terms and correlated the time series of hashtag search popularity with the page-view time series for the articles which are returned. This correlation score can be used to indicate which hashtags are likely to be about events, a useful feature for web searches and any other temporally aware zeitgeist application.

In a similar vein, researchers from the University of Edinburgh and University of Glasgow used the Wikipedia page-view stream to tackle the problem known as first-story detection (FSD), which aims to automatically pick out the first publication relating to a new topic of interest. While traditional techniques primarily focus on newswire or Twitter, the authors used a combination of Twitter and Wikipedia page views to construct an improved FSD system. To improve on state-of-the-art Twitter-only FSD systems, the authors aimed to filter out false positives by checking that the Twitter-based first stories corresponded to a Wikipedia page that was also experiencing heightened traffic during the same period.

Using a simple outlier detection method, the authors created a set of Wikipedia pages with unexpectedly high page views for each hour. Each Twitter-based first story (tweet) was then matched against the corresponding collection of Wikipedia outliers, employing an undisclosed metric of textual similarity that uses only the Wikipedia page titles. If the tweet failed to match any spiking Wikipedia page, it was down-weighted as a first story candidate. The authors showed that this combined approach improves FSD precision in comparison to a twitter-only baseline for all but the most popular twitter-based stories. Though this research makes advances on the difficult task of first-story detection, perhaps the most immediately useful finding is that Wikipedia page views appear to lag behind twitter activity by roughly two hours. In general, we can expect to see an increasing amount of joint models over various open-source intelligence streams as we learn exactly what each stream is useful for and the relationships between the streams.

See also the Signpost coverage of a small study of the highest hourly page views on the English Wikipedia during January-July 2010, and their likely causes: "Page view spikes"

The limits of amateur NPOV history
In "The inclusivity of Wikipedia and the drawing of expert boundaries: An examination of talk pages and reference lists", information studies professor Brendan Luyt of Nanyang Technological University looks at History of the Philippines, a B-class article that had featured article status from October 2006 until it was delisted at the conclusion of its featured article review in January 2011.

Luyt argues that talk-page discussions, the types of sources cited, and the organization of the article itself, all point to a very traditional view of what constitutes history: in short, great man history concerned mainly with political and military events, and the actions of elites. This style of history does not capture the breadth of approaches used by professional historians, so does not live up to the ideal of NPOV in which all significant viewpoints published in reliable sources are represented fairly and proportionately. In practice, Luyt shows, editors (lacking sufficient knowledge of the relevant professional historical literature) end up using arguments over bias and NPOV to construct a limited and conservative historical narrative—for this article at the least, although a similar pattern could be found for many broad historical topics.

The sources cited are primarily what Luyt calls "textbookese" summaries, easily available online, which focus on bare facts without the historical debates that surround them. Between the valid sources and experts recognized by Wikipedia editors and the good-faith use of the NPOV principle to limit other viewpoints, Luyt concludes that—rather than being more inclusive of diverse views and sources than the typical "expert" community—Wikipedia in practice recognizes a considerably narrower set of viewpoints.

Three new papers about Wikipedia class assignments
An article titled "Assigning Students to edit Wikipedia: Four Case Studies" presents the experiences of four professors who participated in the Wikipedia Education Program, in a total of six courses total (two of four instructors taught two classes each). The lessons from the assignments included: 1) the importance of strict deadlines, even for graduate classes; 2) having a dedicated class for acquiring skills in editing and for understanding Wikipedia policies, or spreading this over segments of several classes; 3) the benefits of having students interact with the campus ambassadors and the wider Wikipedia community.

Overall, the instructors saw that compared with their engagement in traditional assignments, students were more highly motivated, produced work of higher quality, and learned more skills (primarily, related to using Wikipedia, such as being able to better judge its reliability). Wikipedia itself benefited from several dozen created or improved articles, a number of which were featured as DYKs. The paper presents a useful addition to the emerging literature on teaching with Wikipedia, as one of the first serious and detailed discussions of specific cases of this new educational approach.

"Integrating Wikipedia Projects into IT Courses: Does Wikipedia Improve Learning Outcomes?" is another paper that discusses the experiences of instructors and students involved in the recent Wikipedia:Global Education Program. Like most existing research in this area, the paper is roughly positive in its description of this new educational approach, stressing the importance of deadlines, small introductory assignments familiarizing students with Wikipedia early in the course, and the importance of close interactions with the community. A poorly justified (or explained) deletion or removal of content can be quite a stressful experience to students (and the newbie editors are unlikely to realize that an explanation may be left in an edit summary or page-deletion log). A valuable suggestion in the paper was that instructors (professors) make edits themselves, so they would be able to discuss editing Wikipedia with students with first-hand experience instead of directing students to ambassadors and how-to manuals; and to dedicate some class time to discussing Wikipedia, the assignment, and collective editing.

A four-page letter in the Journal of Biological Rhythms by a team of 48 authors reported on a a similar undergraduate class project in early 2011, where 46 students edited 15 Wikipedia articles in the field of chronobiology, aiming at good article status. After their first edits, they were systematically given feedback by one "Wikipedia editor and 6 experts in chronobiology" before continuing their edits (in the paper's acknowledgements the authors also thank "innumerable Wikipedia editors who critiqued student edits"). Because of the high visibility of the results – most of the articles were ranked top in Google results – students found the experience rewarding. Topics were selected collaboratively by the class, and because students came up with a relatively small number of suggestions, one concern was that the project might, if repeated, run out of article topics in the given subject area.

A literature review presented at July's Worldcomp'12 conference in Las Vegas about "Wikipedia: How Instructors Can Use This Technology As A Tool In The Classroom" also recommended to have students actively edit Wikipedia (as well as practicing to read it critically), and concluded that "it is time to embrace Wikipedia as an important information provider and one of the innovative learning tools in the educators' toolbox."

Substantive and non-substantive contributors show different motivation and expertise
"Investigating the determinants of contribution value in Wikipedia" reports the results of a survey of Wikipedians who were asked their opinion about the "contribution value" of their edits (measured by agreement to statements such as "your contribution to Wikipedia is useful to others"), which was then related to various characteristics.

The researchers used Google to obtain a list of 1976 Wikipedia users’ email addresses (using keywords such as “gmail.com” or “hotmail.com”). They sent invitation emails that provided the URL to the online questionnaire. In six weeks, 234 editors completed all the questions. Of these, 205 – Nine females and 196 males – supplied a valid user name and were considered in the rest of the analysis (anonymous editors were removed).

A content analysis was performed of 50 randomly selected edits by each respondent (or all, if the user had fewer than 50 edits), classifying them as "substantive" changes (e.g. "add links, images, or delete inaccurate content") and "non-substantive changes" (e.g. "reorganizing existing content [or] correcting grammatical mistakes and formatting texts to improve the presentation"), corresponding to "two [proposed] new contributor types in Wikipedia to discriminate their editing patterns."

An attempt was made to relate this to the "contribution value" the respondents assigned to their own edits, and to their responses in two other areas: The "breadth" of interests and resources was defined as the number of ratings above a certain threshold in each, and the "depth" as the highest rating assigned in each.
 * "interests" (measured by respondent ratings of a variety of different motivations to contribute to Wikipedia on how well each applied to themselves, e.g. "Enhancing your learning abilities, skills and expertise"); and
 * "resources" (meaning expertise based on education, profession and hobbies, measured by respondent ratings of their expertise in a variety of fields within eachc, e.g. "Hospitality and tourism").

In an "important consideration for practitioners", the authors wrote that:
 * "[T]o produce valuable contributions, users with high depth of interests and resources should be encouraged to concentrate their efforts on substantive changes. Meanwhile, for users with high breadth of interests and resources, wiki practitioners should advise them to pay more attention to nonsubstantive changes. The findings imply that practitioners can try to identify two distinct types of users. To achieve this objective, they may develop certain algorithms in wikis to automatically detect the frequencies of substantive/non-substantive changes of users. ... For example, notification messages about wiki articles that need substantive changes can be sent to users who have high levels of depth of interests and resources. Similarly, well-prepared messages about articles that need non-substantive changes can be delivered to users who have high levels of breadth of interests and resources."

Is there systemic bias in Wikipedia's coverage of the Tiananmen protests?
Wikipedia: Remembering in the digital age is a masters dissertation by Simin Michelle Chen, examining collective memories as represented on the English Wikipedia; she looked at how significant events are portrayed (remembered) on the project, focusing on the Tiananmen Square Protests of 1989. She compared how this event was framed by the articles by New York Times and Xinhua News Agency, and in Wikipedia, where she focused on the content analysis of Talk:Tiananmen Square protests of 1989 and its archives.

Chen found that the way Wikipedia frames the event is much closer to that of The New York Times than the sources preferred by the Chinese government, which, she notes, were "not given an equal voice" (p. 152). This English Wikipedia article, she says, is of major importance to China, but is not easily influenced by Chinese people, due to language barriers, and discrimination against Chinese sources that are perceived by the English Wikipedia as unreliable – that is, more subject to censorship and other forms of government manipulation than Western sources. She notes that this leads to on-wiki conflicts between contributors with different points of views (she refers to them as "memories" through her work), and usually the contributors who support that Chinese government POV are "silenced" (p. 152). This leads her to conclude that different memories (POVs) are weighted differently on Wikipedia. While this finding is not revolutionary, her case study up to this point is a valuable contribution to the discussion of Wikipedia biases.

While Chen makes interesting points about the existence of different national biases, which impact editors' very frames of reference, and different treatment of various sources, her subsequent critique of Wikipedia's NPOV policy is likely to raise some eyebrows (pp. 48–50). She argues that NPOV is flawed because "it is based on the assumption that facts are irrefutable" (p. 154), but that those facts are based on different memories and cultural viewpoints, and thus should be treated equally, instead of some (Western) being given preference. Subsequently, she concludes that Wikipedia contributes to "the broader structures of dominance and Western hegemony in the production of knowledge" (p. 161).

While she acknowledges that official Chinese sources may be biased and censored, she does not discuss this in much detail, and instead seems to argue that the biases affecting those sources are comparable to the those affecting Western sources. In other words, she is saying that while some claim Chinese sources are biased, other claim that Western sources are biased, and because the English Wikipedia is dominated by the Western editors, their bias triumphs – whereas ideally, all sources should be acknowledged, to reduce the bias. The suggestion is that Wikipedia should reject NPOV and accept sources currently deemed as unreliable. Her argument about the English Wikipedia having a Western bias is not controversial, was discussed by the community before (although Chen does not seem to be aware of it, and does not use the term "systemic bias" in her thesis) and reducing this bias (by improving our coverage of non-Western topics) is even a goal of the Wikimedia Foundation. However, while she does not say so directly, it appears to this reviewer that her argument is: "if there are no reliable non-Western sources, we should use the unreliable ones, as this is the only way to reduce the Western bias affecting non-Western topics". Her ending comment that Wikipedia fails to leave to its potential and to deliver "postmodern approach to truth" brings to mind the community discussions about verifiability not truth (the existence of this debates she briefly acknowledges on p. 48).

Overall, Chen's discussion of biases affecting Wikipedia in general, and of Tiananmen Square Protests in particular, is useful. The thesis however suffers from two major flaws. First, the discussion of Wikipedia's policies such as reliable sources and verifiability (not truth ...) seems too short, considering that their critique forms a major part of her conclusions. Second, the argumentation and accompanying value-judgements that Wikipedia should stop discriminating against certain memories (POVs) is not convincing, lacking a proper explanation of the reasons why the Wikipedia community made those decisions favoring verifiability and reliable sources over inclusion of all viewpoints. Chen argues that Wikipedia sacrifices freedom and discriminates against some memories (contributors), which she seems to see as more of a problem that if Wikipedia was to accept unreliable sources and unverifiable claims.

"Low-hanging fruit hypothesis" explains Wikipedia's slowed growth?
A student paper titled "Wikipedia: nowhere to grow" from a Stanford class about "Mining Massive Data Sets" argues for the "low-hanging fruit hypothesis" as one factor explaining the well-known observation that "since 2007, the growth of English Wikipedia has slowed, with fewer new editors joining, and fewer new articles created". The hypothesis is described as follows: "the larger [Wikipedia] becomes, and the more knowledge it contains, the more difficult it becomes for editors to make novel, lasting contributions. That is, all of the easy articles have already been created, leaving only more difficult topics to write about". The authors break this hypothesis into three smaller ones that are easier to test – that (1) there has been a slowing in edits across many languages with diverse characteristics; (2) older articles are more popular to edit; and (3) older articles are more popular to read. They find a support for all three of the smaller hypotheses, which they argue supports their main low-hanging fruit hypothesis.

While the overall study seems well-designed, the extrapolation from the three subhypotheses to the parent hypothesis seems problematic. The authors do not provide a proper operationalization of terms such as "novel", "lasting", and "easy/difficult", making it difficult to enter into a discourse without risking miscommunication. There may be at least four main issues in the work:
 * (a minor but annoying issue): hypothesis II is incorrectly and confusingly worded in the section dedicated to it: "Older articles (those created earlier) will be more popular to read than more newly created articles"; however, their study of hypothesis II is based on the number of edits to the article, not the number of page views (those are analyzed in the subsequent hypothesis III);
 * regarding the claim "all of the easy articles have already been created, leaving only more difficult topics to write about", it is true that the majority of vital/core articles are developed beyond stub, and their subsequent expansion is more difficult (it takes more and more effort to move the article up through assessment classes). However, while the older articles are more popular, they are not necessarily easier to edit, as The Core Contest illustrates. While almost everyone may be able to quickly define (stub) Albert Einstein, it is questionable whether 1) developing this article is easier than developing an article on a less well-known subject, where fewer sources mean the editors need to do less research, and 2) while mostly everyone knows who Einstein was, everyone also has knowledge of at least some less popular subjects. As Missing articles illustrate, there are still many articles in need of creation, and for a fan/expert, it may be easier to create an article on an esoteric subject than to edit the article on Einstein.
 * The claim that "[it is more difficult] for editors to make novel, lasting contributions" is difficult to analyze due to the lack of operationalization of those terms by the authors, but 1) regarding novel, if it means new, see the Missing articles argument above – there is still plenty to write about; and 2) regarding lasting – the authors do not cite any sources suggesting the deletionism in English Wikipedia may be on the rise.

Overall, the paper presents four hypotheses, three of which seem to be well supported by data, and contribute to our understanding of Wikipedia, but their main claim seems rather controversial and poorly supported by their data and argumentation.

See also the coverage of a related paper in a precursor of this research report last year: "IEEE magazine summarizes research on sustainability and low-hanging fruit"

Briefly

 * Barnstars at ASA annual conference: Two Wikipedia papers were presented at the 2012 annual meeting of the American Sociological Association last week, both focusing on "barnstar" awards on Wikipedia.Michael Restivo and Arnout van de Rijt presented their research on the effect of barnstars, titled "Experimental Study of Informal Rewards in Peer Production", which had found that assigning "editing awards or 'barnstars' to a subset of the 1% most productive Wikipedia contributors ... increases productivity by 60% and makes contributors six times more likely to receive additional barnstars from other community members", as stated in the abstract. See the review in the April issue of this report: "Recognition may sustain user participation". Benjamin Mako Hill, Aaron Shaw, and Yochai Benkler presented "Status, Social Signaling, and Collective Action: A Field Study of Awards on Wikipedia", with a more skeptical look at the effect of barnstars. According to the abstract, "Willer has argued for a sociological mechanism for the provision of public goods through selective incentives. Willer posits a "virtuous circle" in which contributors are rewarded with status by other group members and in response are motivated to contribute more. [... But] there is reason to suspect that not all individuals will be equally susceptible to status-based awards or incentives. At the very least, Willer's theory fails to take into account individual differences in the desire to signal contributions to a public good. We test whether this omission is justified and whether individuals who do not signal status in the context of collective action behave differently from those who do in the presence of a reputation-based award. [Analyzing barnstars on Wikipedia,] we show that the social signalers see a boost in their editing behavior where non-signalers do not."
 * How high school, college and PhD students evaluate Wikipedia quality: "Trust in online information A comparison among high school students, college students and PhD students with regard to trust in Wikipedia" is a master thesis that looks at how these three groups judge the trustworthiness of Wikipedia articles, based on the "3S-model" model by the advisors of the thesis (Lucassen and Schraagen (2011), Factual Accuracy and Trust in Information: The Role of Expertise. Journal of the American Society for Information Science & Technology, 62, 1232–1242). Unsurprisingly, the more educated the group is, the more detailed their analysis will be. High school students usually focus on accuracy, completeness, images, length, and writing style. College and PhD students go beyond those five elements, although looking at authority, objectivity, and structure. Interestingly, the differences between college and PhD students were much smaller than those between high school students and the other two groups. Another important finding of the study was that the less educated the group, the less likely they are to be aware of Wikipedia being open source and open to editing by anyone. Further, high school students seem to have much more difficulty in distinguishing between a high and low quality article, and overall, seem much more likely to simply not question the trustworthiness of the sources given.
 * Doctors widely use Wikipedia as a reference: A literature review of 50 articles about the use of social media by clinicians found that "Wikipedia is widely used as a reference tool" among them, despite concerns about its accuracy. The authors remark that "we found multiple projects that sought to emulate Wikipedia's success in crowd-sourcing useful medical content, while additionally emphasizing editorial credibility by verifying credentials of contributors. These include RadiologyWiki, announced in 2007 and currently dormant, and Medpedia, which launched in 2009 with substantial institutional backing. We did not find articles reporting success metrics for these projects or similar ones."
 * Predicting quality flaws in Wikipedia articles: A notebook paper to presented at the annual PAN workshop at the Conference and Labs of the Evaluation Forum meeting (CLEF '12) introduces FlawFinder, a toolset to predict quality flaws in Wikipedia articles. The paper is one of the winning entries in a Competition on Quality Flaw Prediction in Wikipedia. The paper defines 11 types of quality flaws, spanning low-level issues (such as orphaned or unreferenced articles) and high-level quality flaws (such as notability or original research). It uses a corpus of articles tagged with cleanup templates (154,116 articles from a January 2012 dump of the English Wikipedia) as a training set to predict whether articles in a separate, uncategorized set suffer from the same flaws. The model uses a variety of features of the training set based on revision data, lexical properties, structural properties of the article and the reference section, network properties of the link graph. The results suggest, among other things, that the strongest non-lexical features for the advert flaw are links pointing to external resources, while the number of discussions on article's talk page is the strongest feature to predict original research.
 * Quality of text and quality of editors. A poster presented at the 2012 ACM Conference on Hypertext and Social Media (HT 2012) describes a method to measure the quality of Wikipedia articles by combining text survival metrics and the quality of editors editing these articles, where editor quality is calculated recursively as a function of the quality of their contributions. The method claims to be "resistant to vandalism", however no empirical validation is presented in the poster.
 * WikiSym 2012: WikiSym, the annual conference "dedicated to wiki and open collaboration research and practice" was happening in Linz, Austria as this issue of the research report went to press. Links to online versions of all conference papers have been posted in the program; expect fuller coverage in the September issue.