Wikipedia:Wikipedia Signpost/2011-12-26/Recent research

A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, edited jointly with the Wikimedia Research Committee and republished as the Wikimedia Research Newsletter.

Mental health information on Wikipedia more accurate than Britannica and Kaplan & Sadock psychiatry textbook
In an article for Psychological Medicine, ten researchers from the University of Melbourne conclude that "the quality of information on depression and schizophrenia on Wikipedia is generally as good as, or better than, that provided by centrally controlled websites, Encyclopaedia Britannica and a psychiatry textbook."

The study focused on ten mental health topics (e.g. "antidepressants and suicide in young people" or "side-effects of antipsychotics"), five each in the areas of depression and schizophrenia. "Using the topic terms (or synonyms) as key words for the searches or through manual browsing, content relating to these topics was extracted from [Wikipedia and 13 other websites selected for prominent Google results for depression and schizophrenia] and from the most recent edition of Kaplan & Sadock’s Comprehensive Textbook of Psychiatry ... and the online version of Encyclopaedia Britannica" by two reviewers. For both depression and schizophrenia, three psychologists with clinical and research expertise in that area evaluated these extracts on accuracy, up-to-dateness, breadth of coverage, referencing and readability, on a scale from 1 to 5 ("e.g. Accuracy: 1 = many errors of fact or unsubstantiated opinions, 3=some errors of fact or unsubstantiated opinions, 5 = all information factually accurate"). As in an earlier study of the quality of health information on Wikipedia (Signpost coverage: "Wikipedia's cancer coverage is reliable and thorough, but not very readable"), readability was also measured using a Flesch–Kincaid readability test, which is calculated from word and sentence lengths.

For both depression and schizophrenia, Wikipedia scored highest in the accuracy, up-to-dateness, and references categories – surpassing all other resources, including WebMD, NIMH, the Mayo Clinic and Britannica online. In breadth of coverage, it was behind Kaplan & Saddock and others for both areas. And "of the online resources, Wikipedia was rated the least readable [by the human reviewers], although some of its topics received an average rating." Likewise, the Wikipedia content had relatively high Flesch–Kincaid Grade Level indices (around 16 for schizophrenia and 15 for depression – indicating that a tertiary level of education is necessary to understand the content), similar to that of Britannica but higher than most other resources examined.

The authors note that their "findings largely parallel those of other recent studies of the quality of health information on Wikipedia" (citing eight such studies published between 2007 and 2010):
 * "Despite variability in the methodologies and conclusions of these studies, the overall implication is that Wikipedia articles on health topics typically contain relatively few factual errors, although they may lack breadth of coverage. ... Given the number of patients, would-be patients and concerned others using the internet to search for information on health issues, it seems that Wikipedia is an appropriate recommendation as an information source."

Psychologists gauge impact of Wikipedia's Rorschach test coverage
A paper in the Journal of Personality Assessment tried to assess the impact of the Wikipedia article Rorschach test on psychologists' use of that test. As summarized by the authors, "In the summer of 2009, an emergency room physician [ User:Jmh649 – James Heilman, MD] posted images of all 10 Rorschach inkblots on ... Wikipedia. The images were accompanied by descriptions of “common responses” to each blot. ... a fierce debate ensued between some psychologists who claimed that posting the inkblots is a threat to test security and other individuals, including some psychologists and other mental health professionals, who argued that all information should be freely available, including full details of the Rorschach". (In fact, the debates on whether to display versions of the inkblots in the article go back to at least 2005, at first accompanied by rather spurious copyright claims – Rorschach died in 1922.) The authors note that the inkblots had already been revealed to the general public in a 1980s book and cite an earlier study that had found "particularly damaging information" about personality assessment tests on the Internet as early as 2000, "including examples of test stimuli from... the Rorschach" (presumably including this site). Still, "Internet coverage of the Rorschach appeared to grow exponentially during" the 2009 debate about the Wikipedia article, which made it to the front page of the New York Times (Signpost coverage: "Rorschach test dispute reported").

The first part of the study examined the top 50 Google search results for "Rorschach" (excluding "watchmen" to filter out results about a comic book and film) and "inkblot test", coding them into four levels representing the "threat each site presents to test security and the extent to which the content of the site might aid an individual in dissimulating on the Rorschach". 44% of the sites were classified as Level 0 ("no threat"), e.g. home page of bands with "Rorschach" in their name, and 15% as Level 1 ("minimal threat"). The 22% Level 2 ("indirect threat") sites which "tended to discuss test procedures more explicitly" apparently included "several 'official' Rorschach Web sites, where one is able to register for Continuing Education Rorschach workshops, [and which] also allow visitors to purchase materials that contain sensitive test information. For example, certain training Web sites allow individuals to purchase training texts and instructional media without requiring a license or other professional credentials". The authors find it "disturbing" that many sites in this threat category "were authored by psychologists". 19% of the sites were classified as the highest level, "direct threat", e.g. many that contained depictions of one or more Rorschach inkblots, or specific information about how responses are interpreted. Together with results about the high percentage of Internet users consulting Wikipedia for health information (36% in the US in 2007 according to Pew research), the authors conclude that "we can no longer presume that examinees have not been exposed to this information prior to an assessment".

The second part of the study likewise starts out with a Google News search for "Rorschach" and "Wikipedia", noting that "of the 25 news stories reviewed, 13 included one or more of the Rorschach inkblots, with Card I as the most frequently displayed", and eventually arriving at five media stories about the controversy which allowed readers' comments. The altogether 520 comments on these stories were "coded according to the opinion expressed by the writer regarding each of the following categories: (a) the field of psychology, (b) psychologists, and (c) the Rorschach." While the vast majority did not state a clear opinion on the first two categories, the authors note that "Of those comments that did express an opinion toward psychologists [ca. 16%] most were overwhelmingly negative." Many more of the commenters on the Wikipedia/Rorschach news stories expressed an opinion about the test itself: "In total, 182 (35%) of comments were classified as unfavorable toward the Rorschach, whereas only 55 (11%) were coded as favorable toward the Rorschach. The remaining 283 (54%) of comments were categorized as neutral or not mentioned." Among those who identified as mental health professionals, 61% expressed a favorable opinion about the test and 15% a negative one.

Asked for his comment on the paper, Heilman said: "My main criticism of their paper is that they seem to take as axiomatic that exposure to these images hurts test reliability without any real evidence to back it up. Otherwise it is an interesting piece." (The paper includes a section reviewing literature on "the impact of 'coaching' on psychological tests", however it does not mention results pertaining specifically to the Rorschach test, and mostly concerns subjects who deliberately try to "cheat" on such tests, rather than those who have accidentally been exposed to a test's material before.)

Spell-checking the English Wikipedia
University of Nebraska-Lincoln MBA candidate Jon Stacey reports on the results of a proof-of-concept tool to measure the rate of misspelled words in the English Wikipedia over time. A text parser (code available for download) was applied to a random sample of 2,400 articles. Instead of considering the latest revision, a random revision from the history of each article was used. The final corpus was obtained by stripping markup and non-ASCII characters as well as article sections such as the references and table of contents. Words were matched against a dictionary obtained by manually combining 12dicts and SCOWL (source) with Wiktionary.

The results show that the percentage of misspellings has been growing steadily, reaching 6.23% for revisions created in 2011. Several weaknesses with the method are discussed, including the lack of Unicode support, the high rate of false positives, and the possibility that the rising rate might be associated with a rise in the complexity of content. The concluding remarks speculate on how semi-automated spell-checking may support editorial work at a large scale. (Wikipedians have used lists of common misspellings for many years, also integrated in semi-automatic editing tools such as AutoWikiBrowser.)

In related news, the developers of an open-source multilingual proofreading application called LanguageTool released a beta application for proofreading Wikipedia articles. wikiCheck proofreads articles from the English and German Wikipedias based on a set of customizable syntax and grammar rules. A bookmarklet is available to access the application from a browser.

Wikipedians are "smart but fun", and have expertise in topics they edit
Three researchers from Stanford University and Yahoo! Research used a novel method to construct "a data-driven portrait of Wikipedia editors", as described in a preprint currently undergoing review for publication. While earlier studies relied on Wikipedians participating in surveys (and identifying themselves as such), the authors mined data from users of the Yahoo! Toolbar for Wikipedia URLs containing an  parameter, thereby arriving at a sample of 1900 editors of the English Wikipedia.

Their first main finding is that "on broad average, Wikipedia editors seem, on the one hand, more sophisticated than usual Web users, reading more news, doing more Web searches, and looking up more things in dictionaries and other reference works; on the other hand, they are also deeply immersed in pop culture, spending much online time on music- and movie-related websites." However, these "entertainment lovers ... form only a highly specialized subgroup that contributes many edits".

Based on the toolbar data, the paper also tries to answer the question "Do Wikipedia editors know their domain?" and related questions, positively: "across all topical domains Wikipedia editors show significant expertise. ... We also show that more substantial edits tend to come from experts", and that logged-in editors show more expertise than IP address editors. A final result is that "About half of the click chains culminating in an edit start with a Web search, with the other half originating on Wikipedia’s main page."

Wikipedia as a database for structured biological data
A special issue of Nucleic Acids Research features 11 articles describing how wikis and collaborative technology can be used to enhance biological databases. A commentary by Robert Finn, Paul Gardner and Alex Bateman discusses in particular how to leverage Wikipedia, its collaborative infrastructure and large editor community to better integrate articles and biological data entries: the authors argue that the project offers an opportunity for crowdsourcing the curation and annotation of biological data, but faces major challenges for expert engagement, i.e. "how to get scientists en masse to edit articles" and "how to allow editors to receive credit for their work on an article".

Another article in the same issue presents the Gene Wiki, an open-access and openly editable collection of Wikipedia articles about human genes. The article describes how structured data available on Gene Wiki articles is kept in sync with the data from primary databases via an automated system and how to automatically compute the quality of articles in the project at word or sentence-level using WikiTrust.

Individual and social drivers of participation in Wikipedia
A thesis entitled Individual and social motivations to contribute to Commons-based peer production was submitted by University of Minnesota student Yoshikazu Suzuki for an MA in mass communication. The thesis presents and discusses the results from a small series of interviews as well as a survey exploring individual and social motivations of Wikipedia contributors, drawing on social identity theory, volunteerism and uses and gratifications theory. The survey, run in July 2011 with support from the Wikimedia Research Committee, collected 208 responses from a random sample of 950 among the top English Wikipedia editors. The results, obtained by applying principal components analysis to the responses, reveal eight distinct motivational factors: providing information, the seeking of creative stimulation, concern for others’ well-being, the need to be entertained, the avoidance of negative self-affect, cognitive group membership, career benefits, and social desirability. An analysis of the relative strength of each factor indicates that providing information, the seeking of creative stimulation, and concerns for others’ well-being were the three strongest motivational dimensions. Grouping the eight factors into two macro-categories according to self- and other-focused motivations, the other-focused motivations were found to be significantly stronger than the self-focused motivations. The thesis reviews the implications of these results for the design of incentives for participation and editor retention. The full text of the thesis and an executive summary are available under open access.

Mining article revision histories for insights into open collaboration
A paper in this month's edition of First Monday, ambitiously titled "Understanding collaboration in Wikipedia", reports on a statistical analysis of a complete dump of the English Wikipedia (225 million article edits) with regard to several quantities, starting with two that were introduced in a 2004 paper by Andrew Lih "as a simple measure for the reputation of [an] article within the Wikipedia": the total number of edits an article has received ("rigor") and the number of (logged-in and anonymous) users who have edited the article ("diversity"). The First Monday paper cites a 2007 study from the same journal, which found that featured articles tend to have more edits and contributors (while controlling for a few other variables) as a justification for using "rigor" and "diversity" as proxies for article quality, but includes other quantities such as the article size change for an edit. The paper cites earlier work on evaluating Wikipedia article quality (e.g. dismissing the well-known 2005 Nature study based on the mistaken assumption that it had "only focused on featured articles"), but does not discuss existing attempts at more sophisticated quantitative quality heuristics.

The First Monday paper highlighted that if consecutive edits by the same user are counted as one, the overall number of article revisions drops by more than 33%, "revealing that one in three revisions in Wikipedia consist of users responding to their own edits or continuing an ongoing edit begun by themselves". "Article diversity" ranged up to 12,437 contributors per article, with a median of 12 and an average of 32. One of the main conclusions is that "rather than reflecting the contributions and expertise of a large group of people, the typical article in Wikipedia reflects the efforts of a relatively small group of users (median of 12) who make a relatively small number of edits (median of 21)."

Supporting the assumption that most edits do not result in significant changes in content, the study finds that 31% of all revisions cause a size change of fewer than 10 characters, and 51% a change of fewer than 30 characters, with an apparently significant peak at a four-character difference, presumably related to the insertion or removal of the four brackets (" ...  ") that generate a wikilink.

The author notes the slight decrease in the overall number of edits since 2008, but tentatively explains it by the increasingly complete coverage of encyclopedic topics, and doesn't share the widespread concerns about declining or stagnating editor activity: "participation in Wikipedia seems to remain as healthy as ever as revisions made per article created each year has annually increased since 2001 without exception".

A different paper from last year's "Collaborative Innovation Networks Conference" similarly promises far-reaching insights from "Deconstructing Wikipedia" solely based on revision history statistics without analyzing the actual content changes, using a much smaller sample – 30 featured articles from the English Wikipedia, but also including timestamps. The data did not confirm the hypothesis that "the editor who initiated an article would have a high level of involvement in the article’s creation": for only five of the 30 articles, the initial author was the most frequent contributor.

A second conclusion is that for all of the articles in the sample, "there is a single Wikipedian whose contributions far exceed all others", ranging from 8% to 82% of the articles with an average of 39% (but the analysis does not seem to have sought to quantify the extent to which this exceeds the contributions of the second most frequent contributor). The author indicates that this supports Jaron Lanier's "oracle illusion" criticism of a supposed presentation of Wikipedia as a product of "the crowd". Somewhat tautologically, the author observes "that the control of an individual editor seemed to be reduced as more editors joined the process", and points to the need to analyze "a significantly larger number of articles" to answer the question whether "too many cooks spoil the stew" (apparently unaware of the significant body of earlier literature on this subject, starting with a 2005 paper that presented an answer in its title: "Too many cooks don't spoil the broth", and including the 2007 study which the above reviewed First Monday paper relied on).

A third result of the paper, which likewise might not surprise those already familiar with Wikipedia's editing processes, is that "the creation process is continuous and can go on for a very long time", with even articles about historic events from the distant past continuing to receive edits.

The author, an assistant professor in management and marketing at Virginia State University, concludes the paper by urging his readers to start "thinking about how the wiki platform, itself, is influencing the creation process".

Briefly

 * The Wikimedia Research Committee launched a public consultation on the future data/research infrastructure for Wikimedia, in an effort to understand how to best serve the research and developer community with open data from our projects. The consultation will remain open through January 2012 and the full set of responses will be shared under a CC0 license.
 * Semantic enhancements: In "Enhancing Wikipedia with semantic technologies", Lee et al. review existing interfaces for semantic search and present their own platform for enhancements. Based on small-scale user tests, they find that one of their three enhancements – range-based queries – are strongly preferred by users, who would find them desirable not only in Wikipedia but on the wider web. A longer summary is available on AcaWiki.
 * English and Finnish Wikipedias egalitarian, Japanese hierarchical: A paper titled "Analyzing cultural differences in collaborative innovation networks by analyzing editing behavior in different-language Wikipedias" (from the 2010 Collaborative Innovation Networks Conference, as was the revision statistics paper reviewed above) applied social network analysis to collaboration on featured articles on the English, German, Japanese, Korean, and Finnish Wikipedias. It "found notable differences in the communication behavior among egalitarian cultures such as the Finnish, and quite hierarchical ones such as the Japanese. While the English language Wikipedia shows a distinctive pattern, most likely because it is by far the largest and frequently exploring new concepts copied by others, it seems to follow more the Finnish egalitarian, than the Japanese hierarchical style".
 * User:Emijrp shared a link on wiki-research-l listing 2,596 scholarly references on wikis, obtained by scraping Google Scholar results (on December 22, 2011), as part of a project to build a comprehensive bibliography about wikis – a challenging task that has seen various earlier attempts and was the subject of a workshop at this year's WikiSym (see the October and April editions of this research report).
 * Should doctors use and edit Wikipedia?: An editorial in the Journal of the Royal Society of Medicine asks whether doctors should reject the use of Wikipedia. The two-page article (one-day access: US$30.00) cites results about the popularity of Wikipedia among medical students, young physicians and the general public, and for some reason highlights the malicious edits of British journalist Johann Hari as example for the downsides of Wikipedia's free editability. It contains a review of the literature on the reliability of Wikipedia's medical information, which is less thorough than that of the Psychological Medicine article reviewed above, and comes to a less approving but still somewhat positive conclusion: "Although Wikipedia entries are often poorly structured and difficult to understand, they are comparable in accuracy to some online resources, such as health insurance websites." The authors seems to lean towards recommending against ignoring Wikipedia: "One risk of clinicians disengaging from Wikipedia is that only contributors motivated by personal experience (e.g. patient anecdote) or vested interests (e.g. individual clinicians, institutions or companies promoting their own ideas and products) will remain."