Wikipedia:Wikipedia Signpost/2015-05-27/Recent research

German study finds Wikipedia's pharma articles accurate and largely complete

 * Review by William Skaggs

Recently when my 83-year-old father was undergoing medical treatment, the doctor wanted to change one of his blood pressure drugs, and in order to let us know what the effects would be, she printed out the Wikipedia article on the drug and handed it to us. This accords with the overall impression I have developed: Wikipedia's articles on drugs are pretty good – good enough to impress even doctors. A new research study adds some substance to that impression.

A team of German pharmacologists picked a set of 100 drugs described in pharmacology textbooks, and compared the textbook descriptions with Wikipedia articles about the drugs, for accuracy (meaning that the Wikipedia article matched the information in the textbook) and comprehensiveness. They found that 99.7% of the facts in the Wikipedia articles were accurate, and 83.8% of the facts from the textbooks made it into the Wikipedia articles. These numbers were derived from the German Wikipedia, but the authors state that similar results were obtained for the English language version. They conclude that "our results suggest that Wikipedia is an accurate and informative source of drug information for undergraduate medical students." They also revisited the drug articles examined in 2010 by an earlier study which came to less positive conclusions (see coverage in this newsletter: "Quality of drug information in Wikipedia"), and "found the quality of pharmacological information significantly improved". Upon reviewing several other empirical studies which evaluated the quality of medical information on Wikipedia, the authors observe that "despite different methodologies, the main conclusion of these studies was that Wikipedia articles on health topics contain few errors and are well referenced, while the information provided often lacks depth."

Obviously this is something we should be proud of, but let me note a caveat. Articles about specific drugs are a prime example of the sort of thing Wikipedia is best at: articles about topics that can be handled in a systematic way, without requiring mastery of a large body of literature. As a rule, the more comprehensive a topic, the lower the quality of the Wikpedia article. Thus our article on the drug chlordiazepoxide (commonly known as Librium) is better than our benzodiazepine article, which covers the class of drugs to which Librium belongs. The latter article contains a lot of good information but is poorly organized. Our article pharmaceutical drug shows this flaw to an even greater degree. The general take-home message, supported by the German study, is that our medical articles can be very useful to people who are looking for specific facts, but tend to be less useful to people who are trying to understand broad principles.

Notable women "slightly overrepresented" (not underrepresented) on Wikipedia, but the Smurfette principle still holds

 * Review by Maximilianklein

"It's a man's Wikipedia? Assessing gender inequality in an online encyclopedia", presented at the Ninth International AAAI Conference on Web and Social Media (ICWSM) this week, is an investigation into the gender of biography articles of six different Wikipedias. Four different biases that are investigated are coverage bias (who makes it into the encyclopedia), structural bias (which articles link to which), lexical bias (the type of words used in the articles), and visibility bias (who is featured on the Main Page).

Coverage bias is analysed by seeing who from the reference databases of notable humans of Freebase, MIT's Pantheon, and Human Accomplishment are in Wikipedia. A surprising result here is that women are not proportionally underrepresented as hypothesised, but even "slightly overrepresented". (The researchers acknowledge that the first two of these three are at least partly based on Wikipedia themselves, but try to address this issue by "seeking patterns that exist across all three datasets".)

The structural bias is a graph theoretical measure of how men and women's articles link to each other. Here it is shown that across all six languages, articles about women tend to link more to articles about men than vice versa. The Smurfette Principle, that women are less central in the link graph, is also tested. The in-degree of the two gendered article categories is compared, and it is found that men are indeed significantly more central in all language editions, except in the Spanish Wikipedia, where men and women are equally central.

The lexical bias notion stems from the idea of the Finkbeiner test, that a female scientist will often be noted as a woman as much as a scientist. It is indeed found that articles about women place linguistic emphasis on relationship, gender, and family. Whereas top terms in men's articles focus on their professions. The authors mention that this ties into the concept of male as the null gender. For instance the word "divorced" is 4.4 times more frequent in a woman's article than a man's on English Wikipedia. For German and Russian, that multiplier increases to 4.7 and 4.8 times, respectively.

Lastly visibility bias, the propensity of gendered articles to appear on the English Wikipedia Main Page is tested. Yet no significant difference is found in the propensity of the two genders to appear on the Main Page.

Unfortunately this paper suffers from its Euro-focus. The six languages in question are English, German, French, Italian, Spanish and Russian, but the width of the methods used still show wide-scale issues. The authors conclude that Wikipedia does show some signs of addressing systemic bias, like equal visibility on the main page, and coverage bias equality; but still there are stark differences in their portrayal. Whether this is due to biases in the real world, or the way that Wikipedians write about the real world, they say, is still an unknown mixed bag.

Editors who use user talk pages are more involved in high-quality articles

 * Review by Piotr Konieczny

An article in the Journal of the Association for Information Science and Technology (JASIST) examines Wikipedia editors' public communication using social network analysis theory. This research suggests that Wikipedia editors who engage in communication with others using user talk pages "are more experienced in editing high quality articles and are more integrated in the community". The author distinguishes quantitative and qualitative contributions, noting that the use of communication tools is more directly related to contributing not just to many articles, but to high quality articles, as well as larger number of namespaces. The use of such tools is centered on "coordinating and mentoring editors who edit lower quality articles", or in other words, the author observes that editors who edit high quality articles and use communication tools a lot seem to be more likely to reach out to less experienced editors than the other way around. The author concludes that online collaboration systems are improved through features that allow creation of what the author calls "personal" communication network. Through the study excluded bots, it does not seem to have investigated the details of communication (ex. templates, warnings, awards, others), and so its conclusions on the nature of communications (rather than who engages in it) are more tentative.

"Wikipedia, collective memory, and the Vietnam war"

 * Review by Piotr Konieczny

This paper, likewise published in the JASIST, looks at the Talk:Vietnam War page (and its archives) and analyses it in the context of theories dealing with the concept of collective memory (cultural memory, memory space, and the "floating gap" concept introduced by Pentzold (2009) in his paper on Wikipedia. As such, this paper is one of several works that argues that Wikipedia is a place where modern world's memories are being recorded and, to some extent, shaped for posterity. The paper finds that the Wikipedia's article is affected by two major debates ("(a) whether the US actually lost the war and (b) whether the voice of the American Vietnam veteran should be privileged.") It reviews major, recurring arguments presented by the talk page participants, and concludes that Wikipedia allows us to study how collective memory is shaped. The author also argues that it is the very fact that such debates can be observed on Wikipedia that may distance some educators, primarily librarians, who are used to works that conceal their knowledge production processes. The author ends with a call for librarians to edit Wikipedia, and help their patrons do the same, in order to participate in the 21st century curation of collective memories.

In a separate paper, published earlier in the Journal of Documentation, the author examined the debate about reliable sources on the same talk page and concluded (according to the abstract) that while much of it "is conducted without acrimony, the level of analysis one finds in the talk pages is rather shallow while the attention of individual contributors is not overly concentrated."

Survey of secondary school use of Wikipedia

 * Review by Gamaliel

Three researchers have conducted a survey of the use and perceptions of Wikipedia among secondary school teachers and librarians in the United States. Twenty-two teachers and librarians responded to the survey. The vast majority (91%) reported that "Wikipedia had some effect on student research". Responses were mixed about how positive or negative that effect was, however. Positive comments included responses that Wikipedia is "easily understood...thorough, up-to-date, and easily edited" and "students use it to get the basic ideas for their research, then go to other websites to verify it." Negative comments largely centered on the fact that many students did not go beyond Wikipedia in their research, such as the responses that "students rely on it too heavily and do not expand their research to prove or disprove their findings" and "Students don’t want to check sources when they can just get their work done in one stop." Most (91%) reported that their schools had no policy regarding the use of Wikipedia, but responses were roughly split regarding the need for one. Teachers and those responding that Wikipedia had a negative effect were more likely to respond there was a need for such a policy, as opposed to librarians and those responding it had a positive effect. Based on the results, the authors concluded that any policy should not restrict Wikipedia use. They write "instead of banning and fighting against the usage, students need to be taught the skills to utilize it an effective way, such as how to use Wikipedia as a jumping off point to other potentially more trustworthy resources and how to evaluate the reliability of articles." Given the very small sample size of the survey, this article is more useful for its excellent literature review.

Briefly

 * "User engagement on Wikipedia, a review of studies of readers and editors": Another ICWSM conference paper frames itself as a literature review of topics that are of key interest to Wikipedia community: editor motivations, engagement, and retention. Unfortunately, it lacks a proper methodology (how did the author select papers to review?), which makes it difficult to discuss how its comprehensiveness. It nonetheless provides a good summary of many other key work in this field, and creates an interesting framework for recognizing some patterns in this subfield of Wikipedia studies. Unsurprisingly, the authors conclude that the Wikipedia community needs to improve its communication with newbies in order to increase their retention (fewer templates, stark warnings; more friendly personal outreach). (Review by Piotr Konieczny) Image demonstrating freedom of panorama.jpg
 * Freedom of panorama in Europe: This paper presents an advocacy towards adopting freedom of panorama laws in the context of the European Union law harmonization. It is enriched with case studies from Wikipedia community's history, and has been supported by the Wikimedia Foundation (though the paper does not make it clear how, nor is it released under a free license itself). While suffering from a few minor issues (such as not clearly recognizing that Wikimedia Commons does not accept non-commercial images, and a law that would grant freedom of panorama to non-commercial uses would be of little value to Wikipedia), and heavily geared towards European legislation framework, it is a valuable addition to the discussion of the freedom of panorama concept. (Review by Piotr Konieczny)
 * Talking like an admin: linguistic mimicry and network centrality on Wikipedia. A new conference paper in the field of sociolinguistics examines whether Wikipedia editors are more likely to linguistically coordinate with (use the same words as) their interlocutors when those others are more centrally located within the social network of Wikipedia, or when speaking to admins. The study draws on an annotated corpus of talkpage discussions in which the admin status of each participant is known, and uses several measures of network centrality (Betweenness and Eigenvector) to calculate the distance between all editors in terms of the number of times they have directly replied to others in a talkpage thread. The authors determine that while editors align their vocabularies more when speaking to admins than non-admins, highly central editors (those who have engaged in a lot of discussions with a lot of different editors) tend to be aligned with whether or not they are admins. Their results suggest that admin status follows high centrality, not the other way around.

Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.


 * From March's CSCW conference (see also m:Research:CSCW 2015):
 * "Functional roles and career paths in Wikipedia"
 * "'Is' to 'was': coordination and commemoration in posthumous activity on Wikipedia biographies"
 * "The virtuous circle of Wikipedia: recursive measures of collaboration structures"
 * "Effects of a Wikipedia orientation game on new user edits" (about The Wikipedia Adventure)
 * "Wikipedia and the politics of openness" (book, see also 2011 Signpost interview with the author)
 * "Wikipédia, objet scientifique non identifié" ("Wikipedia, unidentified scientific object", book in French)
 * '''"Improving disease surveillance: sentinel surveillance network design and novel uses of Wikipedia"
 * "Disaster monitoring with Wikipedia and online social networking sites: structured data and linked data fragments to the rescue?"
 * "Barriers to the localness of volunteered geographic information"
 * "Amateur encyclopedia editors as nonprofessional journalists: Wikipedia as a gateway for breaking news" (German, with extended abstract in English)
 * "How to extract seasonal features of sightseeing spots from Twitter and Wikipedia"
 * "Analysing the use and perception of Wikipedia in the professional context of translation"
 * "Cross-language Wikipedia editing of Okinawa, Japan"
 * "Property type distribution in Wordnet, corpora and Wikipedia"
 * "Quality assessment of Wikipedia articles using h-index" From the abstract: "In this paper, we propose a method for assessing quality values of Wikipedia articles from edit history using h-index. One of the major methods for assessing Wikipedia article quality is a peer-review based method. In this method, we assume that if an editor's texts are left by the other editors, the texts are approved by the editors, then the editor is decided as a good editor [ see m:Research:Content persistence ]. However, if an editor edits multiple articles, and the editor is approved at a small number of articles, the quality value of the editor deeply depends on the quality of the texts. In this paper, we apply h-index [... to improve this method. ...] the accuracy of article quality assessment in our method outperforms the existing peer-review based method."
 * "Social Interactions vs Revisions, What is important for Promotion in Wikipedia?" From the abstract: "[We look] at the process of election for administrator in the English Wikipedia community. We modeled the candidates according to their revisions and/or social attributes. [...] Our model combining knowledge contribution variables and social networking variables successfully explain 78% of the results which is better than the former models. It also helps to refine the criterion for election. If the number of knowledge contributions is the most important element, social interactions come close second to explain the election. But being connected with the future peers (the admins) can make the difference between success and failure, making this epistemic community a very social community too."