Wikipedia:Wikipedia Signpost/2016-04-01/Recent research



"Employing Wikipedia for good not evil: innovative approaches to collaborative writing assessment"

 * Reviewed by Piotr Konieczny

This paper is a good example of how to write articles for the "teaching with Wikipedia" field. The authors report their positive experiences with several under- and postgraduate classes at the University of Sydney, developing articles such as pregnancy vegetarianism, Cleo (magazine) or Slave Labour (mural). They describe in relative detail a number of assignments and assessment criteria, and discuss benefits that their Wikipedia assignments have for the community (improving valuable and underrepresented content) and for the students themselves (improving their writing, research and collaborative skills). The paper could benefit from a more comprehensive literature review, however: while it describes a useful set of educational activities, and rather well at that, these are not groundbreaking—practically all activities discussed in this paper have been discussed in peer reviewed literature by others. Unfortunately, the authors fail to cite many of related works (I count only about five citations to the other peer-reviewed works from the much larger field of teaching with Wikipedia). Furthermore, the authors seem unaware of the Education Program. It does not appear that any of their courses so far have been registered on Wikipedia; sadly they have no on-wiki homepage allowing identification of all edited articles or participating students; it is also unclear if the instructors themselves have a Wikipedia account. This suggests a failing both on the part of the researchers (they spent years reading about, researching and engaging with the teaching with Wikipedia approach without realizing there is a major support infrastructure in place to assist them), as well as on the part of the Wikipedia community and the Education Program itself, which is clearly still not being visible enough, nor active enough to identify and reach out to such educators who have been engaged in several years of ongoing teaching on Wikipedia. Hopefully in the future we can integrate those and other educators into our framework better.

Using eyetracking to find out how Wikipedia articles are being read

 * Reviewed by Tilman Bayer

Researchers from the University of Regensburg in Germany have used eyetracking methods to find out which article elements readers focus on while searching for information on Wikipedia, depending on the nature of the search task (factual information lookup, learning, or casual reading—a classification taken from a 2006 article about exploratory search in general).

In two 2012 articles the researchers summarized the methodology and results of one of their lab experiments with 28 participants, which besides eyetracking also incorporated data from survey questionnaires, browser logs and electromyography for two facial muscles that indicate emotional reactions (the corrugator and the zygomaticus major). Among the results of this first study (see also a related paper in English with illustrations explaining the various article elements ): A subsequent German-language PhD thesis (see also 2012 conference poster) contains much more detail, e.g. reporting that in "lookup" tasks, readers spend >45% of their time on scanning the table of content and lists in the article, in "learn" tasks these only amount to <10% of the time.
 * During lookup tasks, tables and graphical representations were preferred (but illustrative/decorative images were almost never looked at. As the authors point out, their test question, about the number of passengers on the Titanic, focused on textual information). On the other hand, "in 'learn' tasks users concentrate more on the introduction and lists. In the 'casual leisure' area, many different content elements are used." [this and other quotes have been translated from German]
 * Users tend to skim the article during lookup tasks, but read more text parts in the other tasks.
 * According to a post-task survey, user satisfaction in both the lookup and learn tasks was independent of the number of images.

A second PhD thesis, covered in a brief paper last year, examined for example which elements readers look at first within an article (from an experiment involving 163 German Wikipedia articles and 90 participants who were asked to prepare themselves for an course on the history of Bavaria in the 20th century, i.e. a "learning" overview task): The table of contents was the most frequent entry point (36%) followed by the lead section (31%) and the text body itself. The author observes further that "the article heading and images serve less often as entry point. The text heading [presumably the first section heading after the lead] and image captions very rarely occurs as points of first contact". Another publication by the same author focused on "users' interaction with pictorial and textual contents ...[ The spread] of information within the articles and the relation between text and images are analyzed. ... By now 30 articles have been analyzed according to this scheme. [Within these, there] are 639 contact points leading to images. Results show that 39% of all contact points lead from image to image, in mutual directions (previous or next). All text contact points [e.g. citations] sum up to a total of 37%. In 5% of all cases, an introduction triggers a saccade to an image. The remaining types of contact points occur rather rarely."

A later overview article summarizes other aspects in less detail, e.g:
 * More experienced readers used the table of contents less often.
 * Overall, search strategies did not differ a lot between the "learning" and casual reading ("non-work-based") tasks. But there were statistically significant differences to the information seeking behavior in fact lookup tasks. The largest differences concerned the consumption of text, images and TOC (cf. above). Readers also spent a larger ratio of time navigating compared to analyzing content.

(For an overview over other new data sources shedding light on how readers navigate within articles, see also this reviewer's recent tech talk at the Wikimedia Foundation, and a research overview page on Meta about the question "Which parts of an article do readers read?)

Other recent publications
A list of other recent publications that could not be covered in time for this issue – contributions are always welcome for reviewing or summarizing newly published research.
 * "Political Advertising on the Wikipedia Market Place of Information" From the abstract: "Wikipedia’s popularity and reputation give politicians incentives to use it for enhancing their online appearance effectively and tailored towards their constituency. [...] we assemble data covering editing activity for articles on all 1,100 members of the German parliament (MPs) for the three last legislatures. We find editing to be a persistent phenomenon that is practiced by a substantial amount of MPs and is growing throughout election years."
 * "Identifying missing dictionary entries with frequency-conserving context models" From the abstract: "Upon training our model with the Wiktionary—an extensive, online, collaborative, and open-source dictionary that contains over 100,000 phrasal-definitions—we develop highly effective filters for the identification of meaningful, missing phrase-entries. With our predictions we then engage the editorial community of the Wiktionary and propose short lists of potential missing entries for definition, developing a breakthrough, lexical extraction technique, and expanding our knowledge of the defined English lexicon of phrases."
 * "Population automation: An interview with Wikipedia bot pioneer Ram-Man" From the abstract: ".... an in-depth interview with Wikipedia user Ram-Man, [...] creator or the rambot, the first mass-editing bot. Topics discussed include the social and technical climate of early Wikipedia, the creation of bot policies and bureaucracy, and the legacy of rambot and Ram-Man’s work."
 * "Mining Wikipedia to Rank Rock Guitarists" From the abstract: "The influence of a guitarist was estimated by the number of guitarists citing him/her as an influence and the influence of the latter. [...] The results are most interesting and provide a quantitative foundation to the idea that most of the contemporary rock guitarists are influenced by early blues guitarists. Although no direct comparison exist, the list was still validated against a number of other best-of lists available online and found to be mostly compatible."
 * Predicting tennis players' Wikipedia popularity from tournament performance: From the abstract of a paper titled "Untangling Performance from Success": "We show that a predictive model, relying only on a tennis player's performance in tournaments, can accurately predict an athlete's popularity [as measured by Wikipedia pageviews], both during a player's active years and after retirement."
 * "Request for Adminship (RFA) within Wikipedia: How Do User Contributions Instill Community Trust?" From the abstract: "... we examine the impact of different forms of contribution made by adminship candidates on the community's overall decision as to whether to promote the candidate to administrator. To do so, we collected data on 754 RFA cases and used logistic regression to test four hypotheses. Our results supported the role of total contribution, and clarification of contribution in RFA success while the impacts of social contribution was partially supported and the role of content contribution was not supported. Also, both control variables (tenure and number of attempts) showed significant relationships with RFA success."
 * "Wikidata: A platform for data integration and dissemination for the life sciences and beyond" From the abstract: "Our group is [...] populating Wikidata with the seeds of a foundational semantic network linking genes, drugs and diseases. Using this content, we are enhancing Wikipedia articles to both increase their quality and recruit human editors to expand and improve the underlying data. We encourage the community to join us as we collaboratively create what can become the most used and most central semantic data resource for the life sciences and beyond."
 * "A matter of words: NLP for quality evaluation of Wikipedia medical articles" From the abstract: "We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified."