Wikipedia:Wikipedia Signpost/2022-02-27/Recent research

How earthquakes and terrorist attacks may elicit anger and sadness among Wikipedia editors and readers
In a study titled "Emotions in Wikipedia: the role of intended negative events in the expression of sadness and anger in online peer production", four Germany-based researchers (three among them from the field of psychology) argue that while "Wikipedia explicitly strives to provide objective and neutral information in unbiased language [, ...] Wikipedia articles might still contain subtle expressions of emotions from the experiences of their authors."

Specifically, the authors  "... analysed N = 330 [English] Wikipedia articles with automatic linguistic text analyses and found that Wikipedia articles on man-made attacks (e.g. terrorist attacks, shooting rampages) contained more anger-related content than Wikipedia articles on man-made disasters (e.g. ship accidents, train accidents) and natural disasters (e.g. hurricanes, flooding) [...]. Wikipedia articles on man-made attacks also contained fewer sadness-related words than articles on natural and man-made disasters [...]. Depending on the kind of negative event, individuals seem to express certain negative emotions in the respective Wikipedia article to a greater extent than others. It seems that these collective emotional expressions are driven by the psychological mechanism of intentional harm that may explain the current findings"

The automated linguistic analysis method is Linguistic Inquiry and Word Count (LIWC), a tool developed in the 1990s by Pennebaker and others. LIWC has been widely used and vetted, but nowadays exists alongside more sophisticated sentiment analysis methods – a fact that the paper's limitations section coyly alludes to. The researchers used LIWC to detect the percentage of an article's words that match "three specific negative emotion categories [...]: sadness (e.g. 'loss', 'sorrow', or 'grief'), anger (e.g. 'offensive', 'brutal', or 'violent'), and anxiety (e.g. 'panic', 'afraid', or 'scared')." They illustrate them with the following examples:
 * Sadness-related words: "The Haitian art world suffered great losses " (from 2010 Haiti earthquake)
 * Anger-related words: "He had a history of violence, including an arrest in July 2009 for assaulting his girlfriend" (from Boston Marathon bombing)
 * Anxiety-related words: "... which sparked fears in the scientific community of massive numbers of fish dying" (from Hurricane Katrina)

To avoid confounds, the text of references and external links was excluded from this analysis. Also, the authors "did not include the word 'attack' within the anger category or the word 'terror' within the anxiety category in all analyses to avoid the possibility that the topic of the articles could be a confounding factor in our analyses". However, they appear to have made no attempt otherwise to distinguish emotions that are expressed directly in the text (in what editors call "Wikipedia voice") from emotions that are merely reported and attributed to others (such as the scientific community's "fears" in the Hurricane Katrina example, or quoted reactions from politicians etc.).

The authors suggest that online peer production systems such as Wikipedia  "...could design user interfaces in such a way that Internet users are hinted by alerts, for instance, to the fact that they are about to write about a negative event that could potentially produce negative emotions. Although Wikipedia's control system for counteracting potential violations of objectivity and a neutral point of view is already very elaborate and sophisticated, it could potentially benefit from taking emotional aspects into account. It would be possible, for example, to highlight certain emotional passages by the computer system while people are writing a text, so that Wikipedia users are aware of emotional expressions. Other Wikipedia authors, administrators, and bots could flag content that needs correction also with respect to emotional wording."

The paper extends and replicates results from a 2017 publication by the same authors (which had also examined article talk pages, finding that "Surprisingly, Wikipedia articles on those two [types of] events contained more emotional content than related Wikipedia talk pages").

Having "demonstrated that Wikipedia articles on terrorist attacks contained more anger-related content than Wikipedia articles on earthquakes", two of the authors replicate and extend this result by directly measuring the emotional reactions of Wikipedia readers in a more recent study. Specifically,  "... raters rated their emotional reactions during and after reading the content of Wikipedia articles. We conducted two studies, each with a different focus. In Study 1, four raters rated 60 existing Wikipedia articles on earthquakes and terrorist attacks regarding their emotional reactions while reading the articles. As a conceptual extension, in Study 2, 35 participants [all native speakers of German, and 29 of them female] serving as independent raters indicated their emotional reactions after reading four existing Wikipedia articles on earthquakes and terrorist attacks. Moreover, Study 2 used an Asian and a European earthquake as well as an Asian and a European terrorist attack in order to take the geographical proximity of the negative event into account."

The researchers conclude  "... that Wikipedia articles on terrorist attacks elicited more threat, anger, sadness, and anxiety than Wikipedia articles on earthquakes. These effects occurred for negative events in Europe but were absent for events in Asia, with one exception. The anger effect was the same across Europe and Asia. [...] The findings of Study 2 showed that the Wikipedia article on the nearby (i.e., European) terrorist attack elicited more threat appraisal than the Wikipedia article on the nearby earthquake, which was not the case when the negative events happened far away (i.e., in Asia). For the elicitation of anger, however, the geographical proximity of the negative event did not matter."

Briefly

 * See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
 * The deadline for paper submissions to the "Wiki-M3L" workshop has been extended to March 9. The event is intended to be "a space for the Wikipedia community and the multimodal & multilingual research community to share and support each other."

Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"Multilingual Entity Linking System for Wikipedia with a Machine-in-the-Loop Approach"
This paper describe the development of a machine learning model used in the "Add a link" task suggestion feature for new Wikipedia editors (deployed by the Wikimedia Foundation's "Growth" team on several language Wikipedias last year). From the abstract:  "...despite Wikipedia editors' efforts to add and maintain its content, the distribution of links remains sparse in many language editions. This paper introduces a machine-in-the-loop entity linking system that can comply with community guidelines for adding a link and aims at increasing link coverage in new pages and wiki-projects with low-resources. To tackle these challenges, we build a context and language agnostic entity linking model that combines data collected from millions of anchors found across wiki-projects, as well as billions of users' reading sessions. We develop an interactive recommendation interface that proposes candidate links to editors who can confirm, reject, or adapt the recommendation with the overall aim of providing a more accessible editing experience for newcomers through structured tasks. Our system's design choices were made in collaboration with members of several language communities. [...] Our experimental results show that our link recommender can achieve a precision above 80% while ensuring a recall of at least 50% across 6 languages covering different sizes, continents, and families." (See also: research project page on Meta-wiki)

"Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks"
From the paper and abstract:  "We introduce the multilingual Wikipedia hyperlink prediction objective to contextualise words in a text with entities and concepts from an external knowledge source by using Wikipedia articles in up to 100 languages. Hyperlink prediction is a knowledge-rich task designed to (1) inject semantic knowledge from Wikipedia entities and concepts into the MMLM [Multilingual Masked Language Model ] token representations, and (2) [...] to inject explicit language-independent knowledge into a model trained via self-supervised learning [...]. We devise a training procedure where we mask out hyperlinks in Wikipedia articles and train the MMLM to predict the hyperlink identifier similarly to standard MLM but using a 'hyperlink vocabulary of 250k concepts shared across languages [...]" "In our experiments, we use Wikipedia articles in up to 100 languages and already observe consistent gains compared to strong baselines when predicting entities using only the English Wikipedia."

"Automatic Wikipedia Link Generation Based On Interlanguage Links"
From the abstract:  This paper presents a new way to increase interconnectivity in small Wikipedias (fewer than a 100,000 articles), by automatically linking articles based on interlanguage links. Many small Wikipedias have many articles with very few links, this is mainly due to the short article length. [...] Due to the fact that Wikipedias are translated in to many languages, it allows us to generate new links for small Wikipedias using the links from a large Wikipedia (more than a 100,000 articles).

"Improving Website Hyperlink Structure Using Server Logs"
From the abstract and paper:  "Here we develop an approach for automatically finding useful hyperlinks to add to a website. We show that passively collected server logs, beyond telling us which existing links are useful, also contain implicit signals indicating which nonexistent links would be useful if they were to be introduced. We leverage these signals to model the future usefulness of yet nonexistent links. Based on our model, we define the problem of link placement under budget constraints and propose an efficient algorithm for solving it. We demonstrate the effectiveness of our approach by evaluating it on Wikipedia [...]" "in the English Wikipedia, of all the 800,000 links added to the site in February 2015, the majority (66%) were not clicked even a single time in March 2015, and among the rest, most links were clicked only very rarely [...] In a nutshell, simply adding more links does not increase the overall number of clicks taken from a page. Instead, links compete with each other for user attention." (See also: Research project page on Meta-wiki)