Wikipedia:Wikipedia Signpost/2023-01-01/Recent research

"How to disagree well: Investigating the dispute tactics used on Wikipedia"


This paper, presented earlier this month at the Empirical Methods in Natural Language Processing conference, applies a modified version of Graham's hierarchy of disagreement to classify talk page comments on the English Wikipedia. As explained by the authors:

The authors call these "rebuttal tactics", and distinguish them from a second category of dispute tactics, "attempts to promote understanding and consensus (referred to as coordination tactics)." Coordination tactics are classified with a separate set of "non-disagreement labels" which is combined from comment types identified in several previous research publications about Wikipedia talk pages (e.g. a paper by Ferschke et al. that was summarized in our March 2012 issue: "Understanding collaboration-related dialog in Simple English Wikipedia").
 * "Bailing out" ("An indication that an editor is giving up on a conversation and will no longer engage.")
 * "Contextualisation" (where "an editor 'sets the stage; by describing which aspect of the article they are challenging. This does not directly disagree with anyone")
 * "Asking questions"
 * "Providing clarification"
 * "Suggesting a compromise"
 * "Coordinating edits" to the article page ("This can signal that a compromise has been found.")
 * "Conceding / recanting"
 * "I don’t know" (i.e. "Admitting that one is uncertain. This signals that an editor is receptive to the idea that there are unknowns which may impact their argument.")
 * "Other"

The authors provide a dataset "of 213 disputes (comprising 3,865 utterances) on Wikipedia Talk pages, manually annotated with the dispute tactics employed in the process of resolving a disagreement between editors", allowing multiple labels for each comment ("up to three rebuttal strategies and two resolution strategies per utterance", see examples below).

These discussions are drawn from the authors' own "WikiDisputes" dataset, which provides information "which is annotated according to whether the dispute was resolved without the need for a moderator." This allows the researchers to identify relations between specific dispute tactics and the risk of a conversation escalating. For example, they

In particular, they examine the effect of personal attacks, finding e.g. that conversations can still recover after a personal attack happens:

Furthermore,

The study proceeds to use machine learning for automatically classifying talk page comments with these multi-labels. A BERT-based model performed best (according to three different performance metrics), but still struggled with some of the labels:

Lastly, they apply this to the separate task of predicting whether a conversation will escalate, already examined in their earlier paper that gave rise to the "WikiDisputes" dataset. Namely, they use "multitask training with escalation as the main task and tactics as the auxiliary task, such that the features that are predictive of dispute tactics are incorporated in the escalation predictions." This improves upon their earlier prediction algorithm, "indicating that knowledge of these dispute tactics is useful for tasks beyond classifying the tactics employed."

The following table (adapted from Figure 1 in the paper) shows the labeling of several comments by two different users in one talk page discussion:

Briefly

 * See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
 * The Wikimedia Foundation's Research team published its seventh biannual activity report.

Other recent publications
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.''

"Analyzing Digital Discourses: Between Convergence and Controversy"
From the abstract:  "This study analyses Wikipedia’s sites for negotiating convergence, conflict and identity, concentrating on two aspects. First, convergence and conflict at the macro-level of intercultural comparison are investigated using the example of the construction of concepts of nationalism, citizenship, identity and tribe in their English and German language versions. Second, the English articles serve as a basis to examine the types of convergence and conflict tendencies at the micro-level of the Talk-section." From the paper's section on talk pages:  "[...] in our data, criticism of content (81 instances/31% of all 259 conflictual codings) is the most frequent conflictual category [...], followed by general metapragmatic criticism concerning clarity and more general stylistic features [...], metapragmatic criticism related to Wikipedia's principles (each comprising about half of the total of 81 metapragmatic tokens), or a mixture of both [...].

Giving reasons for disagreeing is the mitigating strategy used most frequently in all for Talk1-sections, followed by suggesting, inviting and hedged imperatives to induce further improvement of an article, agreement and additional explanation to clarify an issue [...]."

Discursive Perspective on Wikipedia: More than an Encyclopaedia? (book)
From the publisher's description:  "This book provides a concise yet comprehensive guide to Wikipedia for researchers and students of linguistics, discourse and communication studies [...]. Drawing on Herring's situational and medium factors, as well as related developments in (critical) discourse studies, the author studies the online encyclopaedia both theoretically and empirically, examining its origins, production and consumption before turning to a discussion of its societal significance and function(s). "

"What’s hot and what's not in lay psychology: Wikipedia’s most-viewed articles"
From the abstract:  "We studied views of articles about psychology on 10 language editions of Wikipedia from July 1, 2015, to January 6, 2021. We were most interested in what psychology topics Wikipedia users wanted to read, and how the frequency of views changed during the COVID-19 pandemic and lockdowns. [...]. We made two important observations. The first was that during the pandemic, people in most countries looked for new ways to manage their stress without resorting to external help. [...] We also found that academic topics, typically covered in university classes, experienced a substantial drop in traffic, which could be indicative of issues with remote teaching."

"Building a Public Domain Voice Database for Odia"
From the abstract and paper:  "The pilot detailed in this paper is about creating a large freely-licensed public repository of transcribed speech in the Odia language as such a repository was not known to be available. The strategy and methodology behind this process are based on the OpenSpeaks project [which is hosted on the English Wikiversity at https://en.wikiversity.org/wiki/OpenSpeaks ]. "The 'Methodology' section details the process of collecting words [from a dump of Odia Wikipedia], compiling a wordlist [making use of Wikidata lexeme forms to generate additional forms], recording the pronunciation of those words, and uploading the speech data to Wikimedia Commons using Lingua Libre."