Wikipedia:Wikipedia Signpost/2020-03-29/Recent research

See also the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

"Uncertainty During New Disease Outbreaks in Wikipedia"
From the abstract and the discussion section:  "New disease outbreaks [e.g. Ebola, MERS, Swine influenza] are often characterized by emergent and changing information which, in turn, require Wikipedia editors to spend time and effort to retrieve and understand information that is sometimes ambiguous, complex, and contradictory. [...] the goals of this study are to identify types of uncertainty expressed by Wikipedia editors during new disease outbreaks, and examine different strategies deployed by Wikipedia editors to manage uncertainty. [...]

Wikipedia editors depend on several strategies to cope with uncertainty during a disease outbreak. These strategies rely primarily on consulting authoritative sources, reporting the uncertainty to the public, ignoring the uncertainty in the interests of maintaining simplicity, and, to a far lesser extent, setting up a mailing list to gather information and science as they emerge over time."

"Analyzing Wikipedia Deletion Debates with a Group Decision-Making Forecast Model"
From the abstract:  "we show that machine learning with natural language processing can accurately forecast the outcomes of group decision-making in online discussions. Specifically, we study Articles for Deletion, a Wikipedia forum for determining which content should be included on the site. Applying this model, we replicate several findings from prior work on the factors that predict debate outcomes; we then extend this prior work and present new avenues for study, particularly in the use of policy citation during discussion. Alongside these findings, we introduce a structured corpus and source code for analyzing over 400,000 deletion debates spanning Wikipedia's history."

"Science Is Shaped by Wikipedia: Evidence From a Randomized Control Trial"
From the abstract and discussion section:  "Incorporating ideas into Wikipedia leads to those ideas being used more in the scientific literature. We provide correlational evidence of this across thousands of Wikipedia articles and causal evidence of it through a randomized control trial where we add new scientific content to Wikipedia. In the months after uploading it, an average new Wikipedia article in Chemistry is read tens of thousands of times and causes changes to hundreds of related scientific journal articles. Patterns in these changes suggest that Wikipedia articles are used as review articles, summarizing an area of science and highlighting the research contributions to it. Consistent with this reference article view, we find causal evidence that when scientific articles are added as references to Wikipedia, those articles accrue more academic citations. [...]

For each Wikipedia article that we created for this experiment we paid students $100. Assuming one Wikipedia article (or equivalent contribution) per research paper, the implicit tax on research would be ($100/$220,000 ) = 0.05%. [...] even with many conservative assumptions, dissemination through Wikipedia is ∼ 120× more cost-effective than traditional dissemination techniques." This research caused community discussions that ultimately led to the creation of a "Wikipedia is not a laboratory" policy on the English Wikipedia.

"'This is exactly how the Nazis ran it': (De)legitimising the EU on Wikipedia"
From the abstract: "The data examined consist of Wikipedia contributors' debates that took place on a Wikipedia discussion site ('talk page'). Taking a corpus-assisted approach combined with argumentation analysis and aspects of systemic functional linguistics, I found that Wikipedia editors repeatedly propose that Nazi Germany might have been a precursor of the EU today. However, the Wikipedia community ultimately rejects this notion and emphasises the voluntary nature guiding the EU's creation process. Thus, while the EU's legitimacy is indeed contested in the course of the debates, the Wikipedia community eventually rejects this challenge."

"The Dynamics of Peer-Produced Political Information During the 2016 U.S. Presidential Campaign"
From the abstract: "Drawing on systems justification theory and methods for measuring the enthusiasm gap among voters, this paper quantitatively analyzes the candidates’ biographical and related articles and their editors. Information production and consumption patterns match major events over the course of the campaign, but Trump-related articles show consistently higher levels of engagement than Clinton-related articles."

"Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia"
From the tool documentation and abstract:  Wikipedia2Vec is a tool for learning embeddings of words and entities from Wikipedia. The learned embeddings map similar words and entities close to one another in a continuous vector space.

This tool learns embeddings of words and entities by iterating over entire Wikipedia pages and jointly optimizing the following three submodels:
 * Wikipedia link graph model, which learns entity embeddings by predicting neighboring entities in Wikipedia's link graph [...]
 * Word-based skip-gram model, which learns word embeddings by predicting neighboring words given each word in a text contained on a Wikipedia page.
 * Anchor context model, which aims to place similar words and entities near one another in the vector space[ ...]

The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. [...] We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings."

"Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs"
From the abstract:  " ...we provide an overview over [...] recent advancements [in question answering research], focusing on neural network based question answering systems over knowledge graphs [including "the most popular KGQA datasets": 8 based on Freebase, 2 on DBPedia, one on DBpedia and Wikidata]. We introduce readers to the challenges in the tasks, current paradigms of approaches, discuss notable advancements, and outline the emerging trends in the field."

"Automatic Fact-guided Sentence Modification"
From the abstract:  "Online encyclopediae like Wikipedia contain large amounts of text that need frequent corrections and updates. The new information may contradict existing content [....] we focus on rewriting such dynamically changing articles. [...] To this end, we propose a two-step solution: (1) We identify and remove the contradicting components in a target text for a given claim, using a neutralizing stance model; (2) We expand the remaining text to be consistent with the given claim, using a novel two-encoder sequence-to-sequence model with copy attention. Applied to a Wikipedia fact update dataset, our method successfully generates updated sentences for new claims... " See also university press release: "Automated system can rewrite outdated sentences in Wikipedia articles" ("Text-generating tool pinpoints and replaces specific information in sentences while retaining humanlike grammar and style") and media coverage.

"Transforming Wikipedia into Augmented Data for Query-Focused Summarization"
This preprint presents a query-focused summarization dataset using Wikipedia's citations to align queries and documents.

"Knowledge Graphs and Knowledge Networks: The Story in Brief"
This summary of the journey of knowledge graphs for Artificial Intelligence also covers Wikidata: "Wikidata (wikidata.org/) is wikipedia’s open-source machine-readable database with millions of entities where everyone can contribute and use (with reading and editing permissions) with a user-friendly query interface.

It covers a wide variety of domains and contains not only textual knowledge but also images, geocoordinates, and numerics. Wikidata uses unique identifiers for each entity/ relation for accurate querying and provides provenance metadata, unlike DBpedia and schema.org. For instance, it includes information about a fact’s correctness in terms of its origin and temporal validity (reference point of time during of the fact). Wikidata is one of the latest projects acknowledging the dynamic nature of KG and is continuously updated by human contributors unlike DBpedia which is curated from wikipedia once in a while."

"Strangers in a seemingly open-to-all website: the gender bias in Wikipedia"
From the abstract: "Based on action research with a mixed evaluation method and two rounds of interviews, the research followed the steps of 27 Israeli women activists who participated in editing workshops.

Findings: [...] having the will to edit and the knowledge of how to edit are necessary but insufficient conditions for women to participate in Wikipedia. The finding reveals two categories: pre-editing barriers of negative reputation, lack of recognition, anonymity and fear of being erased; and post-editing barriers of experiences of rejection, alienation, lack of time and profit and ownership of knowledge. The research suggests a “Vicious Circle” model, displaying how the five layers of negative reputation, anonymity, fear, alienation and rejection enhance each other, in a manner that deters women from contributing to the website."