Wikipedia:Wikipedia Signpost/2020-11-29/Recent research

Wikipedia succeeds where libraries fail, showing "an unmet interest" in the Shoah and Israel, also in Muslim countries
In a paper titled "The Political Geography of Shoah Knowledge and Awareness, Estimated from the Analysis of Global Library Catalogues and Wikipedia User Statistics" Austrian political scientist Arno Tausch finds a disturbing "global North-South and North-East divide in the library presence of Shoah-related titles", contrasting it with "a more optimistic [trend], based on freely available information on the internet" – namely the availability and popularity of Wikipedia articles about the same topic in multiple languages. For example, the study highlights their pageview numbers in the Farsi, Arabic and Indonesian Wikipedias as "truly a hopeful sign".

The bulk of the paper consists of detailed bibliometric examinations: "Based on the data of our research project covering 165 library catalogues (54 nationwide union catalogues, 81 national libraries, 16 legislative-assembly libraries, 14 libraries of international organizations) and the OCLC Worldcat, which by itself includes no less than 70,000 libraries in more than 170 countries, we found that there is indeed a huge global gap in Shoah library holdings. Some 69.3% of the global library presence of the leading peer-reviewed journal in the field, Holocaust and Genocide Studies, in principle available to global publics, is encountered in libraries within the geographical distance of less than 1,000 miles from New York City or Brussels. We particularly analyze the lack of Shoah knowledge and awareness in many Muslim and Catholic countries."

The author contrasts this with webometrics, where "we must regard Wikipedia download statistics [i.e. page view data] as a first and very reliable seismograph of global social network trend ... Its 49.3 million articles in almost 300 languages are therefore also a treasure trove for the research on Judaism, Israel, the memory of the victims of the Shoah, and global anti-Semitism. ....[To] estimate whether or not a given language community on Wikipedia has a high or a low relative tendency to seek information on the Shoah", he compares the pageview numbers for the corresponding article with the annual pageview numbers for the entire Wikipedia in that language, or alternatively with "the culturally most neutral article in this context, the Wikipedia article on the encyclopedia Wikipedia itself."

The paper's detailed bibliometric studies contain many observations on particular countries or regions, e.g. "Among the countries holding less than 100 titles in their combined entire countrywide library system, we find countries where considerable numbers of Jews were sent to the Nazi German death camps".

While praising Wikipedia as an information source with more even coverage (across languages or countries), the author still notes that "compared to the presumed size of the Wikipedia user community [i.e. total pageview numbers], the Portuguese, Spanish, German, Italian, Persian, and French speaking Wikipedia users had a higher tendency to download the main Shoah Wikipedia article. Results for the Wikipedia downloads in Japanese, Turkish, Russian, Chinese, Swedish, Polish, Korean, Ukrainian, Czech, Finnish, English, Indonesian, Arabic, and Dutch (in descending order) were below the trend." The study similarly examines the article about Israel, observing e.g. that "With 844 daily downloads of the Israel article in Persian and 1,254 daily downloads of the Israel article in Arabic, a certain presence of the theme of Israel among Wikipedia audiences in the Middle East has now been achieved."

The article was published in a journal of the Jerusalem Center for Public Affairs, a think-tank which (according to the English Wikipedia article about it) "is considered to be politically neo-conservative". That said, few of the author's previous publications appear to have focused on topics related to the Holocaust or the Arab-Israeli conflict.

Libraries and their biases are the main focus of the paper, with the Wikipedia-related results occupying a smaller part. Still, the former are of interest to Wikimedians and Wikipedia researchers as well - for example as evidence for possible risks in GLAM-WIKI collaborations, where the biases and political constraints of such cultural institutions might negatively affect Wikipedia's efforts to achieve a neutral point of view.

Briefly

 * See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publications
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.''

"NwQM: A neural quality assessment framework for Wikipedia"
From the abstract:  "In this paper we propose Neural wikipedia Quality Monitor (NwQM), a novel deep learning model which accumulates signals from several key information sources such as article text, meta data and images to obtain improved Wikipedia article representation. We present comparison of our approach against a plethora of available solutions and show 8% improvement over state-of-the-art approaches with detailed ablation studies."

"Evidence of a mostly productive and continuous effort to improve the quality of references" on English Wikipedia
From the abstract:  "... we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. [...] We use the dataset to study the temporal evolution of Wikipedia references as well as users' editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors)."

"The network structure of scientific revolutions"
From the abstract:  "Philosophers of science have long postulated how collective scientific knowledge grows. Empirical validation has been challenging due to limitations in collecting and systematizing large historical records. Here, we capitalize on the largest online encyclopedia to formulate knowledge as growing networks of articles and their hyperlinked inter-relations. We demonstrate that concept networks grow not by expanding from their core but rather by creating and filling knowledge gaps, a process which produces discoveries that are more frequently awarded Nobel prizes than others. Moreover, we operationalize paradigms as network modules to reveal a temporal signature in structural stability across scientific subjects."

"Using logical constraints to validate information in collaborative knowledge graphs: a study of COVID-19 on Wikidata"
From the abstract:  "we catalog the rules describing relational and statistical COVID-19 epidemiological data and implement them in SPARQL, a query language for semantic databases. We demonstrate the efficiency of our methods to evaluate structured information, particularly COVID-19 knowledge in Wikidata, and consequently in collaborative ontologies and knowledge graphs, and we show the advantages and drawbacks of our proposed approach by comparing it to other methods for validation of linked web data."

"PNEL: Pointer Network based End-To-End Entity Linking over Knowledge Graphs"
From the abstract:  "Question Answering systems are generally modelled as a pipeline consisting of a sequence of steps. In such a pipeline, Entity Linking (EL) is often the first step. [...] In this work we present a novel approach to end-to-end EL by applying the popular Pointer Network model, which achieves competitive performance. We demonstrate this in our evaluation over three datasets on the Wikidata Knowledge Graph."

"A decade of writing on Wikipedia: A comparative study of three articles"
From the abstract:  "This article reports what observable writing activities characterized three Wikipedia articles, archive, design, and writing, over a three-year period from 2012–2014. It then compares these results to writing in these same three articles 10 years earlier, from 2002–2004. Results show that articles were longer and more referenced in 2012–2014. The most frequent written contributions in 2012–2014 were adding and deleting content, followed by vandalizing and reverting vandalism. Ten years earlier, content addition was likewise the most frequent activity, though vandalism and its removal were not found."