Wikipedia:Wikipedia Signpost/2021-04-25/Recent research

How aquatic scientists improve Wikipedia
An article in Limnology and Oceanography Letters reports on the activities of WikiProject Limnology and Oceanography (WP L&O), founded in 2018 by "a group of early career aquatic scientists" concerned about "inadequate representation of aquatic information on Wikipedia":  "Unfortunately, limnology‐ and oceanography‐related Wikipedia articles are in poor condition. For example, as of 05 May 2020, the English Wikipedia article for Hypolimnion was only six sentences long and contained no references yet had been viewed over 11,000 times per year on average since 2015. [...] Through recruiting new editors and hosting five “edit‐a‐thons” (focused editing time), WP L&O has added over 50,000 words to limnology‐ and oceanography‐related Wikipedia articles; however, more than 60% of the articles assessed for the project remain in poor condition and lack references, have poor structure, or are missing crucial information." The open access paper contains the following helpful overview of "ways to promote more equitable dissemination of aquatic scientific information through Wikipedia":

See also an earlier paper involving some of the same authors: "Ripples on the web: Spreading lake information via Wikipedia"

Other recent publications
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.''

"The Quality and Readability of English Wikipedia Anatomy Articles"
From the abstract:  "Forty anatomy articles were sampled from English Wikipedia and assessed quantitatively and qualitatively. Quantitatively, each article's edit history was analyzed by Wikipedia X-tools, references and media were counted manually, and two readability indices were used to evaluate article readability. This analysis revealed that each article was updated 8.3 ± 6.8 times per month, and referenced with 33.5 ± 24.3 sources, such as journal articles and textbooks. Each article contained on average 14.0 ± 7.6 media items. [...] the articles had low readability and were more appropriate for college students and above. Qualitatively, the sampled articles were evaluated by experts using a modified DISCERN survey. According to the modified DISCERN, 13 articles (32.5%), 24 articles (60%), 3 articles (7.5%), were rated as "good," "moderate," and "poor," respectively. There were positive correlations between the DISCERN score and the number of edits (r = 0.537), number of editors (r = 0.560), and article length (r = 0.536). Strengths reported by the panel included completeness and coverage in 11 articles (27.5%), anatomical details in 10 articles (25%), and clinical details in 5 articles (12.5%). The panel also noted areas which could be improved, such as providing missing information in 28 articles (70%), inaccuracies in 10 articles (25%), and lack or poor use of images in 17 articles (42.5%)."

"Wikipedia Edit-a-thons as Sites of Public Pedagogy"
From the abstract:  "Through the analysis of interviews with 13 edit-a-thon facilitators [...] we find motivations for running edit-a-thons extend far beyond adding content and editors. In this paper, we uncover how a range of personal and institutional values inspire these event facilitators toward fulfilling broader goals including fostering information literacy and establishing community relationships outside of Wikipedia. Along with reporting motivations, values, and goals, we also describe strategies facilitators adopt in their practice. Next, we discuss challenges faced by facilitators as they organize edit-a-thons. [...] Finally, we suggest new ways in which edit-a-thons, as well as similar peer production events and communities, can be understood, studied, and evaluated." (The paper also cites two earlier reports by the Wikimedia Foundation related to the topic .)

"How trust in Wikipedia evolves: a survey of students aged 11 to 25"
From the abstract:  "we analyse the answers given by 841 young people [in France], aged eleven to twenty-five, to a questionnaire. To our knowledge, this is the largest study ever published on the topic. It focuses on (1) the perception young people have of Wikipedia; (2) the influence teachers and peers have on the young person’s own opinions; and (3) the variation of trends according to the education level. [...] Results. Trust in Wikipedia depends on the type of information seeking tasks and on the education level. There are contrasting social judgments of Wikipedia. Students build a representation of a teacher’s expectations on the nature of the sources that they can use and hence the documentary acceptability of Wikipedia. The average trust attributed to Wikipedia for academic tasks could be induced by the tension between the negative academic reputation of the encyclopedia and the mostly positive experience of its credibility." From the "Results and findings" section:  "When young people consider their past experience on Wikipedia use, they mostly provide positive judgments about the quality of information available on Wikipedia, whatever their education level. The major point of view is that most often the collaborative encyclopaedia enables them to access information they qualify as useful (93%), understandable (92%), and accurate (92.7%). Indeed, in open questions about the perception of Wikipedia, only one of the 841 respondents reported that he found mistakes in Wikipedia. [...] the higher the education level, the more participants attribute to their teachers a negative image of the collaborative encyclopaedia [...]. While only 31.7% of collège pupils feel that their teachers’ opinion about Wikipedia is low (or very bad), 67.2% of lycée pupils, 68.2% of bachelor’s students and 76.7% of master’s students answer this way." See also our review of a related earlier paper involving one of the authors: "Wikipedia: An opportunity to rethink the links between sources’ credibility, trust, and authority"

"How do academic topics shift across altmetric sources? A case study of the research area of Big Data"
From the paper (preprint version):  "Taking the research area of Big Data as a case study, we propose an approach for exploring how academic topics shift through the interactions among audiences across different altmetric sources. [...] we attempt to investigate the semantic similarity between topics from publications [about Big Data] and those from the discussions of audiences mentioning and disseminating [these] publications across different altmetric sources, including Blogs, News, Policy documents, Wikipedia and Twitter. [... To obtain topics, considering] the short titles of Wikipedia articles, we choose to use the first sentence in the summary which is a condensed explanation of an event, and is equivalent to the titles of blogs, news and policy documents in part. [...] By comparison [with blog and news publications], policy documents and Wikipedia entries have a more limited focus on Big Data publications with fewer publications mentioned. Specifically, the high-frequency terms in these two groups suggest a quite a different concern of topics on these platforms. Wikipedia entries are more oriented towards the research and application of technologies on internet and web, while policy documents have an obvious orientation to more general issues related to social progress like “EU law” and “Climate change”. [... Based on a cluster analysis of all terms,] publications from Wikipedia are more oriented towards academic, technical and more theoretical topics (e.g., “university”, “cloud computing”, or “theory”)."

IMGpedia: Enriching Wikimedia Commons images with metadata from DBpedia and Wikidata
The IMGpedia dataset "brings together descriptors of the visual content of 15 million images [from Wikimedia Commons], 450 million visual-similarity relations between those images, links to image metadata from DBpedia Commons, and links to the DBpedia resources associated with individual images [as well as links to Wikidata, in a later version]. It allows people to perform visuo-semantic queries over the images." It is the topic of several academic publications:

A 2017 conference paper and thesis, which - as summarized in the abstract  "defines the process of analysis of 15 million images from Wikimedia Commons in order to build the knowledgebase. First, the visual descriptors must be calculated; later, we propose an efficient strategy to compute similarity links between them; afterwards, we define a method for obtaining relations between the images and DBpedia resources of the Wikipedia articles that use them; and finally, the data is published as an RDF graph ready to be queried through the SPARQL endpoint service mounted for IMGpedia."

A 2018 conference paper titled "Querying Wikimedia Images Using Wikidata Facts" reports on enhancements:  "IMGpedia [...] is a knowledge-base that incorporates similarity relations between the images based on visual descriptors, as well as links to the resources of Wikidata and DBpedia that relate to the image. Using the IMGpedia SPARQL endpoint, it is then possible to perform visuo-semantic queries, combining the semantic facts extracted from the external resources and the similarity relations of the images. This paper presents a new web interface to browse and explore the dataset of IMGpedia in a more friendly manner, as well as new visuo-semantic queries that can be answered using 6 million recently added links from IMGpedia to Wikidata ..."

The IMGpedia dataset has also been published, but appears to have been removed since.

See also the "Structured data on Commons" project, a separate effort

"Inauthentic Editing: Changing Wikipedia to Win Elections and Influence People"
This is the topic of two blog posts published by the Stanford Internet Observatory earlier this year (on occasion of Wikipedia's 20th birthday) :  "Building on the work of Wikipedia editors catching politically motivated editing [citing a recent Signpost report by User:Smallbones], we investigate one case of 'inauthentic editing'—where individuals targeted the Wikipedia pages of two contending politicians during the 2020 British Columbia (BC) general provincial election. Through this deep-dive case study, we also show a process for investigating Wikipedia, and identify Wikipedia’s strengths and weaknesses in dealing with inauthentic edits." The studied example involved conflict-of-interest editing in the article about Canadian politician John Horgan.

In the second post, the student authors provide a detailed overview of how to investigate such cases based on Wikipedia's publicly available data and tools. As summarized by one of the authors:  "we explore the key markers of suspicious pages—such as a high edit-to-pageview ratio, or a high number of edits over time from a single otherwise-inactive user. We also include a free program that anyone can use to assess hundreds of pages at a time [called "Wikipedia Scanner" and published in form of a Jupyter notebook for use on Google's Colaboratory platform]."