Wikipedia:Wikipedia Signpost/2024-04-25/Recent research

Survey dataset of over 100,000 Wikipedia readers and contributors
From the abstract:  "The dataset focuses on Wikipedia users and contains information about demographic and socioeconomic characteristics of the respondents and their activity on Wikipedia. The data was collected using a questionnaire available online between June and July 2023. The link to the questionnaire was distributed via a banner published in 8 languages on the Wikipedia page. [...] The survey includes 200 questions about: what people were doing on Wikipedia before clicking the link to the questionnaire; how they use Wikipedia as readers ("professional" and "personal" uses); their opinion on the quality, the thematic coverage, the importance of the encyclopaedia; the making of Wikipedia (how they think it is made, if they have ever contributed and how); their social, sport, artistic and cultural activities, both online and offline; their socio-economic characteristics including political beliefs, and trust propensities. More than 200 000 people opened the questionnaire, 100 332 started to answer, and constitute our dataset, and 10 576 finished it." This dataset paper doesn't contain any results from the survey itself. And from the communications around it (including the project's page on Meta-wiki at Research:Surveying readers and contributors to Wikipedia) it is not clear whether and when the authors or others are planning to publish any analyses themselves. Hence we are taking a quick look ourselves at some topline results below (note: these are taken directly from the "filtered" dataset published by the authors, without any weighing by language or other debiasing efforts). It remains to be hoped that more use will be made of this data soon, also considering that various questions appear to have been designed for compatibility with certain previous surveys.



These gender ratios are notably somewhat more balanced than e.g. the figures from the Wikimedia Foundations "Community Insights" surveys of recent years; however, those targeted a different population consisting exclusively of contributors. Still, the gender gap in this new survey data is even somewhat smaller than that found for English-language Wikipedia readers in a past survey by the Wikimedia Foundation (cf. below).



Unless we are dealing with a data anomaly here, this chart shows a general preponderance of left-of-center political positions among Wikipedia users, partly balanced out by a substantial share of far-right users (10 on a scale from 1 = left to 10 = right).

Briefly

 * The Wikimedia Foundation invites feedback on a whitepaper about "Wikimedia Research Best Practices Around Privacy" (until April 30), see also News and notes in this Signpost issue
 * The Wikimedia Foundation's research department invites proposals (deadline: April 29) for the "Wiki Workshop Hall", a new feature of the annual Wiki Workshop online conference consisting of two 30-minute sessions "for Wikimedia researchers and Wikimedia movement members to connect with each other."
 * See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publications
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.''

"Global Gender Differences in Wikipedia Readership"
From the abstract and introduction:  "From a global online survey of 65,031 readers of Wikipedia and their corresponding reading logs, we present first evidence of gender differences in Wikipedia readership and how they manifest in records of user behavior. More specifically we report that (1) women are underrepresented among readers of Wikipedia, (2) women view fewer pages per reading session than men do, (3) men and women visit Wikipedia for similar reasons, and (4) men and women exhibit specific topical preferences" "Across 16 surveys, men represent approximately two-thirds of Wikipedia readers on any given day. Additionally, we observe that women view fewer pages per reading session than men do. However, we also find that on average, men and women visit Wikipedia for similar reasons. That is, the depth of knowledge that they seek, referred to as information need for the remainder of this paper, and their triggers for reading Wikipedia, referred to as motivations, are remarkably similar. Finally, men and women exhibit specific topical preferences. Readership of articles about sports, games, and mathematics is skewed to-wards men, while readership of articles about broadcasting, medicine, and entertainment is skewed towards women. We further observe evidence of self-focus bias[...], i.e. that women tend to read relatively more biographies of women than men do, whereas men tend to read relatively more biographies of men than women do." "closing content gaps is not a panacea as evidenced by prior research on Welsh Wikipedia, where a majority of the biographies are about women [...], a majority of Welsh speakers are women,[...] but readership is still heavily skewed towards men" See also project page on Meta-wiki: m:Research:Characterizing_Wikipedia_Reader_Behaviour/Demographics_and_Wikipedia_use_cases and a subsequent literature review which formulated various potential explanations for the observed gender gap in Wikipedia readers.

"Hunters, busybodies and the knowledge network building associated with deprivation curiosity"
From the abstract:  "A recently developed historicophilosophical taxonomy of curious practice distinguishes between the collection of disparate, loosely connected pieces of information and the seeking of related, tightly connected pieces of information. With this taxonomy, we use a novel knowledge network building framework of curiosity to capture styles of curious information seeking in 149 participants as they explore Wikipedia for over 5 hours spanning 21 days. We create knowledge networks in which nodes consist of distinct concepts (unique Wikipedia pages) and edges represent the similarity between the content of Wikipedia pages. We quantify the tightness of each participants' knowledge networks using graph theoretical indices and use a generative model of network growth to explore mechanisms underlying the observed information seeking. We find that participants create knowledge networks with small-world and modular structure. Deprivation sensitivity, the tendency to seek information that eliminates knowledge gaps, is associated with the creation of relatively tight networks and a relatively greater tendency to return to previously-visited concepts. We further show that there is substantial within-person variability in knowledge network building over time and that building looser networks than usual is linked with higher than usual sensation seeking." See also an explanatory Twitter thread by one of the authors

"Architectural styles of curiosity in global Wikipedia mobile app readership"
From the abstract: "[...] most curiosity research relies on small, Western convenience samples. Here, we expand an analysis of a laboratory study with 149 participants browsing Wikipedia to 482,760 readers using Wikipedia's mobile app in 14 languages from 50 countries or territories. By measuring the structure of knowledge networks constructed by readers weaving a thread through articles in Wikipedia, we provide the first replication of two distinctive architectural styles of curiosity: that of the busybody and of the hunter [in reference to the above paper involving some of the same authors ...] Finally, across languages and countries, we identify novel associations between the structure of knowledge networks and population-level indicators of spatial navigation, education, mood, well-being, and inequality." See also research project page on Meta-wiki: m:Research:Understanding Curious and Critical Readers

"Quantifying knowledge synchronization [between Wikipedia language versions] with the network-driven approach"
From the paper:  "[...] we explore the dominant path of knowledge diffusion in the 21st century using Wikipedia, the largest communal dataset. We evaluate the similarity of shared knowledge between population groups, distinguished based on their language usage. When population groups are more engaged with each other, their knowledge structure is more similar, where engagement is indicated by socio-economic connections, such as cultural, linguistic, and historical features. Moreover, geographical proximity is no longer a critical requirement for knowledge dissemination. We used Wikipedia SQL dump of 59 different language editions on February 1, 2019. [...] Specifically, we used two collections of the Wikipedia dump: category membership link records (*-categorylinks.sql.gz) and interlanguage link records (*-langlinks.sql.gz). [...] From the linkage between Wikipedia pages and categories, we extracted a hierarchical knowledge network of each language edition. [...Based on these per-language structures] we constructed the similarity network from the pairwise knowledge structure similarity, where nodes represent the language of Wikipedia, and the link's weight indicates similarity between languages. "English is in the center and serves as a hub node, while intermediate hub languages such as Spanish, German, French, Russian, Portuguese, Chinese, and Dutch also function as cluster centroids"

Despite teachers' skepticism, 86% of Estonian high school students use Wikipedia at least a couple of times per month (female students more often)
From the abstract:  "The article is based on a quantitative study in which 381 Estonian school children [9th and 12th grade students] participated in filling out an online survey. The questionnaire included both multiple-choice and open-ended questions. Findings: Statistical analyses and responses to open-ended questions showed that students often use Wikipedia as a primary source of information, but that their use of the site for learning tasks is guided by teachers’ attitudes and perceptions towards Wikipedia. Students perceive Wikipedia as a quick and convenient source of information but are uncertain about its reliability." From the "Results" section: "[...] 5% of the students surveyed use Wikipedia every day, 51% at least a couple of times a week and 30% a couple of times a month. To compare the groups, we conducted a t-test, which concluded that statistically significant differences were present across gender and grades. For the purpose of the calculations, we treated responses as numerical (rarely/not at all = 1, a few times a year = 2, a few times a month = 3, a few times a week = 4, every day = 5). For gender, the mean is 3.73 for women and 3.46 for men (p < 0.05). Thus, there is a statistically significant difference in the frequency of Wikipedia use between the two groups, with female students using Wikipedia more often than male students. [...] 24% of the students surveyed said that teachers had no objection to using Wikipedia, 3% said that teachers did not allow to use Wikipedia, 47% said that some teachers did and some did not and 10% said that they did not know. Teachers do not explicitly forbid students from using Wikipedia for learning tasks, but they do recommend that students use more trustworthy sources [...]"

"With or without Wikipedia? Integrating Wikipedia into the Teaching Process in Estonian General Education Schools"
From the abstract:  The study is based on semi-structured interviews with 49 teachers from 11 general education schools in Estonia. The results of the qualitative content analysis of the interviews indicate that teachers consider the use of Wikipedia to be a suitable for teaching, alongside other information sources and environments. However, teachers acknowledge some uncertainty and caution towards Wikipedia, as they do not consider it a very reliable teaching tool: an attitude largely inherited from the early days of Wikipedia. While teachers themselves are active and frequent Wikipedia users, and allow students to search for information, they do not assign Wikipedia-based text-creation tasks to students. "