Wikipedia:Wikipedia Signpost/2021-10-31/Recent research

Automated welcome messages don't improve retention of new editors
A field experiment involving 57,084 newly created user accounts on French Wikipedia examined the effect of pre-formulated welcome messages (example), finding no statistically significant effect on retention (i.e. whether the user made an edit within one week) or time spent contributing (estimated based on edit time stamps). The researchers also compared two variants of the welcome message with and without a "human mentorship" offer (i.e. a "contact me" phrase), likewise with no significant differences between the two. The study was carefully pre-registered (a practice that has been gaining popularity in various research fields amidst the replication crisis but is still not common among the publications we are usually covering here).

The authors note that their findings are "consistent with field experiments on other language Wikipedia communities ... however, [they] are in contrast to previous work that observed a positive effect of sending an invitation to the newcomer forum (the "Teahouse") on new editor retention in English Wikipedia [cf. our previous review ]. The divergent findings could have multiple explanations. First, the Teahouse study used a slightly different intervention. [...But ] The Teahouse study might also have been a false positive" because of a statistical problem involving multiple comparisons.

In an accompanying blog post, the authors note that "at the Wikimedia Foundation, the Growth Team has been re-thinking the new editor experience through designs like the Newcomer homepage." This work has likewise involved randomized experiments to "learn more about what types of content is effective in driving activation and retention rates". While the full results on the effects of providing newcomers with an initial version of that homepage have not been published yet, it has been informally reported that it did not indicate increased retention, consistent with the French Wikipedia experiment. However, the WMF Growth team subsequently found that adding "newcomer tasks" to the Newcomer homepage increased both retention and the number of edits made during the first week.

The French Wikipedia study was done by Cornell University's "Citizens and Tech" lab (originally founded as CivilServant), which has been doing various other studies on Wikimedia projects, including a related one about the "thanks" feature (see our previous review: "Receiving thanks increases retention of new editors, but not the time contributed to Wikipedia"). On Twitter, the lab's founder J. Nathan Matias connected the study to current debates on "how to govern big tech", arguing that  "This kind of study is part of the answer:
 * open, transparent research designed with/for the public
 * platform-independent
 * public-interest problem-solving rather than PR / profit"

Briefly

 * See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.

Other recent publications
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.''

"BioGen: Generating Biography Summary under Table Guidance on Wikipedia"
From the abstract and paper:  "we propose the task of table-guided abstractive biography summarization, which utilizes factual tables [i.e. infoboxes in biographical Wikipedia articles] to capture important information and then generate a summary of a biography. We first introduce the TaGS (Table-Guided Summarization) dataset, the first large-scale biography summarization dataset with tables. Next, we report some statistics about this dataset to validate the quality of the dataset. We also benchmark several commonly used summarization methods on TaGS and hope this will inspire more exciting methods. [...] In the real-world application which provided automatic biography summarization service, we will employ human editors to double-check the generated summary to ensure the correctness of content and grammar before publish[ing] the summary." The paper contains no examples of the generated summaries, and the code and corpus that it describes are not publicly available. However, the authors (five Beijing-based researchers) indicate that they might make the latter accessible upon request to researchers who agree to a copyright statement claiming that "The dataset is only for research purposes. Without permission, it may not be used for any commercial purposes and distributed to others."

Uncertainty avoidance, masculinity, and government policy affect Wikipedia contribution rates in Asian countries
From the abstract:  "This research measures the participation of Asian countries in [the sharing economy] through their contributions to a global sharing economy platform-Wikipedia. This study uses language as a proxy for each country, which allows for a macro-scale comparison of factors related to participation in sharing economy. The study finds that in addition to expected factors related to the global digital divide and the country's development level, other factors such as country's size, dominant language, and cultural factors also play a significant role. Lower development levels, multi-ethnic (multi-language) and smaller populations can be a severe impediment to the development of the sharing economy. Government policy (China) or unique Internet structure (South Korea) can create significant outliers. Contributing to the sharing economy is also more common in countries located near the self-expression and rational-secular ends of the Inglehart-Welzel model, and the uncertainty avoidance, masculinity, and long-term orientation dimensions of the Hofstede model." (see also presentation slides)

"How do Editors Collaborate in the Farsi and Chinese Wikipedias?"
From the abstract:  "... we expanded upon the known collaborative mechanisms on the English Wikipedia and demonstrated that the collaboration model is best captured through the interplay of these mechanisms. We annotated talk page conversations for types of power plays or vies for control over edits that are made to articles, to understand how policy and power play mechanisms in editors' discussions account for behavior in English (EN), Farsi (FA), and Chinese (ZH) language editions of Wikipedia. Our findings show that the same power plays used in EN exist in both FA and ZH but the frequency of their usage differs across the editions."

Predicting pageviews around elections and football matches
From the abstract:  "...we propose a simple model that describes the dynamics around peaks of popularity by incorporating key features, i.e., the anticipatory growth and the decay of collective attention together with circadian rhythms. The proposed model allows us to develop a new method for predicting the future page view activity and for clustering time series. To validate our methodology, we collect a corpus of page view data from Wikipedia associated to a range of planned events, that are events which we know in advance will have a fixed date in the future, such as elections and sport events. [...] restricting to Wikipedia pages associated to association football, we observe that the specific realization of the event, in our case which team wins a match or the type of the match, has a significant effect on the response dynamics after the event."