Wikipedia talk:Wikipedia Signpost/2015-05-27/Recent research


 * Thanks for the coverage of this research. The gender research is most useful, although the centrality and gendered language results were already known, if I remember correctly.   All the best: Rich Farmbrough, 22:15, 29 May 2015 (UTC).


 * With regard to coverage bias: I would have to wonder if it's not more likely that the sources used as comparisons underrepresent women, than that Wikipedia overrepresents them. With regard to language bias, historically notable females have had some family-related reason to become notable, be it family connections, early widowhood, or simply the identities of the men they married. I'm less concerned, then, with language bias on articles for women who were born 50+ years ago; it'd be interesting to see if the detected language bias still exists for newer biographies. Powers T 23:39, 29 May 2015 (UTC)
 * Found this research article on popular pages' content quality: Wasted effort and missed opportunities. Is there a WikiProject or TaskForce addressing improvements of the most popular pages based on the Top 25 Report or other similar rating? — Preceding unsigned comment added by JayaJune (talk • contribs) 17:58, 31 May 2015 (UTC)
 * That paper was already covered in last month's issue, see also the discussion on the talk page (including one of the researchers). See here on how to alert us about new papers that should be covered. Regards, Tbayer (WMF) (talk) 07:05, 2 June 2015 (UTC)


 * I'm skeptical about the conclusion that women are "slightly overrepresented" How do we know if they are not instead underrepresented in these databases? One of these databases is crowdsourced, which means it would certainly be subject to the same systemic biases as Wikipedia, and the other is a database generated from a book from a single author with a history of dubious scholarship.   Gamaliel  ( talk ) 19:26, 31 May 2015 (UTC)
 * I don't know of a reason why "crowdsourced" (btw: ) databases should be inherently more biased than those compiled by professionals, but of course you are asking a very important question about the authors' choice of gold standard. Here is what they wrote about it:
 * It is important to understand that a biased reference dataset will obviously impact our results. If, for example, our reference dataset is already biased towards men (i.e., it covers only extremely famous women but also less famous men) than the proportion of women who are represented on Wikipedia would probably be higher than the proportion of men. To address this issue we analyze the coverage using several independent reference datasets (Jaccard coefficient between the three datasets ranges from 0.0 to 0.12 for different language editions), assuming that each of them will have a different bias and seeking patterns that exist across all three datasets.
 * While I don't expect this paper will be the last word on gender-related content bias of Wikipedia, it's a lot more solid than many other claims that have been made about the topic, especially in the media. It is also consistent with Magnus Manske's recent blog post who compared Wikidata with VIAF and ODNB (finding both more "sexist" than Wikidata) and concluded that
 * "Strong gender bias towards men exists in the number of biographical items on Wikipedia and Wikidata, however, this bias appears to be to a large degree due to historical and/or cultural bias, rather than generated by Wikimedians. Since our projects are not primary sources, we are restricted to material gathered by others, and so reflect their consistent bias."
 * On the other hand, the 2011 "WP:Clubhouse" paper found evidence that "female" films are less well covered on WP than "male" films, and a 2011 paper by Joseph Reagle and Lauren Rhue concluded that "Wikipedia provides better coverage and longer articles, and that it typically has more articles on women than Britannica in absolute terms, but we also find that Wikipedia articles on women are more likely to be missing than are articles on men relative to Britannica".
 * Also, Max was too modest to mention his ongoing WIGI (Wikipedia Gender Index) project in his review. While - AIUI - it won't examine coverage bias directly, it will surely yield a lot of data that should make it much easier for others to look at possible evidence for such bias.
 * Regards, Tbayer (WMF) (talk) 07:05, 2 June 2015 (UTC)


 * Piotrus, about the paper saying that user talk page communication is associated with high-quality contributions: Does that analysis still hold true if you exclude current and former FAC coordinators from the dataset?  Or is it only true when you include people whose "job" it is to edit FACs are included?  WhatamIdoing (talk) 22:17, 31 May 2015 (UTC)
 * I don't believe the article differentiates between coordinators and regular editors. --Piotr Konieczny aka Prokonsul Piotrus&#124; reply here 05:57, 1 June 2015 (UTC)
 * I think it might get different (less significant, maybe still positive) results if the coordinators were excluded. WhatamIdoing (talk) 20:29, 6 June 2015 (UTC)