User:MargaretRDonald/sandbox/Gender-age bias

Gender bias
Hill and Shaw (2013) critique a widely quoted survey of wikipedians. "Opt-in surveys are the most widespread method used to study participation in online communities, but produce biased results in the absence of adjustments for non-response. A 2008 survey conducted by the Wikimedia Foundation and United Nations University at Maastricht is the source of a frequently cited statistic that less than 13% of Wikipedia contributors are female. However, the same study suggested that only 39.9% of Wikipedia readers in the US were female – a finding contradicted by a representative survey of American adults by the Pew Research Center conducted less than two months later. Combining these two datasets through an application and extension of a propensity score estimation technique used to model survey non-response bias, we construct revised estimates, contingent on explicit assumptions, for several of the Wikimedia Foundation and United Nations University at Maastricht claims about Wikipedia editors. We estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%)."The online references given in Hill & Shaw are no longer found at the addresses given. However, the methodology for the Pew survey seems to be described here.

For commentary by wikimedia on the 2008 Wikimedia survey, see Research:UNU-MERIT Wikipedia survey.

Age distribution of Wikipedians
The Wikipedia report of 2011 reports that in 2011 that of those who responded 13% were less than 17 (18?), 14% aged between 18 and 21, 26% between 22 and 29, 19% between 30 and 39 and 28% 40 and over. This site gives access to the questionnaire, and states: "Every registered user (editor) will see a notification once to participate in the survey. Anyone may click on this link and participate in the survey. We are also looking into the possibility of providing a link to take the survey if respondents miss the notification the first time.". That is, this survey is an opt-in survey, with all the known problems of non-representativeness.

The numbers given by the 2009 wikipedia survey results for age distributions look in part as if they are a function of internet access in 2008-2009 (school, university and work), and incorrect in 2009, they are not likely to be true for 2020. Given that internet access in first world countries is now almost ubiquitous, I would expect the age distribution results would now have changed for the US and for most of the developed world.

A figure from the Hill and Shaw (2013) paper gives adjusted ages estimates as follows:


 * 1) The median age for 2009 wikipedian survey data is 23 but  27 in the adjusted estimates;
 * 2) The 75% percentile changes from 34 to 47 under adjustment. That is, the wikipedia report indicated just 25% of wikipedians were over 34, while the adjusted estimates for 2008-2009 indicates that  25% of wikipedians were over 47.

Comment:

 * 1) As an opt-in study this report suffers all the known problems of attempting to summarise non-representative samples. Its summaries certainly represent those who opted in, but there is no reason to believe that they represent the community being discussed. It is generally the case that the populations who "opt in" differ in many significant respects from those who "opt out".
 * 2) If the number of respondents in the wikipedia survey were 5000 and they had been randomly sampled then the original results for the proportion of women editors would be 17/8% with 95% confidence limits (16.7%, 18.9%), while the adjusted estimates would be 22.7% with confidence limits(21.5%, 23.9%).
 * 3) With respect to the reported age quantiles, confidence limits for quantiles can (and should) be calculated but require the data ,or assumptions about the distribution of the data, to do so.
 * 4) For a very famous poll with an extraordinarily high number of respondents which got the results wrong, see The Literary Digest presidential poll.