Wikipedia talk:Wikipedia Signpost/2019-11-29/Special report

Well I am a Wikipedia editor from Sri Lanka and of course English is not a popular language though it is one of the official languages of the country. It is regarded as a link language in the country. Surprised to Portugese speaking Brazil in the top 15 list even ahead of South Africa for gaining popularity in English Wikipedia. Abishe (talk) 03:36, 30 November 2019 (UTC)
 * Fascinating. I'm surprised Nigeria did not make the list. -Indy beetle (talk) 08:04, 30 November 2019 (UTC)
 * This is all there is from the September file:

language. country quant. lower limit upper limit
 * arwiki	Nigeria	  5 to 99	1	10
 * enwiki	Nigeria	 100 or more	11	20
 * enwiki	Nigeria	  5 to 99	251	260
 * hawiki	Nigeria	  5 to 99	1	10
 * jawiki	Nigeria	  5 to 99	1	10
 * yowiki	Nigeria	  5 to 99	1	10

yowiki is prob. yorba (sp?). Smallbones( smalltalk ) 14:12, 30 November 2019 (UTC)
 * See Yoruba language. MPS1992 (talk) 15:13, 30 November 2019 (UTC)


 * These are interesting figures, but they need to be viewed with caution, for a number of reasons, including the following. For a start, the records are of the assumed location, and not necessarily the nationality, of the contributor. So, eg, if a contributor is a foreign student in the USA, the UK or Australia, all of which have big foreign student populations, the records will not be a record of the nationality of the contributor. Secondly, the assumed location may not be correct. So, eg, if a contributor in the PRC is using a VPN that says that the contributor's location is the USA, then the record will show the USA, not the PRC, as the location, even though the true location is the PRC. Thirdly, the records are likely to be skewed not simply towards native speakers (as the article hypothesises), but also towards fluent non-native speakers. That would explain, eg, the high performance of the Netherlands and Germany (the latter of which has more fluent speakers of English than Australia), and possibly also Brazil, a country with a large population (about 2 1/2 times that of Germany) and mandatory learning of at least one foreign language for all 12 grades of compulsory schooling. Fourthly, one needs to be cautious about Canada, where only about 56% of the population speaks English as a mother tongue, and about 21% use French as their mother tongue. Fifthly, the figures indicate that some countries with smaller populations have disproportionately large numbers of contributors. So, eg, New Zealand and Ireland both have populations of about one fifth of that of Australia, but New Zealand has more than one fifth as many prolific contributors, and Ireland has more than one fifth as many small contributors. Similarly, the number of contributors from the UK (both prolific and small) is disproportionately large by comparison with the USA. Bahnfrend (talk) 09:46, 2 December 2019 (UTC)
 * Thanks for this - all datasets have limits or quirks of course, and it's important for everybody to understand the limits. Also you're starting to get into some new hypotheses about the data (e.g. from foreign students, fluent non-native speakers, or VPNs), which is the start of really understanding the data. This dataset has some special quirks, designed right-in to hide identities, and even the most basic measure - what is an "edit"? - is pretty vague covering everything from changing a comma to a semi-colon, to adding in 1,000 words to an article. Beyond simple curiosity, I suppose my motivation has to do mostly with so-called "political bias". A lot of Americans seem to think WP has a liberal bias, but is that due to age, gender, or country of residence of editors? I do think that this dataset will be examined in detail, so getting out all the quirks, biases, and hypotheses now is a worthwhile exercise.  Thanks. Smallbones( smalltalk ) 17:19, 2 December 2019 (UTC)