Wikipedia:Articles for deletion/List of countries by English-speaking population (2nd nomination)


 * The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review).  No further edits should be made to this page.

The result was Keep. But please work to improve the current article to meet our standards (as discussed here). --MZMcBride (talk) 01:20, 18 January 2009 (UTC)

List of countries by English-speaking population
AfDs for this article: 
 * ( [ delete] ) – (View AfD) (View log)

The article constains information of two kinds: information that is already presented in more relevant articles and information that is in violation of WP:OR, is unsourced and is not likely to be possible to source. All the article consists of is a table, containing All in all, we have no possible way to know the number of English speakers in most countries as censuses do not record them, the vast majority of countries in the world have not even been the subject of a survey such as the Eurobarometer and not even the Eurobarometer measures what is claimed in this article, nor did it intend do. As the rather faulty table is all the content in the article and the only content that can be properly sourced is already found elsewhere, I suggest that this page should redirect to English language JdeJ (talk) 14:14, 13 January 2009 (UTC)
 * 1) The number of English speakers in Anglophone countries. This information is based on census results and is no doubt correct, at least as correct as could reasonably be expected. However, this information is already listed in a table in English language where we find a table with Anglophone countries and the number of English speakers in these countries. Having that information in this article only copies what we already have, and the page could perhaps be redirected to English language
 * 2) The number of English speakers in other countries. These numbers, if I may be frank, constitute an excercise in original reasearch. For European countries, the numbers builds on a language survey in the Eurobarometer. To be more precise, the numbers are claimed to build on that source, but the source do not support them. Nowhere in the Eurobarometer are the number of English speakers in European countries presented. All that is presented in the source is a series of estricted surveys among adult respondents in EU countries, with a sample of around 1000 respondents per country. That is all. After that, some Wikipedia editor has taken the percentages found in the surveys, calculated those percetages on the whole population and claimed that the resulting number equals the number of English speakers per country! This is a rather obvious breach of WP:OR, not to mention that it is wrong. First, the respondents in the study were all adults, so calculating the returns on the whole population is incorrect. It is highly likely than 100 Swedes aged 25-35 will speak English than that 100 Swedes aged 5-10 will speak English, thus the percentage in the survey is almost guaranteed to be higher than in the population as a whole. This limitation has been disregarded in this WP-article. Even if it were not, using the results of a restricted survey to try to claim the total number of English speakers based on one's own calculations would quite obviously not be a proper used of sources. The Eurobarometer cannot be used as a source to claim the number of English speakers in any country, and that was never its intention. For the rest of the world, there are two options. For some countries, there are no sources at all, only some editors own estimates. For most other counties, the source is a book The Cambridge Encyclopedia of the English Language (Crystal, 2005). This is a great book and David Crystal is an acclaimed academic, but once again, we are dealing with very rough estimates and the Wikipedia editor has taken Crystal too literally. I'm also worried that ripping a few pages of his book may constitute a copyright-infringement.
 * Keep This is an appropriate spinoff of English language, particularly since there is far more than can be said about any language besides the number of speakers. The parent article has a table regarding the seven nations with the largest population (which is about right for space limitations), and a link to this article for people who want to know about more than 7 countries.  I can't tell whether the proposal is that we make the article English language much larger, or whether we just don't say anything about nations other than the U.S., Britain, Australia, Canada, India, Nigeria, and the Phillipines; but neither of those would be a good idea.  The only objection that I think might be valid is the danger of doing rankings (i.e., Israel #78, Japan #80), but the sortable table would make the need for "rank" unnecessary. There are serious problems with this article, as the nominator (JdeJ) has found by taking the time to look at the cited sources.  Mandsford (talk) 15:20, 13 January 2009 (UTC)
 * Comment The table in English language could of course be lengthened with those countries for which we have sourced data, but 90% of the data for countries in this article is only guesses and even if we decide to keep the article, they would have to be removed as violation WP:OR for the reasons given above. The data in the table is not supported by the sources, it is the personal guesses of a Wikipedia editor. As the article does not containt anything else, that is the reason I propose moving the countries for which we have census results into the already existing table in English language and redirecting the article there.JdeJ (talk) 16:11, 13 January 2009 (UTC)
 * The article at English language would be TOO long if it contained a detailed table of so many countries with populations that speak the language. It belongs on its own article although the sources and content itself should be improved. BritishWatcher (talk) 10:54, 16 January 2009 (UTC)


 * Weak Keep if editorial improvements can be done Both articles bizarrely include Nigerian pidgin as English, when it is about as far from English as Dutch. "'I no know wetin u dey yarn' means 'I don't know what you are talking about' per the article on the pidgin (which is not a variety otr dialect of English). This falsely elevates Nigeria to have more English speakers than UK. Both articles also overstate the number of English speakers in India. These issues hit at the verifiability requirement and original research in the form of WP:SYNTHESIS. The topic seems notable and deserving of an article, since there have been many scholarly and popular works on the prevalence of the English language in various countries. Edison (talk) 17:06, 13 January 2009 (UTC)
 * Keep - claims of OR are unfounded. Some uncited numbers might be removed if they can't be cited, not a big deal.  But there's no problem that requires deletion here. It is certainly not only the anglophone countries that have good census data (for instance, native speakers in Russia is hard census data).  Merging to English language would immeadiately necessitate spinning back out - hence that's unviable. Wily D  18:36, 13 January 2009 (UTC)
 * (a) The last hard census data on Russia is dated 1989, and significant share of legal aliens (i.e. embassy staff) were off limits. Later censuses aren't hard in any way. (b) I won't be surprised if the U.S. Embassy in Moscow employs more U.S. nationals that the number stated in the table for the whole country. Check the expat newspapers for membership stats, these may give a better estimate. Sure it halved after the economy collapsed (again :)) but it's still in five digits for Moscow alone. NVO (talk) 20:40, 13 January 2009 (UTC)
 * With all due respect, the above comment is not factual. Yes, censuses in some countries record the number of native English speakers and that kind of information is fine. We could possibly have a list about that. The problem here is the number of second language speakers, such as those reported in the Eurobarometer. I would argue that anyone seriously claiming that the way the article currently uses them isn't OR simply has not understood what OR means. Taking a survey with a small sample (with only adult respondents) and using the findings of that survey to calculate one's own number of English speakers is OR, we don't even have to argue about that. The data in the table that makes up all of the article is not found in the sources, it is produced by editors trying to calculate the number of English speakers by using their own interpretation of the source. The fact that their interpretation is inaccurate is only moderately important, it would be OR even if they interpreted it correctly. We present sources, we don't interpret them. That is OR in a nutshell and that is why most of the data in the article is unsuitable.JdeJ (talk) 18:56, 13 January 2009 (UTC)


 * Keep Strong keep. None of the nom's arguments (OR, copyright infringement, no survey data) hold water. (I lost a lengthy rebuttal to connection problems, so will have to respond to these points in pieces.) As Edison said, this is an important topic, with reliable sources, and the list should be kept. -- Avenue (talk) 19:21, 13 January 2009 (UTC)
 * As you have admitted yourself now, "Some of the recent calculations in the list are badly wrong and should be deleted". Given that, I found it a bit strange that you first voted to "keep" and claimed that none of my arguments hold water, then admit that some of the data is bad but instead of weakening your "keep" when you discovered that some of the data is "badly wrong and should be deleted", you strengthen it to strong.JdeJ (talk) 21:40, 13 January 2009 (UTC)
 * My original response (which I lost) acknowledged that some of the information was badly wrong, and should be deleted. I apologise if my making comments in batches has made them hard to follow. But I stand by my original view that none of your arguments are valid reasons for deletion of the whole list. The incorrect calculations can perhaps be viewed as OR. In any case they are wrong, and should be deleted, as we agree. But you also claimed that the original calculations from the Eurobarometer survey are OR. They are not, and that is why I believe your OR argument for deletion of the list is invalid. -- Avenue (talk) 01:03, 14 January 2009 (UTC)
 * Well, we have different views on OR and it might be an idea to take that argument to the OR page and get comments from others. I maintain that taking percentages from one population and calculating them on another population is OR.JdeJ (talk) 08:23, 14 January 2009 (UTC)
 * From that comment, it seems you don't have the faintest idea what my point is. I am certainly not arguing for "taking percentages from one population and calculating them on another population". See my edit here where I explain that the figures being multiplied referred to the same population, and came from the same report. We might have the same views on OR after all. -- Avenue (talk) 07:42, 16 January 2009 (UTC)
 * With all due respect, from that comments, it seem you don't quite understand the vocabulary in statistics. The population in the Eurobarometer is the sample, nothing else. It also presents the total population over 15 in different countries, but its survey is done on a sample of that population, not on that population.JdeJ (talk) 08:14, 16 January 2009 (UTC)
 * The Eurobarometer percentages are not simply based on sample counts. They are weighted survey estimates of the true percentage in the relevant population. The 15+ population figures shown in the report were used for weighting the survey data (see the "Technical specifications" annex to the report). This means that the survey's analysts considered that the population covered by the survey was effectively the same as that estimated by the 15+ population figures. That is their judgment, not mine. -- Avenue (talk) 10:13, 16 January 2009 (UTC)
 * On copyright, we do not copy all of Crystal's list (which is on just one page of his book). More importantly, we combine it with data from several sources to produce a more comprehensive list. I believe our use of his figures therefore does not constitute copyright infringement. The same is true for the Eurobarometer and Ethnologue figures (and for figures from many other sources on other Wikipedia pages).
 * I'll be traveling and offline for the next couple of days, so apologies in advance if I'm slow to respond to any further queries. -- Avenue (talk) 03:24, 14 January 2009 (UTC)


 * Comment - As it was claimed above that there is no OR in the article, here are just a few of the claims the article make

As can be seen, the Eurobarometer is used as the source for many of the figures. This Eurobarometer can be found here. Can anyone find 6,600,000 English speakers in Sweden? 10,000,000 in Spain? 14,000,000 in Italy? Can anyone find the number of English speakers in a single country in the Eurobarometer? I doubt it, since it doesn't make any such claim. Yet this article happily uses it as a source for claiming those numbers, and we even have a user here trying to argue that that is not OR. And what about these figures for Lebanon, Honduras and other countries. No sources at all, so where do they come from? Once again, while it can be argued to keep or redirect the article, I fail to understand how anyone could claim that there is no OR here. If inventing number of speakers for countries isn't OR, then what is?JdeJ (talk) 19:27, 13 January 2009 (UTC)
 * (ec) First, there is nothing wrong with quoting figures from sample surveys. One fallacy you fall into here is a false dichotomy between sample surveys and censuses. Both are inaccurate to some degree. The literature on post-enumeration surveys shows that practically all census figures are not completely accurate. In the context of our list, the varying concepts used in the census questions for English speaking ability give me greater concern than the calculations on the Eurobarometer data.
 * On to OR. According to WP:OR, "The "No original research" rule does not forbid routine calculations" including calculating percentages. Multiplying the Eurobarometer English speaking percentages by the 15+ population figures given in the Eurobarometer report is a routine calculation, and does not constitute OR. For example, the figure for Sweden of 6,600,000 English speakers was calculated as 7,376,680 * 89% = 6,565,245, then rounded to two significant figures (to match the precision of the percentages reported by Eurobarometer).
 * On the other hand, the more recent recalculation of percentages using total population figures, and the display of these to two decimal places, is horribly innumerate. These percentages should be deleted from the list immediately. -- Avenue (talk) 21:02, 13 January 2009 (UTC)
 * There is nothing wrong with quoting figures from sample surveys, the problem arises when you go on to producing figures. What you have done here is to take the result from a survey on one population (the respondents in the sample) and then calculated that percentage on another population (the total population over 15). A routine calculation is counting the percentage found in a study on the population used in that study, but taking the percentage from one study and using it to make calculation on another population is no longer a "routine calculation", it is OR.JdeJ (talk) 21:33, 13 January 2009 (UTC)
 * All right, let me be more explicit. We agree there is nothing wrong with quoting figures from sample surveys. There is also nothing wrong with showing figures derived by routine calculations from the survey results, as stated in our original research policy. This includes adding two percentages, calculating percentages by dividing the estimated number of people with some characteristic by the total number of people in that population (if the two numbers are from the same report), and, in my view, calculating the number of people with some characteristic by multiplying the percentage having that characteristic by the total number of people in the population (again, if the two numbers are sourced from the same report). If I understand you right, you feel that the last calculation is OR, but the second is not. I do not see any real difference between them, and I believe neither is OR. -- Avenue (talk) 01:33, 14 January 2009 (UTC)
 * The two calculations in question can be written as:
 * p1 = n1 / nT
 * and
 * n1 = p1 * nT
 * in case this is getting lost in my verbiage. -- Avenue (talk) 01:41, 14 January 2009 (UTC)
 * About the unsourced numbers, I suspect they are from Ethnologue. I think that citation got dropped improperly in early 2006. I'll add it back. -- Avenue (talk) 00:46, 14 January 2009 (UTC)
 * Oops, I see it's still there. Inline citations would be nice though. -- Avenue (talk) 00:50, 14 January 2009 (UTC)


 * Journalism, perhaps? On the topic, I would not mind delete: the patchwork of real data is incompatible to the point that makes rankings useless. NVO (talk) 20:31, 13 January 2009 (UTC)
 * I'd be happy to see the rankings removed. -- Avenue (talk) 21:10, 13 January 2009 (UTC)


 * Strong Delete JdeJ is right. After looking at the Eurobarometer and the table above, I'm changing my vote.  Percentage of English speakers is on page 14 of that report and the numbers don't at all match up with the table to which they are sourced.  France, 36%; Italy 29%; Netherlands 87%; Spain 27%; etc. -- There's nothing that I see that has the precision of the numbers listed here-- 24.82% for France for instance, that's not from the 2006 report.  While it's a legitimate topic in the hands of someone who has accurate information, I would not rely upon this article for anything.  One of the cautions of Wikikpedia is that you shouldn't rely on it as your exclusive source for information, simply because it's not a substitute for printed sources.  It's this type of article that gives Wikipedia a bad name.  I'd love to be able to eat crow on this, be proven wrong, but clearly false information is the worst failing of any encyclopedia. Mandsford (talk) 20:28, 13 January 2009 (UTC)
 * I agree strongly that some of the information here should be deleted, but I think you want to throw the baby out with the bathwater. Please see my response to JdeJ above. -- Avenue (talk) 21:04, 13 January 2009 (UTC)


 * Delete. Calculating new data from numbers in a report is by definition original research. When you use a different total population for your calculation than the one listed in the study, you're further skewing the results because of improper WP:SYNTHESIS. All the reliable data is already in the English language article. The rest isn't reliable and can't be edited to fix its shortcomings, so it needs to go. - Mgm|(talk) 20:42, 13 January 2009 (UTC)
 * Not just original synthesis, but badly done original synthesis. If anyone was surprised by the figure that only 89.33% of Americans speak English, it's the result of dividing the total number of English speakers over the age of 5 (English only, or speak English "very well" or "well"-- 251 million) by the entire 2000 population, of all ages -- (281,421,906).  As it turns out, about 11 million of the 262 million people aged 5 and over said that they spoke English "not well" or "not at all", slightly more than 4 percent.  Mandsford (talk) 20:55, 13 January 2009 (UTC)
 * See my response to JdeJ above. Some of the recent calculations in the list are badly wrong and should be deleted, including most of the percentages column, but the numbers of English speakers given are not OR. -- Avenue (talk) 21:09, 13 January 2009 (UTC)
 * They still are. A routine calculation would be to calculate the percentage of a study on the population in the same study. Taking the percentage found on one population and calculating it on another population is definitely OR.JdeJ (talk) 21:33, 13 January 2009 (UTC)
 * The two calculations are simply inverses of each other; there is no difference in complexity. And both the percentage and population figures are sourced from the same Eurobarometer report, not from different sources. Where is the OR? -- Avenue (talk) 23:16, 13 January 2009 (UTC)


 * Delete Here's another problem with using the Eurobarometer survey results in this fashion: a lot of the countries on that list have a non-negligible number of British immigrants. The survey, it seems, "covers the national population of citizens of the respective nationalities and the population of citizens of all the European Union Member States that are residents in those countries and have a sufficient command of one of the respective national language(s) to answer the questionnaire."  Many of those British migrants to these countries don't have sufficient command of the national languages in question.  This will skew the results slightly. JulesH (talk) 21:18, 13 January 2009 (UTC)
 * Is that really true? I don't have any figures at hand, but I would suspect that there were only a few countries where English-speaking immigrants are not negligible (taking into account all the other sources of inaccuracy operating here). It might be worth a note in our article, but I don't see this as a good reason for deletion. -- Avenue (talk) 23:46, 13 January 2009 (UTC)


 * Keep per Avenue.- gadfium 22:41, 13 January 2009 (UTC)
 * Keep The problems raised are essentially problems with how it should be edited (btw, I agree with the comment that Nigerian pidgin or other pidgins should not be included here, butt hat's for the talk p. As long as the source for the data is given, it's not OR. DGG (talk) 00:10, 14 January 2009 (UTC)
 * Then this embarrassment should be taken down to someone's userspace while someone tries to figure out which parts are true and which parts are false. The citations to the 2006 Eurobarometer have been exposed by the nominator as a falsehood.  The "only 89% of the people in the U.S. speak English" has been shown to have been inaccurate because nobody questioned it until now.  The same incorrect information is tainting the English language article as well.  Maybe someone can find a table that has been published in, say, a book or some other source that doesn't lend itself to constantly re-editing. Mandsford (talk) 02:46, 14 January 2009 (UTC)
 * I agree the list is currently a mess, and maybe it would be best to revert to a version without the percentage column, e.g. back in May 2008. That would be worth discussing on the list's talk page. But you don't seem to understand the situation with the Eurobarometer figures. They are not a falsehood, as I explained above. The new columns giving total populations and calculating incorrect percentages from this are the problem. And constant re-editing is the nature of Wikipedia. -- Avenue (talk) 03:07, 14 January 2009 (UTC)


 * I don't seem to understand, huh? Okay, enlighten me -- Can you tell me which page of that report says that there ar 16,000,000 English speakers in France?  How about those 14 million speakers in Italy?  Where's that from?  Twelve million in the Netherlands?  Is there a source for that?  Did anyone else even look at the report (there's a link to it above)?  Was there anybody, other than the nominator, who has taken the time to see whether the information was reliable?  This is why it is preferable to rely upon a published table of data instead of having 100 people construct one.  Mandsford (talk) 14:19, 14 January 2009 (UTC)
 * These figures (for second language speakers of English in these countries) are based on a routine calculation from the 15+ population for each country and the proportion of this population who claim to speak English as a second language not as their mother tongue. The 15+ population figures come from the second page in the "Technical specifications" annex (no page page number shown, but it's the 70th page in the PDF file), and the percentages come from table D48T on page 13. I gave a worked example for Sweden earlier in this edit. The corresponding calculation for France is 44,010,619 * 36% = 15,843,823 (rounded to 16 million), for Italy it's 49,208,000 * 29% = 14,270,320 (rounded to 14 million), and for the Netherlands it's 13,242,328 * 87% = 11,520,825 (rounded to 12 million). -- Avenue (talk) 09:02, 16 January 2009 (UTC)
 * right, we do not take article down while we work on them because of the present low quality--if the quality embarrasses anyone, they can and should  improve it right as it stands. DGG (talk) 04:25, 14 January 2009 (UTC)
 * If DGG were right, the article could be kept but the problem is that we simply do not have sources for most countries so the edits he are claiming could be done simply are not possible. If they were, they would probably already have been done. Nobody argues that we should not keep the countries for which we have census returns, but they make up less than 10% of the table and they are already included in another article. However, if DGG can provide sources that present the numbers of English speakers in countries such as Sweden, France, Honduras, Lebanon and others, then please do so.JdeJ (talk) 08:23, 14 January 2009 (UTC)
 * It's not simply a question of whether there are census figures. For instance, we only have figures from the Swiss census on people using English as their main language. Ranking Switzerland based on that figure, while the figures for other countries include second language speakers, yields silly results (as you pointed out below). There is no standard approach to these questions across censuses from different countries. Some include second language speakers; some do not. And what qualifies someone as knowing a language varies widely. Restricting the list to census results could give a very incomplete and misleading picture as a result. -- Avenue (talk) 08:24, 16 January 2009 (UTC)


 * Comment - As a result of the inconsistent use of sources in the article, it produces some rather strange (I could say funny) results. Switzerland and Austria are two neighbours that in many regards are very similar, including similar levels of standard of living and education. According to our article, the percentage of English speakers in Austria is 46.76% (no source supports that claim) while in Switzerland the same percentage is a meager 0.96%. The reason is of course that we use a proper source for Switzerland (the Swiss census) while we don't have a source supporting our Austrian percentage, but a reader might now know that and think there is an enormous difference between the level of English in Austria and in Switzerland.JdeJ (talk) 08:30, 14 January 2009 (UTC)
 * See my comment about census figures above. The problem here does not relate to the use of a sample survey versus a census, but results from comparing a figure including second language speakers to one restricted to "main language" speakers only. -- Avenue (talk) 08:24, 16 January 2009 (UTC)
 * In fact, if you ignore the silly percentages column, these two countries illustrate well why census figures alone are not enough. The Eurobarometer figures, although subject to some sampling error, provide a fuller picture of the situation in Austria than the limited Swiss census figures do for that country. -- Avenue (talk) 10:51, 16 January 2009 (UTC)


 * Keep but article should be overhauled to use only a single source. Barring that, a multi-column table with figures given by a reputable source for each column would be ok. The topic is notable enough for an article. The article may be in terrible shape but deletion is not the way to fix the article. --Polaron | Talk 15:03, 14 January 2009 (UTC)
 * As I've tried to point out many times, the problem here is the lack of such a source. Almost every user who has voted "keep" has agreed that the sources are not good but that it can be fixed with good sources. However, as stated already in the nomination, no such sources exist and that is the reason why the article looks the way it does.JdeJ (talk) 15:46, 14 January 2009 (UTC)
 * The cited reference by David Crystal already includes such a table although some might not want to use that exclusively because it excludes countries where English is not the predominant native language nor an official language. A newer version of that table is in the second edition of Crystal's "English as a Global Language". Use that reference as the basis and remove figures and countries not cited in that book. Then, let other people over time add other countries where English is a classified as foreign language and ensure that the figures added are referenced to a reliable source that virtually no one disputes (e.g. a national census authority). --Polaron | Talk 16:58, 14 January 2009 (UTC)


 * Strong Delete. Very clear WP:SYNTHESIS violation. THF (talk) 01:15, 16 January 2009 (UTC)
 * Which part do you feel is a synthesis violation? The new percentage column, or other bits as well? -- Avenue (talk) 09:05, 16 January 2009 (UTC)


 * Comment: I'm taking the liberty of copying across the only response to JdeJ's AfD notification on the list's talk page. This editor also made a plea elsewhere on the talk page for retaining Nigerian pidgin in the list. -- Avenue (talk) 09:18, 16 January 2009 (UTC)
 * "Why can't a survey be used as data? To be sure, the usage of these figures should be revised on the basis of what you say, but I don't think that makes the data unusable. I think this article makes a worthwhile effort to give some estimate of who speaks English in the world, and I wouldn't want to end that.  Agh.niyya (talk) 04:05, 16 January 2009 (UTC)"


 * Strong Keep - The article needs work but it is a useful and noteworthy article and should not of been rushed to AFD without more debate about the articles content being held on the talk page first. Certain tags are clearly justified because of the current content but that is NO reason to simply delete an entire article. BritishWatcher (talk) 10:50, 16 January 2009 (UTC)
 * Keep Not sure where all these people (including the nominator) appeared from (since I have never seen them on the talk page and I'm the person who has most criticized the article). As I've said on the talk page, the article needs to be improved, not deleted.  Please read my post there.   Fowler&amp;fowler  «Talk»  17:32, 16 January 2009 (UTC)
 * PS I do agree with the nominator that the numbers for European countries (and perhaps for non-Anglophone countries) are mostly unreliable and should be removed; however, I think that the table in the English page is too short.  Fowler&amp;fowler  «Talk»  17:38, 16 January 2009 (UTC)


 * Conditional Keep The topic itself meets WP:Notability. This primarily a sourcing dispute and unreferenced/unrealiably sourced material can be taken out, and better sourced statistics reinserted with discussion on the talk page. Failing that, Userfy for anyone who wishes to work on it. --Patar knight - chat/contributions 21:37, 17 January 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.