User talk:Piotrus/Wikipedia interwiki and specialized knowledge test

We don't need them!
Wikipedia doesn't need anything like 400 million articles. Having that many would increase the housekeeping requirements by a huge multiple and the utility of the site by a small percentage. Even as Wikipedia is now, the vast majority of its value lies in a few percent of the articles. Calsicol 00:00, 24 July 2006 (UTC)
 * the vast majority of its value lies in a few percent of the articles citation needed :D Honestly, what makes you say so? The fact that some articles are more popular or controversial does not translates directly into their utility: I 'd argue that a good entry on sociocultural evolution is more more useful then the most popular article on Wiki, a biography of GWB. By the same talken I think that the relatively not popular entries on subjects like kanclerz and Polish-Romanian Alliance, which to my knowledge are not described on any English language online pages, are of immense utility, even if only to a few people interested in those fringe topics, who likely would not be able to dig information on them (at least, not online). Therefore I would say that the vast majorit of Wikipedia lies in its highly specialzed articles on fringe topics, which hopefully will increase to 400 million the sooner the better.--Piotr Konieczny aka Prokonsul Piotrus talk 03:39, 24 July 2006 (UTC)

At 400 million articles, that would make 400 times the cleanup load. It would also mean that there would beclose to a half million admins and over half a billion users. I think we might be hitting the point at which we out grow the size of the current internet. We are already close to the top ten websites. At 400 million pages, we would be far above the number two site.--Rayc 06:03, 27 July 2006 (UTC)
 * Nothing wrong with that, especially as at that time we should have stuff like quantum computers, brain implants and such that should allow us to continue to scale :) --Piotr Konieczny aka Prokonsul Piotrus talk 15:50, 27 July 2006 (UTC)
 * OMG, imagine all those wikimaniacs devoting the precious CPU time of their implants to this project instead of helping to find the cure for cancer. As a chain smoker I should be rather devoted to the latter than the earlier, huh?  // Halibutt 22:06, 14 August 2006 (UTC)
 * Nothing's stopping you from doing both: Rosetta@home, Folding@home... :) --Piotr Konieczny aka Prokonsul Piotrus 09:23, 15 August 2006 (UTC)

Good fast estimation study
It shows how that we will never the limit of our knowledge as subjects will continue to add on and more pages will always be dedicated to newer topics. It also shows that we can actually counter systemic bias by communicating and translating articles in such a way that all the important subjects all over the world would have an article on them. There used to be a project that offered money (to Wikipedia I think) when people used to create articles on subjects that aren't covered in the english books. Lincher 19:56, 15 August 2006 (UTC)

Wales mention
Jimbo mentioned this essay (more or less) in a NYT interview (Jimmy Wales: 2 Million Articles Down and More to Do, NYT, Wednesday, August 8, 2007. --Piotr Konieczny aka Prokonsul Piotrus 13:02, 8 August 2007 (UTC)

Count
I counted roughly what is contained in langlinks dump files, see


 * de -> en: 339,1 K links
 * fr -> en: 314,9 K
 * pl -> en: 180,7 K
 * en -> de: 338,5 K links
 * en -> fr: 310,2 K
 * en -> pl: 174,5 K

(Preceding unsigned comment added 8 August 2007 by User:West29north34) —Preceding unsigned comment added by Dsp13 (talk • contribs) 16:11, 7 January 2008 (UTC)

Italian paper encyclopedia results
This test prompted me to look at a paperback volume of the Italian Garzanti encylopedia for literature (see also it:Garzanti). The English wikipedia had 10 articles out of 13 entries from page 1. The Italian wikipedia had 12, and was only missing the French writer fr:Abeozen. The volume lists 5500 entries, so if 3 out of 13 are missing that means the English wikipedia is missing over 1000 articles or stubs that are at least considered notable in a 1974 printed literature encyclopedia. I added requests for the 3 missing articles with these edits and this one. 84user (talk) 22:57, 20 March 2009 (UTC)
 * Interesting; I'd strongly recommend testing a few more pages, as one page may give a significantly biased result. That's why in my interwiki test I am testing 100 pages - 100 is a good number, likely to give much more reliable results. --Piotr Konieczny aka Prokonsul Piotrus 04:45, 21 March 2009 (UTC)

I've now examined 17 more entries (two more pages from the Garzanti volume) and of the total 30 possible I find 5 missing from the English wikipedia (see User:84user/Sources) and only one missing from the Italian wikipedia (see w:it:Utente:84user). I will probably continue at this rate of 10 entries per month. 84user (talk) 01:57, 24 May 2009 (UTC)

A little relativisation
Your essay is interesting but opens the question as to the relevance of every article on every language.

For instance there are 36000 communes in France, a vast majority of which have less than 2000 inhabitants. I expect to find an article about each on the francophone wiki, but not on the anglophone wiki, where an article on each of the major ones and a 'list of communes in France' with redirects to the francophone wiki would be far more advisable.

As well your dictionary of notable Poles is a good measure of what should be covered in the polish language wikipedia, but honestly are all those people notable for a non-polish speaking person, and could many of them not be better treated in a paragraph in the main article for which they are notable ? (Not to bash Poles, the same holds true of Malians, or of French, or of Americans in the Polish wikipedia.)193.251.2.126 (talk) 15:30, 23 March 2009 (UTC)
 * Dear anon (please consider registering); you may want to look at What Wikipedia is, at Notability, and perhaps most importantly, at WP:BIAS. Briefly, to not disturb a dead horse too much, we don't (inend to) discriminate based on language or culture. Yes, Polish Wikipedia will have a better coverage of Polish topics (for obvious reasons), but that doesn't mean it is a satisfactory situation. Ideally, all Wikipedias should have the same coverage, so that language barrier will not impede any research. There may only be a few English speakers interested in some forgotten French village, or a footnote-to-history-book Pole, but they should be able to find that information here. --Piotr Konieczny aka Prokonsul Piotrus 17:12, 23 March 2009 (UTC)

Garbage
The 40 million assumes that because the Wikipedia doesn't have many polish biographies, that it doesn't have many biographies at all- this is an incredibly unsound assumption.

We don't have very many polish contributors in the English wikipedia so we don't have many polish biographies, but we have lots of English speaking ones, and hence lots of American and English biographies.

If you divide the number of biographies by the fraction of polish biographies, then of course you get a nonsense number.

For example, let's assume that we currently have half the biographies that we should do (say a million), but we only have 3% of the Polish ones; dividing a million by 3%, 0.03 gives 33 million. But we said we had half the biographies. Unless the Polish biographies are representative of the whole Wikipedia you just can't do that. It doesn't give you anything like the right answer; we already said that we only needed 2 million.

And of course the polish biographies are not representative, the Wikipedia almost certainly has a particular blindspot on Polish biographies; but they're only a small fraction of biographies anyway; 25,000.84.93.138.90 (talk) 16:11, 15 February 2011 (UTC)


 * There are a few hundred current countries. Poland is one country; America and England are just 2 countries. If we've covered the latter, then we have covered a few percent at most. Most countries' biographies are like Poland - not like America & England. (How are our Chinese & Indian biographies? Pretty pitiful I'd guess.) And then there are all the countries which no longer exist... --Gwern (contribs) 06:47 16 February 2011 (GMT)

This page should be updated
The data written here was about 12.5 years ago and it's now maybe outdated and should be updated into the future. Just a random Wikipedian (talk) 15:42, 14 August 2023 (UTC)