Wikipedia talk:Size of Wikipedia/Archive 1

Older comments
Informative graph Anode!

If the data is available it would be cool to graph the number of editors and/or the number of hits on the same time axis. This might show us any emerging lead lag relationships. user:mirwin

It's a nice graph, but there's an awful lot of white space on it. Can somebody who knows how to do these things trim it? --Camembert
 * Additionally, the JPEG compression makes it a bit muddy. Would it be possible to resave the original as a PNG file? --Brion
 * New and improved graphs now in place. The Anome 09:35 Sep 21, 2002 (UTC)
 * time to update graphs? --Lightning 19:42 Oct 19, 2002 (UTC)

anyone notice something funky about the following:


 * 2002 Oct 20,  66372,     mpacIII
 * 2002 Oct 19,  61128,     mpacIII
 * 2002 Oct 17,  54339,     mpacIII

Lightning 05:14 Oct 21, 2002 (UTC)

Not really - User:Ram-Man's bot was pumping in about 10-20 US cities a minute for much of the day. --mav


 * Yep, he's ramming them in there. I wonder if that's why how he selected his user name? ;-) --Ed Poor


 * Alas, it comes from my real name. -- Ram-Man

I made a new graph. uhm.. I'll try to keep it updated.. i'm sorry if it doesnt look great, but im just pumping it out with a spreadsheet program. --Lightning 05:38 Oct 23, 2002 (UTC)


 * Looks good to me. --mav


 * Are you going to change the graph below (rate of increase) as well? -- WillSmith (Malaysia)


 * I want to wait till Ram Man is done, because the bot massively inflates this number, once the bot is done running, ill take a week's worth of samples and do it. --Lightning 19:49 Oct 24, 2002 (UTC)


 * Fire away! I've for the most part finished it up (at least the large scale automation anyway!) -- Ram-Man

how about a graph showing the amount of data hosted by wikipedia and the average size per page? Lir 05:56 Oct 23, 2002 (UTC)


 * No access to the db, so i can't make sql queries to get these numbers.. --Lightning 19:49 Oct 24, 2002 (UTC) It would be interesting though to get the mean number of content bytes per article and wait like a year and take it again for comparison purpouses. --Lightning 19:49 Oct 24, 2002 (UTC)

The new graph has an x-axis which is not evenly spreaded in time. The slope is now dependant on the number of samples in any given period. This is a bit confusing in my opinion. Erik Zachte


 * The best thing to do is to use an x/y scatter-plot setting for the graph tool. This will allow for the non-uniform sampling, which will otherwise distort the graph.

Illustration:

Image:Article_growth_chart.png

The graph above does not allow for the non-uniform sampling in time: compare with:

Image:Wikipedia article count graph to Oct 04 2002.png


 * i'll look into it --Lightning 19:49 Oct 24, 2002 (UTC)

It is possible to give the growth of Wikipedia without including the Ram-Man bot additions? The bot is adding around 1,000 articles a day (and seems to have around 30,000 in total to add) and it would be interesting to see the rate of growth without this distortion.

---

Note: the article count feature is currently disabled, with the article counter stuck at 90679. -- 15 November 2002

Note: The article counter is incrementing again. -- 18 November 2002

--

Is the article counter fixed now? If not, there is very little point in continuing to update this data by hand. If much of the past mpacIII data is questionable, perhaps someone would be so kind as to regenerate the mpacIII data from the database dumps? The Anome


 * The count is still calculated stupidly (comma count?!!) but it's now fixed, yes. I see absolutely zero purpose in regenerating older counts, since A) the number is pure hype with limited value, B) we only have a limited number of dumps kept on hand at ~1 month intervals (keeping the old ones around at a higher rate more would waste A LOT of disk space), and C) the margin of error from the drift is probably smaller than the margin of error of our crappy count system (comma count?!!) except for that one >100000 entry. --Brion 20:33 Dec 17, 2002 (UTC)


 * Speaking of which, what happened to the idea to redefine the count? I still think we shouldn't count anything below 500 bytes as an article. That along with the dreaded comma count, IMO, would give a more accurate measure of our true progress (~80,000 articles). My only concern for this plan though is what it might do to the moral of the non-English Wikis. Maybe we could have a that would display the more conservative article count (we could even up the ante by excluding anything below 1 kilobyte).  --mav

If we are going to make graphs, and then analyze them, shouldn't we take RamBot's contributions into account? --Uncle Ed

-

The new article count system is now active on the English Wikipedia. (And the counter is no longer stuck. ;) If desired, I can go back through my backup dumps and run counts of the new algorithm on older databases for comparison purposes. --Brion 06:02 25 May 2003 (UTC)


 * Is it also up for the Dutch wiki? I find differences between the count on our main page (6901) and the count obtained by

SELECT count(*) FROM cur WHERE cur_namespace=0 AND cur_is_redirect=0 AND cur_text LIKE '%[[%'

(8427) TeunSpaans 12:41 18 Jun 2003 (UTC) -

I have replaced Fonzy's analysis of growth with a new treatment, which produces a new growth model that tries to eliminate the effects of outliers, data dumps, recalibration, and slow-downs. It's a remarkably good (conicidental?) fit for the past, but who knows about the future? -- The Anome 16:58 11 Jun 2003 (UTC)

An HTML idiot writes: is there any way either that this page can be made a sensible width, or that I can view it (IE6) as a screen-width page? jimfbleak 17:30 11 Jun 2003 (UTC) KAKA

Update desperately needed!
This page hasn't been edited since April except to correct a spelling error, and the page linked to (here) hasn't been updated since May!!!!


 * I have made a few graphs, and scripts to update them. I am not sure how accurate they are (I didn't do the database query myself), but it looks reasonable. The details are on my user page. Perhaps this can be used here? Amaurea 14:48, 23 April 2006 (UTC)

Other kinds of growth
There are some other kinds of growth like this here: Image:Vandalism.png. I Have made this diagram to compare the situation with the german wikipedia (Image:Vandalismus.png). --Markus Schweiss 06:41, 9 December 2006 (UTC)

size in GB
Could someone get and add information about how much space the text actually takes up? Or perhaps an estimate of how many printed pages all the text would take? There isn't anything here that really gives me a good idea of how BIG wikipedia is when compared to other information compendiums, which is all I wanted when I came to this page.24.128.152.12 08:06, 19 December 2006 (UTC)greg

Ditto. Not much seems to be going on here, in terms of updates. ALTON  .ıl  07:34, 11 May 2007 (UTC)


 * Nevermind, I think I found it. According to the dump download page, the entire html of Wikipedia weighs in at 8042 MB, or about 7.9 GB. Suprising? ALTON   .ıl  07:38, 11 May 2007 (UTC)

Not including sizes of pictures in gigabytes doesnt make any sense to me. The images of the encyclopedia are just as important as the text information. Also to compare this to a "book" you would have to look at how much space the average wiki page has in images and include that as well. —Preceding unsigned comment added by 68.154.41.177 (talk) 05:04, August 29, 2007 (UTC)


 * The original question was how much space the text takes up. Whether you want to include media files or not in this number completely depends on what you want to do the knowledge. —Kri (talk) 14:06, 23 February 2014 (UTC)

Hmm, aren't the pictures hosted by the Wikimedia Foundation rather than Wikipedia itself? Canterwoodcore (talk) 23:22, 9 March 2011 (UTC)
 * Some are hosted on commons, many are still hosted on en.WP.  Oreo Priest  talk 13:11, 10 March 2011 (UTC)

Im sure the entire point of asking this question is to wonder: "If I wanted to download ALL of wikipedia's most latest content, as it appears, for my own personal offline viewing pleasure, how much of my hard drive would it take?" That is the question in my mind when I came to this page. I dont care about page update history or discussion pages, just the meaningful article content, to include all pictures, media, LaTeX/SVG imagery, sounds, etc. -CogitoErgoCogitoSum 75.172.58.58 (talk) 03:03, 10 December 2012 (UTC)


 * Why would you want to do that, just for your own personal offline viewing pleasure? It sounds more reasonable to me that someone would make a program of some sort that would use the contents of Wikipedia to iterate on, maybe for a machine learning purposes. And in that case it is not obvious that you want to include media files. —Kri (talk) 14:13, 23 February 2014 (UTC)


 * You would do that in order to guarantee access to important articles concerning history, math, physics, chemistry, and so on. A great deal of the information on subjects of academic interest can be explained through the media content in Wikipedia, and in some cases can only be explained through some form of media different from text. Specifically, I want to download Wikipedia to my Raspberry Pi to have a highly portable, nearly universal reference guide to knowledge. Yes, it will have errors, and yes, there are better solutions, but it's still something I want to do, and it would aid me greatly if I knew the total size in GB before I started. — Preceding unsigned comment added by 50.182.238.147 (talk) 12:22, 5 November 2014 (UTC)
 * Check Modelling Wikipedia's growth. Apparently, it's about 10 GB of text.  Oreo Priest  talk 07:30, 6 November 2014 (UTC)


 * I started adding some of this data to the article. Help needed - it's complex!  ★NealMcB★ (talk) 20:22, 1 July 2015 (UTC)

ISO 8601 dates
Should use ISO 8691 8601 dates (eg. 2007-05-30) for all Wikipedia stuff, including graphs and charts. In this age of international commerce and communication, it seems foolish to use ambiguous dates, especially since Wikipedia English is edited and read by a large minority of English speakers outside the US. Anthony717 19:10, 30 May 2007 (UTC)


 * I agree that the date format is lacking. If this was in the article space I would have just fixed it by wiki-linking the dates and letting the servers format it on the fly; and maybe that should be done here. ISO-8601 would be better than what is here now, but ISO-8691 is for the benefit of computers, is it not? I think most people would find "30 May 2007" more humanistic. --Charles Gaudette 09:18, 31 May 2007 (UTC)


 * I'll change the date format in the "Wikipedia growth" plots during the next update. The plot in "Comparisons with other Wikipedias" was grabbed from Commons, so I'm not sure where the source data for it is.  Maybe I can get the data from http://stats.wikimedia.org/EN/Sitemap.htm and generate a new plot (later, when I have more free time).  --Seattle Skier (talk) 09:01, 2 June 2007 (UTC)

DONE. The "Wikipedia growth" plots have been updated. I also found a couple of hours to combine the data from http://stats.wikimedia.org/EN/TablesArticlesTotal.htm with the data from this page to generate two new plots in "Comparisons with other Wikipedias". --Seattle Skier (talk) 01:26, 4 June 2007 (UTC)

Scanty information
This page doesn't answer some of the obvious questions (as noted above) like how many gigabytes is it. Another question that comes to mind, how many gigabyes are the images used in articles (hosted here or in the commons), since they are definitely part of wikipedia as well. Also, how many servers are there currently, how many watts of electricity do they use, how much total RAM - all these are interesting questions. -- fourdee ᛇᚹᛟ 11:09, 7 August 2007 (UTC)

Reliable source
Wikipedia: proving the Web's freedom of space and How much paper would it take to print out Wikipedia? cite Nikola Smolenski, a contributor to this article, as a reliable source for the amount paper it would take to print out Wikipedia. --  Jreferee  (Talk) 15:05, 29 August 2007 (UTC)

The question about the data of statistics
I am a graduate school student of MBA from Taiwan. I and my advisor, Professor Chu, are interested in the diffusion phenomenon of the famous wikipedia website very much. I and my advisor and have some questions about the diffusion data from this URL below,

http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

we hope we could apply the formal diffusion model from management science to figure out the successful story of Wikipedia.

At the bottom of this website, there is a data set, describing the shape of Wikipedia growth in the domain of English. It make me have two questions from this data set. First of all, faced with this data set, I can hardly distinguish the numbers of size is from auto-posting robot, the Rambot, or from the real people. Could you help me to obtain the data which have already disassembled those two different processes of data (edited by program and editing by human being)?

Second, what makes me so confused is that the formation of dates is irregular. I was wondering why the pattern of the data set appears in that way. Is there anything happening inside those irregular data? Could you provide me further story or idea which may help me to figure it out?

Thank you for your response in advance. I hope I can get acquainted with the statistics of Wikipedia which can help us to explore the nature about the diffusion condition of Wikipedia.

Once again, thank you very much.

Best wishes, —Preceding unsigned comment added by Jackiewi (talk • contribs) 12:57, 10 December 2007 (UTC)

People just posts the update of the count of articles when they feel like, it is not a robot who is making that. Keep us updated with the models you are going to use... and use also google. I have seen a couple of good articles studying how wikipedia grows Diego Torquemada (talk) 23:47, 10 December 2007 (UTC)
 * Hello!


 * Hey, I saw your question and tried to come up with a better answer. I think the only way to distinguish human and automated editing is checking all editors entries in the bot category. You can find comments on unusual growth in some of the Category:Wikipedia statistics articles. Some are also slashdot or similar effects. Good luck. --Ben  T/C 14:57, 18 December 2007 (UTC)
 * See also User:Dragons flight/Log analysis. You wil find there are some graphs of edits of bots/registered users/unregistered users. HenkvD (talk) 18:27, 29 December 2007 (UTC)

Bookshelf graphic
The sign above the silhouette says 813 (current as of Aug. 5, 2008) while the graphic shows 613. Which one is correct? --HJKeats (talk) 11:17, 5 August 2008 (UTC)

Where's the evidence of exponential growth?
This page says there was exponential growth at some point in time. But the percent increase keeps changing from year to year. "Exponential growth" is a precisely defined mathematical concept that means the percent increase over any time period is THE SAME as the percent increase over any other period of the same length of time.

So what is the basis for the assertion that there was ever exponential growth? Michael Hardy (talk) 02:57, 11 January 2009 (UTC)


 * Why not replace the "Wikipedia growth" graphs and associated "Notes" with the more up-to-date graphs and text at Modelling Wikipedia's growth which show logistic growth. The stuff being replaced could be preserved by being copied to the other article.  (Also, the second paragraph of the lead section is not useful; it could be removed and the new Wikipedia statistics summary link moved to the See also section.) JonH (talk) 09:51, 11 January 2009 (UTC)


 * In the first years Wikipedia has grown faster as linear. At that time it was thought to be exponential growth. I noticed the percent increase kept going down from year to year, and proposed the logistic growth on 28 February 2007. Others had hinted on the logistic growth before, but nobody modelled it. Only recently it is more or less accepted that growth is logistic (or linear, certainly not exponentional). Feel free to change the pages accordingly. HenkvD (talk) 20:09, 12 January 2009 (UTC)

Logistic growth
I'm extremely sceptical about the claim of logistic growth. Just because the growth is now sub-exponential does not mean it's logistic. If the project was in an exponential regime, and is now in a more or less linear regime, that does not mean its growth is going to fall off to zero. Perhaps the exponential regime was as Wikipedia was being discovered, and now we're in a linear regime, where everyone knows about it and pretty much anyone who wants to edit already knows about it and is editing? In any case, the logistic growth claims are an overzealous projection from limited data. - Oreo Priest  talk 03:07, 7 July 2009 (UTC)
 * Feel free to propose a sub-exponential model. The logistic model was proposed 2 years ago, when the growth was still bigger each month. We now see the growth has peaked and is even getting smaller, maybe not exactly as the bell-curve, maybe more like the Extended-growth model. As far as I can see it is not a linear regime either. Will the growth fall to zero: Nobody knows for sure. It is even possible, I hope not, that the growth will be less than zero if more and more stubs are deleted in a final stable version. HenkvD (talk) 11:46, 8 July 2009 (UTC)
 * My concern is that a fancy toolbox of curve-fitting tools is being used to extrapolate an entire regime there's no evidence of. Real-world factors are also being ignored; the only way we could have zero growth is if editors lose interest or if we run out of things to write about; I find both of those quite implausible. User:Piotrus estimated the maximum size of WP to be around 400 million articles (he likely has at least the order of magnitude right), so the latter is out, and tell me, do you see the WP community just giving up in a year or so? I certainly don't. Perhaps the decrease in growth rate is due to having picked the low hanging fruit in terms of article creation, but that doesn't mean there's none left.
 * In any case, I don't think logistic growth should be presented as fact on this page. the extended growth model seems much more believable and conservative, in terms of not extrapolating trends that haven't been seen yet. - Oreo Priest  talk 14:56, 8 July 2009 (UTC)
 * Two years ago the logistic curve seemed the most simple model that could explain the non-logistical growth. I feel it still is a reasonable model. I like the idea of Extended-growth model. As model it still assumes growth will fall to zero. English Wikipedia could grow much more if wikipedians would translate from other languages as User Piotrus calculates, but the fact is that this is not done. If that would be true than all languages could grow to the size of the english wikipedia. This is not happening because 1) the language skills are missing 2) the interest in foreign cities, people, history etc is not as big as for your local cities etc and 3) translating is not as satisfying as writing a new article. My personal feeling is that a small steady growth will keep the wikipedia up-to-date. HenkvD (talk) 21:28, 11 July 2009 (UTC)

JonH (talk) 08:51, 12 July 2009 (UTC)
 * Back in January 2009, I thought that the old graph with a logarithmic scale (File:EnglishWikipediaArticleCountGraphs.png) no longer showed clearly how the number of articles was growing. So I replaced it with two graphs by HenkvD that I found at WP:GROWTH.  These included the logistic curves, so I had to include an explanation of what they mean.
 * I think this page should mainly show the growth up to the present, rather than future predictions. But it is helpful to describe the logistic model, as it provides a rational explanation for the observed decline in the rate of growth since 2006.  The page does say that the growth only "approximately follows a logistic growth model".
 * I have today changed the captions so that they no longer refer to "extrapolations". In my view the logistic curves are just provided for comparison.  (The caption for the first graph also said the thick line is "smoothed to match thin model lines", but that seems unlikely to me, so I removed the comment.  Perhaps HenvD can confirm this.)
 * I suggest keeping the logistic curves for comparison, at least while the growth rate remains between the curves for 3 and 4 million articles.
 * The smoothing relates to the Rambot action of 2002. The growth at that month was enormous. I smoothed that especially for the related growth charts. EnwikipediaArt.PNG. A version without logistic comparison and without smoothing Rambot of 2002 is available as well. HenkvD (talk) 11:41, 12 July 2009 (UTC)

Linear growth
The assumption that "more content also leads to less potential content" is ridiculous. There's no reason to assume that potential content will ever run out. There will always be more obscure events and biographies to add, no matter what. News, new releases of products, entertainment, etc. will all continue as usual in the future. The number of article-worthy events per year is also going to remain constant. Even the graph does not visually follow the logarithmic fit placed on it. Growth of wikipedia was polynomial at first due to its newness and the number of eager new contributors but the base of contributors is going to remain roughly constant as will their rate of contribution in the future, and from these facts and from the graph itself it becomes apparent that growth will remain linear. Barring some massive policy changes, by 2013 there will be roughly 5 million wikipedia english-language articles and my comment will be here having predicted it. Perhaps some section should detailing linear or other growth predictions should be added. —Preceding unsigned comment added by 69.253.221.174 (talk) 12:55, 23 October 2009 (UTC)

Decreasing talk page growth = increasing consensus?
When I read various talk pages I get the impresson that there was much more activity in 2006 and 2007 than in 2009. Is there any statistics about that? If so I think that there is more and more consensus about the article content. Åkebråke (talk) 14:32, 29 December 2009 (UTC)

GB!
What about the real, total, complete size in GB? I seriously doubt that it is slightly higher than 8Gb. - --189.216.65.4 (talk) 17:53, 7 November 2010 (UTC)


 * So anyone gonna answer this? 8GB's is a lie anyway if not GB how many servers do you run? — Preceding unsigned comment added by 140.198.245.209 (talk) 02:34, 30 November 2012 (UTC)


 * Based on the logic at Size in volumes, the text of the encyclopedia is 14 GB. Keep in mind this doesn't include any pictures, videos, audio files or anything else but text.  Oreo Priest  talk 16:54, 30 November 2012 (UTC)


 * So any estimates on the size in GB (or TB, or whatever) including pictures, videos, and audio files? — Preceding unsigned comment added by 209.203.138.185 (talk) 19:13, 9 October 2013 (UTC)

The total size of Wikipedia and most of its sub wiki's, based on the disk space it occupies when using XOWA (A local software application on a local computer) is approximately 150,761,865,216 bytes, or roughly 140GB as reported by the Windows 7 operating system. This numeric value includes all downloaded and decompressed data dumps for the following: Wikipedia Commons, wikiebooks, wikinews, wikipedia, wikiequote, wikiversity, wikiionary, wikispecies, mediawiki and wikidata. This is not a complete listing as there is obviously site overhead that must be considered as well and other wiki data pages not included in the local version of Wikipedia. This is simply a general size analysis of Wikipedia and its various parts and domains.

The original dump files gathered by XOWA and subsequently decompressed were deleted after the various wiki's setups were completed leaving the value of 140GB being reported by the operating system. Again, this is the size of a local copy of most of Wikipedia, but not all of it. The actual size of the entire Wikipedia database is a subject best to be described by the system administrators of the Wikipedia site. Not included in this value are Meta, Incubator, Wikisource, Wikivoyage and Wikimedia Foundation which in total equals approximately 1691.1 GB of additional compressed data.

Current dump sizes as reported by the XOWA software are as follows, the values below are of compress database files, and not to be confused with the final output value derived from the decompress archives made in reference to the local copy stored on a local computer. AS OF 1-29-2015 - Commons 3.7GB - Wikidata 3.9GB - MediaWiki 60.2MB - Wikispecies 84.1MB - Meta 164.7MB (Not listed in above size value) - Incubator 52.1MB (Not listed in above size value) - Wikimedia Foundation 6.6MB (Not listed in above size value) - Wikipedia 10.7GB - Wikionary 432.8MB - Wikisource 1.4GB (Not listed in above size value) - Wikibooks 122.6MB - Wikiversity 54.9MB - Wikiquote 78.5MB - Wikinews 36.30MB - Wikivoyage 67.7MB (Not listed in above size value)

So in conclusion the full decompressed size of Wikipedia would be close to 150-160GB of physical storage space (English only), this is an approximation and the actual value will vary. This compilation of data values is for the English Wikipedia only and does not include any of the dumps for other languages. A 100% complete and decompressed copy of Wikipedia including all languages, images, and framework would have to be somewhere in the 200GB range, probably more. This however is just an educated guess, Wikipedia is far to complicated to derive a final finite value for the physical disk space it occupies. Not to mention that the database is constantly growing due to users adding data to it daily. (Contributed by Britton Burton)


 * See above for more discussion. ★NealMcB★ (talk) 20:22, 1 July 2015 (UTC)

How big are the Images?
The Size of Wikipedia in Volumes leaves out the images completely. I have tried to calculate how large an area would all the images cover if they where to be printed on a 600dpi printer. ( I guess 600dpi is what you'd print your family snaps at right?) The database stores in a table the hight and width in pixels of every image. If we multiply these for each image and add them all together we get the total number of pixels in all the images. So I got the latest sql dump of the images table for the English wikipedia (the commons one is over 3 gig compressed) and ran this query: select sum(img_width*img_height) from image WHERE img_media_type = "BITMAP";. The result is 676 071 025 703. (That is 676 Gigapixels just for en.wikipedia) Now to find out what area that would cover if printed. The square root of 676071025703 is 822235.383392736. Divide by 600 to get 1370.392305655 inches per side of a printed square. Multiply by 0.0254 to get meters gives 34.807964564.

If all the images on the English wikipedia where printed at 600dpi on a square you would need a square sheet of paper 34.8 meters on each side, or around 1211.59m2.

If someone can double check my calculation to make sure it's correct I'll try and get the same sql query above run on commons (and all other wikis) on the toolserver. Then I'd try to create some fancy graphics to illustrate this (e.g. how much of Belgium would all the commons pictures cover?) --Inkwina 14:02, 4 February 2011 (UTC)

Predicted Gompertz maximum vs. List of encyclopedia topics
It's interesting to note that, as of 2011-04-10, Wikipedia is currently at 3608254 articles, 82.4% of the currently-predicted Gompertz maximum of 4378449, at the same time as List of encyclopedia topics is listed as 79.6% done. -- 188.28.14.237 (talk) 19:04, 10 April 2011 (UTC)
 * I have a feeling that was one factor in deciding on 4378449 as the maximum.Brightgalrs ( /braɪtˈɡæl.ərˌɛs/ )[1] 04:51, 2 April 2012 (UTC)
 * No, the 4378449 is determined by fitting the best Gompertz function. This maximum was predicted in June 2010. HenkvD (talk) 12:48, 2 April 2012 (UTC)
 * Well then 188.28.14.237's observation is pretty amazing. Brightgalrs ( /braɪtˈɡæl.ərˌɛs/ )[1] 04:27, 17 April 2012 (UTC)

A new Gompertz fit
In the process of investigating the recent disparity between the Gompertz curve prediction and the recent data, I've just re-fitted the Gompertz curve using up-to-date data, using only data between 2004 and the present day, on the basis that Rambot activity unduly distorted the activity before that. (Which you can see in quite clearly in the graph at Image:EnwikipediapercgrowthGom.PNG)

Using the same units as in the main article, the new fitted curve has the parameters

a = 4471486 b = -15.344927 c = -0.379785

which gives the graph below:



However, even though this fits better than the previous Gompertz fit, there is still a clearly discernable and growing trend away from the Gompertz curve in favour of continued article creation, starting in roughly mid-2011. -- The Anome (talk) 20:16, 10 June 2012 (UTC)


 * I think it would be best to ignore everything before that large jump at ~November 2002 when doing the fit. By the way, what is the maximum in this new fitted curve? Nevermind... Brightgalrs ( /braɪtˈɡæl.ərˌɛs/ )[1] 00:32, 1 July 2012 (UTC)


 * You might want to take a look at Modelling Wikipedia's growth for my most recent thoughts on this. -- The Anome (talk) 09:19, 1 July 2012 (UTC)
 * Wow! Thanks for that. For all of my Pools predictions I've been using the old Gompertz model to calculate the approximate date - so with your new model 20 million articles can (theoretically) be reached by 2048 instead of ~2120 aka within my lifetime. Pretty exciting stuff. Brightgalrs ( /braɪtˈɡæl.ərˌɛs/ )[1] 10:25, 1 July 2012 (UTC)

Updates needed for the mid-2010s
The page is outdated when it comes to graphs during the mid-2010s. It would be great to have more updated graphs. Thank you. Johnny Au (talk/contributions) 00:18, 1 November 2015 (UTC)

Doubling time
For the annual growth rate table, should we add in doubling time? Johnny Au (talk/contributions) 03:12, 9 December 2015 (UTC)


 * Sure, why not. It's interesting. Sagittarian Milky Way (talk) 03:21, 9 December 2015 (UTC)


 * I have corrected and clarified the calculations. Johnny Au  (talk/contributions) 05:03, 15 December 2015 (UTC)
 * I have added in the calculation for 2016 as well. Johnny Au  (talk/contributions) 00:24, 1 January 2016 (UTC)

"Comparisons with other Wikipedias" unlabeled graph
The graph at the top of the section Comparisons with other Wikipedias has no title and the y-axis is unlabeled. Is it the number of articles on each Wikipedia? The total number of pages? The memory in kB?

OK, I assume it's the number of articles, but this fact is never mentioned. Numbers of articles are finally explicitly mentioned in the last two sentences of the section, but even then it's not clear that this is what was on the graph. Eebster the Great (talk) 07:58, 25 November 2015 (UTC)


 * It needs Cebuano and Waray-Waray, as these two are well in the top ten Wikipedias. Johnny Au  (talk/contributions) 02:18, 1 February 2016 (UTC)

Data set
I made the data set collapsible, especially given how many samples have been collected over the English Wikipedia's history. Johnny Au (talk/contributions) 04:24, 13 February 2016 (UTC)
 * It looks much neater as well. Johnny Au  (talk/contributions) 02:10, 1 May 2016 (UTC)

Visits? Page Views?
On my user page, I use the following wikimarkup...

...to generate this:

''' As of Monday, July 2024,  (UTC), The English Wikipedia has  registered users,  active editors, and  administrators. Together we have made edits, created  pages of all kinds and created  articles. '''

Is there some way I could add something like "number of visits" or "number of page views" to the above? --Guy Macon (talk) 19:38, 17 February 2013 (UTC)


 * (Sound of Crickets...) --Guy Macon (talk) 06:55, 27 February 2013 (UTC)


 * Wow. Three months and no response, not even a "I don't know" or "you stink, you redacted !!!"... --Guy Macon (talk) 07:22, 31 May 2013 (UTC)


 * , it would be very large and rapidly changing number and therefore probably too expensive to compute in real time. https://stats.wikimedia.org/EN/SummaryEN.htm ...so you can take 2.5 years of silence for a "No, sorry." ;-) SageGreenRider (talk) 13:43, 1 November 2015 (UTC)


 * Could we add page views if we updated once a month or even once a year? --Guy Macon (talk) 18:29, 1 November 2015 (UTC)
 * Even then it would be rather cumbersome and we would need consistency regardless. Johnny Au  (talk/contributions) 02:36, 1 March 2016 (UTC)


 * For some reason Wikipedia doesn't use an analytics solution. instead it seems the data is obtained from the server logs. however i see no effort by Wikipedia's tech people to actually consolidate their server report log and run analytics against them. so yes this is cumbersome. (it wouldn't be expensive to run, its just extra work to organise it, and the tech people are clearly busy with other more important stuff). A Guy into Books (talk) 12:51, 18 August 2017 (UTC)

Visits? Page Views?
On my user page, I use the following wikimarkup...

...to generate this:

''' As of Monday, July 2024,  (UTC), The English Wikipedia has  registered users,  active editors, and  administrators. Together we have made edits, created  pages of all kinds and created  articles. '''

Is there some way I could add something like "number of visits" or "number of page views" to the above? --Guy Macon (talk) 19:38, 17 February 2013 (UTC)


 * (Sound of Crickets...) --Guy Macon (talk) 06:55, 27 February 2013 (UTC)


 * Wow. Three months and no response, not even a "I don't know" or "you stink, you redacted !!!"... --Guy Macon (talk) 07:22, 31 May 2013 (UTC)


 * , it would be very large and rapidly changing number and therefore probably too expensive to compute in real time. https://stats.wikimedia.org/EN/SummaryEN.htm ...so you can take 2.5 years of silence for a "No, sorry." ;-) SageGreenRider (talk) 13:43, 1 November 2015 (UTC)


 * Could we add page views if we updated once a month or even once a year? --Guy Macon (talk) 18:29, 1 November 2015 (UTC)
 * Even then it would be rather cumbersome and we would need consistency regardless. Johnny Au  (talk/contributions) 02:36, 1 March 2016 (UTC)


 * For some reason Wikipedia doesn't use an analytics solution. instead it seems the data is obtained from the server logs. however i see no effort by Wikipedia's tech people to actually consolidate their server report log and run analytics against them. so yes this is cumbersome. (it wouldn't be expensive to run, its just extra work to organise it, and the tech people are clearly busy with other more important stuff). A Guy into Books (talk) 12:51, 18 August 2017 (UTC)

Cebuano Wikipedia's extremely rapid growth
For the "Comparisons with other Wikipedias" section, there should be a sub-section dedicated to the Cebuano Wikipedia and other primarily bot-generated Wikipedias such as the Waray Wikipedia (and the Swedish Wikipedia to some extent). All of the Wikipedia comparison charts need to display the Cebuano Wikipedia.

The Cebuano Wikipedia reached 5 million articles and is well on its way to exceed the English Wikipedia before the end of this year.

What do you think? Johnny Au (talk/contributions) 02:17, 10 August 2017 (UTC)
 * I have mentioned this in the lead. Johnny Au  (talk/contributions) 02:11, 1 October 2017 (UTC)
 * Looks like Cebuano Wikipedia's rapid growth practically halted for perhaps quality control. Johnny Au  (talk/contributions) 04:32, 1 December 2017 (UTC)

Size of Content vs. Meta-Discussion on Wikipedia?
I'm having trouble finding data regarding the size and growth (preferably in word count) of the actual content pages of English Wikipedia, in comparison to other sorts of pages, especially guideline and talk pages. Is that data available somewhere, and if so, should it be included on this page? Aquaticonions (talk) 18:27, 30 November 2020 (UTC)
 * The Number of words section of this page covers the total number of words in all content pages. Hope this helps! AmericanLemming (talk) 00:24, 1 December 2020 (UTC)
 * Gotcha. Is there separate data that includes the word count on non-content pages then? Aquaticonions (talk) 17:16, 2 December 2020 (UTC)

What's the size of Wikipedia in 2018, after all?
Hi people! How are you? Thanks for this article and its important contribuition for knowledge discovery in datasets. I just couldn't undestand at all whats 2018's size of wikipedia in gigabytes. Could anybody help me, please? Thank you very much, Lu Brito (talk) 22:14, 2 September 2018 (UTC)
 * I have done some analysis. You can take a look at the Excel spreadsheet I made. See for yourself. It constitutes original research though. Johnny Au  (talk/contributions) 00:33, 1 December 2019 (UTC)
 * Thank you so much. I appreciate it :) Lu Brito (talk)
 * You are welcome. Johnny Au  (talk/contributions) 16:55, 22 November 2020 (UTC)
 * The spreadsheet will no longer be updated as the content is original research and dumps.wikimedia.org/enwiki and its pages archived by archive.org have the database sizes anyways, albeit rounded to the nearest tenth rather than the nearest hundredth. There is no need to be very precise. Johnny Au  (talk/contributions) 04:24, 5 December 2020 (UTC)

Word count
This article says the English Wikipedia contains over 3.9 billion words, I'd like to know which parts of an article count toward this number (do references, lists or tables count?). Tools like User:Caorongjin/wordcount and Prosesize don't count the words inside certain tables, so the contents of the episode list section in One Piece (season 1) are not counted (plenty of prose gets skipped because it's inside a table), while I assume they would be counted here. How is the counting done and which words get counted? — Preceding unsigned comment added by Jasper Norbert (talk • contribs) 23:50, 31 May 2021 (UTC)
 * That is a good question. I don't really know as how words are counted isn't stated. Johnny Au  (talk/contributions) 00:20, 1 November 2021 (UTC)

wp:size under discussion
The redirect page wp:size is currently discussed at WP:RFD. --George Ho (talk) 10:13, 23 December 2021 (UTC)

Quantity and quality
This page talks about the size of Wikipedia, over 6 million pages but it doesn't mention Quality. It might be worthwhile to include a section that gives some space to also discuss 'quality. It could paraphrase and point to Featured_articles which says There are 6,092 featured articles out of 6,498,196 articles on the English Wikipedia (and also Good articles).

I had expected maybe there would be a link in the See also section, and since I didn't see any I decided to make this small suggestion. -- 109.76.199.51 (talk) 04:58, 14 May 2022 (UTC)
 * Thanks. Johnny Au (talk/contributions) 00:12, 1 July 2022 (UTC)

Talk Page
To be honest, this page needs more information about how many articles are created in Wikipedia. This new section should include total comparison since Wikipedia is the largest encyclopedia with over 6.5 million articles. Other edit is adding new pictures as well. More pictures should be added because I wanted to improve this article and I tend to agree with other editors on any contribution. What do you guys and others think? --76.20.110.116 (talk) 19:15, 29 May 2022 (UTC)
 * I agree. Many of the images are outdated. Johnny Au (talk/contributions) 00:27, 1 June 2022 (UTC)
 * That is exact out there, we should upload new images and add to this page out there. 76.20.110.116 (talk) 21:46, 2 June 2022 (UTC)
 * Thanks. Johnny Au (talk/contributions) 00:12, 1 July 2022 (UTC)

arts dates are wrong
it says 1990-

it should be "pre-1990" or similar. please fix ASAP Electricmaster (talk) 20:37, 23 November 2022 (UTC)

Only contains file estimates without media
The filesizes in the article are all without the media like images, but really it should also contain filesizes that include the media and images. 80.189.100.37 (talk) 22:31, 10 January 2023 (UTC)

Main space articles by size
I been asked to add this data that I collected for a separate purpose, but I haven't yet determined how best to do it. The table of this data is as follows; it excludes redirects, as well as almost all dab and list pages.

BilledMammal (talk) 23:40, 21 March 2023 (UTC)

Yearly growth rate
The 71% growth in word count from 2010 to 2018 is averaged to 8.9% in the table, but this is misleading. Taking compounding into account, the yearly growth rate is about 6.9% (1.71^(1/8) ≈ 1.069). Any advice how to make the change so the explanation is brief and easy to understand MrFennicus (talk) 18:07, 15 June 2023 (UTC)