Wikipedia talk:Modelling Wikipedia extended growth

No equation actually fits
Where's the equation for this model? It looks to me like it's just been eye-balled, and doesn't seem to have been well justified in the article.- (User) Wolfkeeper (Talk) 20:59, 1 November 2009 (UTC)


 * The model was developed as a graphical curve to fit the overall pattern of the data, which does not follow a simple mathematical model because many batches of new articles are added by wiki-bot and short-term groups, rather than as "random" additions by the general public. Hence, there is no simple equation which could fit the actual data, which fluctuates wildly when robotic bot-programs are triggered to load numerous new articles, such as for numerous protein-sequence articles. There is no simple mathematical "process generator" to simulate new-article growth. A detailed operational model would not be an equation, but rather a logical, procedural computerized model. However, the growth impact of articles from the general public has been much greater, than the short-term group efforts, so that the overall pattern appears to be a somewhat linear decline in the growth rate for new articles, averaged for several months, or 3 future years, at a time. Perhaps a rough equation would reduce the new-article growth by 11% each year, with the understanding that the decline slows further in each June/July but rises in August, each year, probably tied to school vacations in the Northern Hemisphere. Bear in mind that if a massive bot were triggered to load 700,000 new articles from a "Who's Who in Science" then the new-article rate would soar for months, and always appear as an anomaly, as an upward bump, in the declining overall curve during the next 20-65 years. -Wikid77 (talk) 08:00, 1 February 2010 (UTC)

A question for future research
So has anyone compared the ratio of stubs:non-stub articles? If the ratio since the 2006 peak of new articles has been in favor of non-stub articles, that would support the "low-hanging fruit" hypothesis of declining new article creation -- viz., it is easier to improve an existing article than to create a new one. But if the ratio has remained roughly the same before, during, & since the 2006 peak, then the cause of the new article fall-off needs to be found elsewhere. (And I hope that this decline is due to the "low-hanging fruit" hypothesis, rather than, say, increased barriers to new article creation.) -- llywrch (talk) 18:57, 15 March 2011 (UTC)
 * I have begun scanning through the recent Special:NewPages, to confirm that many new articles are bio pages of lesser-known athletes, authors, artists, actors, musicians (etc.), hoping to meet the WP:Notability requirements. It seems easy to pass WP:ATHLETE (kick a ball and you're in!?!). Meanwhile, keep noticing all the red-links still in various articles everyday: those red-links are only slowly getting changed into articles, and hence, many mountain names, lakes, ships, books, songs, museum artefacts, the Eureka Diamond, string grammar (etc.) are still likely to become articles in the future (hence I think +3 million more). -Wikid77 23:31, 17 March 2011 (UTC)

Drop in English-WP new users after March 2007


I am repeating this topic here, as to why enwiki user levels slowed after March-May 2007, to allow other enWP users to read and reply with any explanations they might offer. The new-articles graph (at right) shows a similar slowing in the addition of new articles, as though the major factor(s) which generated new articles had also been thwarted in early 2007. Some analysts have noted how new users often create articles about musicians, artists or authors they know (sometimes a current girlfriend, or themselves), to expand the bio pages. Meanwhile, I feel strongly that the drop (in 2007) was mainly due to the notorious banning of Wikipedia in U.S. schools and colleges, as demanded by more academic officials beginning in February 2007. See the following essay about those bans, with 19 news articles:
 * "WP:Schools and colleges banned WP in 2007-2008"

Other bans were suggested in England. Those early bans, coupled with the typical 3-month school vacations (June-August) seem to be what thwarted growth of the English Wikipedia. Recall how some similar reductions occurred in the German Wikipedia in early 2007 (which had the similar 50% of users younger than 22 years) despite deWP's faster initial growth of gaining new users, while the French Wikipedia graph showed only a steady, seasonal variation in adding new users, with no sign of bans in France, to deter new users from joining French WP. Because the French WP shows no signs (yet) of a dramatic decline in new users, the French statistics can be used as a control group, to exclude the effect of updates for MediaWiki release 1.09 (etc.), as not having a strong impact to deter new users. Obviously any large colleges banning Wikipedia use, among 2,000 to 50,000 students per college, would cause a massive decline in user access during 2007-2008 and beyond. -Wikid77 23:31, 17 March 2011 (UTC)


 * I've looked at your list of reports about universities and schools banning WP and it confirms my quick google when you made the suggestion first. There was ONE school (Middlebury) that banned it in Jan 2007, followed by 2 universities in March (Penn and UCLA) and a single prof at Syracuse in April. The "ban" in Pennsylvania schools in your reports in November amounts to a teacher putting up "Just say no" messages beside the terminals as far as I can see. Where is the evidence that a "Pennsylvania school district" has banned it? This blog says much the same.
 * That's not a strong ban - indeed it might even make more kids aware of WP than it discourages using it. And in particular it says nothing about actually participating/editing. The whole discussion is about using and citing it. There are some indications that teachers were encouraging pupils to update particular articles as projects - and it's conceivable that they all suddenly stopped doing it as a result of Ted Stevens' proposed Senate Bill or other pressure (did it ever get voted on?). If so, I'd like to see the evidence.
 * My observation on the discussion page was that in a single month (the peak was March 2007) the number of new active WPians changed from increasing by 2,100/month to decreasing by 1,400 a month. It's a huge change and even if 3/4 of Wiki editors at the time were schoolkids, I'd suggest that it would take at least one or two state-wide bans to produce that sort of effect, and there's no evidence of it.
 * I haven't done the math yet but I guess the effect might show up by looking at the change in the proportion of editors who achieve "active" status - ie. 5 or 10 edits. If the 1 edit numbers don't show the same abrupt change we can know it was something internal to the wiki process rather than things going on outside. I've downloaded some of the figures but am not sure I have enough to check it out yet.
 * About your graph: what makes you put the asymptotic tail in growth of article numbers at zero rather than some other number? Why not 10,000 or 20,000? Chris55 (talk) 01:29, 18 March 2011 (UTC)

Here are the graphs I suggested. First a very simple graph showing the total number of people who have created a user id and make one edit and those who persist till they've done 5:

So there were plenty of new Wikipedians after March 2007 - the problem is that the number of active WPs starts to decline. Does this simply mean that more people are leaving than joining or that fewer people became active (ie completed 5 edits) even though they've joined. There certainly was an abrupt change as shown by the second graph.

In words, up to March 2007 about 20% more newcomers became "active" in completing their first 5 edits than old active editors dropped out. After that date the process goes into reverse. Unfortunately this graph doesn't differentiate between old active editors who've stopped and new editors who haven't become active and though I suspect the latter I can't prove it. There's obviously an amount of noise in the graph because people take a different amount of time to become active or to quit, and take holidays. They are based on monthly totals, so much but not all of the noise is absorbed.

To get a definite answer someone's going to have to frame a new query into the raw records. If you can put me in contact with the people who are close to this I'll liaise with them. Chris55 (talk) 01:03, 20 March 2011 (UTC)