User:Amaurea

I spend most of my time on wikipedia reading articles, following links from one article to the next, especially in articles about science and technology. I also read articles about games and entertainment, and most of my edits may be in that category. I have made a few major edits, but most of my edits are small, like small corrections and vandalism reverts.

I am fascinated with how the wiki process has built Wikipedia, and seeing an open, collaborative and altruistic project surpass existing encyclopedias in just a few years has given me back some of my faith in humanity (which I tend to lose gradually as I read about how big industry again and again lobbies through new laws taking away people's rights and locking down our cultural heritage). I hate seeing limitless resources artificially being made scarce through copyright and patents, and seeing the elegant distribution system the internet has created, which allows anyone to publish anything at low cost, be hampered by this. Artificial scarcity is, much like advertisement, a waste of resources.

On stable pages
Recently the concept of having stable versions of Wikipedia's pages has been gaining foothold rapidly. Most supporters of this idea portray this as a rather small change, which would help make Wikipedia more reliable and help it be taken more seriously in academic circles. It is understandable that this idea is tempting, since vandalism and decay of article has gotten a lot of focus recently, but I do not think that this is the easy, safe change many seem to think it is.

Wikipedia was started as an alternative to the stagnating Nupedia, which had the same goal as Wikipedia - to make a free encyclopedia - but tried to do this through the traditional means of experts and peer review. Nupedia failed because there were few editors, no fresh supply of new editors, and difficult to make changes. Wikipedia's wiki model changed that by making every reader an editor and making it easy for them to edit, and after that followed an exponential increase in the number of articles, which has lasted to this day. One explanation for this could be as follows: For any encyclopedia it is reasonable to assume that the number or readers is proportional to the number and quality of articles. By making every reader an editor, Wikipedia added a proportionality between the number of readers and the article creation rate, that is: The rate of article creation became proportional to the number of articles, a recipie for exponential runaway growth.

By marking a version of an article as stable, and presenting that version to normal visitors, we are breaking down the coupling between the number of readers and the number of editors. The whole point of a wiki, and the key behind Wikipedia's incredible growth, is that every reader is an editor, and in light of that it isn't a good idea to create seperate views of an article for readers and editors. Any reader reading the stable version instead of the current version will be one less potential editor to improve the current version. One could hope that people who find faults in the stable version would go to the draft version to implement improvements there, but simply saying that a version is stable will discourage edits, and people who still want to make edits will be further discouraged by those edits not being seen by the main public, but hidden away in some draft version of the article. Thus, this will discourage positive edits for the same reason it will discourage vandalism: It becomes slightly harder to edit, and more importantly, the results aren't immediatly visible on the main version of the article (the one most people read).

Regarding vandalism and bad pages, the wiki answer to these is that we have lots of people to fix those problems for the same reason the poblems are there. There will be more vandalism the bigger Wikipedia grows, but so will the number of people who can spot and fix that vandalism, for the same reason. This is inherently scalable, so there should be no need for any change the way we deal with this just because the wiki has reached a given size. Wikipedia is by no means finished. There are still millions of articles waiting to be created, fleshed out and polished, and many already existing articles that need to be improved. We should therefore continue to make our readers edit our articles. Not because the wiki process is sacred in itself, but because it has proven to be the fastest way to create an encyclopedia.

(The quality isn't a big problem according to some sources, such as Nature)

Automatic analysis of Wikipedia growth
I have made a small ruby and gnuplot script to automatically fit a set of functions to the size of Wikipedia given a file containing all page creations (or better: the time when an article became big enough to count as an article). The ruby script partition.rb partitions the dates into a given number of boxes, and counts how many fall into each. The gnuplot script expect to find the results in result.txt, and produces four images showing the growth together with several possible functions describing it. Since gnuplot needs a set of hints to fit correctly, these are provided in wikivar_f.txt, wikivar_g.txt and wikivar_h.txt. The result looks like this: wiki.png and wiki_log.png.

Updating this graph is as simple as producing a new dump of article creation dates, and running "ruby partition.rb dump.txt 1000; gnuplot wikifit.gp". However, the ruby step is very slow. I guess it would have been better to make it in something faster than ruby, and since this was one of my first attempts at writing something in ruby, it can probably be made to run faster.

Currently 3 functions are tested: exponential, logistic and 3rd degree polynomial. The best fits are exponential and logistic, which are almost identical, and fit quite nicely in the period 700+ days after the wiki's creation. Before that point the growth is faster, but very irregular. The best fit exponential function is: size(days) = 21976.1*exp(0.00208562*days), which predicts that Wikipedia should reach 2M articles 2163 days after the first article in the dump (at 20010206182124 in the dump used), or 20070109182124.