User talk:Dantheox/Stub percentages

Drivers for increasing stub %
Editing behavior tends to fall into one of two camps (very loosely): Mergist vs. Reductionist. The Mergist view is that most stubs can be merged into larger articles thereby providing more context for information and decreasing fragmentation of information, while the Reductionist view is that it works against the power of the wiki to merge stubs and fragmentation facilitates flexibility. In my editing, I've encountered very vocal resistance to the combining of related stubs into larger articles as opposed to jumping right to expansion of each stub individually. I've the feeling that the environment generally favors the Reductionist approach, though there seems to be a boundary of article size (can't say what this is) beyond which an article becomes an "attractor", rather like it having a "gravitational pull" that tends to sweep its topic area. That's a very fuzzy articulation of a gut feeling I've come to have over time and the only experiment that comes to mind that might address this would be to look at something like article length vs. topic area vs. stub count in topic area, anticipating that the resulting curve(s) or surface(s) might show a shoulder or deflection point. User:Ceyockey ( talk to me ) 12:11, 1 March 2006 (UTC)

I like your project, and agree with your methodology; and I'd love to see that million-article update. I think one additional factor is that stubs are increasingly labeled as such. The familiarity of stub-tags, the visibility of the stub-sorting project, improvements in Recent Changes and New Page patrol, and a tendency for newer editors (whose numbers are also increasing) to do drive-by tag-slapping, rather than substantial fixing, have led to more stub-tagging, may contribute somewhat to the increase in percentage, comparable to the spikes in percentages in incidence of autism and Asperger's syndrome, as those diagnoses became more familiar to doctors. Probably not enough to make a huge difference, but worth mentioning.


 * I think that both of these points are entirely correct. I'll look into separating out old articles that are relabeled as stubs and new articles that are marked as stubs (within the first day or week, perhaps). --Dantheox 04:22, 2 March 2006 (UTC)

Stub awarness

 * Disclaimer: I wrote that before reading all of the comments above.
 * Have you considered the stub awarness and usage? I.e. once there were no stubs, and when they were created few people knew about them. As more people begun to use them, more articles were tagged with stub tags. Thus it is possible the stub percentage has always been stable, it's just our knowledge of them (usage of stub tags) is growing. Perhaps analysis of articles by size would be useful here?--Piotr Konieczny aka Prokonsul Piotrus Talk 06:50, 5 March 2006 (UTC)

Article size statistics
Article size statistics for the up to December 2005 are available here - the percentages of articles greater than 500 bytes and 2000 bytes is shown in columns J and K. (Note that the data from before March 2002 is not reliable, and so is best ignored for this purpose).

The conclusion from this is that the average size of articles has been increasing, slowly but surely - in March 2002 63% of articles were greater than 500 bytes. This increased at a slow, steady rate to 74% in December 2005. You get similar conclusions if you look at articles over 2000 bytes. So stubs on Wikipedia, measured by article size, have been getting rarer over time as a proportion of articles.

The stub percentages measured according to what is tagged show a very different story. Over time on Wikipedia, people have been tagging longer and longer articles as stubs. When the stub tag was first introduced, it tended to be applied to very small, one sentence articles with virtually no useful content, saying things like "Paris is the capital of France", whereas now it is commonly applied to articles with three or four paragraphs.

The fact that Wikipedia articles are getting longer on average is a surprising result in a way - it surprised me when I first looked at similar stats a few years ago. It's easy to get the impression that Wikipedia is getting flooded with lots of short, pointless articles, and it seemed that way in the early days of the project as well. In early 2002 Wikipedia was being deluged by hundreds of sub-stubs of Simpsons and Star Trek characters. At the time, there lot of fuss and worry on the mailing lists about how too many articles were being created, and how Wikipedia would become a mess of mini-articles with no useful information, that noone would ever want to improve. But time and the statistics have always shown these worries to be unfounded so far - the average length (and, in my view, the quality) of Wikipedia has been slowly improving despite the constant barrage of short new articles.

Enchanter 21:53, 10 May 2006 (UTC)


 * Note that there isn't a 1.000 correlation between length and stubworthiness. Quite a few of the articles now dubbed stubs have one sentence followed by a list, or one sentence followed by an infobox. Until recently userboxes were only added to fairly lengthy articles whereas now there are even cases of articles which are only infoboxes. This could at least in part explain the reason why stub length appears to be increasing. Grutness...wha?  05:16, 11 May 2006 (UTC)

credulity in re "stubs"
Are these data based on an assumption that articles that say "This...is a stub" are in fact stubs? What happens is that someone puts a "stub" notice on an article and then it keeps growing, and when it's half as long as War and Peace and is the basis of graduate study in its topic, it still says it's a stub. One should often (maybe even usually) disbelieve articles' self-reported "stub" status. Michael Hardy 19:43, 13 December 2006 (UTC)
 * This is true, of course - which is why so many stub articles are regularly checked, to see whether such stub templates should be removed. Thankfully, with close to 400 members, the stub sortin wikiproject can get round qute a large proportion of the articles marked stub fairly regularly, although some still do slip through. If you notice any articles that are well beyond what can be considered stub status, please remove the templates (but also note the point above - anything with one sentence of explanation followed by screeds of list or infobox is still only really a once-sentence article and therefore still a stub!) Grutness...wha?  04:04, 14 December 2006 (UTC)