User:Squidonius/who watches the watchmen

Quis custodiet ipsos custodes?

This is an analysis of the wikipedia project MCB and of the pages under it's jurisdiction.

WP:MCB itself
The portal is where controversial edits and large scale projects are discussed. The following graph shows the number of edits in the four WP:MCB talk pages. Note: The graph has been subjected to SMA-5 (made in a hurry, hence the odd 5 given that 12 months make a year). Basically, WP:MCB's hayday was in 2006 when Help and Announcements were split from Discussion and up to last year it was gathering strength, with a nasty dip between Sept-Dec 08, but now it is drammatically loosing edits. Hypothetical explanations:
 * Incorrect analysis
 * The gnomish hypothesis: switch from large scale coordinated edits to fine-tuning edits, which happen silently.
 * Decrease in edits: Fewer people are editing or editors are less active.
 * Wikipedia is nearly the codex of all human knowledge, so less edits are needed. Utter bollocks.
 * Harder for new editors to join.

Pages in MCB
To verify the hypothesis that all the edits go on the background, aka the gnome hypothesis :), I wrote a script to download and analyse all the MCB pages. The decrease is quite marginal (the data is unmodified, while the black line is a 12-month period moving average (e.i. smoothed out version), whereas the previous graph had relative few edits per month so more noisy so was smoothed), so here is a more succinct pictograph (why a pictograph? It makes the data look more believable in the popular press!), showing there is a 15% drop from last year. I am talking about an absolute difference of 30k edits so any stats will give infinitesimal p-values. . So there are less edits, but it is not as drammatic as first predicted and it started actually last year... Shame, I quite liked the Gnome hypothesis. However, I think there is the Bot hypothesis, in which bots have done most edits, although I have had trouble automatically filtering out bot edits (upto a 3 fold underestimate), which account for over 1/7th of all edits... Parenthetically, edits and editors, as expected, form a power law distribution with a kink caused by bots.
 * I spotted the error in the filtering (I was checking for the "(bot)" tag in the description whereas all the bots seem to finish in -bot, having fixed that, here is the monthly graph split into bots, humans and "gods", the dozen or some editors who have done over 10k edits, although some of these may be bots in disguise.Bot hypothesis.jpg

The decrease is caused mostly by bots: Bot edits have dropped by 65% from last year (October-Dec 2010 corrected) and 75% from 2008, whereas human edits have dropped by 5% per annum since 2010 (gods dropped by 45%), so not so dire after all, but still not good.

Trends
I started Matlab to see the peridicity of the edits using a FFT, but then I realised that I knew what was the period I cared about and the poor neglected neuron of mine which keepsakes the Ockham's razor principle was screaming at me, so the following a bog standard pivot tables. The edits during the week show procrastination on wikipedia is highest on tuesdays and mondays. It is in fact a Tuesday right now that I am typing. I was hoping for a more crazy distribution. Northern-hemisphere summer months have less edits. Two contributing factors: This distribution is truly stunning, but due to the world's time zones needs some guesswork. I assumed the time was the same as that on my browser as a registered user, which is set to NZ local time, however, I checked the file and it appears to be GMT, so the peak is a 6pm GTM peak, 6am NZ, 1pm NYC and 10am San Fran, which makes way more sense than NZ time.
 * The editors are pesky undergraduate students and edit topic x after reading it?
 * Summer means less computer related procrastinations?

gods
Who are these chaps who edit like crazy?

Hypotheses to answer

 * What is the ratio of reversions for IPs, bots, humans and gods?
 * Is the quantity increase decreasing?
 * Is the number of good and featured WP:MCB articles promoted per month is decreasing?
 * Decrease in new members?
 * IP address users: do the reversion rates increase with time?
 * Suggestions please