Wikipedia talk:Wikipedia Signpost/2009-06-22/Vandalism

Author's note
I appreciate any and all commentary and criticism of this study — use the Discussion page for this. If you edit this report, please do so with extreme care. Aetheling (talk) 05:35, 15 June 2009 (UTC).

Sample size
Have you thought about using a larger sample? You'll admit, a sample size of only 100 has some pretty big error bars on it. I know it's tedious to do more, but hey ... that's what grad students are for :-P

Also, I'd like to see some way to take into account the "importance" of a page. You could use monthly page view numbers as a rough proxy. My guess is that vandalism reversion time and the popularity of an article are highly correlated. So while 4 minutes might be the median across all articles, the median across articles that people are actually reading (let's admit, most articles barely get read at all) might be smaller still. -- Cyde Weys 03:09, 23 June 2009 (UTC)
 * Thanks for the comments. On your first point: data from 100 pages produced a very clear survival curve, which was after all the goal of the study. I am reasonably certain that adding more pages to get a smoother curve is just a waste of time. I did consider asking a grad student to help with the data collection, but in the end the sad fact is that I just don't trust a grad student to be careful enough with reading through all those edits (it helps to have some OCD traits). On your second point, I think this is an excellent idea. I wish I had thought of it! Perhaps the resulting data will allow some form of logistic regression. Next time... —Aetheling (talk) 05:19, 23 June 2009 (UTC).


 * Nice informative study. I also reacted negatively to the mention up front of a sample size of 100.  Agree on Cyde's idea of somehow normalizing for the page views.  Tempshill (talk) 19:42, 26 June 2009 (UTC)

Some thoughts on tools
Thanks for the study -- it was quite interesting to read. My first thought was, did you correct the two instances of vandalism that had not been corrected before? If not, tell me what they were, and I'll do it. JesseW, the juggling janitor 19:05, 16 June 2009 (UTC) Regarding what tools might be helpful -- better history analysis tools would seem to be of considerable help. For quite a while now, I've wanted to take the time to craft a number of such tools: one to show all the text that has been added to an article over a given time-frame (even if it was removed within the time period); one to highlight the age of text; one to highlight text that has not stayed unaltered during a given time frame; etc. I think such tools would go a long way to rooting out vandalism that got lost in the history. The remaining problem would be intentionally subtle vandalism incorporate within otherwise correct changes, or subtle factual lies or bias, which is even harder to handle. Your thoughts would be certainly appreciated. JesseW, the juggling janitor 19:05, 16 June 2009 (UTC)
 * Sorry for not making this clear. Yes, I did correct the two instances that I found. —Aetheling (talk) 03:41, 19 June 2009 (UTC).
 * I assume this fix to Leslie Roy Marston was the one from 2007, as it was vandalized on the day you mentioned; but what was the other one? 70.213.92.234 (talk) (really, User:JesseW/not logged in) 04:49, 25 June 2009 (UTC)

uncorrected vandalism
I have to ask: What were the two articles with uncorrected vandalism? Kaldari (talk) 00:49, 17 June 2009 (UTC)
 * Sorry, I didn't keep a record. They were quite forgettable. To get a feeling for how banal most Wikipedia articles are, try clicking the Random article link for a while. You will see endless details about popular culture, obscure geographical locations, odd lists, etc. It leads me to think that a better study might focus on the more substantive articles, which also tend to attract more vandalism. —Aetheling (talk) 03:41, 19 June 2009 (UTC).

Hmm...
I think this examination is a very good start, but the small size of the sample, combined with the use of such expressions as "50% of all vandalism is being detected" and "10% of all vandalism endures for months" (emphasis mine) makes me very uncomfortable. While those were the percentages that turned up in your (rather small) sample, it's a bit over-reaching to assert that 100 samples are absolutely representative of "all vandalism". – ClockworkSoul 05:23, 23 June 2009 (UTC)


 * It is a small sample, but the resulting survival curve is convincing and the overall results are similar to previous studies. I don't think sampling error is a problem here. Incidentally, sampling error has no effect on the "50%" figure, because that comes from the definition of the median. It does affect the median itself (the "four minutes to reversion" figure). I have changed the wording slightly so as to make it less absolute. —Aetheling (talk) 03:23, 24 June 2009 (UTC).

Query
This is excellent work. I’ve been sceptical about the usual reassuring statements about vandalism reversion for a long time, having come across many instances of ancient vandalism persisting, even in rather high-traffic articles. This neatly describes what is going on. A question: is it possible to estimate from these results what percentage of articles are currently vandalised? (I realise that 2% of the sample was in this state, but I am not clear what, with any confidence, can be drawn from that.) Ian Spackman (talk) 06:30, 23 June 2009 (UTC)


 * I didn't record the data that is required to estimate the percentage of articles that are currently vandalized. It's a shame, because it would have been easy to do, had I thought about it. Next time... —Aetheling (talk) 03:23, 24 June 2009 (UTC).

It is probably impossible to prevent all vandalism - whether creative/humorous or destructive - and some examples will overlap with truthiness, POV-ism and/or genuine misunderstanding. Even if there was a drive to ensure that "every last article as of 1 January 2010 is free of error, vandalism, POV and other problems" a few examples will survive - and there will be a fresh crop of such things emerging.

I would guess that eg (present Pope, Prime Minister, Monarch, President, Sports Champion etc) will be more subject to vandalism than the equivalents from 100/200/500 years/other date ago. - — Preceding unsigned comment added by 83.104.132.41 (talk • contribs)


 * There are some good points here. I appreciate that restricting the survey's focus to clear instances of vandalism was probably necessary, but I think the grey area of subtle vandalism/POV/misunderstanding presents interesting problems of its own. Presenting the study's conclusions in terms of "all vandalism" seems to sweep them a bit too far under the carpet for my taste. -- Avenue (talk) 08:52, 24 June 2009 (UTC)

Possible bias
Thanks for this interesting study. I do agree with your general conclusions. However I think that the results may be biased (in the technical sense) due to the sample design, and in particular your investigation of only the most recent instance of vandalism from each article. This means that instances of vandalism in heavily vandalised articles would be given equal weight in the results to instances in less vandalised articles, and thus each instance of vandalism in a heavily vandalised article is less likely to enter your sample than instances in less vandalised articles. If vandalism to heavily vandalised articles is corrected more quickly and thoroughly, e.g. because people expect it and watch for it more closely, then your measures of time to correction would tend to be overstated.

It might be possible to correct for this effect, e.g. by weighting based on some measure of vandalism rate. However there are other potential biases lurking here too, e.g. due to some articles being older than others. Adjusting for everything may be difficult. Another option would be to think through any assumptions you are making, and hedge the results accordingly. -- Avenue (talk) 08:38, 24 June 2009 (UTC)

Some topics are vandal magnets, while "in the news topics" (in the broadest sense) are likely to suffer much vandalism, "errors arising from overlapping editing" and other sources of error, which will drop significantly after the event passes into history (eg articles on George W Bush and Tony Blair are likely to show this phenomenon). And "vandalism and errors" in articles on obscure topic are likely to remain undetected for some time. Could "someone statistical" be brought in to determine suitable bases for "low", "medium" and "high" activity articles? (A more technical analysis would involve comparing articles across the various languages in which they appear - to see the way in which particular controversy "travels." —Preceding unsigned comment added by 83.104.132.41 (talk)

Wikiproject
Ideally, I had set up the WikiProject Vandalism studies to do just the sort of study that is mentioned here. Hopefully, with this study there may be more interest in getting that project going again. Remember (talk) 16:52, 24 June 2009 (UTC)

Methology
I see some problems with matching the conclusions to the results. We (or you) state that "50% of all vandalism is being detected and reverted within an estimated four minutes." I'm not sure you studied that. I think what you found was that 50% of previously vandalized articles had their most recent vandalism reverted within an estimated four minutes. I think there are two effects you are ignoring.

1. You're not taking a good sample of "vandalism." "Vandalisms" are edits, and so your sample should select randomly from edits. Instead, you sample randomly from articles. This substantially overweights Ted Chabasinski which is one article, but has 6 edits, and substantially underweights George W. Bush which has more edits, and thus more vandalism, but if both of those two articles were the entire sample, you would say that 50% of articles have never been vandalized and 50% of articles have their vandalizm reverted in seconds for a median of "in seconds." In fact, what you should have done was take a random sampling of edits, determined which of those edits was vandalism, and determined the reversion time on those edits. You can select a random edit from the database in multiple ways - I'm certain the more technically literate can help you figure out the most random way.

2. You're ignoring the "still exists" vandalism that was covered by more recent vandalism. Imagine an article was vandalized in a very subtle and damaging way a year ago (say, alledging that the person was involved in the assassination of JFK). Then, imagine that someone, 1 minute ago, wrote "PENIS PENIS PENIS PENIS" over the header, which was instantly reverted. Your study would show that the vandalism TTL on this article was instantly reverted, when, in actuality, the TTL in a sample of all-vandalism would show 1 TTL of instant, and 1 TTL of never reverted.

These two effects would seem, to me, to pull in different directions. My expectation is that if you gave a distribution, you would show a median TTL that was too long (4 minutes too large), but with a distribution that was far too normal (IE - vandalism has a fatter tail than even you discuss, consisting of subtle, damaging vandalism designed to disparage people the vandal does not like). This is also being discussed offsite, at but one should have a very thick skin and be able to deal with all comers if they engage at that location. Hipocrite (talk) 16:41, 25 June 2009 (UTC)


 * I raised some similar issues in the section above. Thanks for the link to the external discussion; there are some good ideas there, along with the vitriol. -- Avenue (talk) 19:19, 25 June 2009 (UTC)


 * I strongly agree, especially to point number 2. As a result, this report might be encouraging underestimation of the real urgency that Wikipedia is facing due to vandalism. - Subh83 (talk &#124; contribs) 05:22, 13 April 2011 (UTC)

Never?
I have to say, I like this study and what it sets out to do. We can learn from this, then repeat it, with bigger samples to see if we have improved.

I quibble with your use of the word "never" - why not just state the time between the vandalism and the time you found it? It could have been only 1 day for all we know. I can't really see what "never" could mean in this context. Stevage 01:13, 26 June 2009 (UTC)

Studies
Would a "compare and contrast" of "rearrangements and vandalism" to Michael Jackson, Farrah Fawcett and AN Other Minor Notable be useful?

Guestimating the likelihoodness of non-constructive rearrangements for the three persons.

Mistake?
I haven't read the article very carefully, but in Wikipedia Signpost/2009-06-22/Vandalism... since when has 5 been 25% of 25? 20% surely? Maybe it should also be made clearer that this is the percentage of the vandalised articles, not of the total - any fool reading the page could work that out, but there are a lot of fools on Wikipedia. —Vanderdecken∴ ∫ξφ 20:45, 1 July 2009 (UTC)