Wikipedia:STiki/Draft

STiki: A new layer in the anti-vandalism sieve

Understanding Wikipedia requires grasping just how many layers an edit has to go through for it to stay on a page. It begins with the edit-filter--automatic triggers which block very suspicious edits such as "GAY!!!!!!!!!!!!!!!!!!!!!!!!!". This is the low-hanging fruit, and the edit-filter knocks off the most obvious vandals and predictable patterns. After this inital gate, edits are subject to a gauntlet of off-wiki tools. It begins with Cluebot, a neural network program that is astonishingly fast at reverting vandalism. Even the frantic Hugglers often lament with admiration, "Cluebot beat me to it!"

After Cluebot the web spreads out to anti-vandal patrollers. They're human--yes--but they are also machine-assisted. After the edits which are almost definitely vandalism, the edits which are likely vandalism get presented to a patroller for review. Different tools tackle this triage process in their own way. Most focus on ip edits, some target naughty words, others consider sophisticated metadata and quantified editor reputation metrics.

I have experimented with almost all of the anti-vandal tools. I've searched through recent changes, used Lupin's various feeds, booted up and Igloo, and marveled at the Huggle race. Anti-vandal patrol is a vital part of keeping Wikipedia free from embarrassment, and the tools which make it efficient are a force-multiplier.

Being a machine-assisted anti-vandal patroller is a good deal like being a Pending Changes reviewer. I have the reviewer right, which doesn't do me much good now that Pending Changes imploded under the surrounding controversy. The difference, of course, is that anti-vandalism tools do their work after the edit has gone live. Still, despite the failure of Pending Changes to gain traction, I review hundreds of diffs a day. And my favorite tool of the moment is STiki.

Unlike Huggle, the street cop-on the vandalism beat, STiki is like the desk detective. It presents edits based on metadata--features of the edit like what part of the world it came from, the length of the edit comment, and about 60 others. Whereas Huggle specializes in obvious vandalism, STiki searches that, as well as more subtle indicators. It's great for catching single-word changes in the midst of a large paragraph, blanking of sentences, and the insertion of spam links.

There's not a competition between tools, and each one plays a valuable role in the anti-vandal ecosystem--the filtration system. But what does make STiki promising is that it is beginning to act as a clearing-house for all different methods of vandalism detection.

In addition to the STiki metadata algorithm, STiki also handles the leftovers from Cluebot--edits which the bot were pretty sure were vandalism but not sure enough to bypass a human eye. Combined with human review, Cluebot's flagging can be utilized on 90% rather than 60% of the worst edits. STiki also processes the WikiTrust feed, which is based on how durable an editor's contributions are. Most recently, STiki added a linkspam feed, which presents the most likely candidates for commercial and junk urls. Each of STiki's feeds leverages a different approach to tracking down unconstructive or outright destructive edits, and using the tool I can browse through each of them at my own pace.

That is one distinct difference between Huggle and STiki--time to review carefully. Unlike the mad-dash Huggle race, STiki sends diffs out of a queue using a 'reservation system'. When you're reviewing a particular diff with STiki, no one else is. This gives you the ability to track down a reference for a BLP addition, to check the sports scores for an updated goal figure, or to browse through the contribution history of the editor to make a more informed judgment, without worrying that you'll be beaten to the punch.

All of that is possible without a tool, but STiki makes it convenient and efficient. It prioritizes the most likely vandalism and presents it through a clear user interface. If it's vandalism, one click, and the edit is immediately rolled back and a warning posted to the user talkpage. If it's innocent, one Click, and the diff leaves the queue with a seal of implicit approval. Not sure? Clicking pass recycles the diff for another editor to evaluate.

Since it's start, STiki has reverted nearly 70,000 incidents. I like to sit down and go through a few hundred at a time. It's a way to remove a good portion of some of the estimated 7% of edits which have nothing worthwhile to add. Also, using STiki is a way to 'see the Wiki', to pop in and out of different subject areas and grasp the full scope of contributions to the site, both good and bad.

The only bad thing I can say about STiki is that few people know about it and even fewer have used it. STiki has the full support of University of Pennsylvania PhD candidate Andrew G. West and his well-credentialed advisers. West made a splash a year ago when his research projects injected spam onto several articles. It was a "breaching experiment", and it was not well received. I had an inkling, despite the backlash he received, that West was worth holding onto, mentoring perhaps, and ultimately utilizing to our full advantage. As is often the case with hackers, it's much better to have them on your side. Fulfilling that hunch, West proceeded to "pay back" the community by not only pointing out ways to stop the kind of attack on the site he had perpetrated, but by creating a tool to prevent others from succeeding at the same. That tool was STiki. In addition to the program, West has put out a bevy of research papers analyzing vandalism and spam. He's won awards for some of them. He's presented at Wikimania in 2010 and 2011. In short, he's one of the foremost and most active authorities on wiki security, and we're lucky to have him on board.

I'll end this with a shameless request. If you haven't tried it before, go check out STiki. Run through a few diffs. If it works for you, make it part of your wiki routine, or an alternate for the times where you don't feel like dealing with controversy and humans. If it's not for you, no harm done, and good luck finding something else which is a better fit. As Wikipedia matures, the role of anti-vandalism patrollers and their tools becomes ever more essential to maintaining the quality of what we have collectively produced. We have the knowledge. We have the tools. Now we just have to get more people to use them.