Wikipedia:Wikipedia Signpost/2011-07-18/WikiProject report

This week, we spent some time with WikiProject Spam. The project describes itself as a "voluntary Spam-fighting brigade" which seeks to eliminate the three types of Wikispam: advertisements masquerading as articles, external link spam, and references that serve primarily to promote the author or the work being referenced. WikiProject Spam applies policies regarding what Wikipedia is not and guidelines for external links. The project received some help in February 2007 when the English Wikipedia tagged external links as "NOFOLLOW", preventing search engines from indexing external links and limiting the incentive for many spammers to use Wikipedia as a search engine optimization tool. The project maintains outreach strategies, detailed steps for identifying and removing spam, a variety of search tools, several bots for detecting spam, and a big red button to report spam and spammers. The project was started by Jdavidb in September 2005 and has grown to include 371 members. One of the project's most active members, MER-C, agreed to show us around.

How much time do you typically devote each week to fighting spam?
 * I find the time commitment required for anti-spam work to be extremely variable. Monitoring the IRC feed isn't particularly taxing; and it isn't too difficult to clean up a few possible copyright problems, edit a few articles or perform non-WP related work or leisure concurrently.

'''WikiProject Spam is the most active project by edits (including bots) and the second most watched project on Wikipedia. What accounts for this high activity and interest by the Wikipedia community?'''
 * This is an illusion. 98% of those edits are from User:COIBot, a spam reporting bot. The remaining 2% are to the project's talk page, which serves as a noticeboard for reporting spam campaigns. A good chunk of the edits to the talk page are from a handful of anti-spam specialists. I can't explain the number of watchers though.

'''What type of wikispam do you come across most often? Do you use any special tools to detect spam or do you simply remove spam you notice while reading and editing articles?'''


 * While reading articles and cleaning out the spam contained within haphazardly works, it doesn't address the cause of the problem. I target the spammers themselves, i.e. identifying domains owned by the spammer and systematically removing spammed links to said domains. To do it properly requires heavy use of tools beyond the usual contribution analysis:


 * Special:Linksearch and its cross-wiki counterpart
 * Cross-wiki contributions
 * User:Versageek and User:Beetstra maintain a database of link additions to all Wikimedia projects. New links are reported to the IRC channel  (don't go there yet, it's not currently working) and others. User:XLinkBot, a spam reversion bot, and User:COIBot use this channel as their source of link additions. Reports are triggered when a small group of users are responsible for a large fraction of link additions to a particular site or can be requested through IRC or User:COIBot/Poke (administrators and trusted users only).
 * Various external tools, including Whois, reverse DNS lookups, HTML analysis, Google AdSense and Google Analytics databases and a bit of Google-fu.
 * The Firefox extensions NoScript and RequestPolicy to detect redirects to other domains and protect against the mystery meat nature of spammed sites.
 * A text editor that has fuzzy find and replace functionality, usually implemented using regular expressions.


 * I target external link additions, so I encounter vanilla external link spam most frequently. The most annoying and widespread spam campaigns, however, involve multiple spam tactics. That said, I've noticed the following recent spam trends -- note the tendency towards avoiding scrutiny from RC patrollers:


 * The spreading of spam edits over multiple IP addresses and user accounts; one spam link per IP address/account isn't uncommon.
 * Spam masquerading as citations. This typically involves the repeated addition of a certain "reference" by a given person, the spammy nature isn't apparent until you look at the big picture.
 * Replacement of existing links and/or citations
 * Inline spamming, the insertion of external links into article prose purely for search engine optimization
 * Misleading edit summaries

'''Have you had any heated conversations with spammers after removing spam from an article? What are some strategies you've used to resolve these conflicts?'''
 * Personal attacks, edit warring and vandalism are surefire ways to expedite blacklisting of the spammer's sites. A couple of months ago, I dealt with a spammer who edit warred to include links to his website. He responded by vandalising my userpage, and so the relevant sites were promptly blacklisted. Apart from a bad faith delisting request, we haven't heard from them since. This is typical; blacklisting is a very effective way of removing spammers from Wikipedia. (Unlike blocks, blacklisting requires money to evade&mdash;the spammer needs to purchase new Internet domains.)

'''Has your experience fighting spam resulted in any humorous stories? Have you heard any amusing excuses and special pleading from spammers trying to defend their edits?'''
 * See Grief for details on the usual routine of spammers.

Next week, we'll look at the social construct of naming a rose a "rose". Until then, think deep thoughts in the archive.