Wikipedia:Bots/Requests for approval/SpamReportBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol neutral vote.svg Request Expired.

SpamReportBot
This bot is just something that I wish to create to allow some detailed information about link additions to be published. Its written in perl, and will be working with an external database. The bot will be running every hour or so to update the statistics.

Basically what I wish to do is publish to its userspace a nice list of all links added by only one user over a timespan. These tend to be spam links, though they are not always spam, and thus will need some human review going over the reports. These humans will most likely be drawn from WPSPAM ;). I'm just going to ask for approval to edit anywhere in the bot's userspace, its never going to edit outside this space unless I specifically request it here. It won't be doing that many edits either, perhaps 2 or 3 an hour, depending on how many reports I can generate ;) ——  Eagle 101 Need help? 12:45, 21 March 2008 (UTC)

Discussion
It's not very clear who and how will define the list of users to be stalked, this needs an ungasty discussion. But if you need to debug the bot or demonstrate its functionality, you're to track my or your own edits. Feel free to catch me on IRC if you want me to make some specific edits. Max S em(Han shot first!) 15:23, 21 March 2008 (UTC)
 * Alright, basically every time someone adds a link that is not already whitelisted (admin, known user), we put a line in a database. This database is analyzed looking for patterns that match spammers. If you check my user contribs, I've found about 6 spammers we missed just by picking the obvious ones out of this report. I'd like to make that information open to others so I don't have to do the work of reverting them XD.
 * Your trial limits won't work, and I hope you understand why now. (Both you and I are on the whitelist of the linkwatcher bots to start with). ——  Eagle 101 Need help? 15:26, 21 March 2008 (UTC)
 * I'll talk to you on IRC if you will get on :). Look for 'nixeagle'. I'm in #wikipedia-bag ——  Eagle 101 Need help? 15:27, 21 March 2008 (UTC)


 * The bot is meant to search a database of link-additions, and see which have certain patterns (e.g. adding the same link to different wikipedias in a short time). That is a pattern of 'cross wiki spamming', which will be investigated.  If I understand it correctly, Eagle_101 wants this data to be available to the members of Wikipedia talk:WPSPAM, so this type of spam can be found and handled more quickly.  Eagle_101, maybe you should add some criteria, showing what bottom levels you have, showing that it should hardly catch any regular editors.  Hope this explains a bit.  --Dirk Beetstra T  C 15:38, 21 March 2008 (UTC)
 * It won't catch users that are whitelisted on the linkwatcher bots. The lower bound at the moment is any domain that is added by only one user more then 5 times. I'm going to work on developing more queries. ——  Eagle 101 Need help? 15:43, 21 March 2008 (UTC)
 * This bot will report both potential crosswiki spam, and spam on this wiki. These will be two seperate reports. I have placed an example of the crosswiki report (unformatted) at User:Eagle_101/crosswiki. The params for the crosswiki search are more then 3 wikis with the same domain, and less then NUMBEROFLINKS + 3 users adding the link. ——  Eagle 101 Need help? 15:50, 21 March 2008 (UTC)
 * MaxSem, I'd also like to object to your use of 'stalk', that implies we have something against these folks (new users and IPs). We will be 'stalking' the edits in much the same way countervandalism stalks edits. Just so everyone has an idea of the number of links added to wikipedia in a 16 hour period... the english wikipedia gets: 24021 links, and across all foundation wikis (some 724 wikis) we get: 34759 link additions. Are there any other concerns, technical or otherwise? ——  Eagle 101 Need help? 16:04, 21 March 2008 (UTC)
 * Pardon, I misunderstood the principles of its functions first. Max S em(Han shot first!) 16:23, 21 March 2008 (UTC)

I've got such a concern: OTRS team periodically receives complaints that COIBot's reports (that may or may not indicate intentional COI spam) show up high in Google results for companies' names, thus linking them to spam. It would be great if there was a way to conceal these reports from search spiders, or formulate them as neutral as possible. Max S em(Han shot first!) 16:42, 21 March 2008 (UTC)
 * Right, have a look at the current COIbot reports, http://meta.wikimedia.org/wiki/User:COIBot/LinkReports/fracassi.net for example. They are fairly neutral and have a disclaimer. Perhaps the reports from this bot can follow similar procedures? As these reports are now, its a list of 200 potential hotspots, and as I tweak ways to filter out some links and more users, that number should go down. I don't know if there is a way to make google go away, short of putting the bot's userspace in robots.txt. The crosswiki reports more or less have to list domains that are not spam, as it takes a human to analyze the reports and make sense of them. This should be a new tool in WPSPAM's quiver though. ——  Eagle 101 Need help? 16:51, 21 March 2008 (UTC)

It is my understanding that editing in userspace at such a low rate does not require approval. The idea of publishing statistics to Wikipedia (and the fact that google picks it up) is out of the BAG's jurisdiction. — Werdna talk 00:23, 22 March 2008 (UTC)
 * I agree with Werdna, one probably doesen't need approval, to do userspace-only at such a low rate. (Although, when I was new, I sought and received it for a one-edit-per-day bot in my userspace :P) SQL Query me!  11:31, 22 March 2008 (UTC)
 * I don't want to enter in the debate on what are the limits between when approval is needed and when it isn't. But I see no harm in letting this BRFA run, especially considered that COIbot still generate quite some controversy. I think that a normal BRFA is in order.  Snowolf How can I help? 11:36, 22 March 2008 (UTC)


 * Another sample report is at User:Eagle_101/crosswiki/1. I'll check back tomorrow if you guys figure out what the status of this is :) ——  Eagle 101 Need help? 02:22, 22 March 2008 (UTC)

Max S em(Han shot first!) 11:38, 22 March 2008 (UTC)

What's the status of this request? — Werdna talk 13:56, 4 April 2008 (UTC) Haven't heard from ya in quite a while. Re-open this if you wish to, when/if you come back. SQL Query me! 13:29, 11 April 2008 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.