Wikipedia:Bots/Requests for approval/SoxBot III 3


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

SoxBot III
Operator: X clamation point

Automatic or Manually Assisted: Automatic

Programming Language(s): PHP

Function Summary: Anti testing bot

Edit period(s) (e.g. Continuous, daily, one time run): Continuous

Already has a bot flag (Y/N): Y, old tasks inactive

Function Details: This bot is based off of ClueBot, in the fact that it has heuristics and has a score function to rate tests. This bot is designed to combat edits which add stuff like " $$Insert formula here$$== Headline text ==Bold text'Italic text' ". It sits in the RC feed, and if it detects an edit by an IP with less than 250 edits, or a user with less than 50 edits, it checks the diff. It checks the parts added and removed against a score list, and a whole list of common testing techniques are scored depending on how much of a false positive it could be. If it falls below a certain range, the bot reverts the edit.

Discussion
An IP with less than 250 edits? That's probably 90% of anons. Probably most of the ones with more are shared IPs. If the threshold is that high, it would probably be more efficient to just check every IP edit. Also, will it be limited to checking mainspace pages? Mr.Z-man 06:59, 28 December 2008 (UTC)
 * I could make it all IPs, I guess. And yes, it is limted to Mainspace. X clamation point  07:00, 28 December 2008 (UTC)

What will it do if someone makes an otherwise legitimate edit where they accidentally include a string such as Bold text ? rspεεr (talk) 09:54, 28 December 2008 (UTC)
 * In that case, they can make the edit again, and the bot won't revert it. X clamation point  18:52, 28 December 2008 (UTC)
 * In addition, it will not revert if the edit adds more than 1000 characters, unless the score is unusually high. X clamation point  06:54, 30 December 2008 (UTC)
 * Sure. It sounds like the score function deals with the cases I was worried about. rspεεr (talk) 08:13, 30 December 2008 (UTC)

I could have sworn there was a bot already approved for this just recently... --MZMcBride (talk) 09:55, 28 December 2008 (UTC)
 * Ameliorationbot? Yes, but Ameliorate has just retired. X clamation point  18:52, 28 December 2008 (UTC)

Seems to be making good reversions. But as of 1500GMT today (Jan 2) it seems to have stopped issuing warnings to users. Martin 22:05, 2 January 2009 (UTC)

Short dry run
From a short dry run, here's what the bot would have reverted:

X clamation point  07:49, 29 December 2008 (UTC)
 * Update: The results above are from an overnight dry run. X clamation point  15:31, 29 December 2008 (UTC)

I have a few questions. First, at least one of the previous bots that did this was Bots/Requests for approval/TestEditBot (which has a better name at least...). There seem to be a lot missing safeguards here (or at least they're not described). Thanks! --MZMcBride (talk) 19:27, 29 December 2008 (UTC)
 * 1) Does it give user warnings to users making test edits, and if so, which templates will it use?
 * It will give warnings, but I have not written the page which will warn as of yet. see below. X clamation point  20:05, 29 December 2008 (UTC)
 * 1) If you're using ClueBot's classes and ClueBot is (as far as I'm aware) still running, why wouldn't this functionality be built into that bot?
 * Because it has a whole different mindset on what to do with a test edit than ClueBot does for vandalism. ClueBot is more of "angry reversions, report it to AIV, and everywhere else", while this bot is more of a "revert and warn nicely" approach (That's my view on it, anyway). X clamation point  20:05, 29 December 2008 (UTC)
 * 1) How does it check the number of edits for a given IP?
 * It doesn't anymore, per Z-Man's comment above. Before, it used the API. X clamation point  20:05, 29 December 2008 (UTC)
 * 1) Further, is the number of edits for an IP relevant to whether test code was added, and if so, how?
 * See above question X clamation point  20:05, 29 December 2008 (UTC)
 * 1) You vaguely mention that this bot won't revert the same person twice on the same day. Is this source code publicly available and has it been checked by others?
 * The source is not publicly available, as of yet. Right now, the code is pretty ugly looking. I will work on making it cleaner code, and then release it publicly, and have it reviewed. X clamation point  20:05, 29 December 2008 (UTC)
 * I have released the code at User:SoxBot III/Source, and it is currently being has been reviewed by Chris G. X clamation point  03:25, 30 December 2008 (UTC)
 * 1) Going beyond that, how much of the code for this bot was written by you (Soxred)? As far as I'm aware, Cobi is pretty inactive, so if something breaks, will you be able to fix it? Or was this code simply re-used (not that there's anything wrong with that) from previous bots? And if it is re-used code from previous test edit detection bots, why has this project been abandoned previously? (This also sort of ties in to why ClueBot isn't doing this itself, though I have my suspicions.)
 * Maybe 60% was written by me. The other 40% is the IRC bot part (for getting the live feed, and also for getting a machine readable diff. I understand the concern about Cobi being inactive, but if it breaks, it most likely will be easy to fix by myself. It is also not using any code from previous test detection bots. X clamation point  20:05, 29 December 2008 (UTC)
 * Also, I am not as inactive as you may think — I do pay attention to Wikipedia, just don't always respond. And Soxred knows how to get a hold of me if he needs me.  -- Cobi(t 06:32, 15 January 2009 (UTC)

This looks pretty good. I take it this is just for the test edit type things and not the other types of vandalism scoring that cluebot does (various words etc)? I'm not much for reading code, but the tests I saw were of the '/\'\'\'Bold text\'\'\'/' type. Either way this is great since a fair number of these slip through and the more a bot can get the better. Another thought I had, what's the chance of checking if a user is doing this repeatedly and reporting them? Anybody doing the same thing after being warned for example would clearly be an issue. - Taxman Talk 19:18, 1 January 2009 (UTC)
 * I suppose I could make it report to AIV if they've been warned 4 times. Would that work, Taxman? X clamation point  19:41, 1 January 2009 (UTC)
 * For IP's I think that would be about right, though for registered users I don't see why more than two or three is needed. If the bot really warned in error because someone added the mistakes unintentionally, a report shouldn't be a problem, because it can be explained on their talkpage or whatever and they wouldn't get blocked. Whoever reviews the report will look into it. But by two or three false warnings someone really needs to be reminded to use preview or show changes. :) two or three real warnings they need a block. But for IP's who knows who is behind it so unless the warnings are several or over a short period, no report is likely needed, justifying the higher number. - Taxman Talk 20:18, 1 January 2009 (UTC)

Warning templates
== December 2008 == Welcome, and thank you for experimenting with Wikipedia. Your test on the page : worked, and it has been automatically reverted. Please take a look at the welcome page to learn more about contributing to this encyclopedia. If you would like to experiment further, please use the sandbox. If you believe there has been a mistake and would like to report a false positive, please report it here, but be sure to mention this number: . Thank you. SoxBot III (talk | owner) 06:52, 30 December 2008 (UTC)

Please refrain from making test edits in Wikipedia pages, such as those you made to :, even if your ultimate intention is to fix them. Such edits appear to be vandalism and have been automatically reverted. If you would like to experiment again, please use the sandbox. If you believe there has been a mistake and would like to report a false positive, please report it here, but be sure to mention this number: . Thank you. SoxBot III (talk | owner) 06:52, 30 December 2008 (UTC)

Please stop making test edits to Wikipedia, as you did to :. It is considered vandalism, which, under Wikipedia policy, can lead to blocking of editing privileges. If you would like to experiment again, please use the sandbox. If you believe there has been a mistake and would like to report a false positive, please report it here, but be sure to mention this number: . Thank you. SoxBot III (talk | owner) 06:52, 30 December 2008 (UTC)

This is the last warning you will receive for your disruptive edits. The next time you disrupt Wikipedia, as you did to :, you will be blocked from editing. If you believe there has been a mistake and would like to report a false positive, please report it here, but be sure to mention this number: . Thank you. SoxBot III (talk | owner) 06:52, 30 December 2008 (UTC)
 * -- Chris  04:22, 31 December 2008 (UTC)
 * Symbol information vote.svg Trial started - After a full day of trying to get it work, with server hopping and the toolserver being down, I FINALLY got it to work. It is now reverting and warning users. <span style="font-family:Verdana,Arial,Helvetica;color:steelblue;">X clamation point  04:34, 1 January 2009 (UTC)
 * BotTrialComplete - I have also changed it to simply undo the edit, rather than rollback. I think that will cause less false positives. A report of false positives can be found here. <span style="font-family:Verdana,Arial,Helvetica;color:steelblue;">X clamation point  21:02, 6 January 2009 (UTC)

Additional discussion
I noticed that the false positives were caused by a user entering legitimate text and then something like Italic text. Of course vandals also often enter such strings in their crap edits. Based on your response to the one FP, I'm guessing the bot will no longer make any changes if a certain threshold of "non-test" text is also entered. Perhaps, in these cases the bot could just remove the "test" portion (probably without warning), but only when no text was deleted in the process. But perhaps not... the downside would be potentially vandalious edits being only partially reverted. Just something to consider. --ThaddeusB (talk) 23:06, 6 January 2009 (UTC)
 * I have lowered the point value for those strings. <span style="font-family:Verdana,Arial,Helvetica;color:steelblue;">X clamation point  04:19, 10 January 2009 (UTC)

Extended trial
BJ Talk 02:17, 15 January 2009 (UTC)
 * During the trial, I got 2 or 3 complaints about false positives. Those are likely not to happen again, as I have raised the threshold to eliminate those specific false positives. X  clamation point  00:30, 29 January 2009 (UTC)
 * Everything looks good, I see no reason not to approve. <em style="font:bold 12px Verdana;"> Richard 0612  21:22, 30 January 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.