Wikipedia:Bots/Requests for approval/DASHBot 15


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

DASHBot 15
Operator:

Automatic or Manually assisted: Automatic

Programming language(s): Python

Source code available: ...

Function overview: Reverting blatant vandalism.

Links to relevant discussions (where appropriate): N/A (ClueBot already does this)

Edit period(s): Continuous

Estimated number of pages affected: Dependent on vandalism.

Exclusion compliant (Y/N): Y

Already has a bot flag (Y/N): Y (But it wont use it, all of its functions should be in watch-lists)

Function details:


 * Main bot function


 * 1) The bot downloads User:DASHBot/Vandalism and parses it for regexes.
 * 2) It compiles those regexes, and keeps them associated with their score.
 * 3) It then connects to irc://irc.wikipedia.com, to channel #en.wikipedia
 * 4) When a new edit is made, it continues only if the user is not in  Huggle's whitelist
 * 5) Then, it checks the edit against DASHBot's User:DASHBot/Ignore list.
 * 6) If none of those steps excuse the edit, the bot downloads the diff and parses put all new text (red text, and new green sections that do not appear in the old side)
 * 7) It checks each regex, and adds the score to the edits score. Scores are cumulative, so if it says "fuk" 5 times, the score will +=-3*5
 * 8) If the edit is smaller than 400 new charachters, the score to revert is -3.
 * 9) If it is less than 150 characters, it is -2.
 * 10) IF it is less than 15 characters, it is -1.


 * Logging

Logging for false positives is incredibly important in this case. For that reason, I made a simple error-report generator that can be accessed online. Try it out with this edit.


 * Extra Notes


 * All regex pages are reloaded the instant the bot understands they have been edited.


 * I have been running a dry run in the bot's userspace. See the results of it at User:DASHBot/Dryrun. (Note the items at the bottom will be a more accurate representation of the bot's ability).

Discussion
Please run this on a different account, it's bad enough to have 10 tasks running under one account, but also having an anti-vandalism bot that runs constantly will cause problems down the road.
 * "If the edit is smaller than 400 new charachters, the score to revert is -3."
 * I think you mean to go with more or equal to?


 * "Source code available: ..."


 * "Already has a bot flag (Y/N): Y (But it wont use it, all of its functions should be in watch-lists)"
 * I hope this is some horrible typo, watch-lists have nothing to do with bot flags...

Also, per the Dryrun, this and this would of been reverted. I know it's great to detect a large amount of vandalism, but being too sensitive and having false positives is not an acceptable side affect. This is present in ClueBot and other antivandalism bots. FinalRapture - † ☪ 21:18, 8 June 2010 (UTC)
 * Both those false positives have been fixed, even before I read you comments. Finding errors like these were the whole purpose of the dry run.
 * The source is not yet available because right now it looks like a 5 year old wrote it, Im cleaning it up now, and will publish it after the full trial.
 * It won't edit with a bot flag because its edits should not appear in the recent changes feed. Sorry, it was early when I filled this all out. Tim  1357  talk  23:47, 8 June 2010 (UTC)
 * What are the problems you have with running this under the same bot account? Its easier for me to do, and I dont think it makes it any harder for others to use, given the propper documentation and logging. Tim  1357  talk  23:47, 8 June 2010 (UTC)
 * "It won't edit with a bot flag because its edits should not appear in the recent changes feed.". What the hell? FinalRapture - † ☪ 02:07, 9 June 2010 (UTC)

If I can quote WP:BOTPOL:"Edits by such accounts are hidden by default within recent changes." Sorry if I was unclear. Tim 1357  talk  01:10, 10 June 2010 (UTC) . -- Tim 1357  <sup style="font-family:Times new roman; font-size:small;">talk  00:50, 12 June 2010 (UTC)
 * The comment about the watchlist isn't as ridiculous as FinalRapture makes it, since there is a user preference that enables you to hide bot edits from your watchlist. Then again, you can't run one task with a bot flag and another without one on the same account. Ucucha 16:54, 11 June 2010 (UTC)
 * Hate to be contrary, but you can. Quoted from the API documentation:"* : If set, mark the edit as bot, even if you are using a bot account the edits will not be marked unless you set this flag."
 * My bad, thanks for teaching me something new. Ucucha 06:15, 13 June 2010 (UTC)


 * Compliance with 1RR


 * I'v put in some thought on how to make this bot not revert the same edit over and over again. The process I've come up with is this:


 * When the bot reverts an edit, it makes a hash (to save memory) of the username, the page, and the rendered diff. This hash is then stored, along with a time stamp.


 * If within 24 hours, another edit has an identical hash (meaning it is the same user, making the same edit on the same page) the bot will make the revert_score threshold lower. Instead of needing a score of -4 to be reverted, it would need a score of -10 (or something like that) . This means extremely blatant vandalism will be reverted again and again. However, this feature can be turned off

Tim 1357  <sup style="font-family:Times new roman; font-size:small;">talk  17:19, 12 June 2010 (UTC)
 * IMO AVBots should be 1RR. It's better then picking arbitrary numbers as a cutoff. Q  T C 01:39, 13 June 2010 (UTC)
 * So even though an edit has a score of -1,000, the bot should not re-revert? Or are you saying the bot should not revert the same user in the same day, even if it is a different page/edit? Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  01:45, 13 June 2010 (UTC)
 * After a brief discussion on IRC, I have compromised so that the hash includes the user and the page title, so that the bot will never revert two edits the same page by the same user within 24 hours of each other. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  02:31, 13 June 2010 (UTC)


 * If I'm reading User:DASHBot/Vandalism/Die correctly, it says that any user can change the regex. This can be dangerous. Sole Soul (talk) 19:25, 13 June 2010 (UTC)


 * Yes, but the payoff is that other users can help build the regex base. Additionally, I will request that the page be fully protected, so that only admins may edit it. I believe that admins will be smart enough to not modify a regex if they do not know how. I might be overestimating them though. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  21:14, 13 June 2010 (UTC)
 * Full protection is good, but then how you could edit it? I suggest moving the page to User:Tim1357/regex.css or something similar. Sole Soul (talk) 21:26, 13 June 2010 (UTC)


 * Yes, I have been considering making the page a redirect to a .css page of mine. In fact, that's just what I'll do. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  21:33, 13 June 2010 (UTC)


 * Addition: "if the user is not in Huggle's whitelist", does that mean that the bot can revert some autoconfirmed users? Sole Soul (talk) 19:31, 13 June 2010 (UTC)


 * Are there auto confirmed users that aren't in the white-list? Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  21:14, 13 June 2010 (UTC)
 * I don't know, I don't use Huggle :) Sole Soul (talk) 21:26, 13 June 2010 (UTC)
 * Apparently not, "Huggle whitelists users with edit counts above 500". Autoconfirmed users should not be reverted by a bot. Note: 3 of the 6 false positive edits reported were made by autoconfirmed users. Sole Soul (talk) 22:30, 13 June 2010 (UTC)
 * If it's alright with you, I'd like to stick with the Huggle white list. I spent some time looking, and it appears there is nowhere that the API will let me download a list of autoconfirmed users. In fact, the database does not even have a place where it saves a list of these users. Hopefully the bot is coded well enough so that this will not effect performance. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  03:32, 14 June 2010 (UTC)
 * Of course if there is a technical limitation, you have no choice, but I wonder how Clubot and AVBOT handled the situation, I'm not sure. Sole Soul (talk) 07:51, 14 June 2010 (UTC)

Let me look at AVBOT's code. Tim 1357  <sup style="font-family:Times new roman; font-size:small;">talk  13:56, 14 June 2010 (UTC)
 * ClueBot doesn't revert users with > 50 edits, or IPs with > 500 edits. ( X! ·  talk )  · @219  · 04:15, 15 June 2010 (UTC)
 * Do you know if it loads that entire list all at once, or it checks for each edit? Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  04:54, 15 June 2010 (UTC)
 * Each edit. ( X! ·  talk )  · @248  · 04:56, 15 June 2010 (UTC)
 * I think all IP edits should be checked, as school IPs can accumulate large edit numbers. Sole Soul (talk) 11:06, 15 June 2010 (UTC)
 * I agree. The bot will ignore edits by non-IP users that have more than 50 edits. The bot still uses Huggle's whitelist, because that list is helpful to strip bots, admins, and other experienced editors before it downloads the edit. It checks the user's edit count after it downloads and evaluates the edit. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  16:17, 15 June 2010 (UTC)


 * Oh yeah, I finally got a github, and I've been updating the source here. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  16:17, 15 June 2010 (UTC)
 * Yes? D Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  22:08, 25 June 2010 (UTC)
 * Trial or approval at this point? I could go for a trial.  MBisanz  talk 18:50, 26 June 2010 (UTC)

Trial plz. Tim 1357  <sup style="font-family:Times new roman; font-size:small;">talk  00:34, 27 June 2010 (UTC)
 *  MBisanz  talk 04:36, 27 June 2010 (UTC)
 * . 50 Reversions. There were a few false positives, and all were related to one regex. That particular regex has sine been removed. There were a few other bugs, but all were easy to fix. Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  02:06, 28 June 2010 (UTC)
 * As you requested. ( X! ·  talk )  · @133  · 02:11, 28 June 2010 (UTC)

Done, along with a bit extra monitored sessions. No errors were encountered. I suggest either approval or another long trial period (a week or so). Tim 1357  <sup style="font-family:Times new roman; font-size:small;">talk  17:07, 4 August 2010 (UTC)


 * D Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  17:07, 4 August 2010 (UTC)
 *  MBisanz  talk 06:23, 8 August 2010 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.