Wikipedia talk:Bots/Anti-vandal bot requirements

Obviously, wording is still to be done properly, especially the explanatory sections, but that gives us a good starting point.

Notice I have no specification for how to define a "bad" edit.

So, thoughts? &mdash; Coren (talk) 18:27, 13 November 2007 (UTC)

Rename

 * Note not listed at requested moves, but requirements is to strict. — xaosflux  Talk  01:34, 16 November 2007 (UTC)


 * I would tend to agree, on second look, but let's wait until we finish hammering it out first? &mdash; Coren (talk) 01:42, 16 November 2007 (UTC)
 * Sounds fine to me. — xaosflux  Talk  02:18, 16 November 2007 (UTC)

Requirement list
Two comments at this point: What is less than a day old: the AV bot or the edit?
 * Must not revert to an edit from an AV bot less than a day old
 * The edit. &mdash; Coren (talk) 00:52, 19 November 2007 (UTC)

Although it would presumably be allowed to revert past a bot edit, e.g. SineBot signing vandalism or SmackBot dating.
 * Must not revert an edit marked as a bot edit
 * Yes, the "target" of the revert (the edit that flagged) must not be a bot edit. &mdash; Coren (talk) 00:52, 19 November 2007 (UTC)

With these guidelines, how many anti-vandal bots are expected to run simultaneously? Three is quite enough, in my opinion. Grace notes T § 05:28, 17 November 2007 (UTC)

A few requests

 * 1) If there is to be a global whitelist, please can it be published (presumably on a fully-protected page) in a machine-readable form that can be accessed by human-driven anti-vandal tools, as well as by bots?
 * That's very much useful, but it's also a pretty much unavoidable side effect. :-) &mdash; Coren (talk) 00:54, 19 November 2007 (UTC)
 * 1) Please can there be a standard for saying, in the edit summary of a warning, the level of warning issued? ClueBot does this, but the others don't, so far as I can see.  It makes it easier to scan the user's previous warnings when deciding what level of warning to issue and whether to go to AIV.
 * Excellent idea. Adding.  &mdash; Coren (talk) 00:54, 19 November 2007 (UTC)
 * 1) At the risk of becoming repetitive (see User_talk:ClueBot_Commons/Archives/2007/October, please can there be a standard for how bots deal with dynamic IP vandalism? The common practice seems to be to revert all edits by a vandal to the last edit by a different editor, but this can have the effect of locking-in earlier undetected vandalism by the same user on an adjacent IP.  I think this is a case where the bots might do better to leave the vandalism alone and have a human work out how far back the reversion needs to go.  At the very least, please can there be a standard similar to the one now implemented by ClueBot for the edit summary of a revert, so that cases where a bot has reverted to a version by a similar IP can be searched for? Philip Trueman (talk) 15:04, 18 November 2007 (UTC)

Headers
At the present time, the headers at a repeat vandal's talk page are likely to be a mess. Some bots enter ==November 2007==, others ===November 2007=== , and the more original ones will say ==Your edits to Example Page==. I think there should be a standard followed by anti-vandalism bots, user scripts, as well as human editors. It would help check if user has any recent warnings. Puchiko (Talk-email) 02:23, 19 November 2007 (UTC)
 * Currently, the best way to check if any recent warnings exist is with the &lt;!-- Template:Uw-whatever1 --&gt; comment tags, and this works quite well. It's a bit easier to standardize those than to standardize headers, in my opinion. Also, if you know where a warning is, you can also see what header it's under, when it was added, and from that you know whether or not to create a new header. Grace notes T § 03:33, 19 November 2007 (UTC)

And Anti vandal bots have the same problem, there can be several headings for one month because the vandal bot did not realise a heading was already there. Puchiko (Talk-email) 04:03, 19 November 2007 (UTC)
 * Yes, but that does not work if I'm using a user script, such as Twinkle. Twinkle will check if there is ==November 2007== header. If there is, the warning will be added under it. If there isn't a header with that exact name, twinkle will create one.


 * Personally, I think substing a template to the bottom of the page is the best way. The only reason ClueBot adds a dated header is because someone decided to change the template ClueBot substs to include it.  -- Cobi(t 04:07, 19 November 2007 (UTC)


 * I prefer the ==November 2007== because it makes it easier to spot recent activity and there won't be more than 12 a year. What if another section headline appears?  Should a duplicate November 2007 section be created or unique section names (2007a) ? (SEWilco (talk) 05:19, 19 November 2007 (UTC))


 * My personal preference is either ==November 2007== or no sections at all (just slap the warning to the bottom of the page). But whatever it is, I would like it to be uniformed and standardised (take a look at a disruptive school IP's page, and you'll understand). Puchiko (Talk-email) 05:45, 19 November 2007 (UTC)

It seems like this boils down to: if a header containing the most recent warning is not the bottom-most header on a page, should the warning be added under the header, or to a new header at the bottom of the page? Amelvand takes the second strategy, since such added warnings might otherwise get lost between non-warning sections. Any other thoughts about this? Grace notes T § 16:34, 19 November 2007 (UTC)

ClueBot
Right now, ClueBot checks to see if the user who it is going to revert has less than 50 edits. Is this fine to satisfy the no reverting admins and no reverting bots rule? I don't think that any user would be promoted to admin with less than 50 edits. I also don't think that any bot out of its trial period would have less than 50 edits. -- Cobi(t 04:13, 19 November 2007 (UTC)


 * I would say that definitely satisfies the "no reverting admins", but it's about not reverting edits marked with the bot flag more than edits from a bot (although I would expect those overlap nicely).  &mdash; Coren (talk) 04:28, 19 November 2007 (UTC)

Coordination between bots
Should the bots coordinate their activities, such as by having bots agree on which bot is going to an article or revert a specific vandalism? (SEWilco (talk) 05:26, 19 November 2007 (UTC))
 * There is no current method for bots to agree among themselves. Because all of the bots are different at their core, this would be very hard to implement.  However, Wimt suggested here that the bots be in a delayed sequence so if one bot would get it, it would always have the first chance at it.  -- Cobi(t 06:49, 19 November 2007 (UTC)
 * Perhaps the "delayed sequence" could be different for each bot based on the first (or last) letter of the article (or editor), so one bot doesn't try to do all the work. Split up the work among all the bots with them backing each other up.  (SEWilco (talk) 07:21, 19 November 2007 (UTC))


 * I know there is no current method. Should there be a requirement for coordination?  (SEWilco (talk) 07:21, 19 November 2007 (UTC))

This is more effectively achieved on the server side (not en.wikipedia.org, but the toolserver or someone else's server). It is then easy to use semaphores to get one bot/account to revert. The reverter should pass the appropriate parameters (revid, page title, cookies, a hash for security, etc) to the server. But this would require a complete rewrite of the anti-vandal bots and tools.

Trying to do this on the client (bot) side is, IMHO, stupendously hard to implement. MER-C 12:38, 19 November 2007 (UTC)


 * I don't think there should be any standard for coordination that bots should adhere to - it might stifle development of new bots. Much better to have some healthy competition between bots, IMHO.  If two bot developers wish to experiment with cooperation it should be a matter for them. Philip Trueman (talk) 12:55, 19 November 2007 (UTC)


 * It might not require a complete rewrite. If existing bots are driven from an RC RSS feed, they could instead subscribe to a toolserver RC RSS feed which was intended for them.  Perhaps the toolserver would buffer RC and give a batch of different articles to each bot, so they're each looking at different articles.  If that is the case the bots would only need to be told a different URL for their RSS feed (maybe all bots would use the same URL, or maybe a different URL for each bot).  More elaborate coordination can be built on that as needed.  (SEWilco (talk) 17:59, 19 November 2007 (UTC))


 * The whole point of having more than one bot is that one bot's heuristics may not always detect everything another bot's might, not to have them in a round-robin. If we wanted them in a round-robin, it would be best done with one bot process randomly picking which account to login as and make the revert.  And, like Philip Trueman said, healthy competition is a good thing.  It keeps bot operators on their toes making their bot better.  And a better bot means it is better at reverting vandalism which means a better encyclopedia, which should be the ultimate goal for every Wikipedian.  -- Cobi(t 22:10, 19 November 2007 (UTC)

Coordination with humans
Should the bots coordinate with humans, such as by avoiding having humans examining articles which bots are processing? For example, the "Recent changes" page might be delayed for several minutes with bots having a more recent feed so they can process articles first and thus avoid the race condition with RC Patrol seeing an edit which a bot has already reverted. (SEWilco (talk) 05:26, 19 November 2007 (UTC))
 * It would be possible for bots to hide both the vandalism and the bots reversion in the Special:Recentchanges with tools such as rollback, however there is a general consensus against hiding anti-vandal bots' edits on the RC page. -- Cobi(t 06:52, 19 November 2007 (UTC)
 * I said "which bots are processing". I was referring to articles which had not yet been altered by bots, not hiding bot edits.  Right now there is a race condition between humans and bots trying to process the same articles.  (SEWilco (talk) 07:08, 19 November 2007 (UTC))
 * See above for a possible solution. MER-C 12:40, 19 November 2007 (UTC)
 * Some human-driven anti-vandalism tools do effectively provide a delayed feed, but at the choice of the user, which is how it should be. This can be taken further - I have modified my version of Lupin's tool to ignore 'Blanked the page' edits if a bot is running, and it works for me.  But providing humans with a second-class data feed might deter them from participating.  Philip Trueman (talk) 13:10, 19 November 2007 (UTC)
 * At present RC is a second-class data feed because we're racing bots so humans are deterred from wasting their time participating; often one sees a "newer edit" link which has to be examined or one does a null edit. A tool which shows bot-ignored edits would be helpful.  (SEWilco (talk) 17:51, 19 November 2007 (UTC))

Best to keep Recentchanges "pure", I think, and leave the divide/conquer strategy to an external party. Doing this with both bots and humans seems close-to-impossible (doing it with humans, by contrast, is possible, e.g. with IRC). Grace notes T § 16:38, 19 November 2007 (UTC)

Self-reverted edits
I've been seeing much vandalism where the vandal immediately reverts their alteration. Sometimes a third preceding edit was not self-reverted. How should vandalbots handle a full self-reversion and a partial self-reversion? A full self-reversion consumes a human RC Patroller's time to examine what has been done to the article. (SEWilco (talk) 05:33, 19 November 2007 (UTC))
 * If it was a full self-reversion, then the bot's reversion would be a null edit and not be recorded in the history of the article. Since it was not recorded in the article's history, the bot shouldn't then warn the user.  -- Cobi(t 06:55, 19 November 2007 (UTC)
 * You ignored the triple edit situation, which implies that bots consider a sequence of edits as if it were a single alteration. Should that be part of these requirements?  And should the numerous people doing temporary edits be advised to not do that because people have to examine their mess?  (SEWilco (talk) 07:12, 19 November 2007 (UTC))
 * When deciding whether or not to revert ClueBot looks at only one edit. When reverting, ClueBot treats a series of up to 5 edits by the same user as a single edit and reverts past all of them.  If the series is longer than 5 edits, it aborts the reversion.  -- Cobi(t 00:34, 20 November 2007 (UTC)

Ignoring new warnings?
What is the point of the delays in the following? I can understand ignoring old warnings (and, indeed, ClueBot does), but why new warnings? -- Cobi(t 07:01, 19 November 2007 (UTC)
 * Must not warn if last AV bot warning is less than ((120 secs?)) old
 * Must only report if level 4 AV bot warning present and more than ((120 secs?)) old


 * I think that's so people have a chance to notice the warning. The vandal won't get the Talk notification until they view a page, so if they're in the middle of a vandalism during a warning they won't see the Talk notification until they complete that vandalism.  The delay gives them a chance to see the warning without getting their unwarned vandalism counted as a second event.  (SEWilco (talk) 07:15, 19 November 2007 (UTC))


 * It's also to avoid the case where, should two bots catch the same vandalism or two vandalism in very quick succession, the user is reported to AIV a few seconds after the final warning was given (I have seen AIV reports less than 5 seconds after the last warning). &mdash; Coren (talk) 16:16, 19 November 2007 (UTC)

Respecting human warnings
Should bots respect recent warnings left by humans? For example, a human gives a level 2 warning to an IP, thus a bot reverting the next vandal edit shoud give a level 3 warning or higher. MER-C 13:13, 19 November 2007 (UTC)
 * Certainly they should. But don't they honor them already? Max S em(Han shot first!) 13:32, 19 November 2007 (UTC)
 * Cluebot does, VoABot 2 doesn't. MER-C 02:00, 20 November 2007 (UTC)

Must not revert to an edit from an AV bot less than a day old
Most of the rules that exist make sense, however this one does'nt. The only instance where I can see this rule being useful is for where a bot could be editing incorrectly and reverting articles badly. My bot obeys a one revert per 24hours per article. But if an article was being heavily vandalised then the other bots would revert (if all following the same rule) for another 2-3 edits (depending if the bots notice). But other than that I can't think of anything reasonable to suggest this rule? Quite happy for people to correct me on this one, I might just be overlooking something. Lloydpick (talk) 18:44, 19 November 2007 (UTC)


 * How about a limit of one rv per day per article per username? SEWilco (talk) 03:09, 20 November 2007 (UTC)
 * Would cause major problems if the bot made a false positive, as it would revert basically everyone who reverted the bot back again. Lloydpick (talk) 11:52, 20 November 2007 (UTC)


 * Hmm.. if a bot was concerned about an article, it could remember an article and revisit it later. SEWilco (talk) 03:13, 20 November 2007 (UTC)
 * If that was the case then every revert would be something it revisits later. However with bots making false positives that's the reason it has the revert once per 24hour period. If a bot reverts a good edit then the usual policy is for the editor to re-make their edit and remove the warning. Giving it 24hours before the bot edits again allows the article to be put back to what it should be, and by that point its never going to be in the recent changes list again unless re-edited. Lloydpick (talk) 11:52, 20 November 2007 (UTC)

Resetting the warning level
CVBot currently counts up the warnings for the current month, and then issues one based on that. If we are not going to use a month as the duration, then we should re-adjust the warnings collections. For example, not using a monthly header for vandalism. Lloydpick (talk) 18:48, 19 November 2007 (UTC)


 * Is it a good idea to reset on the calendar month? Vandals race to the computer on the 1st?  SEWilco (talk) 03:18, 20 November 2007 (UTC)
 * With the nature of vandalism I wouldn't really expect them to wait for a particular day to start vandalising to simply by pass 3 warnings. Lloydpick (talk) 11:53, 20 November 2007 (UTC)

Global whitelist
An idea for the global whitelist: one easy and centralized way to coordinate such a whitelist is with the MediaWiki software that runs Wikipedia. In this vein, one means would be a page that only admins can edit; another would be a vandalism-clean up user group. If we create a user group, then, why not give those in the user group the rollback permission? Then the bots need only download the list of whitelisted users directly from MediaWiki's API before running (and the list of admins, if possible), and the users can efficiently revert vandalism without playing data ping-pong with the server. Grace notes T § 20:56, 19 November 2007 (UTC)
 * I'm personally against a white list, as to be white listed you would have to have a significant number of edits in the main space otherwise who would trust you? It's easier for the bots to check the edit count and then make a decision. For example anyone with over 200 edits could be considered safe. Plus with a white list it's just another thing that needs to be kept up to date. Lloydpick (talk) 11:56, 20 November 2007 (UTC)