Wikipedia:Bots/Requests for approval/DrilBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

DrilBot
Operator: –Drilnoth (T • C • L)

Automatic or Manually Assisted: Automatic

Programming Language(s): AutoWikiBrowser

Function Overview: Cleans up various common errors in articles using the lists at WikiProject Check Wikipedia.

Edit period(s): Basically whenever I'm editing Wikipedia.

Already has a bot flag (Y/N): N

Function Details: Using AWB and my own RegExp (after it has been tested using the AutoEd script), DrilBot would use the daily lists at WikiProject Check Wikipedia to find and repair common errors, such as Unicode control characters, bold text in and colons at the end of section headings, missplaced categories and interwikis, and links to the current article. This would be done using the basic "general fixes" of AWB, which repair some of the errors, and custom regular expressions after they have been tested to ensure a minimum of false positives using the assisted editing script AutoEd. It would also do other cleanup at the same time... header name improvements ("Weblinks" → "External links", for example), link simplification and cleanup, and adding bullet points to external links. When running, anyone should be able to shut the bot off by posting on its talk page. I'd only run DrilBot when I'm around so that I can deal with any errors quickly. –Drilnoth (T • C • L) 16:27, 7 May 2009 (UTC)

Discussion
(n.b. I am a member of aforementioned WikiProject.) I suggest anyone interested looks at the main project page, to try to grapple with the scale of the problems on merely a day-to-day basis (new article feed etc.). Would you be prepared to actually watch it, or just be around? - Jarry1250 (t, c) 16:34, 7 May 2009 (UTC)
 * I am aware of the sheer number of errors there and how many more there are each day; I'd do my best to have DrilBot running more or less constantly while I'm editing to work on some of the backlogs which can be done automatically, so that human editors can focus on doing the things that can't be fixed by bot, like incorrect brackets and ISBNs. –Drilnoth (T • C • L) 16:36, 7 May 2009 (UTC)
 * Don't worry Drilnoth, I'm sure you know the scale of the problem! (It was the bot people around here I was primarily addressing, ;) ). Well, I've never found much room for false positives on some of the simpler tasks, as Drilnoth says. Just slightly too close to trial you myself, but it's shouldn't take too long to get the trial to make sure this is fertile ground for a bot, considering the possibility for false positives. - Jarry1250 (t, c) 16:44, 7 May 2009 (UTC)
 * Ah, gotcha. Yes there certainly will be some false positives... I won't deny that. AWB's general fixes have some changes which get some edits incorrect, e.g. moving all for tags to the top of the page, but that is uncommon enough and really is an error with the article which just isn't fixed correctly that I think the benefits would far outweigh the small number of false positives that the bot would get. –Drilnoth (T • C • L) 16:51, 7 May 2009 (UTC)

The Wikipedia community tends to be extremely intolerant of "bot false-positives", meaning automated changes that need to be reverted. A 5% false positive rate would be far too much, especially with a heavy volume of edits. With that in mind, I have some questions. One, are all these fixes things you've previously used AWB to fix under your own account, manually assisted? And two, if enough people complain about a certain class of false-positive, are you willing to suspend that function, even if you feel it's doing far more good than harm? – Quadell (talk) 19:06, 7 May 2009 (UTC)
 * I certainly don't think that there would be anywhere near 5% false positives. To answer your first question, yes. I have used AWB's general fixes a lot as can be seen in my contributions. Additionally, any custom-added regular expressions would first be tested out either with manual supervision in AWB or in the AutoEd script, so that it can be ensured that the change can be made reliably. To answer your second question, if anything is causing an at all unacceptable number of false positives I will not hesitate to stop the bot from making that change, regardless of my own feelings in the particular case. A false positive is worse than no edit at all, so deactivating a problematic change is the only logical thing to do. Of course, deactivating due to a single false positive doesn't really make sense in my mind, but if there are multiple complaints or concerns then that change should definitely be deactivated. Bots may be run and maintained by just one user, but I feel that their actions should really be determined by the community, not just one person. –Drilnoth (T • C • L) 19:52, 7 May 2009 (UTC)


 * Approving bots with super-generic "fixes"-type tasks is somewhat frowned upon now. What fixes, other than AWB general fixes will it be making? Also, bots doing solely AWB general fixes are denied. What makes the additional fixes significant enough that this needs to be done quickly with a bot, rather than just adding them to general fixes and letting people do them when they make more significant edits? Mr.Z-man 06:01, 8 May 2009 (UTC)
 * (added some more indentation). There are a few questions here, and I'll do my best to answer all of them.
 * 1) Non-AWB fixes which I'd add would include things like better "unicodifying", removal of problematic Unicode control characters, and some more template/link cleanup as they are tested in AutoEd.
 * 2) The additional fixes themselves aren't what I feel makes this sort of bot needed, rather it's the massive backlog at WP:CHECKWIKI. When you say "rather than just adding them to general fixes and letting people do them when they make more significant edits?", that isn't really what happens with AWB in relation to CHECKWIKI... you use the lists there to generate a list for AWB and then basically just run the general fixes through it. However, doing this still takes quite a bit of time, so the backlogs there are still building up. Since many of these edits don't require humans to actually look at the article too much, it would be a huge timesaver to have a bot do them. For example, about a week or two ago I ran AWB on a list of about 300 or 400 article which had the CHECKWIKI error "Link equal to linktext". While doing so, all that I really did is glance over what general fixes were done to make sure that there weren't errors and clicking "save". This still took probably close to an hour and a half. If a bot had been doing this, I could have been working on some of the other problems which can't be done as easily. In essence, right now those lists are fixed by just using AWB's general fixes without any "more significant edits" being done at the same time, but a bot could handle it faster. These edits also appear on watchlists and in recent changes when they really shouldn't need to be.
 * Since DrilBot would only use the lists created by CHECKWIKI, almost every article that it edits would have an error which it could fix, so there won't be a ton of edits that just fix things like whitespace or reference order, which I agree would be kind of useless. Right now User:D6 does some of the fixes, but I'm not sure if it should be since I can't find a BRFA for that task. I feel that having an approved bot to help manage the CHECKWIKI lists would make it much easier to maintain since then human editors could focus on the things which bots can't do rather than trying to manage the whole list. If there aren't many false positives and the bot is flagged to prevent its appearance on watchlists/recent changes, I don't really see how this could be problematic.
 * (also, as a side note, it looks like Lightbot's controversy was because it changed date formatting; DrilBot shouldn't be doing anything that could cause that much controversy since all of the fixes would be to known errors, not things like date-formatting which can vary from article to article). –Drilnoth (T • C • L) 14:13, 8 May 2009 (UTC)

Let's get a feel for the sorts of changes we're talking about here. – Quadell (talk) 15:17, 8 May 2009 (UTC)
 * Can do; thanks. –Drilnoth (T • C • L) 15:53, 8 May 2009 (UTC)

The trial went almost perfectly; I'll just report the handful of false positives that I saw (and reverted or fixed):
 * Report
 * Kenneth W. Royce: Date changes where the dates were part of a book name. I fixed this error to prevent its occurrence in the future at about 16:30 UTC by disabling that particular AWB general fix.
 * Balázs Megyeri and Balázs Vattai: Changed REFERENCES to ReferenceS, with the capital "S". Both the all caps and the some caps versions are wrong, so it was just a change from one wrong version to a different (maybe slightly less) wrong version.
 * Coat of arms of Whitehorse, Yukon, Coat of arms of Victoria, British Columbia, Coat of arms of Victoria, British Columbia, and Coat of arms of Victoria, British Columbia: These four have very poorly done DEFAULTSORTS (specifically, ) and DrilBot got confused. These DEFAULTSORTS really shouldn't be there in the first place (they make no difference on categorization), and I don't think that AWB was set up to handle it. This should be a very rare occurence; there was just a set of almost identical articles and almost all of them had this error. I've never seen this particular code before, and once these article are manually fixed I doubt that it should ever really come up again.

Shall I continue the trial or stop for now? –Drilnoth (T • C • L) 16:48, 8 May 2009 (UTC)


 * Yes, you have 22 hours or so left to go. :) – Quadell (talk) 17:04, 8 May 2009 (UTC)
 * Okay; I wasn't sure if you wanted me to give occasional reports or just lump it all together at the end. –Drilnoth (T • C • L) 17:54, 8 May 2009 (UTC)
 * I can't speak for Quadell, but I would advise you make the most of the trial - perfect the regexes, get some edits done. Record your fixes (and what you could fix) and report back here at the end. Good going so far, by the way. - Jarry1250 (t, c) 17:57, 8 May 2009 (UTC)

should probably be added to the general fixes) I think you did a lot of good work and WP:CHECKWIKI and I'd happily back any proposal that helps you. -- User:Docu   22:42, 8 May 2009 (UTC)
 * Note
 * If you are working on the various lists from WP:CHECKWIKI, I think it's worth mentioning in the edit summary which list the bot is currently processing. This can help checking what was fixed, even if other general fixes are applied too.
 * If you come across general fixes that need improvement, e.g. defaultsort in the sample above, I think they should mentioned at WP:AWB/BUGS. This avoids repeating them. (
 * If there are fixes that could be applied to all operations, I think it's worth suggesting them at WP:AWB/FR. This gives the developers feedback and helps develop the program.
 * AWB might fix some of WP:CHECKWIKI, but it wont fix all of them. (I once did the other similar report). There are other reports, e.g. #7 where AWB can fix many, but you will still need to do the remaining ones manually. As for report #64, I think it should be de-activated, there isn't really a point in fixing mainly " link ".
 * Keep in mind that it's less risky to use a regex that fixes problem #xx through a list of articles with problem #xx than the same applied to all articles. You might want to look at the various general fixes and make sure you really want all of them applied automatically (see WP:AutoWikiBrowser/Custom_Modules for the details). Smackbot probably doesn't use all of them either.
 * Thanks; I have a few comments here.
 * I can include which list is being processed, although sometimes I might just omit that part of the summary if I'm doing a short list (like 15-30 items).
 * Of course; I just didn't mention this one because, as I said, it's an extraordinarily rare error.
 * Most definitely; I've already mentioned a few things at WP:AWB/FR, although most of what I'd suggest are already implemented in the next version.
 * I had been under the impression that the "AWB assisted" problems required human attention to fully fix... is that not correct?
 * The way that I see it, if a general fix doesn't have many false positives it is beneficial to apply it at the same time as the CHECKWIKI edits. If a particular change seems prone to false positives I'll deactivate it the way you mentioned, which I already did with date reformatting.
 * Thanks! –Drilnoth (T • C • L) 01:24, 9 May 2009 (UTC)
 * Some of the unbalanced brackets issues (both square and curly) are fixed automatically (approx 20-30% in my experience), but most do indeed need human supervision. - Jarry1250 (t, c) 10:08, 9 May 2009 (UTC)
 * Ah; thanks for the info. –Drilnoth (T • C • L) 13:33, 9 May 2009 (UTC)

I'm going to be away for a few hours, so here's my report on the 500+ edits that DrilBot made yesterday (I checked all of them for errors):
 * Report after trial
 * Things went almost perfectly; there were just a handful of false positives as described below.
 * There were some cases where the error detected at CHECKWIKI couldn't be fixed; in some cases DrilBot then skipped the article entirely and in some cases it made other improvements. This isn't a problem per se, just saying that not all articles that it edits will still have the error detected at CHECKWIKI.
 * In addition to the false positives above (which were weird situations or which have now been fixed), there were three other pages where DrilBot caused an error: Real Madrid C.F., Sport Club Corinthians Paulista, and Undrafted sportsperson. These were all instances where it moved the for or dablink templates to the top of the article when they properly belonged in a section. I reported this at WP:AWB/B and it should be fixed in the SVN builds (which I plan to try and download if DrilBot is approved, to have the latest version) and the next full release. Until I can update my version of AWB, I'll have the bot skip and page containing either of those templates.

I didn't notice any other potentially problematic edits. –Drilnoth (T • C • L) 13:32, 9 May 2009 (UTC)
 * Oops, hadn't seen this before: . –Drilnoth (T • C • L) 18:15, 9 May 2009 (UTC)
 * The for and dablink problem should now be fixed thanks to an AWB update. –Drilnoth (T • C • L) 15:05, 10 May 2009 (UTC)

This seems fine to me. If there are no objections in the next few days, I'm inclined to approve. – Quadell (talk) 15:34, 10 May 2009 (UTC)
 * Thanks. I've posted about DrilBot at WP:VPM per Jarry1250's suggestion on my talk page. –Drilnoth (T • C • L) 16:05, 10 May 2009 (UTC)

Looks good. – Quadell (talk) 01:22, 12 May 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.