Wikipedia:Bots/Requests for approval/WildBot 5


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

WildBot 5
Operator:

Automatic or Manually assisted: Automatic

Programming language(s): Python

Source code available: https://svn.toolserver.org/svnroot/josh/WildBot/book_checker.py

Function overview: Note a book's problematic links on the talk page

Links to relevant discussions (where appropriate): Bot requests

Edit period(s): Continuous, with periodic sweeps for related changes that affect books.

Estimated number of pages affected: The current standing of the world:
 * 590 community books, containing 3030 redirects, 35 redlinks, 64 disambiguation pages
 * 3077 user books, containing 8737 redirects, 819 redlinks, 671 disambiguation pages

Exclusion compliant (Y/N): Standard in pywikipedia

Already has a bot flag (Y/N): Y

Function details: Books in Category:Wikipedia books (community books) and Category:Wikipedia books (user books) are checked for duplicated entries (including those due to redirects), redirects, linking to disambiguation pages and redlinks. Redirects in the book are followed to the target article. A note will be placed on the book's talk page if any problems are found.

The notes will look something like:

This bot will be tightly coupled to the toolserver, utilizing SQL queries for most of its data-access.

Discussion
I don't see any problems with this task, and no one has voiced any opposition, so &mdash;  The   Earwig   @  22:20, 12 February 2010 (UTC)
 * Two edits have been made:
 * http://en.wikipedia.org/w/index.php?title=Book_talk:Fr%C3%A9d%C3%A9ric_Chopin&diff=prev&oldid=344813729
 * http://en.wikipedia.org/w/index.php?title=Book_talk:Acid_jazz&diff=prev&oldid=344812625
 * The message box for all problems is the same: User:WildBot/b01 - it accepts parameters for each kind of problem. I'm hoping HeadBomb will fiddle with it.  I'll do the rest of the run within about a day, baring unforeseen circumstances.  Josh Parris 14:01, 18 February 2010 (UTC)


 * Looks good, although for the revision I would use
 * Problems have been found in this book (rev 334669812, 13:03, 18 February 2010 (UTC))
 * rather than
 * Problems have been found in this book http://en.wikipedia.org/w/index.php?title=Book:Frédéric_Chopin&redirect=no&useskin=monobook&oldid=334669812 at 13:03, 18 February 2010 (UTC)
 * I've made the change in the template and gave it a spitshine. They changes should be self-explanatory (revision is now simply the revision id rather than the url, time is specified separately). It will require very minor modifications to the bot (aka it can be less verbose since things are now explained through the template). Not too sure what "Internal links" are, but that can be addressed later if need be. As far as I'm concerned, the bot works just as intended, and the sooner it's unleashed, the happier I am. Whatever kinks are left in it can be worked out later. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 20:23, 18 February 2010 (UTC)
 * Yes, internal links are of the form - naturally, there shouldn't be any and they make no sense. I haven't searched for them (the test just naturally fell out when looking for links to targetpage ) and I felt the distinction important. Josh Parris 09:37, 20 February 2010 (UTC)

The remaining trial edits are here: http://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=20100220092400&target=WildBot&limit=28 they represent a hand-picked list of community books with multiple problems (dabs and redirects, plus 5 with redlinks); the hand-picking took a while, but I did discover a problem with my original queries for disambiguation pages in books (revised figures above). Headbomb: I haven't finished the handful of tweaks needed for broader run; I won't be doing that full run until the tweaks are finished (shouldn't take long). When that happens I can easily reprocess all pages currently tagged with the template in case those changes affect any of them. Some of the pages in the trial run now have pretty large templates on them. Josh Parris 09:37, 20 February 2010 (UTC)


 * I've reviewed the edits and most are fine. However, and  (I note that there are only dabs reported) are not using User:WildBot/b01, but rather User:WildBot/msg.


 * Also the large templates don't matter as much as in mainspace since discussions are usually minimal in the book namespace. Although I suppose a good way of saving space could be to use columns-list. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 09:45, 20 February 2010 (UTC)


 * Off the top of my head I can't explain those two edits; internally WildBot splits everything from Wikipedia books (community books) into a separate processing stream very early on. I'm going to have to figure that bug out before any wider run.  Expect that to take couple of days (if it's not obvious what's wrong, it's going to be weird). Josh Parris 10:12, 20 February 2010 (UTC)


 * I'm no coder, but looking at that coder I suspect the problem lies with calls to "dab_template_placer.py" or "dab_template_placer.py" itself. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 15:58, 20 February 2010 (UTC)


 * That particular bug is squashed. It was down to shoddy coding inside the category splitter: it wasn't handling the division of data Mediawiki performs when keeping the size of queries down.  So, I fixed that, and also wound the query size up to something other than the default.  Now I've just got to do those tweaks. Josh Parris 18:54, 20 February 2010 (UTC)


 * Are we ready to approve?  MBisanz  talk 03:13, 22 February 2010 (UTC)
 * If you're comfortable those tweaks will be done and the bot operated responsibly, sure. Josh Parris 03:14, 22 February 2010 (UTC)
 * I understand you still need to make some "tweaks". I'm not sure what they are specifically, but I hope they address these two concerns: I'm confused about the parameter that is not specified by the bot, yet used in the template. It appears to be a mistake. Also, as Headbomb suggested above, I think it would be a good idea to use rev 334669812 instead of http://en.wikipedia.org/w/index.php?title=Book:Frédéric_Chopin&redirect=no&useskin=monobook&oldid=334669812, as it is neater. The bot can simply strip the oldid from the end of the url, then use that for the  parameter. The template could use something like: rev . However, aside from those two things, the bot seems okay and capable of being approved. &mdash;  The Earwig   (talk)  04:07, 22 February 2010 (UTC)


 * The tweaks seem to be of the nature of making the bot use rev and time correctly (the first version of the template used http://...), as well as reduce the verbosity of lines such as "Ballades (Chopin) is present more than once, possibly because of redirects." to something like "Ballades (Chopin)", since the template already mentions that this is possibly due to redirects. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 04:13, 22 February 2010 (UTC)
 * I'm also not entirely happy with the redlink text, and was hoping to template that so it doesn't require a code-change, but templates inside templates are a bugger to parse with regexes; I'm getting close to biting the bullet and doing this properly with a real parser. I'm wanting to change the duplicate detection code so it lists duplicates, and there are other small things regarding the presentation of the information - but nothing that changes what it operates on, on what ultimately gets reported - just how friendly and helpful it is. Josh Parris 04:54, 22 February 2010 (UTC)

Tweaks
I noticed that WildBot just removed all the notices with the summary "all problems fixed", even on books where the problem weren't fixed. What gives? Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 06:02, 22 February 2010 (UTC)
 * Remind me never to tweak anything ever again . The blanking was due to the regex removing the existing box and an internal logic failure associated with the link-parsing regex.  I've re-written the regex and output code and now it's all templated, so Headbomb you can fiddle all you like with the formatting.  Now duplicate links with an accompanying HTML comment are not listed as duplicates, all output is templated (with the side effect that the message now takes up much less wikisource) and I figured out a cunning way of avoiding using a proper parser. I've discovered that article text isn't available to toolserver SQL queries, dispenser's dab_solver doesn't yet work for books, and I need to rewrite pywikipedia's getall function.  Updated book talk pages are listed at http://en.wikipedia.org/w/index.php?title=Special:Contributions&offset=201002231314009&limit=23&target=WildBot Josh Parris 13:29, 23 February 2010 (UTC)
 * BTW: Headbomb, there seem to be a number of duplicate or near duplicate books. Josh Parris 13:29, 23 February 2010 (UTC)
 * Yes, I am aware. I've notify wikiprojects of this whenever I see them, but they are slow to respond. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 13:32, 23 February 2010 (UTC)


 * BTW, the bot adds a useless "|b01" at the end of each template. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 13:46, 23 February 2010 (UTC)


 * Not useless, cunning. That's my end-of-template marker for the regex. Josh Parris 14:09, 23 February 2010 (UTC)


 * Ah. I should've written a-priori useless :P. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 14:11, 23 February 2010 (UTC)

Also, could you amass all the links to the source codes (there's more than i'm sure)? This way it would be easy for other bot-coders to retrieve everything they need and adapt this bot for the other Wikipedias / other wikis ? Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 14:15, 23 February 2010 (UTC)


 * Just drop the file off and you get all of WildBot's sources ( https://svn.toolserver.org/svnroot/josh/WildBot/ ). Which reminds me... Josh Parris 14:23, 23 February 2010 (UTC)


 * Oh one last thing, could Wildbot edit the saved book and add yes? This way users who download books know that the book might not be in tip-top shape. (It doesn't do anything yet, but I'll add this functionality to saved book soon). Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 11:30, 24 February 2010 (UTC)
 * I can do this. It will take a little while (WildBot's never altered non-talk pages before).  I presume that the parameter needs to be removed when the book is a happy place. Josh Parris 12:10, 24 February 2010 (UTC)
 * Indeed. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 12:14, 24 February 2010 (UTC)
 * saved book is ready. Anyone feeling like rubber-stamping WB5 as approved or should this be tried out first? Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 21:29, 25 February 2010 (UTC)

←Hm. As long as the main part of the bot is the same, it should be fine, but because I want to make sure the changes will work properly, &mdash;  The Earwig   (talk)  16:54, 26 February 2010 (UTC)


 * See the list of currently tagged books to inspect the state they're in. Adding saved book created numerous problems, all fixed.  I intend to keep WildBot's new functionality on a short leash for the first hundred or so edits and will only be enabling it when I've got several hours free to supervise it. Josh Parris 14:02, 3 March 2010 (UTC)


 * Looks good to me. Headbomb {{{sup|ταλκ}}κοντριβς – WP Physics} 15:08, 3 March 2010 (UTC)


 * &mdash; The Earwig   (talk)  01:55, 4 March 2010 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.