Wikipedia:Bots/Requests for approval/Alaibot 4


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

Alaibot
Operator: Alai

Automatic or Manually Assisted: Automatic

Programming Language(s): Python (modification of replace.py from pywikipedia)

Function Summary: Populating categories from infoboxes and navigation templates

Edit period(s) (e.g. Continuous, daily, one time run): Series of one-off runs, at sporadic intervals as required

Edit rate requested: 6-15 edits per minute (upper rate only at off-peak hours)

Already has a bot flag (Y/N): Y

Function Details: In a number of cases, infoboxes and other templates encode information that's also required for categorisation (i.e. where there's a backlog in populating an existing category, or a consensus to create a category scheme with that information). For example, albums by year; albums by artist;  albums by genre, for stub re-tagging;  German towns and villages by region and district. (There's discussion of each of these examples at the albums and Germany wikiprojects.) This should be pretty much false-positive-free:  the relatively fixed format of the infobox makes this much easier than attempting to extract the info from free text. Additionally, I'm going to ensure avoiding category redlinks with a pattern, (where the categories are completely regular, such as with the albums), or by polling for existence of the category page, in either case skipping and logging. Here's some individually supervised test edits, for the albums-by-year example:, , , , ; article Stitches Split correctly skipped (data not present). Fixed edit summary to be something more sensible. Alai 03:23, 11 April 2007 (UTC)

Discussion
You don't mention it above, so I'll just ask for clarity: does the bot avoid adding the same category twice (ie - if the page is already in the required category when the bot comes along)? Mart inp23 23:15, 11 April 2007 (UTC)
 * Good point, yes, and just to be on the safe side, if there's any other category on the same "pattern" (e.g. any other album-by-year cat). I should also add I'm working from a db dump-generated list of candiidates that transclude the infobox, but lack the category, so this is just a double-check that no such category's been added in the meantime.  Alai 05:29, 12 April 2007 (UTC)
 * OK - are you going to be running this on any sort of infobox/category tie-in, or just "albums by year"? I'm presuming that it's the former case, in which case I'd like to just make sure that the bot will only run where there is consensus for the change, and where there are only going to be very few false positives.  Basically, I'm just asking you to follow the standard caveats for this sort of thing (like wikiproject tagging).  That said,  for up to 100 edits.  Report back with diffs when done :) Mart inp23  13:07, 15 April 2007 (UTC)
 * In short, yes to all of the above. These will be either pre-existing categories (as with the albums by year), or where there's clear consensus to have them (as with the German districts, which is also in line with other geographical categorisation);  and in either event, will be requested by, or well-flagged to, editors in that subject area.  (Infoboxes seem to be associated with wikiprojects with fairly high probability, so that's the obvious stop in such cases.)  OK, run completed, 99 edits to albums.  Results seem OK;  the only complication I hadn't anticipated was two release dates in the same field, which I've tweaked the regex to resolve in favour of the first one, textually.  Alai 18:02, 15 April 2007 (UTC)
 * OK - the edits look fine, and eveything you say above seems fine - just make sure that you when you plan to do a run, you examine the validity of it just as you would expect us to. Kingboyk makes some fair comments below, well worth looking into - would you like to re-apply to put that functionality in later, or leave it in this task, and wait for approval on both? Mart inp23  19:11, 15 April 2007 (UTC)
 * Certainly. I'll re-file.  I'd prefer to address that separately, partly because I may not want to integrate the two immediately;  might be easier to do it by way of separate runs, at least in the first instance.  Alai 21:10, 15 April 2007 (UTC)
 * Mart inp23 11:39, 17 April 2007 (UTC)

Might you be interested in updating the talk page template too? Album has several parameters which might reasonably be set by a bot: --kingboyk 16:18, 15 April 2007 (UTC)
 * class=Stub and auto=yes, if a stub template is present in the article
 * needs-infobox=yes if the album article has no infobox (although I guess you'll never visit these, since you're filtering by transclusion, you might encounter some which say yes when in fact the article now has one)
 * Both of those would be something I certainly look into, though they'd need a little bit of additional coding. The second one I could do on a separate run, assuming I can get a "clean" list of actual albums to set-diff the translusions from.  (As against "articles categorised somewhere under ", which seems to be significantly different.) The album-stubs are probably safe place to start, though. I mentioned this a while back at ALBUM; might be better to continue discussion there for the time being. Alai 18:20, 15 April 2007 (UTC)
 * Yes, the album categories are quite messy; I discovered that when doing a talk page run last year. I've asked Martin to come back and look at your diffs. Cheers. --kingboyk 18:59, 15 April 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.