User:Josh Parris/Bot op experience

Wikignoming is an under-the-radar activity on Wikipedia that improves articles in subtle ways, changing them from dumps of information to polished articles that are integrated into an encyclopedia; examples include improving prose, fixing typos, correcting poor grammar, finding illustrations, placing infoboxes, organizing articles, categorization, creating and maintaining interwiki links, repairing broken links - the list of fixes is enormous, and quite a number of them can be partially or fully automated. That's where bots come in. Bots act as superchargers for wikignomes.

Bots, like wikignomes, come in all varieties; the high-profile glamor magnets you've probably seen in action already - antivandal bots. But there are bots that do things like delink dates, repair broken citations, tag articles for wikiprojects, migrate templates and populate articles with boilerplate assembled from census results. Bots work behind the scenes, improving consistency and performing thousands of tedious edits.

I got introduced to Wikipedia bots when I picked up Pywikipedia to use it to help my efforts in the Disambiguation Challenge (a bot whose every edit is approved by a human can be run by anyone). I customized it and decided I ought to take the customizations and turn them into an authorized bot. I went about things almost the right way; I picked the framework first, and then had a look at the API. I chose the framework based on maturity and wide use; looking at recent BRfAs I saw it was a commonly used framework. I thought that would translate to good support, a full feature set and a reliable bug-free implementation; I didn't see any of these. A better call would have been to find a few active bot developers and asked them for their recommendations, given my plans; and to pick a framework that implemented all of the MediaWiki API, or at the very least those parts of it I expected to use.

While the documentation for the API is not terribly approachable, if you learn to read between the lines you can find out plenty. For example, it turns out that the API only allows for most of the MediaWiki web interface. However, the with the things that can be done a lot of things are available to a API query that aren't possible in the web interface - for example, in one call I can find out for a list of pages which are redirects, plus all the categories those pages are in (handy for cheaply checking birth and death year dates against those for people listed on a page). I don't know of any framework that allows you to do these combined queries, so I ended up coding a surprising number of calls to the API myself; thankfully the API is pretty easy to use, so only a minimal amount of head-scratching and swearing was involved.

How is it that I always propose the contentious bots? I think it's because I aim too high, and don't understand the consequences of what I'm proposing.

One of the most frustrating and rewarding things was the BRfA process; I went into it with a cunningly planned, stupid idea. Repeatedly.

Another mistake I made was to make subsequent BRfAs while there was still a support burden for my existing functionality. A bot is more helpful if it doesn't accidentally do things that confuse, annoy or delay people - so it's preferable to concentrate on bug fixing before extending functionality.

Advice for potential operators? A good reason to build a bot is when it's easier to build a bot to fix the problem than to fix it by hand - for example, Bots/Requests for approval/WildBot 4 was to fix a bunch of incorrect redirects which would have been a nightmare to do manually. And having built a tool, no human ever needs to do that task again, the bot will just keep on running. Additionally, a chunk of the code from that bot got used in another bot that checks for broken #section links - the costs of building a chunk of code are amortized over time and possibly a number of related tasks. So, my advice would be: be lazy. Work hard to avoid work.

-- Josh Parris

(14:24:08) Tim1357: I'd point out that Wikipedia is losing editors. It is important for as much work as possible to be done by bots.

It's easy to build a wikipedia bot; there are libraries for many languages with swathes of example code; many existing bots publish their source code or have operators who are happy to hand it over on request. Getting a minimal bot going is not hard, and often there are tools that already do what you want to do - AutoWikiBrowser and pywikipedia spring to mind. Most interwiki bots are just the interwiki bot that comes with pywikipedia. (what's an interwiki bot?)

But one thing is consistent: we need more bot developers/operators. There is an on-going flood of requests at BOTREQ and the active developers just can't keep up, so many worthy projects are being left unpursued. If you have development experience, and are an editor in good standing, please consider helping out! (how? What do I do?)

Why programmers? Because bots require programming. Straight-forward tasks are often handled by users with generalised tools, especially AWB and the pywikipedia tools.

Once problems move beyond the capabilities of a tool that examines wikitext, makes a decision and changes the same wikitext,