Wikipedia:Bots/Requests for approval/CbmBOT


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.

User:CbmBOT

 * What: The purpose of this bot is to simply update the "Number of Articles Remaining" table on Category:Cleanup by month. The bot can be run manually, or scheduled to run automatically (it is currently not), but needs to run no more than once a day to keep the table updated to the scope that it is currently being done. The bot is a PHP 5.14 Command line interface, which operates on a Unix machine, and makes use of cURL and extensive regular expression parsing.
 * Why: As the data for this table is constantly changing, it is rather tedious for a human to manually update the data. Instead, this bot will retrieve all the relevant information necessary, and update the section automatically. Given the fact that manual updates generally introduce errors (see update history), performing this task automatically is a better way.
 * How: The bot uses cURL/regex to pull and parse a minimal number of pages – the Cleanup by month category pages, plus Category:Cleanup by month, Category:Music cleanup by month, and Special:Statistics. An average run of the bot, which takes about three–four minutes, pulls less than one hundred pages. As well, if, at any time, a page is pulled incorrectly (usually a result of a timeout, the bot will abort, pulling no further pages and making no changes to the category page.

On each category page, the line "There are ## pages in this section of this category." is parsed to determine how many pages are on that page of that category. As categories can span multiple pages, "(next 200)" links are also followed. Finally, the same is done for the subcategories of Category:Music cleanup by month. When all counts have been retrieved, the bot will pull the total number of articles in the English wikipedia from Special:Statistics, and then format a new wikitable for output into the article. It is also possible to configure the bot to output the wikitable to stdout, rather than edit the page, if necessary.

The bot keeps track of a number of statistics, including total number of pages processed, total time, etc. While functionality does not yet exist to do so, it would not be hard to extend the bot script to maintain these statistics on a subpage of the User page. —The preceding unsigned comment was added by Dvandersluis (talk • contribs) 20:43, 19 July 2006 (UTC)
 * Ok, looks ok, can you make a trial run and post a diff please -- Tawker 07:24, 21 July 2006 (UTC)
 * I'm not exactly sure what you're asking me to do... run the bot once? –Dvandersluis 03:31, 23 July 2006 (UTC)
 * Pretty much, the trial run is for a week, and you can run the bot, carefully checking it's edits. During the test keep the edits to no more then 2-3 per min.  After the run(s) post the difs here for review by the group/community. —  xaosflux  Talk 00:55, 26 July 2006 (UTC)
 * Oops, hadn't seen this earlier. Here are the diffs:
 * [ diff for Thursday, July 27, 2006]
 * [ diff for Friday, July 28, 2006]
 * [ diff for Monday, July 31, 2006]
 * [ diff for Tuesday, August 1, 2006]
 * [ diff for Wednesday, August 2, 2006]
 * [ diff for Thursday, August 3, 2006]


 * Everything looks in order here, appears useful, and does not appear to be a resource hog. BOT APPROVED Does not appear to need a flag. —  xaosflux  Talk 01:30, 4 August 2006 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.