User:AllyUnion/NekoDaemon

Categories for deletion / category population / category depopulation bot assistant
This bot would be solely responsible for depopulating, and repopulating categories. Primarily used in the assistance of categories for deletion, but I realized how it may be used for other purposes. -- AllyUnion (talk) 15:01, 17 Mar 2005 (UTC)
 * I thought we already had one?? --ssd 13:04, 18 Mar 2005 (UTC)
 * We did. But I think it was banned. -- AllyUnion (talk) 08:15, 19 Mar 2005 (UTC)
 * Pearle was mistakenly banned for 24 hours by someone who thought she wasn't authorized to tag articles .  Then there was a period of time when she was out of service because people had complained about style details and I was too busy to implement the significant reprogramming that resolving the complaint required.  Then there was the recent problem with wholesale deletion of interwiki links.  But everything is back to normal, and Pearle has been clearning the backlog on WP:CFD. -- Beland 04:27, 20 Mar 2005 (UTC)

Process of how the bot does it:

Depopulation process

 * 1) Check user subpage for category de-population request. - May run at specific times or intervals
 * 2) Check if the request is signed and from an administrator from the list of Special:List of administrators
 * 3) Check if the category has gone through the proper channels, specifically if it has been listed on CFD.
 * 4) Assert no unprocessed rename request has been requested, and assert no "undo" request has been made either
 * 5) Generates a list of pages that need to be edited, and post it into a wiki list format on to user subpage.
 * The purpose of this is to allow the bot to keep a track of articles so that it can undo the damage if necessary.
 * 1) Queries the administrator by posting a message on their talk page, confirming the requested action.  Then waits for an "okay" from the administrator (This particular step may be difficult to program, so I'm uncertain if I can program it in...)
 * 2) Goes ahead and depopulate the category.
 * 3) Once complete, notifies the administrator on their talk page that the requested task was complete.
 * 4) Strikes out the request using, posts underneath that the request has been filed

Undoing the damage of a removal request

 * 1) Check user subpage (separate from the depopulation page) for re-population / undo request
 * 2) Check if request has been completed and listed at the de-population request user subpage.
 * 3) Check if the request is signed from an administrator
 * 4) * Check if the request is from the de-population requesting administrator.
 * 5) If no: Wait until a second and third administrator confirms the request. (preventing abuse)
 * 6) If yes: Don't wait for a second and third administrator approval request. (if the nominating administrator screwed up, the bot can fix this with no problems to the administrator.)
 * 7) Queries the administrator(s) by posting a message on their talk page(s), confirming the requested action.
 * 8) Waits for approval from all administrator(s) who have confirmed or requested the undo.
 * 9) Goes ahead and repopulate the category
 * 10) Strikes out the request using, posts underneath that the request has been filed

Rename process request

 * 1) Check user subpage for category renaming request. - May run at specific times or intervals
 * 2) Check if the request is signed and from an administrator from the list of Special:List of administrators
 * 3) Check if the category has gone through the proper channels, specifically if it has been listed on CFD.
 * 4) Assert no depopulation request has already been made (it follows that if a depopulation request has already been made, then renaming becomes redundant)
 * 5) Generates a list of pages that need to be edited, and post it into a wiki list format on to user subpage.
 * The purpose of this is to allow the bot to keep a track of articles so that it can undo the damage if necessary.
 * 1) Queries the administrator by posting a message on their talk page, confirming the requested action.  Then waits for an "okay" from the administrator (This particular step may be difficult to program, so I'm uncertain if I can program it in...)
 * 2) Goes ahead and depopulate the category.
 * 3) Once complete, notifies the administrator on their talk page that the requested task was complete.
 * 4) Strikes out the request using, posts underneath that the request has been filed

Undoing the damage of a rename request

 * 1) Check user subpage for rename undo request
 * 2) Check if request has been completed and listed at the de-population request user subpage.
 * 3) Check if the request is signed from an administrator
 * 4) * Check if the request is from the renaming requesting administrator.
 * 5) If no: Wait until a second and third administrator confirms the request. (preventing abuse)
 * 6) If yes: Don't wait for a second and third administrator approval request. (if the nominating administrator screwed up, the bot can fix this with no problems to the administrator.)
 * 7) Queries the administrator(s) by posting a message on their talk page(s), confirming the requested action.
 * 8) Waits for approval from all administrator(s) who have confirmed or requested the undo.
 * 9) Goes ahead and renames the category back to the original state
 * 10) Strikes out the request using, posts underneath that the request has been filed

Population process
User process:
 * 1) A user creates a list of pages on the bot's user subpage
 * 2) A user then makes a request

The bot follows up with:
 * 1) Check user subpage for category population request
 * 2) Check if the request is signed and from a logged in user
 * 3) Verify that the list has been approved and reviewed by an administrator, and that the last edit was made by an administrator
 * 4) Verify that another administrator (even if the logged in user IS an administrator) has approved the population request
 * 5) Assert no unprocessed requests (above) have been made for the category requesting to be populated
 * 6) Edits the subpage list of pages, marking that it is or has been processed.  The purpose behind this is for the bot to "tag" the state of the article, making certain that if it ever needed to undo the damage, it has a point of reference to do so.
 * 7) Queries the administrator by posting a message on their talk page, confirming the requested action.  Then waits for an "okay" from the administrator (This particular step may be difficult to program, so I'm uncertain if I can program it in...)
 * 8) Goes ahead and populates the list of articles
 * 9) Once complete, notifies the approving administrator and the requesting user on their talk pages that the requested task was complete.
 * 10) Strikes out the request using, posts underneath that the request has been filed

Population undo request

 * 1) Check user subpage for population undo request
 * 2) Check if no request has previously been made, and assert that there are no open requests of the same article in the population request
 * 3) Assert that this undo request is valid, and does not conflict with the CFD process.  If the undo request is made, and the category is listed on CFD, the bot will refuse to undo the population request.
 * 4) Check if the request is signed from an administrator
 * 5) Queries the administrator(s) by posting a message on their talk page(s), confirming the requested action.
 * 6) Waits for approval from the administrator who have confirmed the undo
 * 7) Goes ahead and reverse all its edits made under the population list
 * 8) Strikes out the request using, posts underneath that the request has been filed

Automatic page archival
Would automatically archive subpages after 50 requests.

General considerations for security and abuse
These are a few considerations that need to be made to prevent abuse and vandalism.
 * 1) Assert the signed name did come from that user
 * 2) Assert that no alternations had occurred to the list of pages that it needs to undo.

-- AllyUnion (talk) 15:01, 17 Mar 2005 (UTC)

Discussion

 * Why not simply protect the user subpage so that only administrators can edit it, and make the Talk: page or a subpage available for requests? r3m0t talk 18:10, Mar 17, 2005 (UTC)
 * Protected pages considered harmful. Also it isn't completely necessary.  Although, technically, the bot could simply post a static page somewhere on the WWW. -- AllyUnion (talk) 22:08, 17 Mar 2005 (UTC)
 * You are going to require administrator signatures for all changes anyway. Protecting the page simplifies the job of the bot and makes more obvious the procedure for the user. r3m0t talk 22:15, Mar 17, 2005 (UTC)
 * So what do you do for requested lists for population? Protect them when they have been reviewed by an administrator? -- AllyUnion (talk) 06:29, 18 Mar 2005 (UTC)

Pearle perspectives
Well, this definitely overlaps a lot with what Pearle does now. I having working code to:
 * Add an article to a category.
 * Remove an article from a category.
 * Change an article from one category to another in a single edit.
 * Depopulate an entire category.
 * Move an entire category, including intro text.
 * Add the tag to a page.
 * Remove the tag to a page.
 * Maintain proper ordering for category, interwiki, stub, and other tags. (This was a non-trivial parsing problem, and this code is used by all of the other editing commands.)

I have found that there are certain complications that do crop up.
 * Many times there are mistakes in specifying the desired command, or weirdnesses in the names or contents of pages (including special characters) that cause execution to fail with an error message. Many errors are intentionally fatal, because bad input on the first line is often a good indication that subsequent input lines are also bad, and the early warning prevents undesirable edits.
 * Some edits require human followup. Instead of capturing terminal output and examining it, I have taken to tagging articles and then checking Category:Pearle edits needing manual cleanup and the like.
 * Category-frobbing operations and batch operations in general generate a considerable amount of server load. Some batches take a long time (on the order of hours) to run, because there is a lot of artificial delay, to allow the servers plenty of time to service human editors.

My general advice:
 * A big, red "emergency stop" button would be a good idea.
 * I like the idea of posting errors and completion notices to user talk pages.
 * The "undo" feature is an excellent idea. It may not be necessary to maintain verbose logs on a wiki page.  As long as there's a way to select which command should be undone, most of the rest of the information that humans need to see is accessible from the "User contributions" page.  The bot could simply store "undo" information on local disk instead of a "really hope no one changed it" wiki page.
 * Design the input interface so it's hard to make massive errors, and avoid command-line-like syntax. An HTML form with a "from" field and a "to" field might be better than a special Wiki page for this reason.  That way, you don't have to worry about whitespace, and you can provide immediate validation.  On the other hand, it makes authentication against Wiki user harder.  (And you could squawk about input errors to their talk page, anyway.)  A CGI-powered HTML interface might also reduce complexity in dealing with race conditions, edit conflicts, unexpected changes, etc.  Certainly some sort of status reporting mechanism is necessary.
 * The ability to queue multiple batches would be nice. That way, if there is an error with one batch, it can move on to the next.  This would also enable multiple users to queue sequential requests.  You're pretty much going to have to have a queue of some kind, because a lot of "umbrella" requests that are basically long lists of changes are punted to the CFD bot.
 * You will have to be careful not to allow cleverly crafted user input to compromise the machine that the bot is running on, circumvent the authentication mechanism, or cause undesirable behavior that could affect Wikipedia operations.

With regard to user authentication and security restrictions...

Current policy requires that all categories being moved or deleted be listed on WP:CFD. However, because of the complexities of how nominations are made, especially the large batch moves that require bot assistance, it is not possible to automatically verify that a category has been listed there. It is possible to automatically require that a page have been previously tagged or  or whatever, and to print a fatal error message explaining that this is a prerequisite.

The upshot of this is that at some point a human is going to have to look at WP:CFD and decide what input the bot should get to implement any given decision. On the one hand, it's good to keep this as open as possible, to prevent a backlog of requests from piling up. On the other hand, it's important to keep the bot out of the hands of vandals and people who don't cooperate with the CFD process. For similar reasons, people who want to operate bots have to get approval from this page. So I would agree that unrestricted access would be a bad idea.

My advice would be to start with a relatively simple authentication model, and add complexity only if problems occur. I can think of three different mechanisms:
 * Allow access by authenticated Wikipedia administrators.
 * Allow access by authenticated Wikipedia users on a special list.
 * Allow access by anyone, but run commands on a 24-hour delay, to allow others the chance to veto execution.

Perhaps some combination of these would be optimal. Personally, I'm not an administrator, and I do clear out a great deal of old and complicated WP:CFD requests, so I'm hoping some kind of mediated access for non-admins will be allowed. I do actually like the idea of a 24-hour delay, because I have certainly made spelling mistakes and misinterpreted people's suggestions before, so a little pre-publication peer review might be a good thing. But on the other hand, it's not like there isn't plenty of peer review after the fact, and an "undo" feature would make the difference somewhat smaller.

I would be happy to share my existing code base to speed up this project, or to give advice about a re-implementation, including how to avoid complaints and behavior deemed undesirable by the community. In fact, it would be nice to pass this rather mundane janitorial position on completely to another party or a community project, and move on to more interesting things. -- Beland 07:31, 21 Mar 2005 (UTC)

Authentication of requests
How about we use a combo? An offsite request is made using a CGI form somewhere where the bot can access and the request is static. Then a Wikipedia subpage somewhere is confirmed by the requesting user and signing it. The file name of the request should match the subpage name. Based on checking the history, a confirmation can be given on who request it and the like. Also, it can match the authorization methods that I suggested above. -- AllyUnion (talk) 07:07, 23 Mar 2005 (UTC)


 * Hmm...I wonder if embedding an HTML form on a wiki page would make things easier. In any case, I'm merely trying to make suggestions to simplify your job and making using the thing as easy as possible.  Whatever sort of interface you see fit to code is fine with me.  It's certainly better to have a simple thing that works than a complex thing that's halfway finished. -- Beland 02:42, 24 Mar 2005 (UTC)