User:DYKToolsBot/BRFA 2 Draft

Definitions

Nomination: A nomination template, i.e. a subpage of Did you know nominations.

Hook: A string starting with "..." and ending with "?". Optionally includes a tag such as "ALT1".

Target: An article referenced from a hook using a bolded wikilink. All hooks have one or more targets.

Hookset: A template containing a collection of hooks along with other metadata. One of Did you know (i.e. the current hookset), the 7 numerically named subpages of Did you know/Queue, or the 7 numerically named Did you know/Preparation area 1, etc.

DYKToolsBot is already approved for a different task, but does not have admin rights. This new account (DYKToolsAdminBot) will handle tasks that require admin rights. They share the same code.

There are two distinct tasks proposed here, protect and unprotect. Both tasks are run as scheduled toolforge jobs. Currently both tasks run every 10 minutes, offset by a few minutes. The exact timing is not critical.

The protect task does:

The unprotect task does:
 * 1) Parse the main page + queue hooksets, extracting all the hooks.  From the hooks, extract the targets which need protecting ("protectable targets").   These titles are indicated by wikilinks set in bold.  There is typically one target per hook, but there can be more than one.
 * 2) For each protectable target, indef move=sysop indef protection will be applied.
 * 3) The page protection log messages will include a link to a page in the bot's userspace explaining the process.
 * 1) Queries the bot's user log with type=protect for the previous N days, where N is long enough to account for any hooks which have progressed through the normal promotion process plus extra time to account for intra-queue hook swapping.  It's currently set to 9, but might need to be increased.  The exact value is not critical.  These are the "unprotectable targets".
 * 2) The current list of protectable targets is acquired as in the protect task.
 * 3) Any targets in the unprotectable set which are not also in the protectable set are unprotected.

I considered computing an expiration date and only protecting until then. The problem is that the expiration date is a moving target. Hooks often get shuffled around when problems are discovered. Sometimes hooks get unpromoted entirely after hitting a queue (or even when they're on the main page). Sometimes the queue processing schedule is disrupted by failure of the bot which manages that process (this has happened a couple of times in the past few weeks). A few times a year, queue processing toggles between 1 per day or 2 per day. Keeping track of all these possibilities and updating the expiration time would add significant complexity for no benefit. It's far simpler to use a declarative approach, in the style of puppet; periodically figure what state each target should be in right now and make it so, regardless of history.

Known problems

On rare occasions, hook targets are written as templates such as one of the (many) Ship variants. The current code does not recognize these properly This happens infrequently enough, and it's difficult enough to do correctly (it requries a call out to Parsoid), and the consequence is mild enough (a page doesn't get the move protection it should), that I'm not going to make it a blocker for an initial deployment.

If a target was already move protected before entering the DYK pipeline, it will have that protection removed when it transitions out of DYK. The probability of this happening is so low, I'm going to ignore it. The alternative would be to maintain a database of pre-existing protections so they could be restored properly, which seems like more trouble than it's worth.

If enough protection log history isn't examined, it's possible to miss unprotecting a target which spent an abnormally long time in the DYK queues. If it happens, the target can be manually unprotected and the history window size increased.