Wikipedia:Bots/Requests for approval/DYKToolsAdminBot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard. The result of the discussion was

DYKToolsAdminBot
Operator:

Time filed: 15:34, Wednesday, March 1, 2023 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python

Source code available: https://github.com/roysmith/dyk-tools/tree/main/dyk_tools/bot

Function overview: Applies move protection to DYK articles which are on the main page or in the queue to be placed on the main page soon.

Links to relevant discussions (where appropriate): Wikipedia talk:Did you know/Archive 188

Edit period(s): Continuous

Estimated number of pages affected: 10 per day

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Adminbot (Yes/No): Yes

Function details:

First, some definitions:

Nomination: A nomination template, i.e. a subpage of Did you know nominations.

Hook: A string starting with "..." and ending with "?". Optionally includes a tag such as "ALT1".

Target: An article referenced from a hook using a bolded wikilink. All hooks have one or more targets.

Hookset: A template containing a collection of hooks along with other metadata. One of Did you know (i.e. the current hookset), the 7 numerically named subpages of Did you know/Queue, or the 7 numerically named Did you know/Preparation area 1, etc.

DYKToolsBot is already approved for a different task, but does not have admin rights. This new account (DYKToolsAdminBot) will handle tasks that require admin rights. They share the same code.

There are two distinct tasks proposed here, protect and unprotect. Both tasks are run as scheduled toolforge jobs. Currently both tasks run every 10 minutes, offset by a few minutes. The exact timing is not critical.

The protect task does:

Parse the main page + queue hooksets, extracting all the hooks. From the hooks, extract the targets which need protecting ("protectable targets"). These titles are indicated by wikilinks set in bold. There is typically one target per hook, but there can be more than one. For each protectable target, indef move=sysop indef protection will be applied. The page protection log messages will include a link to a page in the bot's userspace explaining the process.

The unprotect task does:

Queries the bot's user log with type=protect for the previous N days, where N is long enough to account for any hooks which have progressed through the normal promotion process plus extra time to account for intra-queue hook swapping. It's currently set to 9, but might need to be increased. The exact value is not critical. These are the "unprotectable targets". The current list of protectable targets is acquired as in the protect task. Any targets in the unprotectable set which are not also in the protectable set are unprotected.

I considered computing an expiration date and only protecting until then. The problem is that the expiration date is a moving target. Hooks often get shuffled around when problems are discovered. Sometimes hooks get unpromoted entirely after hitting a queue (or even when they're on the main page). Sometimes the queue processing schedule is disrupted by failure of the bot which manages that process (this has happened a couple of times in the past few weeks). A few times a year, queue processing toggles between 1 per day or 2 per day. Keeping track of all these possibilities and updating the expiration time would add significant complexity for no benefit. It's far simpler to use a declarative approach, in the style of puppet; periodically figure what state each target should be in right now and make it so, regardless of history.

This is currently running on testwiki. See https://test.wikipedia.org/wiki/Special:Log/DYKToolsAdminBot. Reviewers should feel free to exercise the bot by editing the DYK queues on testwiki.

Known problems

On rare occasions, hook targets are written as templates such as one of the (many) Ship variants. The current code does not recognize these properly (github bug) This happens infrequently enough, and it's difficult enough to do correctly (it requries a call out to Parsoid), and the consequence are mild enough (a page doesn't get the move protection it should), that I'm not going to make it a blocker for an initial deployment.

If a target was already move protected before entering the DYK pipeline, it will have that protection removed when it transitions out of DYK. The probability of this happening is so low, I'm going to ignore it. The alternative would be to maintain a database of pre-existing protections so they could be restored properly, which seems like more trouble than it's worth.

If enough protection log history isn't examined, it's possible to miss unprotecting a target which spent an abnormally long time in the DYK queues. If it happens, the target can be manually unprotected and the history window size increased.

Discussion

 * If a target was already move protected before entering the DYK pipeline, it will have that protection removed when it transitions out of DYK. Why would this be the case as you say it [q]ueries the bot's user log with type=protect for the previous N days? Would it not unprotect only the pages that were protected by it? – SD0001  (talk) 03:05, 2 March 2023 (UTC)
 * You set move=autoconfirmed, then the bot changes that to move=sysop. It'll lose your original protection when it migrates off the main page and the bot unprotects.  But this is enough of a corner case, I'm not going to worry about it. -- RoySmith (talk) 14:12, 2 March 2023 (UTC)
 * Every article going through DYK losing its move protection seems like a problem to be worried about. While I understand new articles are rarely protected, recently promoted GAs could be. Can this be fixed? If a database is too much trouble, you can use redis since the data here is easily represented as key-value pairs. – SD0001  (talk) 17:13, 7 March 2023 (UTC)
 * I need to think a bit on this. I had previously assumed existing move protection was such a rare thing, it wasn't worth worrying about much.  But I just did a quick scan of WP:Recent additions and found:
 * Recent additions/2022/January, protect_count=28
 * Recent additions/2022/February, protect_count=14
 * Recent additions/2022/March, protect_count=6
 * Recent additions/2022/April, protect_count=12
 * Recent additions/2022/May, protect_count=12
 * Recent additions/2022/June, protect_count=9
 * Recent additions/2022/July, protect_count=20
 * Recent additions/2022/August, protect_count=15
 * Recent additions/2022/September, protect_count=17
 * Recent additions/2022/October, protect_count=13
 * Recent additions/2022/November, protect_count=21
 * Recent additions/2022/December, protect_count=9
 * The protect_counts are how many targets had any move protection in their page protection log at all. The ones I spot-checked either had that protection already expired by the time they got to DYK, or applied after DYK was over, but it's still more than I had expected to see.  I'm working on some ideas of how to deal with this. -- RoySmith (talk) 13:57, 8 March 2023 (UTC)
 * On rare occasions, hook targets are written as templates such as one of the (many) Ship variants. You could parse the HTML instead of wikitext. Scanning the HTML for  tags leading to article namespace can be easier than parsing wikitext and doesn't require parsoid. – SD0001  (talk) 03:07, 2 March 2023 (UTC)
 * I'm not totally following you. The output of parsoid is HTML (sort of) so calling out to parsoid is indeed parsing HTML.  But it would add complexity which isn't justified for an initial rollout.  The logic for this is contained in Hook.targets, so at least plugging it in later wouldn't be too disruptive. -- RoySmith (talk) 14:40, 2 March 2023 (UTC)
 * Well, it turns out this was easier to do than I thought it would be. Pywikibot's Site.expand_text handles everything.  I suspect it's calling Parsoid under the covers, but haven't gone digging to verify that.  In any case, it works just fine. -- RoySmith (talk) 04:26, 7 March 2023 (UTC)
 * mw:API:Parse is what I meant, it gets you the HTML without going through parsoid. There's also mw:API:Expandtemplates which might be what Site.expand_text uses under the hood. – SD0001  (talk) 15:03, 7 March 2023 (UTC)
 * Well, in any case, it's working. Can this be approved for a trial? -- RoySmith (talk) 15:09, 7 March 2023 (UTC)
 * If enough protection log history isn't examined, it's possible to miss unprotecting a target which spent an abnormally long time in the DYK queues. Why not set a liberal value for N, say 25 - since anyway at the processing step it will skip the pages that don't have protection any longer? I'm assuming the unprotect task only needs to be run at quite a lower frequency than every 10 minutes. – SD0001  (talk) 03:12, 2 March 2023 (UTC)
 * Yeah, there's very little downside to making the history window longer. Setting it to 25 wouldn't be a problem.  You're also correct that the unprotect task could run at a lower frequency, and that's easy to change. -- RoySmith (talk) 14:45, 2 March 2023 (UTC)
 * Has WP:AN been notified of this bot task per WP:ADMINBOT? Primefac (talk) 10:30, 8 March 2023 (UTC)
 * Ooops, I didn't realize that was required. I just dropped a notification on WP:AN. -- RoySmith (talk) 13:48, 8 March 2023 (UTC)
 * The discussion linked looks like a local consensus to me. Policy is against pre-emptive protection and move-protecting DYKs looks like a solution in search of a problem. As far as I can tell, only one example was given of an article being moved while on DYK and that was generally considered a good move. HJ Mitchell &#124; Penny for your thoughts? 14:00, 8 March 2023 (UTC)
 * We already do protect some portions of the main page. For example DYK is protected, as are all images while they're on the main page.  And Bots/Requests for approval/TFA Protector Bot 3 appears to have established that TFA should be protected as well. -- RoySmith (talk) 14:18, 8 March 2023 (UTC)
 * Is page-move vandalism of DYK articles on the main page an issue? Our protection policy dictates that we do not protect pages preemptively. It also seems like a significant net negative for all DYK articles that have been protected in good faith to become unprotected, particularly GAs and BLPs. If the bot could instead restore any pre-existing protection rather than letting it expire, that would generally solve my concerns. Ivanvector (Talk/Edits) 16:38, 8 March 2023 (UTC)
 * There's at least some level of pushback against this. It doesn't help either that the linked discussion was held at the local DYK talk page and closed by an involved editor. I suggest opening a discussion on WP:VPR to ensure this has consensus, as BRFA is not the right place for having such a discussion. – SD0001  (talk) 16:45, 8 March 2023 (UTC)
 * @SD0001 OK, I'll start a discussion on VPR. I'm not sure what the process is here, but let's put this BRFA on hold until the VPR discussion concludes.  In the meantime, I'm going to continue to play around on testwiki to explore some possible solutions to the technical issues which have come up here. -- RoySmith (talk) 18:06, 8 March 2023 (UTC)
 * Posted at WP:VPR -- RoySmith (talk) 18:13, 8 March 2023 (UTC)
 * Till that discussion concludes. – SD0001  (talk) 16:38, 9 March 2023 (UTC)
 * I'm involved so I won't be the one to close that discussion (if it needs closure) but consensus there looks pretty heavily against this. HJ Mitchell &#124; Penny for your thoughts? 11:38, 27 March 2023 (UTC)
 * Yes, somebody should close it. -- RoySmith (talk) 14:30, 27 March 2023 (UTC)
 * Question: If the bot removes move protection and an other admin applies it before N days have elapsed since the bot's protection, will the bot re-unprotect it? Animal lover &#124;666&#124; 17:08, 8 March 2023 (UTC)
 * Consensus at WP:VPR appears to be against the proposed bot task. Please feel free to reopen if the consensus changes. – SD0001  (talk) 11:34, 1 April 2023 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard.