Wikipedia:Bots/Requests for approval/MusikBot 9


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

MusikBot 9
Operator:

Time filed: 02:49, Wednesday, December 9, 2015 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Ruby

Source code available: GitHub

Function overview: Monitors Category:Wikipedia pages with incorrect protection templates and repairs protection templates on those pages, or removes the templates, as necessary

Links to relevant discussions (where appropriate): (permalink),

Edit period(s): Continuous

Estimated number of pages affected: Probably between 15 to 30 a day

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): Yes

Function details: Sorry, I realize this makes my third open BRFA. There aren't any immediate ones coming after this, promise :)

A general overview of how to fix protection templates can be found under "Remedies" at Category:Wikipedia pages with incorrect protection templates.

The bot is capable of automatically generating all the correct protection templates from scratch. That is, check the current protection info and generate the generic templates based on that. This is pretty cool as it will definitively remove the page from the category but it may not always be what we want. For instance, in someone's userspace they may want pp-protected but not pp-move, even if the page is moved-protected. So the logic is as follows:


 * Removes any protection templates representing a protection that isn't present. So if it's semi'd but not move-protected, pp-move will be removed but the semi left as-is, or repaired if necessary
 * Removes any protection templates on template pages that have documentation (or any of it's redirects) or collapsible option (or any of it's redirects). Those templates automatically add the padlock icon. If they are not present, the bot will wrap the appropriate protection templates with , or place them inside any existing noinclude.
 * If it is protected, it will try to repair only what's already on the page. So if pp-blp is there but incorrectly used, but the page is also move-protected, the bot will only repair pp-blp and not add pp-move if it didn't already exist
 * If there is a protection template such as pp but it's usage is completely wrong or lacking any indication of what it's supposed to be, the bot will assume the user just didn't know what they were doing and will auto-generate all templates relevant to the current protection info
 * Protection templates always get moved to the top of the page, even if they were originally at the bottom. I assume this is fine.
 * The bot is aware of all protection templates, even the old ones that redirect to pp, etc, and if a repair is done the bot will use the target template instead of the redirect. It does this by fetching all the redirects to the templates listed at protection templates, generates a map of all the templates and their target templates (e.g. sprotected is actually pp), along with the type of protection they represent (e.g. pp-blp is for edit, pp-move is for move). This information is cached for a week, as the redirects are unlikely to change that often.
 * You'll notice at Category:Wikipedia pages with incorrect protection templates one of the last-ditch efforts to fix pages is to make a null edit. The bot does not attempt this. I'll have to revisit it as I don't have a good solution, and also don't have a reliable way of reproducing this scenario.

Some examples of what the bot would do (assume the repaired expiries and protection levels are correct):
 * &rarr;
 * &rarr;
 * &rarr;
 * &rarr;
 * &rarr;
 * &rarr;
 * on template page: &rarr;
 * when edit/move protected: &rarr;

So on and so forth. Given this task covers all ground for correcting protection templates, it incidentally also partially does the job of 's task #1 (removing pp-pc1 if it's no longer protected). Pinging for input. I don't think this is a problem; MusikBot and Cyberbot both handle conflicts, so it should be fine. I see that made this same argument at Cyberbot's BRFA.

The last thing I wanted to bring up was correcting protection templates on fully protected pages. That requires the admin bit, as you know. My last adminbot experience was not so great, but I feel like here the situation is different, as we're only making minor edits. I do not have strong desire to cover the fully protected pages, just know that I'm confident it can do it reliably. I'm also open to creating a new bot account dedicated to this task, if that means anything. &mdash; MusikAnimal  talk  02:49, 9 December 2015 (UTC)

Discussion
Definitely going to suggest a unique bot account this time if we decide to go down the adminbot route, since it's continuous. I am unclear about the exact role the bot would fill; there's a redundancy aspect (with DumbBOT) for sure, but how much work is not already covered? More specific comments incoming. —  Earwig   talk 03:38, 9 December 2015 (UTC)
 * Pinging and . They seem to stay busy correcting protection templates: Wrong expiry, date format, wrapping templates in noinclude on templates pages, etc, but most commonly just removing them from unprotected pages, it seems. The latter is all DumbBOT does as I understand. I do not know why it misses so many pages. &mdash;  MusikAnimal  talk  03:54, 9 December 2015 (UTC)


 * The template code is likely to be problematic; @content.gsub!(/\\s*\<\/noinclude\>/, "") if @is_template will mess up templates that use as a syntax guard (constructs like  and  ), and inserting the protection template into existing s unconditionally could do strange things, as they might occur in random places in the middle of the template, or even conditional blocks (something like   – weird but valid). As parsing wikicode is basically impossible (trust me, I know), I suggest always putting it at the top and then merging with an adjacent if there happens to be one there. —  Earwig   talk  03:56, 9 December 2015 (UTC)
 * Nice catch! Indeed that could potentially cause some serious issues =P I could do some crazy negative lookbehinds but I think, especially with an admin bot and template/fully protected templates, simply putting the templates wrapped in noinclude at the top as you say is the safest. Kind of surprised I didn't think of this or run into it during testing! &mdash; MusikAnimal  talk  04:37, 9 December 2015 (UTC)
 * As I mentioned here I wouldn't be of much help with the coding - though I am learning new things as I read this thread so thanks for that. I have been wikignoming away with this category for five or six months now and what I can tell you is that Dumbot is only reliable in removing the "Pending changes" templates when that protection has expired. It might get some of the other protection templates but I wouldn't see that normally. I can tell you that I have seen protections that expired two and even three months ago show up in the category. I hasten to add that does not happen very often but it does happen. As ever I am happy to continue help in removing these when I can. Cheers to you both. MarnetteD&#124;Talk 04:21, 9 December 2015 (UTC)
 * I hadn't read all of this page when I made my first post so I need to add that I have seen the Cyberbot II remove PC protections as well. MarnetteD&#124;Talk 04:27, 9 December 2015 (UTC)
 * Thanks for the info. I also wanted to mention I admittedly made a complete and wild guess as to how many pages show up in that category every day (5 to 10). What would your estimate be? If it is less than that, maybe the "continuous" editing period could be reduced to X number of times a day, but I feel like checking the category is relatively inexpensive, and obviously the bot wouldn't try to do anything if there's nothing in the category &mdash; MusikAnimal  talk  04:37, 9 December 2015 (UTC)
 * I would say it ranges between 15 and 30 a day though it is closer to the 15. I check it in the morning my time (-7 UTC) and there will be upwards of a dozen. Then a few more will pop up through the day. Then at midnight (no not that @midnight) UTC another 5 to 10 hit the list, which is to be expected. Every so often there will 100 or more pages but that is invariably a protection template on a highly transcluded page that needs the "nowiki" brackets and Redrose has taught me how to track those down. Once in a couple of blue moons a really tricky one shows up but R's expertise leads to its getting taken care of. I'm about to head offline but if I think of anything else I will let you know. MarnetteD&#124;Talk 05:57, 9 December 2015 (UTC)
 * One thing to add though I don't think it will change what you are doing. I see a lot of user/draft/sandbox and recently created article pages show up in the cat in the course of a week. Most of the time an actual protection has not been placed on the page and I don't know if they think that placing the template there means that others can't edit the page or not. There has been an uptick in this since October and a lot of them are people from the Philippines and other parts of Asia trying to create Facebook style pages about themselves. MarnetteD&#124;Talk 06:07, 9 December 2015 (UTC)
 * This is very helpful. I think it's safe to say the bot can't handle every scenario, but I'm realizing I still have some work to for handling templates &mdash; MusikAnimal  talk  06:41, 9 December 2015 (UTC)
 * On previous bots like DumbBot, these have been discussed before, such as at User talk:Redrose64/unclassified 13; also User talk:Redrose64 (including subthreads) - basically, although these bots remove a lot of expired prot templates, they always seem to miss some. I stopped monitoring for about nine months at one stage - when I returned to it, I found several pages where the prot had expired eight or so months earlier. I am therefore certain that existing bots do not do a complete job. -- Red rose64 (talk) 21:30, 9 December 2015 (UTC)

Reworking
I've reworked the bot quite a bit, trimming it down more to the basics, with some additional configurable features and more informative edit summaries. It will only remove templates when a protection has expired, wrap them in  within the template space, or auto-generate protection tags if there is invalid use of pp. I also would like to retire the idea of an admin bot for the time being. I'd rather see it perform well on semi'd/unprotected templates for a good while before allowing it to edit highly visible pages. Finally, the bot is highly configurable. See User:MusikBot/FixPP for the full documentation.

So a run down the logic again, relevant config options are in parentheses:
 * 1) Loops through all the pages in Category:Wikipedia pages with incorrect protection templates.
 * 2) The bot does not attempt to parse any page that uses PROTECTIONLEVEL, suggesting the usage of protection templates is automated via parser functions.
 * 3) First, check if there's no protection at all on the page. If so, remove all protection templates. Plain and simple. (remove_all_if_expired)
 * 4) Remove any protection templates representing a protection that is not currently on the page, leaving other protection templates as-is. (remove_individual_if_expired)
 * 5) Normalize any instances of pp; for instance convert  to the more appropriate . This parsing of pp and its params are necessary in order for the bot to determine whether or not it should be removed. If it turns out it matches the current protection state and should not be removed, we might as well provide an option to save the normalized template, as it may be responsible for why the page is in the maintenance category. (normalize_pp_template)
 * 6) If while parsing pp we aren't able to determine what it represents, e.g., the bot can assume the user didn't know how to add the template and auto-generate all the appropriate templates given the current protection state. (auto_generate)
 * 7) Wrap protection templates with  on pages in the template namespace. (noinclude_in_template_space)
 * 8) Remove protection templates on pages in the template namespace if the page contains collapsible option or documentation, or any of their redirects. (remove_from_template_space_if_doc_present)
 * 9) If the bot wasn't able to do any of the above, it gives up and will let the hard-working humans takeover by caching the "touched" timestamp of the page, and not processing the page again unless it has been changed.
 * 10) The core protection templates are defined in the config. The bot fetches all redirects to those templates in order to know how to map them to the target template and know what they represent. This information is cached for a week.
 * 11) There is also configuration for the valid values of small and the reasons specified at Template:Pp

Hopefully this new approach addresses some of the aforementioned concerns. I have been testing rigorously in production, manually making the bot-suggest edits using my alternate account, and am fairly confident the bot is stable. During the initial trial any runs will be manually invoked and fully-monitored &mdash; MusikAnimal  talk  08:23, 12 December 2015 (UTC)
 * Something we've not mentioned - redirects. If a redir is protected, and has any form of prot template, remove it; if there is no, add one. -- Red rose64 (talk) 00:12, 13 December 2015 (UTC)
 * I tried this on testwiki and a redirect with pp did not put it in the maintenance category. Either way I think handling this (adding ) might fall outside the scope of this bot task. Let's revisit this idea at a later time.I also am inclined to keep the normalize_pp_template (#5 above) disabled. The issue is we can still end up with a modification to a page that does not necessarily remove it from the maintenance category, when that was what we set out to do. The other subtasks are definitively constructive, so maybe I should stick to those for the time being. Thoughts? &mdash; MusikAnimal  talk  04:57, 14 December 2015 (UTC)
 * Incidentally, are we sure there's not something wrong with this series of modules/templates when it comes to when they decide to add that category? Several of the members of those categories seem fine. and are inexplicably in the category despite all template parameters appearing normal (particularly when the expiry is near but hasn't actually actually passed). -- slakr  \ talk / 05:05, 19 December 2015 (UTC)
 * In these cases, is the datestamp complete? That is, does it contain a valid, correct time as well as the date? If it does, is that time expressed in UTC (although not necessarily stating "(UTC)")? This last case might occur if the person setting up the pp template didn't realise that it needs a UTC time, see Village pump (technical)/Archive 142. -- Red rose64 (talk) 08:19, 19 December 2015 (UTC)

All the tests I have done add the date in the format hh:mm, day month year and goes by what the API gives us, which is in UTC. I've found pages show up in the maintenance category a few hours or so before they actually expire, supporting the theory of some timezone differentiation. However, I'm lead to believe the template themselves are possibly not coded correctly, as again putting in a valid UTC time doesn't do the trick. For this reason I'm with in that maybe adding the time shouldn't be bot-automated, as it is indeed contingent on how the template is coded, which may be subject to change. Moreover, I just don't like the potential of redundant bot edits. E.g. I'm finding similar issues with template pages, see testwiki:Template:Test. The protection templates are wrapped in , and are valid, yet the page is still in the maintenance category?By contrast removing templates for which a corresponding protection doesn't exist is fool-proof, and I'd like to move forward with just that for now. So essentially, this task is just a more comprehensive DumbBOT, as that bot seems to miss many pages. How does this sound, at least for a start? The code is there for the other tasks, so we can consider enabling them later once we see that those remedies definitively work &mdash; MusikAnimal  talk  18:19, 20 December 2015 (UTC)

Where do we stand on this? As stated above, I'd like to move forward with the most simplest of the tasks, which is removing protection templates for which a corresponding protection type does not exist. So if you go by the big numbered list above, the only actionables are #3 and #4. There is still a continual flow of pages meeting this criteria that current bots seem to overlook, so I think this by itself is still worthwhile. I'd like to revisit the other features at a later time, once we iron out exactly how they should work. A subsequent BRFA can be filed for those, if we feel that is necessary &mdash; MusikAnimal  talk  04:09, 30 December 2015 (UTC)


 * Alternatively I could just restart WP:Bots/Requests for approval/Lowercase sigmabot and we could end this BRFA right now. → Σ σ  ς . (Sigma) 00:15, 3 January 2016 (UTC)


 * that seems reasonable. Might as well see how this goes. That said,, I'm slightly concerned about any clashes between the bots. If the old bot runs, that'd likely be fine (as long as nothing's substantially changed when it comes to what it used to be expected to do versus what it'd be expected to do now), but then we also need to make sure that the newer bot doesn't create a situation where one does something to a template, the other does something else, then the other one "fixes" that bot's fix, etc... nor vice versa. -- slakr  \ talk / 03:01, 3 January 2016 (UTC)
 * Also to be clear (though I thought it fairly evident from the adjustment to the scope given in the "reworking" section above), I don't personally feel this task warrants an adminbot. After all, a slightly wrong template on a full-protected page doesn't have to be perfect; MediaWiki makes it very clear should a user try to edit the page that the page is protected, why, and when it expires. :P -- slakr  \ talk / 03:09, 3 January 2016 (UTC)
 * I'm not actually familiar with Σ's protection template bot. Does it modify the templates, or just remove them when the expiration expired? This task came about as for whatever reason many pages go unnoticed by existing bots. At any rate, MusikBot will not reprocess the same page within a 3 hour window. It also won't try anything if it was the last editor to the page. Finally, if it removes (or in the future possibly repairs) a template, it should as intended remove the page from the maintenance category, meaning it wouldn't process it again anyway as that's what the bot goes off of. Edit conflicts are also handled accordingly, so we should be ok with any potential bot wars.I agree about the admin bot or even template-editor bot idea. Consider it withdrawn. I would like to eventually enable some features like repairing of expiries, etc, once we see it actually removes the page from the maintenance category, as sometimes it does not. There's so many factors involved, so I'm going to continue to work on that and once ready perhaps we can do another trial through this same BRFA, or a new one, whatever is advisable &mdash; MusikAnimal  talk  05:56, 3 January 2016 (UTC)
 * Sigma's bot worked quite well, but it always seemed to miss something - some non-protected pages would be left with inapplicable protection templates, and it would be left to somebody passing by (often myself) to clean them up. Hence threads like User talk:Σ/Archive/2014/August; User talk:Σ/Archive/2013/September. It would also occasionally add unnecessary prot templates to pages that already had them. -- Red rose64 (talk) 15:49, 3 January 2016 (UTC)

Alright, the base remove-template-if-no-protection trial is complete. The bot stayed pretty busy, and the trial would have been completed much earlier if I hadn't disabled it a few times, as I didn't want to leave running overnight.

A few diffs:

The diffs you care about:
 * Notice the edit summary. The page is move-protected, so "Removing protection templates from unprotected page" would not be correct.
 * Nasty bug where was changed to . Don't worry it's fixed, and the edit summary which should have said "edit-protected". This does however illustrate how the bot knows only to remove the incorrect protection templates, and leave others as-is
 * Removed newlines, making the templates right next to each other
 * Fixed the above issue, but leaves extraneous newlines at the top of the page
 * Another fix, removing extraneous newlines at the top of the page
 * (deleted edit, so admin-only) A sort of happy medium for newline handling, but it did actually introduce a visible newline

I guess the solution is remove one newline if another newline exists before/after the protection template, or if it's at the top of the page.

Examples:
 * \n &rarr;
 * \n\n &rarr; \n
 * \n\n &rarr; \n
 * \n\n\n\n &rarr; \n\n\n

I'll need an extended trial to prove I can make this happen, but I assume you can believe it's easily fixable.

As for 's bot, correct me if I'm wrong, but it adds protection templates to protected pages, and removes them when they are no longer protected. MusikBot however only repairs existing protection templates, which may or may not include removing them as they are no longer applicable. That being said the bots should coexist peacefully.

If we are OK with what we have here with this trial, I'd like to now put focus on the task of correcting expiries. A little background:
 * by itself, with no expiry, does not always put the page in the maintenance category
 * : this defaults to midnight, so the page is only put in the maintenance category for less than 24 hours (the time between midnight and when the expiry actually is). There's no need to update the expiry there, the template will be removed soon anyway.

MusikBot will only update expiries where the protection has been extended, but the expiry in the template has not been udpated. This is definitely constructive and won't yield any redundant edits. This is what I'd like pursue next.

Meanwhile if you will allow the bot to continue removing templates for which a protection doesn't exist, that'd be great. Please advise, and many thanks! &mdash; MusikAnimal  talk  20:46, 4 January 2016 (UTC)


 * it adds protection templates to protected pages, and removes them when they are no longer protected. MusikBot however only repairs existing protection templates, which may or may not include removing them as they are no longer applicable That is right. So if lcsb added a protection template to a protected page, MusikBot would not edit it.
 * That being said the bots should coexist peacefully. That seems to follow.
 * → Σ σ  ς . (Sigma) 22:26, 9 January 2016 (UTC)

Hoping to move forward with this, at least the basic task of removing unneeded protection templates. I know the other stuff is a little complicated and a lot to read into... I can talk on IRC if that helps! Thanks &mdash; MusikAnimal  talk  00:27, 11 January 2016 (UTC)


 * Would you be satisfied if this BRFA closed with the verdict that your bot be free to remove, but not add or tweak, protection templates? → Σ σ  ς . (Sigma) 23:11, 12 January 2016 (UTC)
 * That'd make me feel better about putting all this work into it, yes :) But I have code ready to go to repair expiries, something that is currently tediously being done manually. Much of the other features are more prone to error, I think, and could be explored later in a different BRFA &mdash; MusikAnimal  talk  23:38, 12 January 2016 (UTC)
 * As one of those doing the manual work I would like to make a suggestion if the repairing of expires is not approved. Since I am not an admin I do not know exactly what the steps are in adding protection templates. What I do know is that - if a page that is currently protected has that protection extended - the template rarely gets updated by the admin performing the extension. This is especially true in the case of PC protections. Now, I can't say that it never happens because if an admin does edit the template with the new expiry time I would not see that :-) So my suggestion is that a step be added to the tools used for applying protections that reminds the admin to update any protection templates that are on the article where the protection is being extended. If this is not possible or practical I am happy to continue working with those pages. We wikignomes can turn even the most tedious of tasks into a heigh ho situation. Thanks for your time. MarnetteD&#124;Talk 23:54, 12 January 2016 (UTC)
 * Thanks :) This is a bot-friendly task, in this case, as MusikBot is already going through the same category. It knowing all the redirects, it can parse any given protection template, and spit it back out with the correct expiry. That being said, I can also work on an update to Twinkle to correct existing protection templates. I will also make sure it adds in the time, which it currently does not. This is what causes so many pages to show up in the maintenance category just before the protection is set to expire (as the absence of a time is treated as midnight) &mdash;  MusikAnimal  talk  02:07, 13 January 2016 (UTC)

I am still unsure about a few things... —  Earwig   talk 10:14, 16 January 2016 (UTC)
 * Why do we have variations of pp when pp itself seems to support (or could support) the functionality of all the other templates? (Looking through their code now, it seems they all invoke the same module; should they be merged?) You mentioned above convert[ing] to the more appropriate – what makes this more appropriate?
 * noinclude_in_template_space: This is a minor thing, but sometimes pages in other namespaces are transcluded. It might be useful to wrap pp in  whenever a page has transclusions, but shouldn't pp be able to figure this out on its own and just not display on transcluded pages?
 * Does expiry matter when yes?
 * MediaWiki has a magic word – why can't we just use that instead of expiry?
 * A broader thing to think about. Because of the tight coupling between protecting a page and communicating that protection, maybe displaying indicators or full mboxes on protected pages should be in MediaWiki proper?
 * is a core template, around which the others like are wrapped. It's not for direct use, in the same way that  or  are not for direct use.
 * A time should always be provided, since a the absence of a time is interpreted as midnight by all the date parsers that we use, and protections (with the exception of the move prot that is always applied to TFA) are rarely set to expire on the stroke of midnight.
 * A page cannot know whether it is being transcluded or not, so cannot know whether it is being used on a transcluded page.
 * An expiry date of some sort (whether it be expiry or something else) always matters, since it is used to determine if the prot has expired; if so, the icon or banner is not displayed and the page should be placed in which is what this whole exercise is about.
 * Maybe it should be in MediaWiki proper. But that is a proposal that would need to go through and it would take years to get any action, they have far higher priorities. Even if they did look at it, they would need to be sure that it was a feature that all wikis using MediaWiki (there are hundreds, if not thousands) wanted before they imposed it on everybody. They've been burned before. -- Red rose64 (talk) 10:29, 16 January 2016 (UTC)
 * Per above, but frankly I am perfectly content sidestepping these tasks and saving it for another more easily digestible BRFA. They are complicated and this BRFA could go on for quite a while before we get all of this ironed out and encountered during a trial. I'd like to just move forward with removing protection templates when the page is no longer protected. This is pretty straightforward, it works, and would keep the bot quite busy :) Next after that I want to focus on updating expiries when the page has been re-protected – but again that can be saved for different BRFA. Sorry for all the confusion! &mdash;  MusikAnimal  talk  19:31, 16 January 2016 (UTC)

(for removing protection templates only) – Time to put this request out of its misery, really. The underlying task of removing protection templates from pages where they don't apply is straightforward, and I'm satisfied enough with the newline solution that we can go ahead as long as the bot's initial edits are supervised. I do think the usage and syntax of protection templates needs a broader look (and I still don't understand why we can't deprecate expiry in favor of ), but that's out of scope here, and we should do that with a clean discussion or proposal. —  Earwig   talk 05:03, 18 January 2016 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.