User talk:Monkbot/task 16: remove replace deprecated dead-url params

Undecided
My bot WP:WAYBACKMEDIC and Cyberpower678's User:InternetArchiveBot are together responsible for about 80% of all archives added to Enwiki and also are continually checking and fixing existing archives. Both bots add dead-url when it is missing (soon to be url-status). Hopefully, we can be on the same page with Monkbot to avoid conflict. I think having it there is a good idea, even if it defaults to 'live'. There are a couple reasons but mainly to get users into the habit of associating all three parameters with archives because many forget or neglect to add the third even when the link is dead. It also makes it easy to switch to dead status without having to remember and/or add the parameter name. Eventually all links die the argument will be needed. -- Green  C  00:52, 2 June 2019 (UTC)
 * I don't see why you marked anything "undecided" when the source code is ready and you've requested a bot run. For this run I suppose you plan to keep as is. I however don't see the purpose of this run to change/remove any parameters except the dead-url and alias. And countrary to that (step #3), you seem to go to great length to not remove an empty url-status.
 * I haven't seen the discussion on why to obsolete dead-url at all, but I do think url-status should be left out in most cases (in the meaning not decided). Setting it "live" needs to be questioned anyways, setting it dead when not dead seems unconstructive. That is, when no archive-url is given the assumption is url-status=live, when archive-url is given the url-status=dead is assumed. Having the possibility to override is useful, but should be rare.
 * With my suggestion, the argument will not be needed if archive url is only set for dead links. This minimize manual work for people and low need to remember the param name. I could also have suggested url-status=archived as a better name for the override and use url-status=dead only for links that are dead but not archived (or keep it for archived too). That should give more information, so that a bot can monitor an override separate from already found dead links. JAGulin (talk) 18:05, 3 September 2019 (UTC)
 * I'm not clear what you mean by 'undecided'. That word does not appear anywhere in User:Monkbot/task 16: remove replace deprecated dead-url params.  Similarly 'this run' is also not found on that page so perhaps you are quoting me from someplace else?  I intentionally leave url-status in response to Editor GreenC's comment at the top of this thread.
 * In Module:Citation/CS1/Configuration, the meta parameter  is set to  .  Before today's change, it was   set to  .  This is the only parameter that is defaulted in this way and it means that when editors do not set url-status to anything (omitted or empty) then the module will assume that the presence of archive-url with a value means that the value in url is dead.  url-status is ignored by the module when archive-url is empty or omitted.  Preemptive archive links are permitted and may, in fact, be encouraged.
 * —Trappist the monk (talk) 18:36, 3 September 2019 (UTC)
 * —Trappist the monk (talk) 18:36, 3 September 2019 (UTC)


 * If an archive URL exists, best practice is to have all three arguments (archive-url, archive-date and url-status). archive-url can exist regardless of url live or dead so nothing can be assumed there ("the argument will not be needed if archive url is only set for dead links"). Likewise URLs die organically and having url-status in the cite makes it easier for anyone to change url-status to dead vs. not remembering the argument name or even knowing it exists. We can assume all links die in time, and will need it (if a archive-url is set). -- Green  C  19:01, 3 September 2019 (UTC)


 * Is that "best practice" your opinion or is it documented somewhere? I think we should not add useless manual work. You quoted my if-statement, but talks about the else-path I left out. In that case, the parameter is needed, but it doesn't mean that it should be introduced in the case when not needed. archive-url can exist regardless is true, but anyone manually adding it would either do it because the link is dead or because they like to do extra work. In the latter case, they should then add either "live" or "archived" to signal which kind of template override they want. Ttm just agreed that no status parameter will make the template assume status=dead and I agree that's a case where the parameter is required.
 * When the link dies someone should set the archive-url, but leave status unset. If there's already a premature archive added, the parameter should also be available, so my suggestion also seems to give what you ask for.
 * The wording was Still to be decided:  is the default. I assumed that it was the part this thread was all about. On the other hand I see nothing here asking you to keep empty paramerters. What is the wording?
 * This run was obviously a reference to the Bots/Requests for approval/Monkbot 16 or anything running the User:Monkbot/task 16: remove replace deprecated dead-url params. You mention "ancillary tasks", but I think changes like this should be focused on the main task. The deleted_count doesn't count that work, so it isn't mentioned in the change logs.
 * Since this is not so much about your code as it is about the parameters to the template, could you please point me to the discussion where the url-status was defined? JAGulin (talk) 19:39, 3 September 2019 (UTC)
 * Still to be decided ... is a leftover from before the time that I agreed to retain url-status and dead (not really needed because this is the default case when url-status is omitted). That paragraph can go away.
 * So this run is not a quote from me but your own words? The ancillary task, there is only one, was an extension of the deletion of empty url-status (before the decision to retain empty url-status was taken).  Because I was deleting empty url-status, deleting other empty parameters in that same template was (is) a simple task – the ancillary task does not delete empty parameters from templates that were not modified to fix dead-url.  If needs must, it can go away though keeping it is harmless.
 * —Trappist the monk (talk) 20:01, 3 September 2019 (UTC)
 * The  shows as empty parameter. I assumed you meant "proper value", not "always empty", in your initial post. Did I misunderstand? Do you see value in keeping empty url-status? JAGulin (talk) 19:55, 3 September 2019 (UTC)
 * I disagree. The url-status parameter should be set only in the rare cases where the default is not appropriate. Don't fill the edit box with cruft. Don't fill my watchlist with bot edits doing nothing useful. Users should flag a url as dead by applying the archive-url and archive-date parameters only. --Srleffler (talk) 02:29, 12 September 2019 (UTC)
 * That is opposite how it works, the default is the same as url-status=no should that param be missing. Nobody is filling edit boxes, sounds like a misunderstanding what the discussion is about. -- Green  C  03:00, 12 September 2019 (UTC)
 * Clearly you are confused. The default state is dead.
 * —Trappist the monk (talk) 03:04, 12 September 2019 (UTC)
 * Guess so. Late night posting. Anyway, checking the code I am mistaken the bots add the param when it is missing, as a sole action, rather only when doing something else like adding/changing/deleting the archive. It's not filling in as a sole action this would be cosmetic. But since most citations contain all three arguments (most archives were created/edited by bot) deleting the argument would be a lot of edits. From that perspective. --  Green  C  04:52, 12 September 2019 (UTC)
 * Guess so. Late night posting. Anyway, checking the code I am mistaken the bots add the param when it is missing, as a sole action, rather only when doing something else like adding/changing/deleting the archive. It's not filling in as a sole action this would be cosmetic. But since most citations contain all three arguments (most archives were created/edited by bot) deleting the argument would be a lot of edits. From that perspective. --  Green  C  04:52, 12 September 2019 (UTC)

Test cases
I don't see any unit testing, and that should be first step for regex like this. Even for currently incorrect template usage, the bot outcome should be predictable and known to the task reviewers. Please add tests where the following is part of the template parameters:
 * deadurls=no
 * dead-url=nose
 * dead-url=truely
 * deadurl==yes
 * deadurl=http://true
 * deadurl=|archive-url=yes/no
 * deadurl=y|dead-url=false|archive-url=yes/no

Cheers, JAGulin (talk) 19:39, 3 September 2019 (UTC)
 * deadurl=|dead-url=no|no-archive-url=
 * url=example.com/deadurl/404.html|dead-url=yes
 * url=//example.com/?deadurl=no|dead-url=yes
 * url=//example.com/page{!}name|dead-url=yes
 * dead-url=yes|url=//example.com/page{!}name|other={\{1}}|leftover=true

What if it is within " finds about 850 cases. Modify to   (extra "*" at the end) it is about 46,000 cases. It matches   with no space between the = and | .. the reason is   says "there must be something other than one of these characters" and since there is nothing there but one of those characters it doesn't match. The "*" makes it optional. Or it could be   is over 54,000 articles. --  Green  C  03:20, 13 October 2019 (UTC)