Wikipedia:Bots/Requests for approval/BHGbot 9


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard. The result of the discussion was

BHGbot 9
Operator:

Time filed: 00:21, Thursday, August 19, 2021 (UTC)

Function overview: Remove the banner tag Cleanup bare URLs from articles which no longer have any WP:Bare URLs.

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB module (C#)

Source code available: will be published once written, and before any trial run. I don't want to spend time coding it unless there is support in principle for this task. Bots/Requests for approval/BHGbot 9/AWB module

Links to relevant discussions (where appropriate):

Edit period(s): Initial run to clear the backlog, then about weekly

Estimated number of pages affected: Initial run ~1,650 pages. Thereafter a rough guesstimate of ~50 pages per week. updated estimate 20:20, 18 October 2021 (UTC): 1424 pages

Namespace(s): Article, Draft.

Exclusion compliant (Yes/No): Yes

Function details: initial article list to consist of all main- and draft-space transclusions of Cleanup bare URLs. With each page:
 * 1) check that the page contains the banner template Cleanup bare URLs, or one of its many aliases.  If not, skip the page.
 * 2) count the number of Bare URL inline tags in the page, including aliases
 * 3) count the number of untagged bare URL refs in the page, i.e. those which match the regex
 * 4) if the total matches of step 2 + step 3 is greater than zero, then skip the page
 * 5) Optional check for bare URLs not in ref tags:
 * 6) * Test for existence on the page of any other URLs which are not:
 * 7) **wrapped in a cite tag, or
 * 8) **wrapped in URL, or
 * 9) **formatted as, or
 * 10) **the value of a http://www.example.com/foo parameter in any infobox
 * 11) * if any such URLs exist, then skip the page
 * 12) remove the banner Cleanup bare URLs, and save the page with AWB genfixes, using an edit summary of the form

I think that this approach is overly cautious, because in practice the Cleanup bare URLs tag seems to be used overwhelmingly for bare URLs within ref tags. However, I am happy to include this step unless there is consensus to omit it. If the bot is set to skip pages with bare URLs not in ref tags, the initial run will be significantly less than 1,650 pages, but until I run the bot in pre-parse mode I won't know how much less.
 * Note 1
 * Step 5 (check for bare URLs not in ref tags) is based on the discussion at User talk:Citation bot/Archive 26, where both @ and @ advocated retaining the banner tag if there are any bare URLs anywhere on the page.
 * Note 2
 * My estimate of ~1,650 pages in the initial run is based on comparing the 7,362 pages currently transcluding Cleanup bare URLs with a scan of the 17 August database dump which found 459,013 pages with 1 or more bare URLs in ref tags. That comparison found 1,665 pages transcluding Cleanup bare URLs but without bare URLs.
 * Note 3
 * Coding the AWB module is not complicated, but testing it and debugging it without a proper development environment is very slow. So I don't want to put in a few hours work without having first checked that the task has approval in principle.

Discussion
Regarding step 2 & step 3, what if you've got a bare URL tag but the URL is no longer bare, and elsewhere you've got an untagged bare URL? Your bot would skip this if it's only relying on counts? Ditto if there's an inline tag for a URL that's no longer bare? ProcrastinatingReader (talk) 00:34, 19 August 2021 (UTC)
 * thanks for that observation. I hadn't factored in the case of a ref which has been fixed, but the Bare URL inline tag has not ben removed.  I think that such cases will be rare, and that it will be even more rare to have that oddity and a banner tag Cleanup bare URLs (without which this bot will reject the page in step 1).
 * If you like, I can add extra check for such misplaced Bare URL inline tags, but I would prefer not to do so, simply to avoid adding extra complexity to accommodate a very rare case whose consequence would be a mistaken skip rather than the more serious matter of a mistaken removal. --  Brown HairedGirl  (talk) • (contribs) 00:57, 19 August 2021 (UTC)
 * PS I just ran https://petscan.wmflabs.org/?psid=19858257 to check for main- and draft-space pages which transclude both Cleanup bare URLs and Bare URL inline: total 12 pages.
 * I checked them all for the case you described, and found only one kindof match, on List of gangs in New Zealand. An IP had wrong added Bare URL inline after , instead of the correct placement before it.  Then reFill filled a bunch of refs, but didn't remove Bare URL inline because it was not inside the ref tags.  I have now fixed that page. --  Brown HairedGirl  (talk) • (contribs) 01:27, 19 August 2021 (UTC)

Regarding step 5 and note 1, I don't see why step 5 is necessary. Is there an example of a page with such a URL (of the 'non-ref bare URL' variety) so I can see a valid use case? ProcrastinatingReader (talk) 00:42, 19 August 2021 (UTC)
 * Thanks again. I have identified no such cases. I added Step 5 solely out of respect for the objections already made by the two highly experienced and technically skilled editors who raised the issue at  User talk:Citation bot/Archive 26. I can't see the use cases myself, but I have high regard for their judgement, which is why I am willing to accommodate their concerns unless there is consensus to proceed without Step 5.
 * Maybe @ and/or @ could comment here? -- Brown HairedGirl  (talk) • (contribs) 01:04, 19 August 2021 (UTC)

For bare URLS without ref tags, here are some basic example According to a report published at at http://www.example.com, 63% of statistics are made up.

Other views sought
BAGAssistanceNeeded

This BRFA seems to have run into the sands, and I would appreciate some feedback from other BAG members.

The disagreement comes down to the bot trial's handling of this edit to World War II: When Lions Roared.

The page had been tagged in May with Cleanup bare URLs because of a bare URL ref to Amazon. In October, that ref was filled in by this edit by Citation bot.

By the time of the bot's trial run, there was no remaining completely bare URL ref, so the bot removed the Cleanup bare URLs tag.

However, there was one ref which @ProcrastinatingReader and @Headbomb argue is not bare, so the Cleanup bare URLs tag should not have been removed:

This is in part a GIGO issue: better source needed should be placed after the  tag. This placing of it inside the  tag is an input error.

The view of ProcrastinatingReader & Headbomb seems to be that the bot should ignore the existence of the misplaced tag, so it should count  as a bare URL, and therefore not remove the  Cleanup bare URLs tag.

I disagree, for several reasons:
 * 1) that IMDB ref is already adequately tagged to note the core problem, viz. that it is a bad source.  The remedy for that is to use a better source ... and it would therefore be unhelpful to tag it as bare.  It is even more inappropriate to retain a big "bare URL" banner at the top of the page, for only one URL whose bareness is at best only a secondary problem.  We should not be inviting editors to "please fill this ref before it is removed as inappropriate".
 * 2) The same applies to the about a dozen similar cleanup tags which might be misplaced inside  :  e.g. Failed verification, Unreliable source?, Promotional source, COI source, Obsolete source,  Irrelevant citation, Self-published inline, Unreliable fringe source).  In each the problem is not that the ref is bare; the problem is that the ref should not be there.
 * 3) Checking for misplaced tags in that bad-ref family would hugely complicate the regex, increasing the risk of error.  A regex to accommodate all these templates and their many aliases would amount to several lines of regex soup.
 * 4) Even if others are not fully persuaded that the tag removal was appropriate in this case, I hope that they will agree is that it is worst a marginal issue, one where there is a a reasonable case for removing it.
 * 5) This issue arose in only one of the 50 pages in the trial, so it is rare.
 * 6) This bot is not altering the encyclopedic content of the article, nor the refs or metadata. All it is doing is removing a cleanup notice, and if it removes a tag from an occasional article where another editor might perhaps have kept the tag, that will in no way degrade the content of the articles.
 * 7) Meanwhile, over 1,400 articles still have this tag when it should have been removed. That actively impede cleanup, by leading editors to pages which don't need refs filled. For example I used https://petscan.wmflabs.org/?psid=20904751 to find Ireland-related articles with bare URLs to fill, but I gave up after only 4 pages because 3 of the first 4 pages still had tags after the refs had been filled by a bot.   The encyclopedia will be improved by removing these tags, allowing editors to get on with the cleanup.

Please can we just get on with this? In task such as this, excessive attention to rare and marginal case of tag removal is a real enemy of improvig the 'pedia. -- Brown HairedGirl  (talk) • (contribs) 01:33, 16 December 2021 (UTC) The bare URLs tag is redundant when the ref should be removed. The fact that this IMDB URL is bare is wholly secondary to the fact that it shouldn't be there at all. The priority is to remove the ref ... and its bareness doesn't deserve a mention at all, let alone being given top billing in a banner at the top of the page. Or in simple language, because it is deeply absurd to invite editors to fill a ref which should be removed, and which has already been tagged for removal.
 * URLs don't cease to be bare because you disagree they should be there.  IS a bare url. I will not approve a bot, i.e. dumb-as-a-brick-no-context-mindless-automaton, to remove valid cleanup tags because you do not personally agree the source should be present in the first place. I do not think you'll find any BAG member that will approve such a task either. The scope is to remove no-longer relevant bare URL tags, not remove unreliable sources. &#32; Headbomb {t · c · p · b} 01:52, 16 December 2021 (UTC)
 * on the contrary, the dumb-as-a-brick-no-context-mindless-automaton (your phrase) is Headbomb's insistence that a ref which shouldn't be there at all needs a big banner at the top of the page to say that it should be filled in before deletion. That banner is a completely inappropriate response to the issues on that page.
 * Your statement that this is about my view (you do not personally agree the source should be present in the first place) is demonstrably false. I did not add the better source needed tag; it was added in this October 2017 edit by User:Rfl0216.
 * Do you really want to argue that a WP:USERGENERATED website should not have been tagged as better source needed?
 * Or you do you really truly believe that a ref to an unreliable source which has been inline-tagged as such also needs a big top-of-the page banner saying that it is bare? Really really really? --  Brown HairedGirl  (talk) • (contribs) 02:13, 16 December 2021 (UTC)
 * I'm not saying it shouldn't have been tagged with 'better source needed', I'm saying it's still a bare url making the removal of the 'this article has bare urls' tag inappropriate. &#32; Headbomb {t · c · p · b} 02:45, 16 December 2021 (UTC)
 * @Headbomb: thank you for dropping that absurd claim that an IMBD ref being correctly tagged by someone else as unsuitable was some sort of weird personal quirk of mine. However, it seems to me that you are still taking a robotic approach which wholly misses the purpose of this exercise.
 * Per the nutshell of WP:CLEANUPTAG, tags are used "to inform readers and editors of specific problems with articles or sections".  So this is about how best to solve problems.  These tags are not some sort of attempt at perfect scientific classification of all the flaws on a page.
 * The guidance at WP:CLEANUPTAG is very helpful:
 * "Don't insert tags that are similar or redundant".
 * "If an article has many problems, tag only the highest priority issues".
 * So if we follow the guidance, that Cleanup bare URLs was removed correctly.
 * Do you really want to argue that the guidance would support its retention?
 * The purpose of Cleanup bare URLs is very simple: to inform readers and editors that a bare URL ref needs to be filled. But on that page there is no bare URL which needs to be filled; there is a bare URL which needs to be removed.
 * Why do you want to waste the time and energy of editors who cleanup bate URLs by drawing their attention to a page which does NOT have a bare URL to be filled?   Brown HairedGirl  (talk) • (contribs) 03:14, 16 December 2021 (UTC)
 * I'll flip the question around, why do you insist on including this tiny minority of articles in the scope of your bot, when two BAG members independently told you they were problematic. I will not approve this task as is, and I doubt any other BAG member will approve it as well, short of having an RFC where the community deems it acceptable for bots to remove bare url tags when there are still bare urls in the article. &#32; Headbomb {t · c · p · b} 03:20, 16 December 2021 (UTC)
 * @Headbomb: the answer to that question is very clearly answered above. But I will repeat:
 * because although the removal of those tags is a GIGO quirk, it is a quirk which will always be appropriate, because per the WP:CLEANUPTAG the Cleanup bare URLs banner gives undue priority to a secondary issue.
 * because progamming the bot to accommodate the guideline-denying demands of two BAG members in respect of this minority issue would add a lot of complexity. That would waste my time, reduce transparency, increase the risk of error .. and all to retain a tag which should not be there.
 * As to your demand for an RFC, that is also absurd. Why on earth do you want an RFC to determine whether to follow existing guidance?
 * I'm sorry to say this, Headbomb, but at this stage your stance is starting to look like perverse obstructionism.  Demanding an RFC on whether it is appropriate to remove a banner "fill this bare URL" tag for a ref which should be removed?  Really really really?
 * I am trying to fil bare URLs, and through various methods I have in the last five months filled all the bare URLs in well over 100,000 articles, and filled some of the URLs in many tens of thousands more articles.  Removing redundant cleanup tags will assist my work and that of other editors.
 * So what on earth are you trying to achieve by making a stand in favour of inviting editors to fill a ref which should be removed? Is this about something other than the issue at hand?   Brown HairedGirl  (talk) • (contribs) 03:49, 16 December 2021 (UTC)

There is a lots of potential for good bot work to be done here, but this task cannot be approved as is. This come from both from the lack of demonstrated consensus for a bot to remove valid Cleanup bare URLs, to the lack of willingness of the operator to limit the scope of the bot to obviously non-controversial edits (over several months of the BRFA being open), and the general WP:BATTLEGROUND mentality on display here. The task can be resubmitted in a new BRFA when and if these concerns have been addressed, either through an RFC establishing the community supports the bot-removal of valid Cleanup bare URLs tags when the bare urls are potentially problematic, or a modification of the bot task's scope to avoid removal of Cleanup bare URLs tags when bare urls remain in the article. &#32; Headbomb {t · c · p · b} 08:10, 16 December 2021 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard.