Wikipedia:Bots/Requests for approval/Gaelan Bot 2


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard. The result of the discussion was

Gaelan Bot 2
Operator:

Time filed: 12:07, Monday, February 7, 2022 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): JS, Rust

Source code available: a bit of a mess at the moment, but happy to publish on request

Function overview: On file pages, remove fair use rationale and friends for pages that no longer use that file.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Remove_redundant_FURs_from_file_pages

Edit period(s): one time run for now

Estimated number of pages affected: <5,842

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: Many file pages include fair-use rationales that are no longer necessary. For example, File:AppleIIGSOS.png has a FUR for Palette (computing), but that article doesn't actually use that image. This bot finds those cases as follows:


 * 1) A xml dump and parse_wiki_text are used to find all File: pages containing one of these templates with an Article parameter.
 * 2) This is cross-checked against the results of this query (and a list of redirects, also extracted from the xml dump) to find which FURs are unused.

The resulting data is here. I've hand-checked a few dozen of these, and they seem fine. There are some cases like File:ContinentalSquare.JPG which have an FUR that accidentally links to the wrong article (the FUR links to Continental Center instead), which'll get removed by this bot. My thinking is that this is fine - it should get flagged up as having no FUR, and someone can rescue it from history? Not sure.

The actual editing part of the bot isn't implemented yet, but it should just consist of using pywikibot or mwn to loop over the JSON linked above, double check that the FUR still exists and is unused (as I'm working with a dump that's a week old at this point) and remove it.

For now, this'll just be a one-time run; I'd like to figure out an efficient way to run it continuously, but I'll file a new BRFA when we get to that point.

Discussion
My thinking is that this is fine - it should get flagged up as having no FUR, and someone can rescue it from history? But will someone? Or will some other bot or bot-like human come along and tag it for lacking a FUR? Keep in mind that fair use bots have historically been highly controversial. To what extent that was inherent in the task versus was due to the attitude of the operator I don't know, but people still may be touchy about the whole idea. It might be safer to limit the initial version to just those images that will remain fully FURred after the bot's edit, and to tag images also lacking a needed FUR for human attention (or ignore them for now). Anomie⚔ 12:48, 7 February 2022 (UTC)

P.S. If you do get to the point of wanting to run it continuously, the fact that the current version gives time for humans to revert vandalism that may have removed the images from the articles (by working from a week-old dump and only removes the FURs that were unused then and are unused "now") is a good thing that should be preserved. Anomie⚔ 12:48, 7 February 2022 (UTC)
 * I'm uncomfortable with the idea of a bot removing fair use rationales at all, but at minimum it must account for vandalism (it seems to do this) and page moves, mergers and splits (I'm not certain it attempts this). In the case of page moves and at least some mergers, the bot should follow any redirects and update the FUR if the image is used on the target page. If a redirect has been nominated at RfD then the bot should still follow the redirect - while most redirects from moves should be kept, there are occasional exceptions and there is going to be a large overlap between editors who don't know they are usually kept and those who don't know that FURs will need updating. I don't know how splits can be automatically detected. If there is consensus to remove FURs that are unusued, it would be much better for the bot to move them to the talk page with an explanation, perhaps something like:
 * "On Gaelan Bot found that this image was not in use on the article(s) listed in the template below.
 * If the image has been restored or moved to a different article or title and the file page has no Free Use Rationale (FUR) for the current location, you should either move the template below back to the file page and update it appropriately or write a new FUR.
 * If the file page does contain a FUR for all current uses there is no action you need to take.
 * If you think the bot got something wrong, please leave a message with details at ."
 * Thryduulf (talk) 10:49, 23 February 2022 (UTC)
 * I am not seeing a lot of confidence from those who have commented that this will be able to effectively deal with the issue presented at the BOTREQ without creating too many false positives and situations where images might be improperly altered after this removal. If these issues can be accounted for (noting that the bot operator has yet to respond to any of the above comments) then discussion can go further, but at the moment I am leaning towards declining this. Primefac (talk) 13:58, 27 February 2022 (UTC)
 * Hi. Sorry, Real Life has been a lot these past few weeks. I'll try to come back to this soon, but some quick notes:
 * It might be safer to limit the initial version to just those images that will remain fully FURred after the bot's edit: This is a good idea. If we went ahead with this, I'd limit it to files that have at least one other FUR, and maybe separately maintain a list of pages that seem to have no valid FURs.
 * In the case of page moves and at least some mergers, the bot should follow any redirects and update the FUR if the image is used on the target page. This is partially done: if a FUR refers to a redirect, that redirect is followed, and the destination page considered. I wasn't planning on updating the FUR to link to the redirect target, but that might be a good idea at some point.
 * One other possibility—and I haven't looked at the data to see how many useful edits this would exclude—is to only remove FURs when every current usage of the file is covered by an existing FUR. That (along with the anti-vandalism delay) should remove most of the issues with removing useful FURs—if every existing usage is covered, further FURs are (I think) pretty clearly not useful. Gaelan 💬✏️ 23:30, 28 February 2022 (UTC)
 * One other possibility—and I haven't looked at the data to see how many useful edits this would exclude—is to only remove FURs when every current usage of the file is covered by an existing FUR. That's what I was suggesting by saying "fully FURred". It's easier to start small and expand than to start too big then have to fight pushback. Anomie⚔ 12:50, 1 March 2022 (UTC)


 * What is the status of this request? Following up as it's been over a month since the last activity on this page. Are you still wanting to pursue this? -- The SandDoctor Talk 17:09, 3 April 2022 (UTC)
 * No reply from operator after two months. Primefac (talk) 14:59, 7 May 2022 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at Bots/Noticeboard.