Wikipedia:Bots/Requests for approval/GreenC bot 10


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was

GreenC bot 10
Operator:

Time filed: 15:25, Wednesday, February 6, 2019 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Awk

Source code available: TBU

Function overview: Add to candidate File pages.

Links to relevant discussions (where appropriate): Bot_requests

Edit period(s): Weekly

Estimated number of pages affected: 30

Exclusion compliant (Yes/No): Yes

Already has a bot flag (Yes/No): Yes

Function details: Add template to File: pages on EnWiki that have the same name on Commons. It uses Quarry 18894 to find candidate articles.

Discussion

 * — xaosflux  Talk 15:12, 7 February 2019 (UTC)


 * Comment - I would very strongly suggest the bot use it's own query running directly rather than relying on a manually updated quarry one. ShakespeareFan00 (talk) 18:32, 7 February 2019 (UTC)
 * I hadn't planned on that. Just downloading the JSON with each run. What problem do you foresee? -- Green  C  18:52, 7 February 2019 (UTC)
 * There's no guarantee that a Quarry query is updated in a timely way. ShakespeareFan00 (talk) 01:00, 8 February 2019 (UTC)
 * What does 'timely' mean for Quarry? The database on Tools has a replication lag, also. The tool is only running once a week or so. -- Green  C  01:22, 8 February 2019 (UTC)
 * are you doing any checks if the shadow template is already in place (to avoid placing a second one), and/or that the file is actually shadowing? If so it won't really matter too much if this is delayed or using older data. This would be for cases where on edit someone else has already tagged the file, or the commons file has since been moved or deleted (i.e. the same checks we would expect of a human editor). —  xaosflux  Talk 13:42, 10 February 2019 (UTC)


 * Those are good points. I was going to check for the existence, but hadn't thought to check that the shadow exists. Both are relatively easy and not costly and yeah it would resolve any problem with delays in the replication server pool. -- Green  C  15:52, 10 February 2019 (UTC)
 * Looks like Quarry is not stable, the link to the JSON file changes with each run of Quarry. It will connect to the DB directly. -- Green  C  19:29, 10 February 2019 (UTC)

Images are the same

 * one of the images is File:Mosh kashi self portrait.jpg (Commons]. According to the template instructions, when the images are the same, the template should not be used. Not the only case, also File:Léon-Vasseur.jpg and probably others. What would happen in these cases? A bot can't determine the images are the same. Should it add the template anyway - or is the bot not viable? -- Green  C  07:23, 11 February 2019 (UTC)
 * One solution: add the template regardless. The burden will be manual removal of the template. This is less work than manual addition of the template, as the ratio of additions to removals is high. It can also leave instructions in the template like:
 * The bot will keep a record and not add a second time. As a bonus the bot will now have a list of images that are the same, if ever needed. -- Green  C  08:12, 11 February 2019 (UTC)
 * That sounds reasonable. Identifying images for CSD F8 (i.e Images identical), would be a related task. You could use an image hash to check IIRC. ShakespeareFan00 (talk) 09:31, 11 February 2019 (UTC)
 * Like with File:Mosh kashi self portrait.jpg they have different dimensions so it's complicated. Will keep image comparison in mind, it would probably require a machine learning API and some other work. Currently the bot is skipping images with templates, , and  (+ aliases) as well as anything with the magic word  . Anything else to avoid?  --  Green  C  16:31, 11 February 2019 (UTC)
 * Huh. I was expecting that someone would file such a bot in due time. Having worked on Shadows Commons cases in the past, I have a few thoughts:
 * Not sure that files with and  should be ignored. They simply say that a file can't be copied over and should be kept (respectively), not that it should stay at its file name.
 * Shadows Commons has  and   parameters; perhaps if the bot encounters files with  and  it should set the parameter to "yes"? And in the case of  it might also set the parameter   to ""?
 * What is the problem with  files?
 * Jo-Jo Eumerus (talk, contributions) 17:01, 11 February 2019 (UTC)
 * Hi Jo-Jo Eumerus, thanks for the info.  as they are high-risk (use on the main page etc) so renaming or moving to Commons would likely be avoided? I'm on-board with keeplocal as replacement for . Not positive about  as that template is further embedded in 8 other templates. Something like   and moving any of those 8 templates creates complexity of embedded templates and reason (for future bots and tools). It would still work with separate templates I believe.  --  Green  C  18:28, 11 February 2019 (UTC)
 * It is confusing with all the moving parts. Current thinking what action to take when the bot encounters:
 * No templates - add Shadows Commons
 * Shadows Commons - do nothing
 * - do nothing? Or add Shadows Commons. Uncertain.
 * Keep local - delete and replace with Shadows Commons with yes
 * Do not move to Commons - keep and add Shadows Commons
 * Now Commons - keep and add Shadows Commons
 * Thoughts / comments? -- Green  C  22:16, 11 February 2019 (UTC)
 * Thoughts / comments? -- Green  C  22:16, 11 February 2019 (UTC)


 * I would ignore anything tagged Now Commons, as those have already been identified. ShakespeareFan00 (talk) 17:49, 12 February 2019 (UTC)
 * Done. -- Green  C  17:59, 12 February 2019 (UTC)
 * Actually it was done in the SQL you gave me, but I added a few more aliases and backup regex check in the source. The current SQL list. The additions are all aliases.

('ShadowsCommons',  'Shadows_commons',   'Shadows_Commons',   'Now_Commons',   'NowCommons',   'Nowcommons',   'NowCommonsThis',   'Now_commons',   'CommonsNow',   'NC',   'NCT',   'Nct',   'Db-now-commons',   'Db-nowcommons',   'Uploaded to Commons',   'Pp-template',   'Keep_local_high-risk',   'Pp-upload',   'C-uploaded',   'C-upload',   'C uploaded',   'C-uploaded',   'M-protected',   'Main page protected',   'Mpimgprotected',   'Mprotect',   'Mprotected',   'PP-main',   'PP-main-page',   'PP-mainpage',   'ProtectedMainPageImage',   'Uploaded_from_Commons',   'Protected_sister_project_logo',   'Rename_media',   'lfr',   'Image_move',   'Media_rename',   'Rename_file',   'Rename_image',   'Rename-image',   'Rename_media',   'RenameMedia',   'Renamemedia',   'Ffd',   'FFD',   'lfd',   'Imagevio',   'PUF',   'Puf',   'PUi',   'Pui',   'PUIdisputed'  )

Trial results
Trial results:


 * File:AcharavadeeW.jpg
 * File:BT_Group_logo.png
 * File:BVN_logo.svg
 * File:Arbeiderpartiet.svg
 * File:BetterDays.jpg
 * File:Bahar2.jpg
 * File:SISTER.jpg
 * File:INSIDENOAPARACAL.jpg
 * File:DJ_Brian_Rikhotso.jpg
 * File:Prerana.jpg
 * File:Logo_of_UCC.jpg
 * File:Rompecabezas.jpg
 * File:Heart_of_Darkness.jpg
 * File:Elsa_Cayat.jpg
 * File:William_Whiteley.jpg
 * File:TFCA_logo.png
 * File:Garia_railway_station.jpg
 * File:Slingshot.jpg
 * File:Showdown.jpg
 * File:Gualberto_Villarroel.jpg
 * File:Puppet_Master.jpg
 * File:John_Kronus.jpg
 * File:Friedrich_Körner.jpg
 * File:Field_Day.jpg
 * File:Cheer_Up.jpg
 * File:Jugs.jpg
 * File:Fortuneteller.jpg
 * File:Spotted_Dove.jpg
 * File:Meri_Maa.jpg
 * File:Love_symbol.png
 * File:Hugh_Macmillan,_Baron_Macmillan.jpg
 * File:Wonderworld.jpg
 * File:Léon-Vasseur.jpg
 * File:The_Violinist.jpg
 * File:Partner.jpg
 * File:Love_is_all_around.jpg
 * File:Garlin_Gilchrist_II_in_Ann_Arbor_(cropped).jpg
 * File:DBS_Bank_Logo.svg
 * File:Create_logo.svg
 * File:OUTSIDENOAPARACAL.jpg
 * File:New_Garia_railway_station.jpg
 * File:Josef_Grohé.jpg
 * File:Jiraiya.jpg
 * File:Harold_Day.jpg

I accidentally issued a "-continuous" to jsub which circumvented the bots internal halts so it processed all available (44) instead of 33. I forgot the bot message which is now included. Question about a few cases like File:Garlin Gilchrist II in Ann Arbor (cropped).jpg that have and have been copied but the image still exists on Enwiki. Should it be tagged? -- Green  C  17:41, 12 February 2019 (UTC)
 * I think yes, they should still be tagged. Jo-Jo Eumerus (talk, contributions) 17:43, 12 February 2019 (UTC)
 * Ok. -- Green  C  18:00, 12 February 2019 (UTC)

SQL Query me! 18:04, 19 February 2019 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.