Wikipedia:Bots/Requests for approval/Stefan2bot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Stefan2bot
Operator:

Time filed: 00:59, Monday September 10, 2012 (UTC)

Automatic, Supervised, or Manual: Automatic.

Programming language(s): Python.

Source code available: User:Stefan2bot/shadowsCommons.py

Function overview: Adding ShadowsCommons to Wikipedia files shadowing a Commons file.

Links to relevant discussions (where appropriate): Wikipedia talk:Database reports

Edit period(s): One-time run.

Estimated number of pages affected: ≈6-7000

Exclusion compliant (Yes/No): Yes.

Already has a bot flag (Yes/No): No.

Function details: At WT:DBR, there was a request for a database report for Commons "shadows". A Commons shadow is a local file on Wikipedia which hides a Commons file because the files have different file names. For example, File:Moonrise.jpg is a typical shadow: there is a local file which prevents Wikipedia pages from using the Commons file.

I have a text file listing a few thousand file name conflicts, and I have been looking at the file manually in the past few months, trying to resolve conflicts. The request at WT:DBR suggests that other users may be interested in the contents of the file.

The idea is to let a bot read that file and add ShadowsCommons if needed. The bot will read the text file and confirm that there is indeed a file with the same name on both Wikipedia and Commons. If the files are different, the bot will add ShadowsCommons to the Wikipedia file information page. --Stefan2 (talk) 00:59, 10 September 2012 (UTC)

Discussion
Do you intend to tag only files where the file contents are different or do you intend to tag all files where there is naming overlap? How will you generate such a list? (Would it help to have a database report for you to work off of?) --MZMcBride (talk) 01:38, 10 September 2012 (UTC)


 * Database report: Someone posted a list of overlapping file names somewhere (Commons Village pump?) about half a year ago, and I downloaded that file. Most files in the list are still present on both projects. An updated list would be convenient, but it would still be possible to tag several thousand files without a new list.


 * What to tag: It would be very stupid to tag ShadowsCommons everywhere if the same file name exists on both projects. For example, lots of keep local files exist in identical copies on both projects and should not be tagged with ShadowsCommons. Also, a file should not be tagged with a second ShadowsCommons tag if it already has one. The idea is to check the MD5 hash of the files and confirm that the value differs. --Stefan2 (talk) 02:10, 10 September 2012 (UTC)
 * The API provides SHA-1 hashes, did you mean that? Or are you planning on calculating each md5 hash? LegoKontribsTalkM 02:38, 10 September 2012 (UTC)
 * Oops, yes, wrong hash function. --Stefan2 (talk) 03:00, 10 September 2012 (UTC)

 MBisanz  talk 00:38, 11 September 2012 (UTC)

See Special:Contributions/Stefan2bot. Only 48 edits, though. Some comments about specific files:
 * File:ITV4 HD.svg: Note that the files really are different. The file on Wikipedia is 67 bytes, but the one on Commons is 66 bytes.
 * File:Full Circle.jpg: The bot code contained a typo, so files were tagged with ShadowsCommons even if they already had a tag. This has now been corrected. --Stefan2 (talk) 09:55, 12 September 2012 (UTC)

Can you publish your source code? --MZMcBride (talk) 07:48, 13 September 2012 (UTC)

 MBisanz  talk 00:48, 19 September 2012 (UTC)

Source code available at User:Stefan2bot/shadowsCommons.py. You need to go to the edit window in order to read the code properly. Sorry for the delay. I wanted to add some extra comments to the code to make it more readable to other people and I kept postponing this. --Stefan2 (talk) 00:42, 25 September 2012 (UTC)
 * No worries. I edited the page just now to make it a bit easier to read. You really want to include a license in the file as well. GPL, MIT, CC-0, etc. Plus author information (your username) and year of creation. This isn't strictly necessary, but it's essential for anyone else to be able to (safely) re-use your code. --MZMcBride (talk) 17:59, 30 September 2012 (UTC)
 * The source code was placed on a Wikipedia page. Any text on a Wikipedia page is automatically licensed under GFDL and CC-BY-SA. Not ideal licences for software, I know, but the code is licensed. I suppose I could add GPL or something to make it easier to combine the code with other programs. I re-added the &lt;nowiki&gt;: I like to have it there in source code on wiki pages to prevent accidental categorisation by template transclusion. Nothing seemed to put this program in a category, but I think that it is useful to always have a tag there. --Stefan2 (talk) 18:43, 30 September 2012 (UTC)


 * but you can talk about licensing on your talk pages.  MBisanz  talk 19:36, 30 September 2012 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.