Wikipedia:Bots/Requests for approval/Hamlet Prince of Robots


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol oppose vote.svg Withdrawn by operator.

Hamlet Prince of Robots
Operator:

Automatic or Manually assisted: Auto

Programming language(s): PHP, my classes

Source code available: soon

Function overview: Deleting "Images available as identical copies on the Wikimedia Commons," per CSD

Links to relevant discussions (where appropriate): Botreq - will also spam the notice boards once I have posted this

Edit period(s): daily

Estimated number of pages affected: 50-100 a day? (unsure, once the major backlog is gone, the amount deleted daily will be reduced significantly - I think I will introduce a hard limit of max 200 deletions per day, just to spread the load abit)

Exclusion compliant (Y/N): Yes, obeys templates such as do not move to Commons and hangon as well as nobots

Already has a bot flag (Y/N): N

Function details:
 * Gets images from Category:Wikipedia files with the same name on Wikimedia Commons
 * If it complies with CSD it deletes the image, otherwise the bot ignores it

Discussion
The only major problem I see would be determing whether "The image's license and source status is beyond reasonable doubt, and the license is undoubtedly accepted at Commons.", but I feel this could be done correctly by ensuring that the image is tagged with a license accepted at commons and checking that the image does not contain any non-free image tags (e.g. sometimes linux screen shots are tagged with both GPL and non-free tags, in these such cases the bot would leave it for a human), or any deletion/license dispute tags. There will obviously be the risk of false positives, but I feel the need for such a bot to clear the backlog outweighs the small amount of images that may be deleted on commons and have to be undeleted here. -- Chris 10:08, 8 June 2010 (UTC)
 * As an admin who occasionally helps out in CSD#F8, I think this is long overdue. OhanaUnitedTalk page 11:44, 8 June 2010 (UTC)

Oh and another point about the first one, the bot would ignore any images that no longer exist on commons -- Chris 10:01, 9 June 2010 (UTC)
 * I don't like this and am very much opposed to it. A gracious plenty of images that get moved to Commons are obvious copyvios where someone just slapped a tag on them and a well-meaning, but ignorant of image policies, user moved the image over to Commons.  Others are copied there without OTRS data or with incorrect attribution information that makes it sound like the Commons user transferring the image is the uploader.  While I agree that the backlog is long and needs to be dealt with, blindly processing it is not the answer.  Perhaps if it were only used to delete images that were moved by some limited list of trusted users or some such thing that would be ok, but there are far too many of them that are just plain incorrect.  I randomly clicked on a few images just to prove my point.  The first one I clicked on was File:The Firm 2009 film.jpg, a fair use image which some time ago was deleted at Commons.  The second one was  File:DSCN5920-w-d-e-close 300x400.JPG, which is a user's personal photo.  Even though yes, it's a free license and we can move it, etc, we have always respected that a user may not want their personal photos moved to Commons and respected their wishes in that matter.  A bot would blindly delete it, but an admin would ask the user if he really wanted it to be moved. The next one was File:Wakefields knuckleball.jpg, which I see a lot of these so I'm sure it's a bot that does it, but the Commons information page gives as the date, the date that the image was transferred to Commons.  A human deleting the Wikipedia image would look at this and correct the date to be the one in the EXIF data or the date it was uploaded to Wikipedia or something more useful.  File:WaterlooLogo.jpg is a flickr image and the bot uploading it credits the en uploader rather than the flickr author.  Sorry, but every single one of these I click on has something that needs to be fixed.  A bot isn't going to do that. --B (talk) 13:21, 8 June 2010 (UTC)
 * Ok
 * File:The Firm 2009 film.jpg - the bot would have ignored this image as it has nonfree copyright tags on it
 * File:DSCN5920-w-d-e-close 300x400.JPG - the bot could leave a notice on the uploaders' talk page warning of the impeding deletion - if the user hasn't removed the deletion tag in seven days the bot deletes the image?
 * File:Wakefields knuckleball.jpg - the commons date clearly states the date is the date when the image was uploaded so I don't think there would be any confusion there, however it would be possible for the bot to determine when there is an inconsistency in the date shown on the description page and the date in the image Metadata
 * File:WaterlooLogo.jpg - actually it preserves the source information in the description field, so it is easy to tell that the image is from flickr
 * Support in principle: Is this something a bot should do? Yes. Is this something a bot can do? Maybe. B makes some good points about why bots working in the area should be careful. But as a bot operator myself, I can see that there are possibilities here. I think that really, asking admins to check everything that B suggests is going to lead to no-one bothering to clear the already large backlog. Why don't we trust the people that moved the files? Wakefields never had the true date of its capture in its description. Why put the onus on the admin who has to delete the duplicate image once it's already been transferred to fix that? (Why don't we/Commons instead, say, have a bot that runs on exif data adding it to FileDescs?) Would a deleting admin really bother to add it anyway? WaterlooLogo is and remains attributed to its Flickr author, just not in the author field. In fact, it's attributed better on Commons because Commons is more used to handling Flickr files. So why shouldn't we delete the dupe here to reveal that. And so on. Let's just make it a bloody good bot. - Jarry1250 [Humorous? Discuss.] 13:44, 8 June 2010 (UTC)
 * Compare the upside with the downside. The upside is we have a backlog cleared.  While clearing a backlog is an achievement, I suppose, there's really no downside to this particular backlog - these images are all free (supposedly) so there's no copyright issue ... nothing is really deleted anyway so we're not saving disk space ... so the upside is we get the "backlog cleared" feather in the cap.  The downside is that we send junk to Commons and potentially hide the correct information such that instead of any person on the planet being able to fix it, and instead of a process that guarantees at least one human in whom the community has expressed confidence will see it, the image with potentially incorrect data is on Commons where it may never be noticed and only en admins can fix it.  It's better to correctly edit the description when moving them to Commons to begin with, but unfortunately, we can't control that process, so the next best thing is for a human admin to review and fix it.  I don't think it has to be an admin - I think that we could have something analogous to the Commons "trusted user" idea where non-admins who understand what they are looking for and aren't just blindly copying things over can be depended upon to copy the correct data ... but I don't think the bot should blindly be led by the blind. --B (talk) 14:26, 8 June 2010 (UTC)
 * How about allowing any user to review the transfer, fixing things as they see fit, and then tagging the image in some way, giving an adminbot the heads up to delete? Something like ? The problem is that only admins can deal with this problem at present, which makes the backlog pile up. We already have Orphaned image deletion bot deleting user-tagged unused media, so why not this? Nautica Shad  es  14:39, 8 June 2010 (UTC)
 * Deleting fair use orphans isn't really a problem because we're not dumping them off on someone else. I think allowing the bot to delete images that have been flagged by a trusted user would be fine.  Another possibility would be for the bot to fix problems (missing OTRS, incorrect attribution, obviously incorrect dates, etc) then flag the image.  Something else that might take out a lot of the copyright violations is to not delete images uploaded by people who have had over some threshold of their images deleted as copyvios.  The uploads of such users obviously need increased scrutiny. --B (talk) 14:46, 8 June 2010 (UTC)


 * The bot should yield to the c-uploaded tag. These images shouldn't be in the speedy deletion category to begin with, but just in case. Shubinator (talk) 15:15, 8 June 2010 (UTC)
 * Yes, it will also ignore any protected images as is stated on WP:CSD -- Chris 09:48, 9 June 2010 (UTC)


 * I'm opposed to this idea in general, Deletions are something that, in my view should be manually checked by the administrator deleting them to guarantee that there are no false positives (I know they can be deleted but that shouldn't be necessary). I also wish to draw the bot proposer's attention back to the policy in regards the name that they have choose, It should be named after the operator or the function that it runs where possible. Peachey88 (Talk Page · &#32; Contribs) 09:08, 9 June 2010 (UTC)
 * In an ideal world we would have the numbers to have admins do all deletions manually and asses the situation properly, however that is not the case. At the moment I think we have around six adminbots approved to do deletions and I think this is an area that is botable and could use the help of a adminbot to help clear the backlog. That specific section of policy is poorly worded and intends to dictate common practice. The intention of it to ensure that a bot account is easily identifiably as a bot account and it is also easy to identify the specific bots' task. This bot fits the first requirement both in that it has 'robot' in its username, and (as with all my bots) will have the text (adminbot) in it's deletion summaries. As for fitting the second requirement once the bot is approved and we have worked out the specific details of what images the bot will delete and what images it will leave for humans etc, I will update the userpage with these details as well as a link to this brfa so it is easy for a passing user to determine the bots task and whether it is acting in its approved scope, this I feel is more informative than a username such as 'CommonsDeletionBot' which would fit the policy's requirements but fail to inform the user of the extent bot's scope and function. -- Chris 09:48, 9 June 2010 (UTC)
 * I am sufficiently nervous of this proposal that I am presently opposed. Checking for non-free licences and delete tags is nowhere near sufficient for dealing with "license and source status is beyond reasonable doubt". Would the bot check to see if the criteria "to avoid deletion on Commons" are met? The image description on Commons would have to be tested by parsing potentially free text. How can the bot reliably determine "Country where the artistic work represented by the image is situated, or where it was first published" and be sure any countries mentioned are intended to signifify the location of the artistic work and not, say, the nationality of the artist? In free text, how does the bot distinguish dates of death, creation, publication etc? Or, pardon my ignorance, is all this information required to be specified in templates these days? Is it suitable to delete files whose Commons descriptions are not machine-parseable? Will the bot honour the stipulations in Category:Wikipedia_files_with_the_same_name_on_Wikimedia_Commons?, particularly the first stipulation? Warning the uploader is not satisfactory since important images may have been uploaded by users long departed from WP: images are unlike articles. There is a least a possibility the bot will delete all sorts of images inappropriately. Is there a general requirement on bots that a reversion process is quickly and easily available or does that have to be stipulated for each bot (again, showing my ignorance)? Thincat (talk) 11:24, 9 June 2010 (UTC)
 * This could lead to inappropriate deletions, but human interaction doesn't fully guarantee otherwise either. For such a huge backlog, though, probably the unambiguous images (I'm not sure how widespread is information's use currently, but images using it would be a good start) will still be numerous enough for this bot to provide a noticeable relief. Admins could then focus on the cases really requiring human intervention. --Waldir talk 11:59, 9 June 2010 (UTC)
 * Perhaps if we tried to better restrain the Commons uploading process itself and come up with a way to have one ncd tag for trusted users that understand image policies and another one for people who have not yet earned that level of trust. A trusted user would need to check over the latter, but the bot could just delete the former.  If we could get a handle on people blindly uploading garbage to Commons without copying all of the relevant data or exercising a modicum of discretion over whether it really belongs there, then having a bot delete it would be less likely to be a problem.  Until then, having a bot delete the image takes out our only opportunity for a (theoretically) trusted user to look at the image before deleting it. --B (talk) 13:08, 9 June 2010 (UTC)
 * Checking these images is a task which should be done by humans. This bot can't replace humans. multichill (talk) 18:37, 9 June 2010 (UTC)
 * Strongly oppose. Images should NOT be deleted until it has been checked that all info has been transfered correctly to Commons. Once the image is deleted only enwiki admins can check the info. As long as original is still visible on enwiki then everyone can check the info. I have found many examples where users "copy" the image to Commons and change a PD-self to GFDL or something like that. Therefore admins should NOT delete the image untill they have checked that everything is ok. When they have done that they can also click the "Check now!" link on the image on Commons. Right now thousands of images need a check in Commons:Category:Files moved from en.wikipedia to Commons requiring review but because enwiki admins has deleted the originals we can not check the images. That mean that enwiki admins should ALSO check the images in this category.
 * I suggest:
 * Transfer process is improved. Either a better transfer bot or a funcktion just like "move" so it is possible to move images to commons without loss of information.
 * Admins do NOT delete images before they are checked. That way more users can help check the images. That way info can be corrected on Commons and then it is much easier for deleting admin to make sure it is ok.
 * If enwiki admins can not manage the job alone then all Commons admins also get admin rights on enwiki so they can help check the images. If you do not like the idea that commons admins get full acces (ability to block users etc.) then invent some "file (un)deletion admin"-account. --MGA73 (talk) 18:49, 9 June 2010 (UTC)
 * There is a user access level called researches that allows users to review deleted file histories. Granting this permission to Commons admins might be a very good idea.  Regarding the rest of it, if admins are blindly deleting images without removing them from the various bot-generated categories, that's a problem and should be pointed out to them, but it would obviously only be worse with bots deleting the images.  I think some number of admins (maybe a SWAG of 75%) do the appropriate due diligence before deleting an image, but if a bot starts doing it, that drops to zero. --B (talk) 22:02, 9 June 2010 (UTC)
 * This is a very interesting proposition. Besides being able to view deleted revisions of images, it would also be good that commons admins could actually delete images here. Probably a new user group would have to be created for that, but I think it's worth it, as that would help a great deal in clearing the backlog. What do you guys think? --Waldir talk 16:33, 14 June 2010 (UTC)
 * Perhaps this issue should be "moved" to somewhere else? I mean this is not a bot request :-) --MGA73 (talk) 17:25, 14 June 2010 (UTC)
 * What do you think of the Idea lab? --Waldir talk 07:49, 20 June 2010 (UTC)
 * I'm happy with anything :-) But as you can see at the bottom it seems that there has been a discussion like this allready. But perhaps time is right to make a change. --MGA73 (talk) 21:13, 20 June 2010 (UTC)
 * Acutally it would be quite easy for the bot to work around this problem. First of all the bot can ensure that the license tag(s) on enwiki matches the tag(s) on the commons image, if it does match the bot ignores the image (or it could list it on a page for human review, or whatever). Secondly before the bot deletes the image on enwiki, it could copy the text on the enwiki description page over to commons (place it in a new section on the image's talkpage or something), thus allowing commons admins to verify the transfer (this would help solve the problem of enwiki admins deleting the images before commons admins can verfiy them, although it would also mean the bot would have to be approved to run on commons - but since it won't need any admin rights that shouldn't be too hard). -- Chris 10:07, 10 June 2010 (UTC)


 * Would it be workable for the bot to simply move the entire image description page over to Commons, just to ensure information does not get lost? (Perhaps in a separate section.) Ucucha 16:46, 11 June 2010 (UTC)
 * *cough*, it doesn't have to be mved to the talkpage, but I think talk page would be better as it would remove the confusion as to why the description page seems to be duplicated. -- Chris 03:10, 12 June 2010 (UTC)
 * How about this? The bots checks the common image history for, and checks for a diff of it bieng removed by a commons user. This ensures that the information transferal has been verified. Nautica Shad  es  16:56, 13 June 2010 (UTC)
 * If the bot checks that has been removed by a commons user and perhaps also if license is the same then the risk of errors should be reduced to a limited level. Perhaps we should make a test to see which images a bot like that would delete? Then we have something to check for any errors.
 * Sadly that only works with the "new" deletions. We still have thousands of "old" deletions so if it is possible to give commons admins acces to deleted versions it would be nice. --MGA73 (talk) 18:24, 13 June 2010 (UTC)
 * I commented on this idea above, where you first proposed it. --Waldir talk 16:33, 14 June 2010 (UTC)
 * I'm not sure I understand you, MGA. Would you mind explaining? Nautica Shad es  18:44, 15 June 2010 (UTC)
 * I guess MGA73 is talking about deleted image review. multichill (talk) 20:05, 15 June 2010 (UTC)
 * Yes or even allow Commons admins to delete (F8) and in some cases undelete (no good example right now). --MGA73 (talk) 19:59, 17 June 2010 (UTC)
 * Doesn't X! run the adminbot EyeEightDestroyerBot to do exactly the same thing? (apologies if I'm not allowed to edit here)  Set Sail   For The   Seven Seas   291° 10' 30" NET   19:24, 23 June 2010 (UTC)
 * No, that's only for files tagged with PBB Image citation, but yeah, probably the code could be reused, or the bot be approved for a second, broader, task. But I still think giving commons admins file deletion permissions on enwiki would greatly help on the cases where a bot couldn't unambiguously decide. --Waldir talk 06:39, 26 June 2010 (UTC)
 * Strong oppose. Images need to be checked by a human for conformity with CSD#F8. Very often images are tagged with free licenses that are actually copyrighted (e.g. nearly every modern American sculpture photograph on Wikipedia). These images need to have their tags changed to fair use tags instead. If this bot is run, the images will simply be moved to Commons where they will be summarily deleted. Kaldari (talk) 01:17, 20 July 2010 (UTC)
 * What do you think of (no deletions, just tagging)? multichill (talk) 04:33, 20 July 2010 (UTC)
 * Support alternate approach. Sounds like a much better idea that will cause fewer headaches :) Kaldari (talk) 21:05, 21 July 2010 (UTC)

Test

 * BAG assistance neededQuestion? Would it be possible to get a test? I think we should give it at try. --MGA73 (talk) 19:11, 27 June 2010 (UTC)
 * I don't think we need a trial (which, for adminbots, can be problematic) just yet, why not have a list of pages which would (not) have been deleted? - Jarry1250 [Humorous? Discuss.] 20:35, 4 July 2010 (UTC)
 * Yes a list of files not to delete but also a list of files that can be deleted would be nice. That way we can see if the code works as it should. --MGA73 (talk) 20:56, 4 July 2010 (UTC)

/Log gives a basic idea of what would be deleted and what would be left (note, the bot also does other checks such as making sure both the commons & enwiki images have the same hash (i.e. are the same image), however they aren't shown in the log because they were never triggered) -- Chris 09:34, 11 July 2010 (UTC)
 * I checked the first images and so far it looked good. Ofcourse some images still look a little bit messy on Commons and could need a cleanup. I would prefer that the bot skipped images if version on Commons still had a "Bot Check needed" like . This image also had a Cc-by-sa-3.0-migrated and therefore had both a GFDL with and without disclaimers. Such images should also be skipped to start with. We can always see how many will be left after the bot has deleted all the others (if it gets approved). However, it is to late now so I have not checked all images - I would like to check some more before I say "Lets roll". --MGA73 (talk) 22:34, 11 July 2010 (UTC)

I checked some more images and have these comments:
 * I do not think we should pass files with “PD-self” on Commons unless the user is the same.
 * Files should not be passed if they are nominated for deletion on Commons (example Commons: File:Tyumen Wooden Buildings 01.JPG)
 * Commons: File:Trimerliamide.jpg is cc-by-sa-3.0 but enwiki is both cc-by-sa-3.0 and GFDL. File:Ranchi.jpg is PD on enwiki but PD+GFDL+CC-BY on Commons. Files should only be passed if the licenses are the same.
 * File:Rephotography-crocker.jpg is from Flickr. We should be careful with these (it is PD on enwiki but All rights reserved on Flickr). Perhaps it is best to skip files from Flickr
 * File:Pocketp.gif is from the web – not sure we can check that by bot except if we skip all images containing www or http.

I do like the idea and would hate to see this project dropped because it is too hard to make a bot that can check all possible issues. But I would also not like a bot to delete too many images, that should not be deleted. Perhaps we could start this project in a few steps:

We create a template for the bot to add on passed files. The template should say something like this:
 * Attention! This file has been reviewed by “Hamlet Prince of Robots” and the bot found this image to be transferred correctly to Commons. This image can therefore probably be deleted per F8. However, this bot is still in test so admins should NOT delete this image without a review [link to relevant place] and any problems should be reported to [link to special page or this bot request].

Template should put images in Category:Wikipedia files reviewed on Wikimedia Commons by Hamlet Prince of Robots and the category should have a warning telling admins to check before they delete (something like the template).

Then we could make the bot do a run on 100 files and if that goes as planned we could run on 500 files and if that also works good then we could either run on the rest this way or allow the bot to delete if it works with almost no errors.

Even if the bot does not delete images I think this is a good help. It makes it easier to spot files that is expected to be easy to check. After that we could concider to let the bot review files with one of the “errors” on the list. Example if it is from Flickr. That way we get a lot of files with a similar problem to check. --MGA73 (talk) 12:09, 12 July 2010 (UTC)

Those comments are valid and I am working on incorporating extra checks in the bot to deal with them. I would rather not have to run the bot through another BRFA sometime down the track to get it approved to delete, so I would like to suggest that we run an extended trial (somewhere between a few weeks, to a month or two), during which the bot tags images with a template like you suggested. Afterwards I think we will be able to better determine the bots effectiveness and decide whether it should be approved only for tagging, or if it is safe enough to delete images. -- Chris 11:52, 16 July 2010 (UTC)
 * If a new test works just fine I would be happy to let the bot delete the images but I think that a bot need more than just my vote ;-) --MGA73 (talk) 17:07, 16 July 2010 (UTC)

Another approach
Loop over all images marked with NowCommons This will dig up a lot of images. This will also make it easier for users. If I review a file at Commons I know the bot will catch up and mark the image at the enwp as reviewed. It also makes it easier to hunt down sloppy reviewers. This should be pretty straightforward to implement (at least in pywikipedia it is). multichill (talk) 11:23, 17 July 2010 (UTC)
 * If it's reviewed, skip this image
 * Look if the file is available at Commons
 * If not, remove NowCommons
 * If it is, check if BotMoveToCommons is still on it
 * If it is, skip this image
 * If it's not, search the history to find out which user checked the image (=removed the template)
 * If you can't find it, skip the image
 * If you can find it, check if the removing user has a SUL account with an active account on the English Wikipedia
 * If that's not the case, skip the image
 * If that's the case, mark the image as reviewed by that user. You probably want to set the bot field too
 * That should also work if deleting admins on en-wiki remember to check images before they delete to see if "the reviewer" knows how to do it right. Not all reviewers has the same quality of work so batch deleting is a bad idea unless we are pretty sure that reviewer is doing a good work.
 * We also have to be sure only to delete images with the same name or unused images. If image has a different name and is used then usage should be replaced before deletion. --MGA73 (talk) 19:01, 17 July 2010 (UTC)

Sorry, I've been neglecting this way too much. I thought I would have time to work on this, but I don't. -- Chris 04:46, 7 August 2010 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.