Wikipedia:Bots/Requests for approval/BJBot 3


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

BJBot
Operator: Bjweeks

Automatic or Manually Assisted: Automatic

Programming Language(s): Python with pywikipedia

Function Summary: Remove fair use image from non-mainspace pages.

Edit period(s) (e.g. Continuous, daily, one time run): Daily

Edit rate requested: 8 per minute

Already has a bot flag (Y/N): Y

Function Details: The bot will go though a list of fair use image being used outside of mainspace and remove them from any pages using the image that are not in the mainspace, if is then an orphan tag it as appropriate. If the image was removed from a user page leave the editor a message advising him/her on why it was removed.

Discussion
I support this (and indeed perhaps take some collateral blame for the idea). I would, however, add some caveats: obviously doing this in the user: space has in the past proven somewhat controversial (though selective linkifying is a 'lighter touch' approach than some have used in the past (such as mass blanking)). Also, there may be some "legit" instances of fair use images in the template namespace, though these should be the rare exception to the general rule: ideally there'd be a 'whitelist' maintained somewhere. Other namespaces I see no reason not to go ahead with immediately. It appears the majority of these are actually occur on article talk pages, for some unknown reason; I'll have to double-check that for false positives arising from whackiness in the categorisation of images, of which I have little doubt there will be large amounts. (Probably images in both the "free" and "fair use" categories, which probably need some sort of separate treatment by way of a "please clarify the status of this image" tag.) Alai 19:18, 23 February 2007 (UTC)
 * For false positives I'm checking to make sure a known fair use template in on the page first just like with the OrphanedFairUse bot. A whitelist is a good idea and I was already thinking of putting one in but I think it should be private SoSomebodyWhoKnowsWhatThey'reDoing TM must add the page to the list. BJ Talk 17:40, 27 February 2007 (UTC)
 * Is there a distinction between "known fair use template", and anything that categorises images in the "fair use images" category? Is there such a thing as "known 'free image' tags", and likewise, does that differ from the "free images" category?  The overlap between the latter pair seems quite large, but possibly my criteria are blunter than the bot's (though we can take the detail of that off-page).  I'd prefer if the whitelist were not private as such, on the basis of transparency being the surest and speediest way to fix problems, but I could see a case for protecting it if there's thought to be potential for abuse.  Alai 05:05, 14 March 2007 (UTC)

Depending on how you plan to handle image inclusion that occurs via template expansion, you might want to give some thought into how your bot will interact with Template:Gallery. I've seen it in use on many user pages, usually to display thumbs of images that the user has uploaded or that they particularly admire. Template:Gallery also occasionally appears elsewhere outside mainspace. I don't think the bot could safely convert these images into links. How do you plan on handling these galleries? Do you intend to remove fair use images from galleries, perhaps by commenting them out in the template parameter? If so, perhaps a note should be added to the corresponding talk page? —RP88 14:59, 28 February 2007 (UTC)
 * Mmm those can be dealt with vie a regex replace function as far as I understand. Instead of Find '\[\[(? Image:Blah)\]\]' Replace . You should be able to modify that to \[?\[?\s*(? Image:Blah)\]?\]?' Replace  With some success. I've not looked into this further to see what to do in cases of say... image captions, but that should be a rather trivial modification of the above regex. I am of course assuming that you are using regex in the first place ;) ——  Eagle  101 Need help? 20:34, 6 March 2007 (UTC)
 * For images removed from the userspace a message will be left. As for breaking people's galleries, I really don't care, they can fix it themselves. BJ Talk 17:25, 10 March 2007 (UTC)
 * No, that doesn't work for me. Even if the user is doing something wrong, you can't break their userpage. — M ETS 501 (talk) 18:15, 10 March 2007 (UTC)
 * Time to go do some testing, will post when done. BJ Talk 18:36, 10 March 2007 (UTC)
 * Ok, linking is not going to work the images will have to be commented out, which doesn't break anything. Changing request as such. BJ Talk 18:41, 10 March 2007 (UTC)
 * Can't you just skip usages in gallerys on user pages? Or, comment those out, and linkify all others?  Unless I'm misunderstanding something, this sounds like a matter of Bigger and Better regexes, as Eagle says.  Alai 05:13, 14 March 2007 (UTC)


 * This bot's code will need to be adjusted to not function on the pages in Category:Wikipedia fair use exemptions, or their subcategories. These pages are often time crucial to operating the project (see FUE). — xaosflux  Talk 01:45, 15 March 2007 (UTC)
 * (addendum) These are mostly categories so would not be affected by this bot, but discussion pages get added here from time to time to deal with issues. — xaosflux  Talk 01:50, 15 March 2007 (UTC)
 * I don't see anything on that page will be affected by the bot. BJ Talk 02:12, 15 March 2007 (UTC)
 * Will BJBot_3 allow exceptions in general? I'm specifically thinking of the fair use images used on templates that were or are on the main page. --Iamunknown 02:00, 15 March 2007 (UTC)
 * Yes, a whitelist exists. BJ Talk 02:12, 15 March 2007 (UTC)
 * Those templates are now tagged in the FUE exemptions category. — xaosflux  Talk 04:46, 15 March 2007 (UTC)
 * OK, not anymore, looks like we don't want FU on MP anymore (Wikipedia talk:Fair_use_exemptions). — xaosflux  Talk 12:18, 15 March 2007 (UTC)
 * Fair use can appear in FA of the day, and it's for humans to decide which are allowed, not a bot. Likewise the FA of the day queue - where folks post the synopsis of FAs for consideration for front page status - should also be skipped. There must be a whitelist and if there's any doubt humans must decide. Beyond that, this seems to be a worthwhile task for a bot. --kingboyk 16:05, 18 March 2007 (UTC)
 * I think that instead of a whitelist, the bot could be set to only remove the images in question from userspace and user talkspace, creating a list of those it finds in all other namespaces to be sorted through by humans. The fact that (excepting Main page related things, vandalism, and BJAODN) I've only seen one fair use image placed outside of the user (talk) and article (talk) spaces suggests that there will really hardly be any instances of this. Because of that, I think a whitelist is likely to be more time-consuming than a simple dump on a subpage. Picaroon 20:09, 26 March 2007 (UTC)
 * What BJAODN page? They shouldn't be there. --Iamunknown 19:25, 28 March 2007 (UTC)
 * Of course they shouldn't, but they are. This and this are two removals I made from just one page; who knows how many there are throughout the whole 60 or so? Picaroon 19:33, 28 March 2007 (UTC)
 * Thanks for doing that. At one point I went through all 61 of the main BJAODN pages and watchlisted all of the fair use images; but, they got swallowed by my watchlist and I haven't had time to go back yet. --Iamunknown 19:35, 28 March 2007 (UTC)
 * Agreed, I will put that into the code today, thanks. BJ Talk 20:45, 26 March 2007 (UTC)


 * Please be away that some fair use tags such as Money can be applied to public domain items as well. Also, images in Category:Fair use images used with permission should be left alone. I am working on a similar project at User talk:HighInBC/FU in userspace. HighInBC(Need help? Ask me) 19:23, 28 March 2007 (UTC)


 * I've got another question for BJ: why will it only comment out the images? They should be removed straight out so they can't get put back in. Unlike the images OrphanBot removes from articles these images are never going to be appropriate outside of articlespace. So there's no reason to simply leave them commented out. Picaroon 19:39, 28 March 2007 (UTC)

What is the status of this request? There seem to be a lot of questions above, which really need answering and appropriate modification made to the bot. Thanks, Mart inp23 11:49, 17 April 2007 (UTC)

BJ seems to be pretty seriously wikibroken currently, so this'll perhaps just have to be archived before too much longer. But for the sake of clarity, for the possible benefit of anyone else proposing to tackle a similar task: have we determined whether, and in what circumstances, image transclusions should removed without a trace, vs. commented out, vs. linkified? Alai 16:31, 18 April 2007 (UTC)
 * I don't think the images should be commented out. I think they should just be removed. The model I imagine is removing the image, leave a message on the talk page of the person in who's userspace the page exists (if it is in userspace). In all cases, leave detailed edit summaries like I do . --Durin 21:50, 18 April 2007 (UTC)
 * I agree, Generally fair-use images are removed, and commenting them out makes it easier for an unsuspecting editor to put them back in, and it also has to be clear that it isn't vandalism - if possible, though it's probably hard, the person who added it should receive an advisory. ST47 Talk 23:12, 18 April 2007 (UTC)

Here's another point for possible posterity: do we have a clear understanding as to what to do on a per-namespace basis? From the above, it appears that they should be removed immediately from the user: and user_talk: spaces, and "listified" from the template namespace. But what about other talk, category, and wikiproject pages? It'd be very odd if any of those were being transcluded into a FUE page, and really ought to be fixed if they are. Can those be blanket-removed? (I'm not sure either way about portalspace.) Incidentally, the listification could be done without a bot, it's just a matter of synching up with a db dump to ensure it's fairly recent and accurate, so if there's individuals or a cleanup project wanting to work on these... Alai 00:24, 19 April 2007 (UTC)
 * I'm not sure that we can make a determination variance per namespace. The display of fair use images is not allowed outside of mainspace and that policy makes no distinction whether it be user, portal, or projectspace.
 * I've done a few thousand of these removals. The vast, vast majority (probably more than 99.7-8% of them never required me to turn them into links. Occasionally, it does happen. Witness this diff. So, can a bot do this work since it requires a Mk 1 Human Eyeball to figure out when to remove and when to link? Yes, a bot can. My reason is this; even if the bot is wrong, and should have linked instead of removed, in the rare instances when this would happen other contributors can come along and fix it without there being much disruption to Wikipedia, the project, which is the encyclopedia...not these other namespaces. --Durin 12:58, 19 April 2007 (UTC)
 * Just for 100% clarity: are you saying that's the case for all non-article namespaces aside from template:, or all non-articles, bar none?  Alai 02:31, 20 April 2007 (UTC)
 * I think the only times I found it appropriate to link instead of remove are on user galleries of images they have uploaded or like, and on talk pages. But, it's been rare. --Durin 16:28, 20 April 2007 (UTC)
 * Here, I'm not referring to linking vs. removal, I mean with regard to fair use exemptions in template transclusions, as discussed above. Alai 17:17, 20 April 2007 (UTC)
 * Forgive me, I wasn't certain. Fair use simply isn't permitted on templates. There are ways to make a fair use image appear through the use of a template if it is transcluded to a main namespace article without it appearing anywhere else. That situation is acceptable, but there's precious few templates that do that. I'm not sure how to train the bot to find (and ignore) such cases. Or perhaps the better way is to insist that a template using a fair use image must call that image as a variable, thus preventing the need for having a fair use image even mentioned in a template. --Durin 17:38, 20 April 2007 (UTC)

I'm going to close this as expired then - BJ is free to reopen the request whenever they get back. On the discussion directly above, it would be helpful if this could continue on a Village pump of a noticeboard, so we can get a consensus/clarification. BotExpired Thanks, Mart inp23 20:29, 21 April 2007 (UTC)

New discussion
I'm back so I'm reopening this. I wrote the SQL query to get all pages using fair use images outside of mainspace and there are 2339 pages with fair use images in userspace alone. My regexp skills have improved since this request was originally opened so ignore my stupid comments above. The issues main issue left is, are there legit uses of fair use images in non-mainspace and if so how should they be dealt with.
 * That's quite a few more than the 2064 I got last time I ran ImageBacklogBot - can you post the query? --uǝʌǝsʎʇɹnoɟʇs (st47) 13:50, 30 December 2007 (UTC)
 * It seems you already have a bot coded to do this, if so I uploaded the results of the query here, User:Bjweeks/Bad fair use and you can close this request. BJ Talk 22:00, 30 December 2007 (UTC)
 * Please run your but against that list, as mine and yours certainly have separate regexen and such. --uǝʌǝsʎʇɹnoɟʇs (st47) 22:02, 30 December 2007 (UTC)
 * Run of about 50 edits done, no mistakes after I fixed a few bugs. BJ Talk 06:27, 1 January 2008 (UTC)

– Minor bug has been worked out; operation is otherwise flawless. — madman bum and angel 20:05, 6 January 2008 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.