Wikipedia:Bots/Requests for approval/ImageResizeBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol neutral vote.svg Request Expired.

ImageResizeBot
This is a proposal for an automatic process run by Eagle 101 to resize oversized non-free images according to NFC

The concept is very simple. We have ~8,000 - 9,000+ (using higher thresholds, we get 5000+ to cleanup) oversized non-free images. By this I mean the images are obviously too large, and we can make these images smaller by resizing them down to the size of the thumbnail on the image page. For example, lets consider Image:Logchart.jpg. Its 7,808 × 4,448 pixels, very large for an unfree image. As such, this should be resized to a smaller number. I've arbitrarally chosen the size of the thumbnail itself to be the new size, that would be 800 × 456 pixels. Quite a difference in size eh :).

The bot will do all the work of resizing the image to the correct size (a ratio from the thumbnail size), and uploading this new revised image to the encyclopedia. The only tag that I can forsee the bot needing to apply is directly to the image page itself, so that an administrator can come along and delete the old oversized revisions. Thistag is, fair use reduced. If we desire, we can have the bot notify the uploaders, but in this case I feel this is overkill as there will be no noticable differences to the image inthe article.

The bot will resize all images larger then 360,000 500,000 square pixels, this can be changed and adjusted if need be. 360,000 500,000 pixels is equivalent to a 700x700 square image.

Again, the bot should have no effect on how existing images render in the articles themselves, just the full sized image will be reduced. As such I ask that this bot be allowed to operate at about 3 uploads per minute until the backlog is cleared. This will require about 40-50 hours of operation.

I do request that as soon as we agree this task is desired, even if we don't agree to all the parameters, that I be allowed to test the bot on a few fake images. :) In addition, I'll have the perl source released as soon as I finish basic testing (making sure we can upload etc). ——  Eagle 101 Need help? 18:57, 17 March 2008 (UTC)


 * You guys can have a look at what images are affected by taking your browsers to User:ImageResizeBot/List1, just a warning its long. The numbers next to the image link, is the image's current size. ——  Eagle 101 Need help? 20:21, 17 March 2008 (UTC)

Todo
Just a small todo section to help us keep track of what needs to be done, technically before we can make this thing live.
 * Category:Image_copyright_tags and Category:Non-free image copyright tags should not intersect, except for the following: Template:Dated dfu, Template:Deletable image, Template:Di-disputed fair use rationale, Template:Di-no fair use rationale, Template talk:Di-no fair use rationale, Template:Di-no license, Template:Di-no source, Template:Di-orphaned fair use, Template:Di-replaceable fair use, Template:Di-replaceable fair use disputed, Template:Di-replaced fair use, Template:Don't know.
 * 1) Test aminated GIF issue.- approved, just need a free image.  Solution is to ignore all images of mime-type 'image/x-gif' Even easier, it works out of the box :). See http://en.wikipedia.org/wiki/Image:ImageResizeBotTestImage.gif. I know its a square, but I did not try to make it proportional. This is just a PoC. ——  Eagle 101 Need help? 05:25, 18 March 2008 (UTC)
 * 2) Make use of User:MBisanz/FURD to replace.
 * 3) Create and make use of a tag category? to allow exceptions in cases where Wikipedia needs a larger non-free image. Can someone other then me make this tag? Thanks.

Discussion

 * Sounds like a great idea. What program / software will be used to do the reductions? --MZMcBride (talk) 19:23, 17 March 2008 (UTC)
 * Perl + imagemagik. ——  Eagle 101 Need help? 19:24, 17 March 2008 (UTC)
 * I'd say that you should not apply furd. First, a human should compare the images to make sure that no technical bug caused a significant change to the image as displayed.  Second, taking your example image above, the original uploaded already made an editorial decision to use only a fragment of the true original for the image.  That editorial decision is a creative act, so it doesn't seem appropriate to eliminate that authorship history.  Is there already a better template that indicates that human evaluation is required?  If not, I suggest creating one and using it instead.  I think the main thrust of the bot task is reasonable.  GRBerry 19:29, 17 March 2008 (UTC)
 * OK, thats reasonable, I would hope the admin would confirm with the furd, but I can see your point. If someone wants to make an alternate template, please do so, as I suck at templates. XD As far as the reduction goes, that image is clearly too large. We can easily reduce the image size to no effect at all on the article proper. Unless I'm missing something of course. :) Edit: I should be a little clearer, the program's reduction of the image size in this case will not affect the article at all, see Logarithm, look to the right. ——  Eagle 101 Need help? 19:33, 17 March 2008 (UTC)
 * Ok, replying to self, I think I picked up why removing the original is bad. We lose the authorship history. A good way to prevent this would be to perhaps have the bot put those details in the description. Is this a possible answer to this concern? Sorry that it took me a second to realize what you were getting at ;). ——  Eagle 101 Need help? 19:41, 17 March 2008 (UTC)
 * That is indeed the thrust of my second reason, and that solution would satisfy the need to preserve authorship data. We still need a human to compare the two images prior to deletion; just to make sure the bot didn't change the image.  Imagine a technical bug that processed a run one file off; so every small image got uploaded on the page for the prior/later image in the run.  No article pages would have been updated - but all the images would be wrong.  Or a perl bug that fed the wrong parameters to imagemagik and the images came out rotated, or ...  Those are the class of bugs I'm still worried about having a human check for prior to deletion of the original.  GRBerry 19:50, 17 March 2008 (UTC)
 * Ok, sounds good, we need a new template. Someone wants to offer their skills? It would be greatly appreciated. As far as that bug you mention, its not possible due to the way things are processed :). However before this does anything serious, as in 1,000's of changes, the source code will be made public under the GPL. ——  Eagle 101 Need help? 19:54, 17 March 2008 (UTC)
 * I don't see any reason why we'd need a huge image, but I would suggest creating a template that you make known to existing users to include on an image page as part of the rationale of why they need an image that large for non-free works. (Technically, this would help us further machine-readible-ize our NFCs, by saying that any image over a certain number of pixels will have the potential to be reduced unless this template, which should include a more descriptive reason for the large size, is included).  Then the bot should ignore such images, though I'd recommend that any image that is tagged as such should be looked at closely by an admin and evaluated if the rationale is sane.  --M ASEM  19:39, 17 March 2008 (UTC)
 * Yes, that is a possibility. However I'm hoping to stick to images that are large enough that there really is no reason for a larger image. Remember fair use images only need to be good enough for the article, we don't have to have a higher resolution available. As far as I understand, its better that we don't. If you can elaborate, please do. I also welcome someone actually creating this template and linking that template :). ——  Eagle 101 Need help? 19:43, 17 March 2008 (UTC)
 * There is a Low_resolution parameter in many existing FUR templates that allow an author to input their reasoning. I am not sure creating another template is the solution here (simply because it would have to be put into common use, which would take a while). I am not sure 600 x 600 is large enough to be definitely too big, myself. - AWeenieMan (talk) 19:50, 17 March 2008 (UTC)
 * Please see my response below, if that does not address what you are getting at please re-ask below... trying to keep things somewhat organized XD. Thanks! ——  Eagle 101 Need help? 19:56, 17 March 2008 (UTC)
 * (after many edit conflicts) I am not convinced doing this without an editor request to shrink the image beforehand is a good idea. What about a large non-free image with a very specific rationale that explains why that level of detail is necessary (I could imagine there are pictures over 600 x 600 that includes such a rationale)? Are you prepared to dissect that information in the multitude of ways it could be presented (templates, non templated, etc). I would be more comfortable with you creating a template (or using a variation of ) that would basically have a human say, this needs to be reduced (maybe even input a size) and have a bot just do the dirty work (you could also add the when you are done, because an admin will have to visit the page to delete the image anyway). Also, what size are you planning on uploading the images at exactly (are you always using the thumbnail size)? For example, I would reckon that your could scale down any album cover to say 400px by 400px (probably even smaller). But if I am reading your proposal correctly, they would most likely end up at varying sizes. Also, is there a reason you have chosen not to use the standard BRFA template? I think the bot task has a lot of merit, I just think some of the details should be hammered out a little more before letting it loose.
 * I did not use the BRFA template, because I felt that writing it as prose was better. Wikipedia is not a bureaucracy. If someone insists on the template, they may go ahead and add it. I figured there would be loose ends, hence the reason why I've cross posted this to 3 different locations :). As far as the size issues, the idea is to nab those that are way too large. If you can show me one example of an image that is greater then 360,000 pixels in size, that should not be reduced, we can consider upping the threshold. If you would like, I can place a list of all affected images to a subpage of the bot's userspace if this would assist you and others in evaluating the task at hand. ——  Eagle 101 Need help? 19:52, 17 March 2008 (UTC)
 * I only asked about the BRFA template out of curiosity. I don't really care, myself (as I have read your prose and gathered most of the answers for myself). The template simply makes it easier to grab some of the facts quickly. I would be interested to see a list of affected images, but I am not going to lie to you, I have no intention on going through them all. Out of curiosity, how many are there in total? My only concern is that some editor may have felt the need to have a larger image (even explained why), and they may be upset to find a bot reduced it without discussion. - AWeenieMan (talk) 20:06, 17 March 2008 (UTC)
 * To my other point, I am just not sure thumbnail size is the optimal size to choose. It may be a good default setting, but I think there may be some other heuristics for specific image types (determined by the licensing template) that might be useful. - AWeenieMan (talk) 20:17, 17 March 2008 (UTC)
 * The requested list is at User:ImageResizeBot/List1. Its long, but all the information is there. The image that would be resized, and the image's current size. ——  Eagle 101 Need help? 20:20, 17 March 2008 (UTC)
 * Excellent, thank you. Alright, well, there does appear to be many ridiculously sized images on that list, so I definitely see the need. (Asking new question below). - AWeenieMan (talk) 21:31, 17 March 2008 (UTC)
 * (ec) Musing on the image size issue. I'm uncertain how the Mediawiki software handles image sizing for display on monitors. But I know that available monitor resolutions are growing over time.  Will there come a point where monitor resolutions will have grown to a size that a 600x600 image is a standard size thumbnail? If so, we'll want large images again then.  I know current top of the line digital monitors display by default in a resolution that is a multiple of the best I can achieve on my home CRT monitor, and it gets a multiple of the monitor resolutions available when I started using computers (an Apple II).  On the other hand, I don't know of a Moore's Law for monitor resolutions, but our article says it applies to digital camera resolutions.  GRBerry 20:12, 17 March 2008 (UTC)
 * Well... you have to remember we are dealing with non-free images here. By that I mean, the goal is not super high resolution here, just what is currently needed. :S Discuss :) There is a reason we are asked to keep non-free images small, we really don't need to display much larger then the size required by the article currently. P.S. I can explain to you how mediawiki handles it, but I don't want to go too far off topic, to put it short, it stores thumbnails that are displayed on the article, and stores a full sized version if someone clicks the image. For non-free images we only need the thumbnail. These are usually kept fairly small due to bandwidth usage. You really don't want to load 10 1MB images to read an article :) ——  Eagle 101 Need help? 20:23, 17 March 2008 (UTC)


 * Question: What do you think about an image such as Image:Cannibalised.jpg? As it stands, it is just above your mark (600 × 601) (I agree, it's much too big in its current state), but shrinking it to thumbnail size makes it just below your mark (599 × 600), which seems like a minor difference. And then there are images like Image:Aroundtheworld.jpg where the thumbnail seems to be the full size of the image. These are really just test cases to me, as I am just wondering how you plan on handling them (not an argument to ignore them in any way). - AWeenieMan (talk) 21:31, 17 March 2008 (UTC)
 * There is probably some marginal difference in size where the costs exceed the minimal benefits. That particular example would shrink 1 pixel in each dimension, and that is a 0.33% change in total pixel count.  If the images are to be reviewed for deletion of older revisions, that marginal difference that is appropriate is higher than otherwise.  Does the software to be used indicate any level of noise/deterioration that it might produce by resizing?  I'll be shocked if it does (certainly not in the marketing materials) but if so that could inform a benchmark.  What are the other costs - data storage of additional versions (deleted versions presumably remaining in the deleted history), download/upload bandwith, bot edits in history, possible human review...  Being over the 360K limit by 25% would when shrunk to 360K pixels shrink each dimension 10.56%.  Being over by 10% would when shrunk to 360K pixels shrink each dimension 4.65%. (1 - 1/sqrt(1+%over360K)) So 10% to 25% pixel count margins seem plausible to me - and the bot could flag these as "current version too large, please shrink to X by Y when the image is edited for another reason".    GRBerry 22:14, 17 March 2008 (UTC)
 * (ec) Well seeing those, I'll probably increase the minimal size to 400,000 square pixels. Basically that puts a bit of leeyway, and gets rid of the problems you mention. Now, the idea from here would be to do the resizing of the obvious cases, get those down to at least thumbnail size. As far as the costs, this bot is actually operating from the wikimedia toolservers, so all the queries are direct database queries, as far as getting that list. The actual resize requires that the bot download the image (to memory, not to hard disk), change the size, and upload the new image size. This is all done inside of RAM. The point at this stage is really to get the obvious violators, not split hairs, I did not realize that the smaller ones would result in a 1 pixel change. ——  Eagle 101 Need help? 23:20, 17 March 2008 (UTC)
 * Well, I fear that you might eliminate a lot of the issues, but you will still have borderline cases. For example, Image:ChoosecologotypeTM.jpg will be resized to 410k (and Image:ShereKhanJBSM.jpg will be resized minimally). This is all a factor of the size of the thumbnail being as large as possible to fit inside an 800x600 box (this is configurable in preferences, too, so there are multiple thumbnail sizes on the server, it would seem). I see two potential ways to handle this. One would be to only work with images above size X (eg 400k) but make sure to size them below size Y (eg 360k). The other would be to just define the size you are going to make them (i.e. define a maximum dimension). Am I missing the reason that you are tying the new size to the thumbnail? - AWeenieMan (talk) 23:53, 17 March 2008 (UTC)
 * Not a very large reason, no but it does give us a base line for what can reasonably be considered a thumbnail. I do understand there are going to be a few edge cases. Not everything will be able to be machine resized, that was and never will be the goal. However both images can use some resizing, especially the second one you mentioned. I'm starting to think a broad first pass on the larger cases will do us best. Perhaps we should start with images with areas larger then 500,000 pixels. At least get those down to an acceptable size, then work on a category by category bases getting things down to an acceptable size. All this will have to be debated as to what the max size is, and there will be exceptions to any rule we put up, these will have to checked by admins before they delete the old revision containing the larger image. Its really easy to undo a resize, I think non-admins can undo the bot as well. We just need to come up with a way that is acceptable to everyone as to how to mark things as an exception to the general rules. Using 500,000 as the minimum size, we still have 5,000+ images to resize. ——  Eagle 101 Need help? 00:07, 18 March 2008 (UTC)
 * Sounds like a good idea. Now that you are well above 480k, you should be able to size to thumbnails and eliminate any really close edge cases. And yes, any user can simply revert the resize, so that's not a big deal if the error rate is acceptably low. - AWeenieMan (talk) 00:14, 18 March 2008 (UTC)
 * From User:David Shankbone, I understand that deleting the oversize version doesn't actually save any space on the servers, since the oversize version is still kept around (like a deleted article). If this theory is correct, has it been factored into the calculation of benefits? EdJohnston (talk) 23:16, 17 March 2008 (UTC)
 * (ec x2)Correct, we know that, the point is to remove the high resolution version of the image. Using our non-free content policies means we use the smallest version we can use. ——  Eagle 101 Need help? 23:20, 17 March 2008 (UTC)
 * Thinking this through, if the bot's upload summary or other note had "shrunk from X1*Y1 to X2*Y2" in it, then even if the original is deleted ordinary editors could know that a more detailed one is in the deleted history and what size it was, should things ever get to the point we want it back. Things can get there two ways - the image falling out of copyright due to timing or release by the copyright holder and monitor resolutions obeying some form or Moore's law.  If an ordinary editor can know, then when appropriate they can also make the needed undeletion request.  GRBerry 03:19, 19 March 2008 (UTC)
 * It is probably easier to just put the full information somewhere on the description page, or if we can fit it in the summary, we can place the name of the original uploader, original size, and date of first upload. That may suffice, and would be easier on me to actually program ;). Lets discuss this below, finding this point in the discussion is difficult. Would there be any concerns if we did it this way? ——  Eagle 101 Need help? 14:28, 19 March 2008 (UTC)

Logos
I've recently discovered that when the Cat:Logos was converted to Cat:Non-Free Logos, the appropriate counter-cat Cat:Free Logos, for simple geometric shapes and words that can't be copyrighted, was not created. One estimate is that about 10% of the 70,000 logo images are actually free. Is there some what you could exclude this cat at first while I try and figure out how to sort and re-tag the logos? Could you give me an intersection of Large Images and Non-Free Logos to see how big an issue this is?  MBisanz  talk 23:18, 17 March 2008 (UTC)
 * About 800 logos will be affected, you can see the full list at User:ImageResizeBot/List2. If needed we can delay actions on this category if what you mention is a real concern. Check the list and see if anything in there should not be resized because the image is possibly free. I appreciate anything you can do to assist. ——  Eagle 101 Need help? 23:31, 17 March 2008 (UTC)
 * Bah! Image:Five.png. :S How long do ya think it will take to sort this category out? At least the 800 items I've listed? I'm willing to delay actions on this category, provided that folks fix the issues with the category so we can do it at a later time. Cheers! ——  Eagle 101 Need help? 23:40, 17 March 2008 (UTC)


 * Went through the list. These 10 are the only I'd say could be free and even then shrinking wouldn't destroy them.   ,

,, , , , , , ,  Given that this would be a rate of error of .01%, I'm not concerned.


 * How would the bot handle images with 2 license tags. Like a Non-free historic image and an OTRS permission?  MBisanz  talk 23:57, 17 March 2008 (UTC)
 * How should it handle it? ——  Eagle 101 Need help? 00:08, 18 March 2008 (UTC)
 * P.S. Lest someone else get fooled by your extremely small percentage, the correct error rate here is 1.25%. (10/800). Is that acceptable to folks? Remember this bot will be fairly easy to undo. :) ——  Eagle 101 Need help? 00:10, 18 March 2008 (UTC)
 * I took it as a % of 70,000, but thats probably my logo-centric view. I'd say it should shunt multi-license/conflicting license images to some holding list for human review.  A lot of times its that the OTRS guy forgot to remove the non-free code or that someone came along and slapped a non-free tag on something like a NASA free image.  But that just my view.  MBisanz  talk 00:14, 18 March 2008 (UTC)
 * Well... things like that need looking at and fixed, no? Someone/something is going to have to work on making that determination, and I have a feeling that a bot won't do that. :) ——  Eagle 101 Need help? 01:18, 18 March 2008 (UTC)
 * If we had an idea of what the conflicting licenses might be, we could probably just run an intersection and fix them without involving the bot. If the bot does stumble across one, I think logging and ignoring is probably the safest bet. - AWeenieMan (talk) 01:27, 18 March 2008 (UTC)
 * Well, give me an idea of what I need to skip and log? I'm sorry I'm not totally uptodate with all the tags, etc. My forte is programming :) ——  Eagle 101 Need help? 01:37, 18 March 2008 (UTC)

(undent) I would say take a look here. If ever you find two tags from different sections, skip it and log it. If it finds two non-free tags (which happens a lot for some reason), it should proceed. Basically, if it sees any evidence of a free tag, it should skip it and log it. I would have to look into the OTRS templates (I cannnot find a good list), but here are a couple:,. I would say if you come across any image with an OTRS template, to log it.
 * Alright, I'll need a full listing of templates that indicate for me to skip. I would prefer someone familiar with this area to give me a list, rather then me inventing one. If I invent one, I'll probably be wrong in several cases, and cause unneeded drama ;). Feel free to edit in the bot's userspace and create a list of templates to skip on. I'll go from there as far as this issue. ——  Eagle 101 Need help? 02:02, 18 March 2008 (UTC)
 * Ideally nothing in Category:Image_copyright_tags should cross with Category:Non-free image copyright tags outside of

Template:Dated dfu, Template:Deletable image, Template:Di-disputed fair use rationale, Template:Di-no fair use rationale, Template talk:Di-no fair use rationale, Template:Di-no license, Template:Di-no source, Template:Di-orphaned fair use, Template:Di-replaceable fair use, Template:Di-replaceable fair use disputed, Template:Di-replaced fair use, Template:Don't know, which should interesect on a temporary basis.  MBisanz  talk 03:24, 18 March 2008 (UTC)

Where to start
I figured a new section might make things easier to follow. I am wondering what people think the proper place to start would be. Obviously working off of the list posted, but I am not sure blindly shrinking them in one run is the way to go. Ultimately, I think we will need to find some consensusfor the size of images that fall into several categories (if not a consensus for the smallest size, a consensus for an acceptable size). Starting with just the biggest images might mean they should be resized again. I am thinking it might be best to take a section of the large image list (something fairly simple like album covers or movie posters) and just work on determining a good size for them (just taking the ones off your list for now). Doing it this way, you can ignore edge cases really (as the new size will be below your original mark of 600x600). I think this approach might be easiest to follow in the end. Thoughts? - AWeenieMan (talk) 01:25, 18 March 2008 (UTC)
 * That is a possible approach, and is something the community needs to figure out. With the number of images we are talking about, I don't think resizing twice will be an issue, however this bot still needs to demonstrate its technical merits at some point in the future. Ie, I need approval to do testing on free (test) images to make sure the bot will behave as advertised. Could bag look into approving me to run this bot on test images only, as many as I need to do until I'm satisfied it works? (I've done plenty of programs for wikipedia before, so I'm not new at this, I know where my bounds are :)). The approval for testing on images is just so that we are clear that the bot is running and that I am debugging it on images that don't have any real value to the encyclopedia.
 * As far as the community discussion goes, we really really need to get a handle on this. It won't take long to actually implement any resizings, but the discussion and debate before hand is going to take time. Probably more then the time that the bot will actually operate ;). As this is a touchy situation, I'm going to ask bag to be extremely clear with what the bot is and is not allowed to do. Lets do this on a task by task basis. In this way, we can hopefully avoid misunderstandings like betacommandbot has been having.
 * We know that we have to reduce the images, but how much? Ideally (in my view) the images should be no larger then the thumbnail required to display them in the article. We need to figure out what that size is for each class of images. This discussion can possibly happen apart from this BRFA. Perhaps on one of the image policy pages? Get everyone interested involved, but please try not to drag it out, find things we can agree to and lets get moving. What is the average size of a logo? a cd cover? book cover? etc. We want to enforce a maximum threshold unless the uploader gives a reason for a larger image then the max. (perhaps a special tag or field in the fur, that the bot can detect). ——  Eagle 101 Need help? 01:36, 18 March 2008 (UTC)
 * Try this out on test images and make only five decent resizings. Please report your results when done. — E  talk 03:02, 18 March 2008 (UTC)
 * Results of the trial are here, I don't see a need for further testing unless we are testing edge cases as mentioned below. ——  Eagle 101 Need help? 04:10, 18 March 2008 (UTC)

Moving GIFs
How are moving GIFs handled by the Bot? What sort of JPEG artifact issues are there? 600 px on the longest side seems like a good enough reduction for non-free purposes. It scales well even to the largest monitors, but wouldn't be, IMO, an infringement on non-wiki uses.  MBisanz  talk 03:29, 18 March 2008 (UTC)
 * This works properly. ——  Eagle 101 Need help? 05:27, 18 March 2008 (UTC)
 * That may be a problem, do we have a test case? As an aside, BAG, please grant permission to test this edge case. Thanks ——  Eagle 101 Need help? 04:15, 18 March 2008 (UTC)
 * Go for it!  SQL Query me!  04:20, 18 March 2008 (UTC)
 * Here is a moving GIF Image:HillstonCSLogo.gif. Working on finding a good JPEG  MBisanz  talk 04:23, 18 March 2008 (UTC)
 * If a JPG were to artifact, these would be good examples Image:Flowerscakes.JPG, Image:Dkpposter.jpg, Image:Newponllogo.jpg, Image:Marshallhs.jpg.  MBisanz  talk 04:25, 18 March 2008 (UTC)
 * Alright, lets test both cases, but let us start with the former. BAG, please approve the latter :) ——  Eagle 101 Need help? 04:29, 18 March 2008 (UTC)
 * I hate to nit-pick, but can we have a free image to test, I really don't want to be messing with non-free images as far as testing. ——  Eagle 101 Need help? 04:30, 18 March 2008 (UTC)
 * How about if I just reupload them under another name? I'll start looking for a moving GIF though at Feature Pics.  MBisanz  talk 04:31, 18 March 2008 (UTC)
 * Here's your animated GIF Image:Cicada molting animated-2.gif  MBisanz  talk 04:32, 18 March 2008 (UTC)
 * Here are your free JPEGs Image:Biohazard.jpg, Image:GNTlogo.jpg, Image:Grid1.jpg.  MBisanz  talk 04:39, 18 March 2008 (UTC)
 * You're free to work on moving images of any format, if it helps :P SQL Query me!  04:57, 18 March 2008 (UTC)
 * Ok, by the looks of it I can just ignore images of selected mime types. We will start by ignoring images that are of 'image/x-gif', animated gifs. it works out of the box :) ——  Eagle 101 Need help? 05:17, 18 March 2008 (UTC)
 * As far as jpeg's, see http://en.wikipedia.org/wiki/Image:ImageResizeBotTestImage.jpg. Thoughts? ——  Eagle 101 Need help? 05:22, 18 March 2008 (UTC)
 * .JPG images seem to be alright per the test images. — E  talk 06:31, 18 March 2008 (UTC)
 * Its alright. I'm seeing some artifacting on the smaller version.  This conversion tool your using, can it automatically convert a file fromsay JPG to PNG/SVG?  If so that might be interesting.  If not,t he loss in quality isn't enough to object to this bot.  MBisanz  talk 06:53, 18 March 2008 (UTC)
 * I'm using imagemagik, more or less the same thing that mediawiki does for image resizing. ——  Eagle 101 Need help? 07:09, 18 March 2008 (UTC)
 * Ok, I see no problems with the bot. Just remember to add the date variable to the template or else it kicks an error.  MBisanz  talk 07:12, 18 March 2008 (UTC)
 * Mmm, alright, I'm working on that as we speak, but I'm tired, so I'm probably going to hit the sack before I finish. There are some things I need to do before we can really trial the bot on real images. What I would like to see is discussion on what to do with the various categories. We can treat each category differently. ——  Eagle 101 Need help? 07:18, 18 March 2008 (UTC)

Resizing
I recently saw a page on commons (I don't recall the page name) describing a known issue that there is a problem with imagemagick outputting massive 48-bit RGB images when resizing black-and-white or grayscale(?) images, and that this has affected file sizes on thumbnails. Can you flag cases where the resized images are larger in byte size than the originals for manual intervention, in case this also affects your bot? —Random832 03:48, 18 March 2008 (UTC)
 * We could check this, if you can give me a specific test case I'd be very happy. For anyone interested, the test trial went as expected. I don't need to do any more edits until we begin trialing the actual bot itself. I still have to test applying a tag to the image for review of the resize. I'll need a bot specific replacement for, one that emphasizes checking that the bot was correct. ——  Eagle 101 Need help? 04:09, 18 March 2008 (UTC)
 * Do you still want it to say "Administrators: if the previous version(s) did not satisfy the non-free content criteria, please delete them on"? I'm recoding a version for you now  MBisanz ''' talk 04:16, 18 March 2008 (UTC)
 * Yes, however it should emphasize that this is a bot doing the resizing, and that it may be appropriate to undo the bot. However please place a portion where it asks those that undo the bot to please leave a message on its talk page or here. (Otherwise I won't know something is wrong). ——  Eagle 101 Need help? 04:21, 18 March 2008 (UTC)
 * Image:ImageResizeBotTestImage.jpg seemed to shrink too much.  MBisanz  talk 04:19, 18 March 2008 (UTC)
 * Correct, but I was just testing all the facilities of the bot with wikipedia. That was not a true test run, more like just making sure all the parts work :). We will probably do a trial once we figure out where to start. As in trial on 10 images and see how it handles. ——  Eagle 101 Need help? 04:21, 18 March 2008 (UTC)
 * Ok, in that case,the trial seems to have gone fine.  MBisanz  talk 04:26, 18 March 2008 (UTC)
 * Here is your template User:MBisanz/FURD.  MBisanz  talk 04:30, 18 March 2008 (UTC)
 * Great!, we will subst that :) I'll add that to the todo list :) ——  Eagle 101 Need help? 04:32, 18 March 2008 (UTC)
 * Event hough its in my userspace, feel free to edit it as things develop.  MBisanz  talk 06:55, 18 March 2008 (UTC)

Allowing the bot to detect exceptions
There are plenty of exceptions to the resolution limit. Eagle101 said "We want to enforce a maximum threshold unless the uploader gives a reason for a larger image then the max. (perhaps a special tag or field in the fur, that the bot can detect)" - I would like to see such a tag implemented before the bot goes ahead. How difficult would this be, and could the bot leave a message saying how to use this tag? The tag should also put the "exceptions" in a category for human review. This was discussed here. It would also be a good idea to get the views of people who regularly deal with image resizing requests. They will have a good feel for how many exceptions there are and what examples are out there. Carcharoth (talk) 13:35, 18 March 2008 (UTC)
 * Well, hopefully someone experienced with the issue, can create a tag to apply. We can just have the bot place notice of the exception tag when it resizes an image in the custom FUR tag. Users can always revert the bot and put the exception tag on with a reason. This does not require an admin to do. However, we should be careful about what reasons are considered valid. The only one I can think of right now is "The image needs to be larger size because article X uses it in that larger size". If a resizing does not affect any articles, there really is no need to have a larger version. ——  Eagle 101 Need help? 23:17, 18 March 2008 (UTC)
 * Just a note, I've added this to the todo list. This bot won't trial without this in place. However, folks need to come to an agreement on what is a legit reason for having a large image. I'm still struggling to think of something other then "an article is currently using a larger size, and the article's use of this larger size does not conflict with non-free image policy". ——  Eagle 101 Need help? 00:07, 19 March 2008 (UTC)


 * I mentioned this above briefly, but when you all say "tag" do you mean yet another banner on the image page? I would prefer to see something that works within the current FUR template design. There is a Low_resolution option (and yes, it is misused, a lot) but it exists in most FUR templates. This allows for an explanation to be given (all the bot would need to look for is the word "Yes" to know it is fair game to resize, I would also say if it wasn't filled out it's fair game). A little cleanup effort would then be to change the templates (there aren't that many of them) to 1) make it a required field and 2) make it auto populate a category with all that were filled out and not equal to "Yes" (for manual review). Using a brand new tag approach would take a long time to get going and might make things even more confusing. Using existing parameters already gives us an established starting point (and has the benefit that the explanation will fall in the FUR). - AWeenieMan (talk) 01:17, 19 March 2008 (UTC)
 * That is what I was thinking of. Use the existing template and parameters. It is confusing because people think of the rationale as NFCC#10c, but in fact a good rationale template will cover more than just that criterion. In this case, the "low resolution" parameter covers NFCC#3b. Maybe someone could analyse the current usages of the template and see what sort of wordings are used. FOr example Image:American jewish history.gif, which uses Template:Non-free image data, has "Sufficient resolution for illustration, but considerably lower resolution than original.". So it is not as simple as "Yes/No". Carcharoth (talk) 02:10, 19 March 2008 (UTC)
 * Alright, my preference is something that is not so common place as the low-resolution parameter. What is wish to do with this bot task is eliminate obviously high resolution non-free images. Have a look at the latter 3/4ths of that list I posted at the start of this and you should notice very large fair use images. Those are the ones I wish to work on. In a future task, I'll be making a program where trusted users can add a template like, or 90% to indicate either reduce to specified params, or reduce to 90% of the existing image size. In this way humans will be able to do resizings with out so much effort, and it will allow humans to do the areas where there are edge cases. In other words, the point here is not to robotically set an avergae limit, but to set a high limit, one which if you need to exceed it, you should explain why using a special tag or a new parameter in the templates. The problem with the existing low resolution parameter is that its not uniform or easy to machine parse. There is plenty of ambiguities there, and that is not something I wish to get into, as it will cause far more consternation and confusion then need be. I wish to restrict the potential for bugs in the operation of the bot to the absolute minimum. Am I making sense here? ——  Eagle 101 Need help? 03:07, 19 March 2008 (UTC)
 * To emphasize my point, we currently have 2,084 non-free images larger then 1000x1000. Ie, they area is larger then 1,000,000 pixels. I really doubt any of these can claim to be low resolution as required by our fair use policy. I'm looking at setting the high point for the bot at 500,000 pixels area. This covers about 5,645 images. If any of those images need to have that size, this tag or parameter should be added. I'm talking about less then 2% of our non-free images are going to be affected. Would anyone like me to provide a list of the images larger then 500,000 square pixels for their review? My understanding here is that none of our non-free images should be larger then this, and if there is a good reason, then an exception can be made. ——  Eagle 101 Need help? 03:21, 19 March 2008 (UTC)


 * People have a near comical notion of low resolution for non-free images sometimes. I am running a query on images greater than 360k pixels that include a "Low_resolution" parameter. I think we will have a much better idea of the use of this parameter when I post the results. Upon a quick look at what has come through so far, many of these large images just say "Yes" and many say "No" (how has no one thought to reduce those ones before). Eagle, I get your point of wanting to go after giant unthinkable ones. Personally, I can not think of an instance where they would ever be necessary. Unfortunately, I fear that if you hit 1 image that has such a reason (maybe even it just needed to be discussed), you could be opening a can of worms. And personally, I am just not sure adding another thing to the image page is ideal here (as in a specialized tag), as an explanation for why the image is large really belongs in the FUR. - AWeenieMan (talk) 03:42, 19 March 2008 (UTC)
 * You are right it belongs in the FUR, however bots can't parse the FUR, at least not with any accuracy. There needs to be something on that image page that wards this bot off, and that something needs to be something I can reliably parse 100% of the time. :S This can be a new template parameter, or its own small template, or even just a new category, presuming the reason really is in the FUR. I just need something I can parse reliably. ——  Eagle 101 Need help? 04:22, 19 March 2008 (UTC)
 * Here's the list (my apologies for the handful of errors, I will URL encode it next time). It should give us a starting point to figure out 1) if the Low_resolution parameter is being abused enough to warrant using a different method of tagging acceptable large images and 2) if it looks like editors are giving any convincing arguments for keeping a high res image (note: this only covers images using the common fur templates, untemplated furs are acceptable also). - AWeenieMan (talk) 04:04, 19 March 2008 (UTC
 * Can you paste your toolserver query for me? I'd like to play with that, namely at different levels of resolution. ——  Eagle 101 Need help? 04:22, 19 March 2008 (UTC)
 * Unfortunately it is a bit more than just a query (I used wikiproxy to get the page text). The script is here. It should be world readable on the toolserver, so feel free to copy it and play around (a warning, it does take a little time to run because of wikiproxy). I should also note that this is not ALL the high res images using the Low_resolution parameter. I used the abbreviated page text in the database as a starting point. It would be easy to alter to get them all (but that involves running wikiproxy on every image on your list). Doable, it would just take a long time (if any one really wants me to do it, let me know). - AWeenieMan (talk) 04:41, 19 March 2008 (UTC)
 * Nah, I really don't need you to do that work :), if we really need a full list, I can generate one from a toolerver query, just I'd have to write the query. Hence I was hoping you had a direct database query already that I could steal and change a few parameters on ;). ——  Eagle 101 Need help? 04:53, 19 March 2008 (UTC)
 * If I am not mistaken, the image page text is not available on the toolserver (so wikiproxy is inevitable if you really want to do what I did). I actually just didn't feel like joining with templatelinks and finding every possible title for a template that uses the parameter (and then grabbing page text for those). Too tedious for me at the moment, though the more correct way of doing it. - AWeenieMan (talk) 05:05, 19 March 2008 (UTC)
 * Example abuse of the low resolution tag, you cannot argue this image is low resolution :P Image:Flowerscakes.JPG, I'd say that is quite high resolution :). The article its in does not require that level of detail. For what its doing, it could get away with something under 700x700, heck the thumbnail shows quite well there are a lot of different products in the picture. For anyone's amusement, that image is the 7th largest non-free image on wikipedia. ——  Eagle 101 Need help? 04:30, 19 March 2008 (UTC)
 * (ec)Upon a quick review of the list I posted, I would say you could resize any of them (I know your cutoff is higher then the list now anyway) as I don't find any of the reasons posted to be particularly compelling (the only understandable one is really the "You won't be able to read anything on this screenshot if it is smaller" argument, but they are not used at that resolution in articles anyway). I wouldn't mind some thought going towards correcting the Low_resolution parameter for images the bot resizes (just changing it to "Yes"), as it seems simple enough to do (I didn't really have any trouble parsing the parameter myself, though my regex is probably not "bot acceptable") and I would hate to see a bot knowingly invalidating a part of the fur (yes, I know there is potential to introduce bugs by doing this). In simple terms: My take on this particular set of images is that it should not be required by your bot to parse that parameter unless you plan on correcting it. - AWeenieMan (talk) 04:34, 19 March 2008 (UTC)
 * (ec with above post)A few more examples: Image:THEHAUNTING.jpg, Image:THEDEAD2.jpg, and Image:MOUSECOVER3.jpg, all which call their high resolution low resolution. Basically the uploader is making the claim that since they are of some lower resolution then the original, they are low resolution by our image policy, I have a feeling this uploader is wrong. Our articles really only need a 400x400 image of those album covers, if that. Their usages look about 250x250 thumbnails in the info boxes. For reference, these are the 17th 18th and 19th largest non-free images on the encyclopedia. I don't think the low resolution parameter is something we can trust. ——  Eagle 101 Need help? 04:38, 19 March 2008 (UTC)
 * (and most other specific fur templates) have something like that as default text (all of those images have a blank Low_resolution parameter). As I have stated previously, I think you should have free reign on any image that says "Yes" or is blank. I would also add "No" to that list. Of course, those would be highly conservative rules (as I explained below, I am coming around to your cat idea and not stopping on this parameter, just correcting). - AWeenieMan (talk) 04:59, 19 March 2008 (UTC)
 * AWeenieMan, however I need something that can be added to images to ward the bot off if for some reason it is needed. My suggestion at this point is a category, not a true template or parameter. If you have justified the use of a larger image in the FUR and you don't want the bot to resize it, add your image to Category:Some_CAT to keep the bot from resizing the image. This also allows images in this category to be reviewed in case the uploader is lying or just does not understand what we mean by low resolution. ——  Eagle 101 Need help? 04:43, 19 March 2008 (UTC)
 * A category sounds like an excellent idea to me. Something like Category:High resolution non-free images with rationale. Here is my thinking, if you are only going after the big ones (in this particular task), the parameter might not be a guiding factor (because these are so out of compliance, and we are looking at a decent sample and seeing them as such). In future tasks, it seems we either 1) have a consensus on the maximum size depending upon image type or 2) an editor will request the change. In all these cases, I see correcting the parameter as probably the correct course of action. I am agreeing that haulting based on what it says in that parameter would be a near impossible task (without a huge amount of false negatives), but it should be that parameter that includes the rationale why (and maybe the category that halts the bot). Of course a category could also be abused, but it should be easier on the bot. - AWeenieMan (talk) 04:54, 19 March 2008 (UTC)

Arbitrary break
(un-ident) Correct, so perhaps someone can create that category and add that category to the FURD template the bot will be using. The template is at User:MBisanz/FURD. The bot will apply that template on resizing. I could possibly have the program correct the low resolution statement to just "yes", but it might be better that the admin, or user reviewing the bot's changes do this change, as there might be a better statement to make for that image. However I do understand we are talking about 5,000 images. :S ——  Eagle 101 Need help? 04:59, 19 March 2008 (UTC)
 * Not to hate on the admins, but I believe their review will probably be cursory at best and I am not convinced they will spend the time to edit that parameter themselves. And you will be creating quite the backlog for them to go through. I don't want to create the category quite yet, as some others might have a better wording for it. - AWeenieMan (talk) 05:11, 19 March 2008 (UTC)
 * Righto, and yeah that is why I meantioned we are dealing with 5,000 images. We need to figure out how to get this bot's changes reviewed by someone that may have an interest in seeing that the image license data is correct. I could attempt to change the low resolution parameter to "yes" in the case that it exists on images that the bot resizes, however I could not magically produce a template, or figure out where to put the parameter if the parameter is not already existing. Both of these circumstances are open to way too many bugs and corner cases that I'd rather not have to chase down. ——  Eagle 101 Need help? 05:17, 19 March 2008 (UTC)
 * I'll agree with AWM that any admin reviewing the template post, will probably not look at the variable in the FUR. The only thing I could imagine is processing this in small chunks the image-junkies like Carc and myself could go through (although I'd have to learn how to delete the old large images, is there a how-to page for that?) I can process about 200 a night on a good night, maybe a 1,000 a week?  Is that too slow for your needs?   MBisanz  talk 05:28, 19 March 2008 (UTC)
 * (ec) Entirely agreed with your thoughts Eagle. I wouldn't even want to touch the idea of actually adding a template or parameter myself. Just altering an existing one. And there are some interested administrators (you have a couple who posted on this page). I just fear those that see a backlog and take great satisfaction in reducing said backlog. At the moment, the proposed FURD template puts the pages into Category:Rescaled fairuse images, which is the same category in which human resized images are put. This has the benefit of drawing attention to them, but also the downside of drawing attention to a backlog (and after these are done, admins really could take some time in reviewing the changes, it doesn't have to be rushed). Perhaps not using that category and having a good explanation on Category:Rescaled fairuse images by ImageResizeBot on what is expected of an admin who goes through them? I suppose admins who actually work on such tasks would have a better idea of how they approach them. - AWeenieMan (talk) 05:29, 19 March 2008 (UTC)
 * I'd say anything going into Category:Rescaled fairuse images will be hosed by admins with semi-automated scripts. Category:Rescaled fairuse images by ImageResizeBot is good if after their hand-processed, they are reclassed into both cats.  Most admins will understand detailed wording, so that should be ok at the Cat page.  MBisanz  talk 05:34, 19 March 2008 (UTC)
 * What cats should they be reclassed in? The rescaled cats seem to just be temporary maintenance cats. - AWeenieMan (talk) 05:46, 19 March 2008 (UTC)
 * We could do a limit of 250 images a day, that means this task will complete 22 days from now when the task starts. Could folks keep up with that? I can't promise that other image resize tasks won't take place before this task is complete, but this would help you guys keep up. The other task I have in mind right now is one where humans invoke the bot to do the resizings for them, as resizing an image by hand is a pain in the butt. You have to download the original imaage, edit it in an editor, and re-upload. ——  Eagle 101 Need help? 05:37, 19 March 2008 (UTC)
 * I can try very hard to keep up that rate. Since the admin-y part is just deleting the old image, is there some way I could just go through and delete the old image and re-tag it for a FUR correction by hand?  MBisanz  talk 05:40, 19 March 2008 (UTC)
 * I'm not uptodate about the process for deleting old image revisons, but I would hope those patrolling the bot would update the FUR themselves, otherwise the bot might as well add two tags, one to alert admins, and another one to alert folks that the FUR needs updating and checking over. ——  Eagle 101 Need help? 05:42, 19 March 2008 (UTC)
 * Also, MBisanz I hope you are not the only one patrolling the bot ;) ——  Eagle 101 Need help? 05:43, 19 March 2008 (UTC)
 * I'd actually argue for a higher rate based on that, it should hopefully not be just you patrolling. Perhaps the bot can leave a tag that says "check the fur" when it resizes an image. When you guys patrol, you just remove the tag. This way we can do more images in a run. You guys won't be duplicating each other's work, just checking images in the category/template that the bot puts on the page. ——  Eagle 101 Need help? 05:45, 19 March 2008 (UTC)
 * (ec)Well I went back to admin school and checked, deleting an old version is simply just clicking a link that appears on the old version. So I can do that part rather quickly. What would be annoying is having to edit the image page, replace the size info by hand, and then re-save. I can do it, but it will cut down on the number of images I can do per day (straight deleting I could probably get to 3,000 a week without problem) editing FURs will probably cut that in half. And don't be surprised if I'm the only admin doing this. I've done checks of my logo edits, and I'm fairly certain I'm the only editor who does those (outside of my bribing Blathnaid to do them) and they represent 70,000 of 280,000 non-free images. AWM does a good deal of other images, and Carc also does his fair share, but I can't name another editor, let alone Admin who hand edits this many.  MBisanz  talk 05:49, 19 March 2008 (UTC)
 * (ec) I would think the safest route is to just keep an eye on the backlog and not let it get out of hand. Maybe set a minimum number just to keep people interested. - AWeenieMan (talk) 05:51, 19 March 2008 (UTC)
 * Well, we can always spam AN :). As far as editing the image page, remove the size info by hand and then resave, what do you mean by that? What size info are we talking about, I'm not following you :( ——  Eagle 101 Need help? 05:56, 19 March 2008 (UTC)
 * Also, btw, in case you have not noticed, I'm an admin. I'll probably put out an attempt to patrol a few of these a night, but I'd prefer to be programming, and fixing bugs :). ——  Eagle 101 Need help? 05:58, 19 March 2008 (UTC)

(undent) do'h! Didn't realize you had the bit. So then you see that if I click delete image, it takes me to the deletion complete screen, and then I need to toggle back to the image page, edit, change the low res variable in the FUR from No, Other, Empty to Yes with an edit and save.  MBisanz  talk 06:00, 19 March 2008 (UTC)
 * Ok, that makes sense, by the way, if you use firefox you can hold CTRL (the contrl key) down while you push delete, it will cause it to open in a new tab, which saves you from having to hit back. You just have to close a bunch of tabs every now and then :) This can be done by simply right clicking the tab you wish to save and clicking "closing all other tabs". Good way to close 100 tabs quickly :D ——  Eagle 101 Need help? 06:04, 19 March 2008 (UTC)
 * 100 pages would make my computer scream, but I get the point that I can do it with the 25 tabs I usually use. Just a personal request, could we test this on the logo cat when it goes live?  I know that region pretty well and would be able to spot errors there better than say Screenshots or Historic images.  MBisanz  talk 08:07, 19 March 2008 (UTC)
 * Sure we can do logos. It really does not matter to my perl code ;) ——  Eagle 101 Need help? 08:23, 19 March 2008 (UTC)

Restating the task
To prevent others and I from getting overly confused, I'm going to restate the task. Please tell me if I have read something wrong, or am otherwise off course. The last thing I want to do is think I'm approved/you all agree with me on a part of this task and be wrong.


 * 1) This bot will work on images with areas larger then 500,000 square pixels. This is roughly a 700x700 image.
 * 2) This bot works on all image formats that mediawiki (imagemagik) handles.
 * 3) The bot will resize to the thumbnail presented when users select the "my preferences" -> "files" -> "Limit images on file description pages to:" 640 x 480 px.
 * To emphasize, we will be preserving image ratios here.
 * 1) If the image has Category:High resolution non-free images with rationale is on the page, skip the image. It in theory will explain why in the rational.
 * 2) If the Image has one of Category:Image_copyright_tags and one of Category:Non-free image copyright tags, do not resize, rather flag this in a log somewhere. There are exceptions to this listed at User:ImageResizeBot/Intersection_exceptions. These exceptions can be changed by any user, or if we wish to restrict who can edit it, an admin may protect that page. (The bot will read from that page every so many uploads).
 * 3) The bot will copy the image contributions table at the bottom of every image to a section called ===Someting here, probably history or the like?=== . This is to preserve upload history, for when the admin deletes an older revision.
 * Other possibility proposed is to put the details of the original upload in the upload summary. User, date and original size.
 * 1) The bot will tag the image with User:MBisanz/FURD, this will not be subst'd to make it easier to remove.
 * 2) The bot will limit itself to 250-500 uploads a day. If the uploads rate is higher then 250 a day, the bot will put a temporary category or template on the image, that humans can review. Alternatively the bot can place in its userspace images that it has uploaded over. Humans can review and work off of that list (removing or striking out reviewed images).

Did I miss anything? Discuss, and fill in the few questions I added. I think we are getting most of the issues hashed out here. ——  Eagle 101 Need help? 08:22, 19 March 2008 (UTC)


 * Looks good, my FURD should be substed to work the date tag properly I believe, since the regular FURD is also subst'd. All the intersections and exceptions look good to me.  MBisanz  talk 08:29, 19 March 2008 (UTC)
 * Alright, I'll be waiting for more comments and clarification to point 5, what should I call the section? I'll probably be ready for a real trial by this weekend, but most of these features are not actually coded in the bot XD (points 4,5,and 6). So I have to do that work. ——  Eagle 101 Need help? 08:34, 19 March 2008 (UTC)
 * Also point 8, category or list, I'm leaning towards a list, because you guys can strike out sections at a time etc. Ie, the bot can make the backlog and you guys can go through it at your pleasure without having to edit a template or category out of each image description page. ——  Eagle 101 Need help? 08:36, 19 March 2008 (UTC)
 * Just to nitpick, the bot is really not making a backlog, just identifying one that already exists ;). This task has to be done sometime :P ——  Eagle 101 Need help? 08:43, 19 March 2008 (UTC)
 * I see no reason to subst the template (the template is really just substs in the date into . I would recommend doing the same. Something like adding  . This should make it a lot cleaner and slightly quicker to remove when reviewing. To point 8. Your FURD template is already making a category (potentially sub cats by date, as that's trivial) and it needs to be removed from the image anyway. So, I would say you already are going to have a category to go through. Unless I am missing something. - AWeenieMan (talk) 14:22, 19 March 2008 (UTC)
 * Ok, we can not subst the thing. The list is so that non-admins can follow the changes and double check the bot. I hate to say it, but putting this into a normal deletion category will get the bot's resizings looked over by admins using semi-automated tools... which means they won't get proper attention to their copyright rationals, and other details. Some of these may need to be deleted out right, but this determination needs to be made by a human. As well, some of these are just fine, but some of the parameters will need to be changed based on the resized image. More or less, the list is for accountability reasons. ——  Eagle 101 Need help? 14:55, 19 March 2008 (UTC)

Potential (small) problem
I think this bot is a good idea, and my post here will probably turn out not to be a problem, but I wanted to mention it anyway. The Belgian comics publisher Dupuis (editor of The Smurfs, Lucky Luke, Spirou et Fantasio, ..., i.e. "owner" of many fair use images on this Wikipedia) has a FAQ (in French) where it directly discusses the use of fair-use images. The problem is that the images may not be visibly degraded through compression... Can e.g. Image:Spirou4heritiers.jpg be further compressed without (more) visible degradation? Or these ones: Image:JournalDeSpirou-01.jpg, Image:Maurice_Tillieux_001_Street_Scene.jpg? This is a case where the copyright holder would prefer us to have less compression, not more. Fram (talk) 08:49, 19 March 2008 (UTC)
 * I'm wondering if its not more efficient to convert these images to a PNG or SVG that can be resized without losing clarity? But thats something outta my league.   MBisanz  talk 09:04, 19 March 2008 (UTC)
 * This sounds like a human task, and a task that we may not be able to do with non-free images. I'm not a lawyer though. ——  Eagle 101 Need help? 09:31, 19 March 2008 (UTC)
 * I'm not compressing, I'm resizing. Wikipedia already resizes when they show the thumbnail in the article. I'd understand they don't want us compressing them so they look ugly, but simply resizing should not be an issue. Especially as we don't need a larger copy then the image in the article. Our non-free content policy says we should use images of low quality for articles. I'm reading that as not having images larger then the thumbnail displayed in the article proper. Also, just so ya know, the images you listed won't be touched by this task. :) ——  Eagle 101 Need help? 09:26, 19 March 2008 (UTC)
 * Yeah, I realised that a bit after I posted, I somehow read 500,000px as 500*500px :-) I don't think "I'm not compressing, I'm resizing" is really what they had in mind though, the problem is the loss of quality, and resizing obviously can give a loss of quality. But anyway, I don't think the problem is big enough to be a worry, certainly not as long as we have the fairly large 500,000px threshold. If it should somehow become a problem with one or two images, we can always give an explanation and add the "no-resize" category you described above. Fram (talk) 09:55, 19 March 2008 (UTC)
 * Right, but you have to understand, wikipedia already resizes images to display in articles. :) I would understand if I were taking jpg images and compressing them to something like 99% compression. We would be losing visual quality. I don't think anyone can dispute our right to resize images. ——  Eagle 101 Need help? 10:33, 19 March 2008 (UTC)
 * I should also note that the folks we are taking non-free images from really don't get much say in the matter, as long as we follow all the relevant laws, which as I believe, our policies are a superset of those laws. If folks were able to tell us we were not allowed to resize images, we would not be able to make thumbnails of them at all for display in the articles. If you really feel this is an issue, and these folks can place that limit on us, take the issue up on the proper policy page. The way I see it, if they told us we were not allowed to use the images at all, we could still use it, as long as our use fits within the constraints of "fair use". If you think resizing images is an issue, I'll emphasize, take it up on the image policy pages, that is something that would need resolved project wide. ——  Eagle 101 Need help? 13:58, 19 March 2008 (UTC)
 * You are right, they can't force us (and are probably not really worried about our use), it's just that one of the (main) reasons we have the policy of making our images as small as possible is to follow the law and avoid problems, while we have here one "supplier" of fair use images who prefers to have the images not too small (understandable, from an artistic point of view). It's hard to please everyone, isn't it? Anyway, I am quite happy with the proposed bot, so I'll shut up now. Fram (talk) 16:03, 19 March 2008 (UTC)

Dimension ratios
The revised task description point 3 says the bot output will be "640 x 480 px". This is 307,200 pixels with a 4x3 dimension ratio. If the dimension ratio is not 4x3 (we've seen an example above that was 1x1, and we could have 3x4 or may other ratios) then as described the bot will be significantly altering the image by changing the dimension ratio, distorting the image in the vertical or horizontal direction. I thought the intent was to preserve the dimension ratio while reducing the size. Can you rephrase this to reflect an output that preserves the dimensional ratio of the image while limiting the pixel size? GRBerry 14:04, 19 March 2008 (UTC)
 * That is the setting you can set in your account. It will preserve image ratios. Doing so otherwise would just be... dumb ;) ——  Eagle 101 Need help? 14:20, 19 March 2008 (UTC)
 * Can you say which way round the reduction will be? I presume it is 'longest side goes to 640 pixels', but saying that would help. Incidentially, many panoramic pics will be useless at 640 pixels, many being better at 3000x100 (say). I would suggest the bot avoids extreme dimension ratios, as those will usually need human judgment. Carcharoth (talk) 14:24, 19 March 2008 (UTC)
 * Maybe pass those images where at least one of the dimensions is below either 640 or 480 px? Carcharoth (talk) 14:26, 19 March 2008 (UTC)
 * We will be passing all images smaller then 500,000 square pixels. If you can find one on the list that should be passed, feel free to let me know, worst case is someone has to undo the bot, and place a category on it. I don't think any articles really will use an image as large as we are talking about. And if they do, we can deal with it on a case by case basis. I'm going to be doing my best to make sure the changes done by the bot are trackable. I will publish shortly a new list that has the current images that will be resized under the current size proposal. ——  Eagle 101 Need help? 14:32, 19 March 2008 (UTC)
 * It could be possible to say that if the image falls outside the aspect ratio from 10:16 to 16:10, it will resize it with aspect ratio to have the final image less than (500k ? 350k?) pixels. The math is simple to determine what that needs to be, and it would allow for panoranmics like the 30:1 example.  If it is within that ratio, it will resize the longest dimension to 640.  --M ASEM  14:29, 19 March 2008 (UTC)
 * Please see the list proposed to be modified below. I don't think what you list will be an issue. We will not change the proportion of existing images, only making them smaller. I do think we are rapidly approaching the time to actually do a trial, (as soon as I code in the additional requests). A trial on 3 images or so will help to clarify things. :) ——  Eagle 101 Need help? 14:47, 19 March 2008 (UTC)
 * A good example here is Image:Simpsons cast.png, which despite being only 3:1 ratio should be larger to allow people to see the characters. It is below the current limit in any case. I uploaded a larger resolution of this pic as Image:Image-Simpsons cast.jpg, but it got deleted because I forgot to put it in the article (was still discussing things at that point). It is rather annoying to have example images hanging around and to have them deleted as orphans, which is why it is better to have them in the revision history. Can anyone merge the history of the two images? I can undelete, but have never merged before. Can anyone tell me what the dimension ratio of the deleted image was (I think it was 1000 along the largest side). Carcharoth (talk) 14:35, 19 March 2008 (UTC)
 * Your example still would not be affected by the current size proposal, as the area would still be 300,000 pixels, which is smaller then 500,000 pixels. I'm generating the new list now, but I can't make the query run any faster ;) ——  Eagle 101 Need help? 14:38, 19 March 2008 (UTC)
 * How many images exist between 500,000 pixels and the next step down, say, 300,000 pixels? And how many in total (ie. how many between 300,000 and 0)? Carcharoth (talk) 14:59, 19 March 2008 (UTC)
 * Why that is relevant to this discussion I'm not sure, but the original proposal was for 360,000 pixels (see list at User:ImageResizeBot/List1). However I have been shown some that were borderline cases there, so I upped it to first 400,000 square pixels, then to 500,000 square pixels. As far as to 0, That would be... *runs query*  ... 281881 items of media in Category:All_non-free_media. I hope you forgive me for not providing you this final list ;). I guess this goes to show that we are talking about a very small number of the total non-free media we have on wikipedia. ——  Eagle 101 Need help? 15:18, 19 March 2008 (UTC)
 * Comparing to the total number is always good. Keeps people focused on the size of the task at hand. Any way to analyse the type of images within that list of 5000. You already have a logo sublist. Are the others mainly albums or what? Carcharoth (talk) 15:51, 19 March 2008 (UTC)
 * Unless I can check every category we have... not really. If someone gave me a list of all non-free image categories I could do a comprehensive query, however things like logos can often be made even smaller then the fairly large amount we are allowing here. Yes I can do it, but I need a list :) ——  Eagle 101 Need help? 18:18, 19 March 2008 (UTC)
 * Would User:BetacommandBot/Non-Free Template Useage help? Carcharoth (talk) 18:59, 19 March 2008 (UTC)

Clarification: Are you limiting the new size to 640 x 480 or 640 x 640? There seems to be talk about both in this section and the difference between the two would be noticeable on landscape images. - AWeenieMan (talk) 15:53, 19 March 2008 (UTC)
 * The former, if you can find any images that would be affected let me know. However I emphasize that our non-free images should be low resolution. ——  Eagle 101 Need help? 16:11, 19 March 2008 (UTC)
 * I found one for ya, Image:Titansvillains.jpg. The image is too large, however what the bot would reduce it to is too small for its use. This is the special case that I'm afraid we are going to have a few of. Just make use of the exemption category and resize it to something appropriate. If we prefer a 640 by 640, instead of 640 by 480, we could do that, however I really doubt that it would matter in the case I put out. Its a special case :(. It should be smaller then it is now, but not as small as the bot would make it. ——  Eagle 101 Need help? 16:17, 19 March 2008 (UTC)
 * Well done. I actually wasn't arguing you point so much as making sure I understood (as there is some talk of longest side to 640 above). I agree, it wouldn't matter in this case at all. However, I am sure there must be an image out there that is really tall and narrow. It would matter then. I'll see if I can find one. - AWeenieMan (talk) 16:24, 19 March 2008 (UTC)
 * Oh, bah I had height and width sorted out, if the image is too tall and actually got measurably affected, we could do an exemption for it and re size it to something reasonable. As long as the total area is less then 500,000 square pixels, it won't get resized by this task. ——  Eagle 101 Need help? 16:29, 19 March 2008 (UTC)

I'll help you out, here is the shortlist of very tall articles that are narrow that the bot will modify in this task. +++---+--++ +++---+--++ +++---+--++ Salchow.jpg I could argue is not a valid fair use image, as someone in the stands could have taken a video of this... so its replacable. The others I have not looked at. However as you see its a very small number :) (this is anything with a ratio of width:height = 1:4. ——  Eagle 101 Need help? 16:38, 19 March 2008 (UTC)
 * img_name          | img_height | img_width | img_size | img_height * img_width |
 * Qp-mist1.ans.png  |       2656 |       640 |   125902 |                1699840 |
 * Salchow.jpg       |       2748 |       372 |   829523 |                1022256 |
 * NewarkHawks14d.gif |      2760 |       632 |   799963 |                1744320 |
 * Krux-ice-dec94.png |      2928 |       640 |   124511 |                1873920 |
 * We can resize the one in Krux-ice-dec94.png as its usage is not requiring a size larger then the one in the thumbnail. Plus both articles it is in could arguably have the image removed. Especially the second one. ICE Advertisements, which has quite excessive fair use for the little discussion of those images taking place. This bot is going to help in revealing some excessive fair use any way we look at it, if someone uploaded a very large image under the mistaken impression that it was "ok", they may have misread or ignored other parts of our policy as well. ——  Eagle 101 Need help? 16:42, 19 March 2008 (UTC)
 * Image:Qp-mist1.ans.png would need to be resized to an acceptable size, but the size that the bot will make it is too small. Add about 100 pixels and it should be fine. As long as the total size is below 500,000 square pixels it won't need the exemption category, and if it is larger then this, then place the exemption cat on with a reason why. This one seems to be used according to policy, however it could probably be updated a bit to go with the times :) (a template, better fur, etc), but its usage is correct as far as I can tell. ——  Eagle 101 Need help? 16:53, 19 March 2008 (UTC)
 * Image:NewarkHawks14d.gif - looks like it is designed to be read rather then shown in the article proper. In any case, there is not an adequate FUR explaining the situation here. Also is it possible that this article has went out of copyright yet? This article is over 70 years old, I'm not sure what the print copyright length is. ——  Eagle 101 Need help? 16:53, 19 March 2008 (UTC)
 * I was just about to run a similar query. Agreed on all counts. Salchow should probably be removed (it is used in an article about the jump, not this particular person's jump). Image:NewarkHawks14d.gif isn't really even used in an article (it is used as a citation). That seems odd to me too. - AWeenieMan (talk) 16:48, 19 March 2008 (UTC)
 * If its used as a citation, it needs to be removed (I think). We may have to take this one to the folks on the image noticeboards. ——  Eagle 101 Need help? 16:55, 19 March 2008 (UTC)
 * I took it to MCQ. - AWeenieMan (talk) 17:56, 19 March 2008 (UTC)

New list
I'll make this a subsection here so you guys can see. I've generated a new list of images that would be resized based on the current area proposal of 500,000 square pixels. This list is at User:ImageResizeBot/500000. If you guys can a substantial number of problem images, please let me know. The idea is to set a high limit, not try to thread a needle. ——  Eagle 101 Need help? 14:44, 19 March 2008 (UTC)
 * mistake: I've mede a mistake, that is only non-free logos that need resizing. The full list is at: User:ImageResizeBot/500000-full when the new run completes... give it a few moments. —Preceding unsigned comment added by Eagle 101 (talk • contribs)
 * I thought there were rather a lot of logos there... :-) Carcharoth (talk) 15:04, 19 March 2008 (UTC)
 * MISTAKE AGAIN: Please disregard the prior list and use User:ImageResizeBot/500000-full-for-real for your evaluation instead. ——  Eagle 101 Need help? 16:04, 19 March 2008 (UTC)
 * There are, that was only logos... the full list is 5,000 images long.... :( ——  Eagle 101 Need help? 15:09, 19 March 2008 (UTC)
 * Most of those look fine for resizing. On a quick look through, I found Image:Seal.and.serpent.letters.jpg and Image:Flowerscakes.JPG. In those cases, I might allow a slightly larger than normal image to allow more detail to be seen when someone clicks on the image. In any case, Image:Flowerscakes.JPG is not something that will cause problems if the image is larger than strictly needed. What would be helpful is a way to tag pictures that are montages, and discuss when they can be larger than normal. But, yes, most of the list is fine, though I haven't looked at them all. If anyone wants to create a gallery without breaching the rules, see the 'preview' method outlined at User:Carcharoth/Image clean-up galleries. Carcharoth (talk) 14:58, 19 March 2008 (UTC)
 * Why would Image:Seal.and.serpent.letters.jpg need to be larger then one side having 640 pixels? Its use in the article is much smaller then the thumbnail presented there. See Seal and Serpent in the infobox for its actual usage. The flowerscakes thing is obviously oversized. Should it be larger then 640 pixels on one side, that is debatable. I'd argue, no it should not be, because the point of the image was just to demonstrate that they had many products (reading the rational), But this is an argument for another day and location. ——  Eagle 101 Need help? 15:05, 19 March 2008 (UTC)
 * In the infobox, people will click on it to see it larger. The products thing is a montage. Some montages need the individual items to be at a certain resolution, others don't. But really, a scan of a collection of old food labels. It doesn't really matter either way. I doubt the copyrights were renewed in any case, but as you say, another argument for another day. Carcharoth (talk) 15:50, 19 March 2008 (UTC)
 * I thought SVG files like Image:L%26T-logo.svg were exempt from the size issue, since their vector images. Meh, I'm probably forgetting the related-policy Q that explains why they are included.  MBisanz  talk 18:47, 19 March 2008 (UTC)
 * I'm not sure on that, perhaps send that example to Media copyright questions‎ as well. If that is ok by the folks there, (I presume they know what they are talking about ;) ), then we can just ignore svg's. ——  Eagle 101 Need help? 20:52, 19 March 2008 (UTC)
 * It is my understanding that they are exempt only in so much as they have no true resolution (and then there have been arguments that they should be removed entirely because of this fact). However, they should never be rendered any larger than necessary (this would include on the image description page). In reality, you could excessively shrink SVGs without quality loss. See . I know it's not a policy, but it's probably a correct rule of thumb. - AWeenieMan (talk) 21:02, 19 March 2008 (UTC)
 * Then I guess we can ignore those, even if I shank it, its just as easy to make it larger again from the same source. ——  Eagle 101 Need help? 08:36, 20 March 2008 (UTC)
 * Since shrinking won't affect the images, I wouldn't mind if you shrank them just for the sake of complying with policy. But since I gather there is both a time issue (foundation compliance deadline) and a resource issue (we can only fix so many so fast), we could do the SVGs at the end of the run, after the GIFs, PNGs, and JPGs.  MBisanz  talk 08:53, 20 March 2008 (UTC)
 * Or just not at all :P Really as shrinking them serves no useful point, the only point to shrinking them would be to follow the letter of some policy. SVGs are fundamentally different then GIF, PNG, and JPG, as its size is irrelevant. Its going to maintain the same quality whether I resize it or not. I'll be glad to shrink them, but its just a question of... why? ——  Eagle 101 Need help? 08:56, 20 March 2008 (UTC)
 * My thought would be some random admin sees an image, sees its FU, sees its some outrageous number like 3,000 * 2,000 and deletes as being a bad image, without realizing its a technical thing with the image itself.  MBisanz  talk 08:59, 20 March 2008 (UTC)
 * Well... by the large number we have now, that is not happening. Otherwise we would not have 5,000 oversized images. Plus I think most admins have the sense to ask for it to be resized then outright deleting it. At least with older images. ——  Eagle 101 Need help? 09:01, 20 March 2008 (UTC)
 * Apparently you've never seen a group of admins compete to see who can clear CAT:CSD the quickest. Won't affect the 5,000 or so images in total, but I can see a couple images a week being lost to oblivious admins.  And since its a bot that'll be re-sizing, I'll volunteer to click the D link to keep those couple images a week.  MBisanz  talk 09:05, 20 March 2008 (UTC)
 * Hah, alright I know what you mean on CSD ;) though those competitions are usually over the articles... everyone I know of shrinks away from the images. (or it was like this 3 months ago when I was last active). I'll go ahead and shrink these anyway, though as I note, there is no point :P ——  Eagle 101 Need help? 09:07, 20 March 2008 (UTC)

Tracking the category contents
Am I right in thinking that once the images have been resized and the old versions deleted, that the tag is removed and they disappear from the categories? If that is so, we need either a permanent category (or list) for all images touched by this bot (maybe the contribs log instead?), or a way to track the history of this category. Otherwise there is no way to follow or watch the backlog being dealt with, other than seeing a category go from full to empty, with no real sense of what was in the category. Can we set up a tracker to list the contents of the category? Carcharoth (talk) 14:17, 19 March 2008 (UTC)
 * Right, which is why I suggest the bot logs a list of what it has touched to a page. I can do it in subsections of 50 pages per section or something, so it is manageable to editors wishing to tackle the backlog. They can mark sections as incomplete, etc. I'm sorry that the revised proposal is not fully exact, we are still brainstorming here ;). ——  Eagle 101 Need help? 14:22, 19 March 2008 (UTC)
 * I kinda like the category idea... Category:Images resized by ImageResizeBot? (Or something...) SQL Query me!  15:26, 19 March 2008 (UTC)
 * We can do both :P. The log idea I think is more to assist folks who are actually trying to double check what the bot touches, and make sure the various licenses are all in order. ——  Eagle 101 Need help? 15:27, 19 March 2008 (UTC)
 * I don't really see the point of a permanent category, myself. We will have the maintenance category (Category:Rescaled fairuse images by ImageResizeBot) that can be used to go through the backlog. Besides that, a list would really be duplicating the contribs log (but no complaints, it would make things easier in some regards, I suppose, and certainly would make things even more transparent). But another category seems excessive. Is there a precedent for categories by contributor? - AWeenieMan (talk) 15:47, 19 March 2008 (UTC)
 * We have special templates for some bots, no reason why it can't be extended to categories if it serves a useful purpose. I'll do the list mainly because it keeps folks happy, and happy wikipedians are nice to be around :) ——  Eagle 101 Need help? 15:49, 19 March 2008 (UTC)
 * I didn't realize that we had a maintenence category, nevermind :) There's a lot of stuff to follow here... SQL Query me!  16:08, 19 March 2008 (UTC)
 * If its annoying to have the bot write a log, we can use AWB to run a log from its full contributions. YEA! I finally get to make a technical suggestion!  MBisanz  talk 19:03, 19 March 2008 (UTC)
 * Heh, its not that big of a deal, just 5 lines of perl code :) ——  Eagle 101 Need help? 19:49, 19 March 2008 (UTC)

Collaboration
I was trolling through edit logs and noticed that run by  seems to do something with cleaning up the rescale tag. Could someone else take a look and see if it could help us?  MBisanz  talk 08:26, 20 March 2008 (UTC)
 * If other bots and tools are helpful to this task, please feel free to get them involved. However this task really is not dependent on these other tools. ——  Eagle 101 Need help? 08:38, 20 March 2008 (UTC)
 * Well my other concern is if this bot parses the rescale cat, and removes notices of those rescaled automatically, it will be difficult for involved users to track which images need the Low Res FUR variable changed. I'll ping East on it.  MBisanz  talk 08:42, 20 March 2008 (UTC)
 * Thats is why this bot will log to a wikipedia page. :) ——  Eagle 101 Need help? 08:44, 20 March 2008 (UTC)
 * 718 bot is a one-off bot to follow East around and tidy up his admin actions. So it can't help or interfere with this plan.  MBisanz  talk 19:11, 20 March 2008 (UTC)

Current status
So whats the current status of this bot. I'm set to do my bit with the bit. As soon as its ready to run, I'll full protect my userspace'd FURD since its not subst'd and start deleting, etc. Also, I'll look into getting someone to interesect the Free and Non-free Cats in advance, so I can tackle that issue (of de-interesecting them) at the same time.  MBisanz  talk 08:16, 23 March 2008 (UTC)
 * I think we are going to wait a bit on this one. I have to do a bit of programming to get the various requests in, all the while I'm working on some counterspam stuff. Example: http://meta.wikimedia.org/wiki/User:SpamReportBot/test At the very least I need to repeat the conditions and task again to make sure that what I'm doing is what is supposed to be done. That will probably happen tomorrow or Tuesday. ——  Eagle 101 Need help? 12:06, 23 March 2008 (UTC)

non-BAG expire, reopen when you continue to puruse the request. BJ Talk 10:18, 5 June 2008 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.