Wikipedia:Bots/Requests for approval/718 Bot 2


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

718 Bot

 * Operator: east718
 * Automatic or Manually Assisted: Fully automatic
 * Programming Languages: Python plus Twisted
 * Function Summary: Convert all images on the English Wikipedia to the more efficient PNG format when necessary.
 * Edit period: One very long run, then once a week.
 * Already has a bot flag: yes
 * Function Details: Just like it says on the can, this bot will attempt to optimize all images on the English Wikipedia. On the first run, I'll let it loose on all images; from then on, it'll only attempt to convert ones tagged with ShouldBePNG, badJPEG, and badGIF. Now for the technical details: it'll attempt to convert the image using imagemagick's convert, then downsize it with optipng -o7 plus advpng -z4, pngcrush -brute, and pngout. If the lightest of these three resultant PNGs is smaller than the original image, the bot will upload it, preserving the image page and adding on all of the history associated with the old image. Lastly, it will update all references from the old image to the new PNG and tag the old image with PNG version available. All free images will remain until a human decides to clear out the PNG duplicate backlog, and all fair-use images will eventually be killed off by the bots. east. 718 at 05:36, June 17, 2008

Discussion
I like this idea and am inclined to trial it if I don't hear objections soon. Can you resize large unfree images during the conversion?  MBisanz  talk 06:08, 17 June 2008 (UTC)
 * This is not a task which I feel is appropriate for a bot. east. 718 at 16:05, June 17, 2008
 * Ok, I understand, just checking, I do see where certain images would be worse resized.  MBisanz  talk 21:36, 17 June 2008 (UTC)

Is it worth trying to convert JPEG images? I'd expect that JPEG artifacts would compress especially poorly in PNG format. --Carnildo (talk) 08:30, 17 June 2008 (UTC)
 * I expect so, but it's my computing cycles being wasted. :) If I manage to downsize even a pittance of a thousand JPEGs, I'll have done some good here. east<big style="color:#090">. 718 at 16:05, June 17, 2008

Why are uploading them as 'new' images? Can't you just replace the existing one with the new version? Also, why don't you run this on commons as well? -- maelgwn - talk 11:57, 17 June 2008 (UTC)
 * Well, uploading a PNG over a JPEG or GIF is kind of silly, no? Notwithstanding that, MediaWiki will automatically rename the file anyway. <small style="background:#fff;border:#800080 1px solid;color:#000;padding:0px 3px 1px 4px;white-space:nowrap">east<big style="color:#090">. 718 at 16:05, June 17, 2008

Shouldn't this task be restricted to GIFs? PNG was designed as a replacement for the GIF format, not for JPEGs. JPEGs should remain as JPEGs. Also, how are you planning on handling animated GIFs? Does your bot specifically detect and ignore them? Are you also planning on converting all SVG images? If so, what would be the point? Kaldari (talk) 22:14, 17 June 2008 (UTC)

The ShouldBePNG template states that: I would recommend applying the same criteria to this task, i.e. only converting non-animated GIFs (and maybe BadJPEGs). Kaldari (talk) 22:24, 17 June 2008 (UTC)
 * This template should not be used for
 * images for which only a JPEG source is available; recompressing with PNG will not remove artifacts and will produce larger files
 * animated images. PNG does not support animation so GIF should be used instead
 * images which contain strictly vector (non-raster) data. SVG should be used in this case.
 * To answer your questions one by one:
 * True, recompressing JPEGs will not remove artifacts, but it will only often produce larger files; images will get the reup treatment if and only if there is a reduction in filesize. The artifacting problem is a whole different beast that is far removed from what this bot is intended to do; this task will neiter resolve nor exacerbate the problem in the slightest.
 * Animated GIFs, multi-layered or indexed XCFs, and vector images will be completely ignored. PNGs will also be skipped over, but I might try that with a later bot.
 * Most bitmaps can be expressed as vector data given the effort anyway, but I can skip over all images already tagged with ShouldBeSVG. Alternatively, I can attempt conversion as usual and preserve the tag, which is the current behavior; again, this doesn't affect the problem of the image being rasterized to begin with.
 * Thanks for the questions and ideas! Is there anything I've missed or can help with? <small style="background:#fff;border:#191970 1px solid;color:#000;padding:0px 3px 1px 4px;white-space:nowrap">east<big style="color:#090">. 718 at 23:58, June 17, 2008
 * Thanks for taking the time to answer my questions. I think I'm satisfied that you've thought this through sufficiently. Kaldari (talk) 15:23, 18 June 2008 (UTC)

The only thing that pops into my mind is that there are a handful of images (just a handful) in Category:Images which should be in PNG format that require renaming (tagged with, some with a suggested title, some without). I can think of no better time to rename them than when a bot is re-uploading them anyway. It would certainly add another layer of complexity to this task, but I thought I would throw it out there. - AWeenieMan (talk) 00:47, 18 June 2008 (UTC)
 * I was thinking about this, but came to the conclusion that this is also unsuitable for a bot. A while back, I tried surreptitiously running a mass-deletion bot under my main account that would find and remove duplicate images, and the one crippling (and unfixable) flaw was that it wasn't able to choose which filename should be preferred. The same problem pops up here: a bot just isn't smart enough to figure out that moving Descriptive_filename_12.jpg to a8fh3jkg9f3j39f.pdf or HAGGER?????.jpg isn't appropriate. To distill somewhat, the rename media tag is applied with human judgment, and that's where the inherent failure in the system is. <small style="background:#fff;border:#4682b4 1px solid;color:#000;padding:0px 3px 1px 4px;white-space:nowrap">east<big style="color:#090">. 718 at 02:32, June 19, 2008
 * I would agree with that...we have a separate process for renaming and I think that's appropriate. The one thing that could possibly be taken into consideration here is that the rename media contains a field for the new filename, including extension. If this bot converts an image with the rename template, the template should be carried to the new file - but possibly the file extension in rename media should be changed to .png. For example, if this bot converts Image:ASDGGFCHJGV.gif, and the old image had, the new image Image:ASDGGFCHJGV.png should have a template that now says  . Hopefully I explained this correctly.  Kelly  hi! 02:38, 19 June 2008 (UTC)
 * Yep, that's a great idea, and one which I've thrown into the code now. <small style="background:#fff;border:#800080 1px solid;color:#000;padding:0px 3px 1px 4px;white-space:nowrap">east<big style="color:#090">. 718 at 02:40, June 19, 2008


 * BJ Talk 02:53, 19 June 2008 (UTC)

Rather than sic it on random images, I decided to cherry-pick the test sample to cover all possible bases. There was one bug: the wikitext in the edit summary portion of the history in Image:718test1a.png got parsed. I squashed this, as evidenced in Image:718test1c.png. I can haz approval plz? :) <small style="background:#fff;border:#daa520 1px solid;color:#000;padding:0px 3px 1px 4px;white-space:nowrap">east<big style="color:#090">. 718 at 04:34, June 19, 2008
 * Image:718test1a.jpg was a poorly optimized JPEG, which the bot correctly moved to Image:718test1a.png, replacing badJPEG with.
 * Image:718test1b.jpg was a well optimized JPEG which remained untouched, save the removal of the badJPEG tag.
 * Image:718test1c.gif was a poorly optimized GIF which was used in User:east718/test. The bot correctly moved it to Image:718test1c.png, copying over all entries in the history and replacing its usage on the test page while tagging the original with.
 * Image:718test1d.gif was an animated GIF with a badGIF tag that remained completely untouched.
 * Image:718test1e.svg was a vector image and also remained untouched.
 * Edits appear proper,  MBisanz  talk 04:39, 19 June 2008 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.