User:DeltaQuadBot/Local File Usage Task

Original VPP
Taken from WP:VPR on 15:03, 27 December 2011 (UTC)

There are an amazing number of pictures that get uploaded to WP, with no (or poor) descriptive text, with ambiguous names, and the uploader disappears forever. Almost every single time an image is uploaded, the uploader edits an article to add that image, often with a descriptive caption or edit summary. Years go by and somebody edits the article so the image is no longer linked, the author is gone, so we now have no idea what the image is of. Sometimes the linked article is deleted so even the edit history is useless to regular editors. These images get deleted all of the time.

I am proposing we have bots add this information to the file pages, so that even if it is not currently being used, we will at least have descriptive text and know what article someone thought it would be useful for. I recently spent several days moving hundreds of images that were in this situation. I had to go through the uploader's edit history, sometimes I would just guess (the worst guess I had to make is now called File:Possibly Ecuador.jpg). Adding this information to the file pages would save so much time and effort. Specifically I am proposing two things:
 * 1) We run a bot that adds these tags as the file is added to the article. The information will be the User name, article and section heading the file was added to, edit summary of the addition (if one was provided), and caption (if one was provided).
 * 2) We run a bot for existing local images (starting with orphaned images), scraping info from histories.

For free images, this will make them a lot more useful when they are moved to Commons. Also, it might be a good idea to put this info in a collapsed template, so the page won't be cluttered, but the text will still hit searches. If this idea meets approval here, I will take it to bot requests. ▫  Johnny Mr Nin ja  04:17, 20 December 2011 (UTC)
 * Update: A bot is now being worked on to complete this task, see Bot requests/Archive 44. ▫  Johnny Mr Nin ja  12:47, 24 December 2011 (UTC)


 * This sounds like a good idea to me. It would be very hard to search histories for unused images. Graeme Bartlett (talk) 10:50, 20 December 2011 (UTC)
 * Sounds brilliant!  S ven M anguard   Wha?  03:25, 21 December 2011 (UTC)
 * Yes. Now!   Ebe 123  → report on my contribs. 13:30, 21 December 2011 (UTC)
 * Strong support! Should also include the diff in which it was inserted, because often uploaders use a descriptive caption in the article that gives more information about the image. I'd also put this on the talk page so it isn't front and center (potential vandalism etc, plus no real use having this visible to casual users who presumably won't be stumbling upon orphaned images anyways.) Calliopejen1 (talk) 18:58, 21 December 2011 (UTC)
 * As I mentioned above, I think putting the info in a collapsed template on the file page is the best idea, so casual readers wouldn't see it anyway. Most people forget that files have talk pages, because they are so rarely used. Putting this text there would also prevent the text from helping during File: namespace searches, and would also make WP:moving to Commons more complicated (the text may not even be noticed and the file page will be deleted when the file is moved). ▫  Johnny Mr Nin ja  22:51, 21 December 2011 (UTC)

Bot request
Taken from WP:BOTREQ on 15:03, 27 December 2011 (UTC)

Per Village pump (proposals) - I would like to request a bot to do two related tasks:
 * 1) On all local files, as they are added to articles, notate the specifics of the addition on the file page. This information will hopefully include the diff, as well as User name, title of article, title of section, text of caption and text of summary (if available).
 * 2) Perform this action on all existing local files, searching through article histories (including deleted articles if possible)

Rather than copy-pasting, I ask that to see my full reasoning you see the VP link above. This will also require a template to be made here and at Commons (something like Usage history), and my thinking is that something close to Template:Tracklist would be best (with auto-collapse). I can do the template myself, but not well (I would be find/replacing Template:Tracklist), so if someone more skilled in template code would do that part as well, that would be great. Thanks! ▫  Johnny Mr Nin ja  21:09, 23 December 2011 (UTC)
 * I would be willing to pick this one up with some time to code over the break here. So let me get this straight, what do you want the bot to pick up on for the image to search an article that it is in? Would no summary be fine? (That way we could add a category and easily track it). So then you want the bot to look up and find the diff it was added, by whom it was added, the title (i'll consider the section, shouldn't be that hard), when it was added (in case of deletion), and a caption (again, shouldn't be hard, i'll consider it). Then you want that info listed on the file page? -- DQ  (t)   (e)  21:49, 23 December 2011 (UTC)
 * I'm pretty sure that you nailed it. By "Would no summary be fine?" do you mean the bot providing an edit summary on the file once it adds the information? Because that doesn't matter. If you mean "no summary" as in not including the summary of the edit that added the file to the page, having this info would be preferable. I did see a few images with no descriptors and no caption with an explanitory edit summary. The rest sounds right. ▫  Johnny Mr Nin ja  22:06, 23 December 2011 (UTC)
 * I was talking about how to find the images to do this with, but are you saying all images? (That could take one hell of a long time to start up initially, because there must be a ton of files, but i'd be willing to do it) -- DQ  (t)   (e)  22:14, 23 December 2011 (UTC)
 * Yes, the intention is all images, probably starting with Category:Copy to Wikimedia Commons (because of the upcoming drive), then moving to orphaned files. Fair use images can be skipped completely, as they are easily replaced and already linked. The backlog would be huge, but obviously there is no time limit. ▫  Johnny Mr Nin ja  22:57, 23 December 2011 (UTC)
 * Ok. Sounds good, i'll start looking into the project and draft a template or something up, will keep you informed. -- DQ  (t)   (e)  23:04, 23 December 2011 (UTC)
 * Awesome, thanks! ▫  Johnny Mr Nin ja  23:36, 23 December 2011 (UTC)

Questions

 * Which categories are first to be the first scanned?
 * Find some category or template that is common to Fair Use Images so we can ignore them. Does this exist?

Coding to do list

 * 1) Test search (List 250 Titles, Verify FUI Clear)
 * 2) Code Article searching/Template insertions
 * Find way to API or use file text to know what articles it's in
 * Search Article for text
 * Search diffs (is there a tool for this?) for image insertions
 * Pull info and transfer to template
 * 1) File BRFA
 * Save file page with all instances
 * Code the rechecker (non-priority)