Wikipedia:Bots/Requests for approval/Theo's Little Bot 21


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Theo's Little Bot 21
Operator:

Time filed: 21:20, Thursday June 13, 2013 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: github

Function overview: Adds Information to self-published files uploaded to Wikipedia that don't already have an information template.

Links to relevant discussions (where appropriate): request by

Edit period(s): Daily

Estimated number of pages affected: Unknown

Exclusion compliant (Yes/No): Nope, no need

Already has a bot flag (Yes/No): Yep

Function details: For all files in Category:Self-published work, the bot checks to see if they already have an Information-esque template. If not, then a pre-filled version of Information (including uploader and upload date, gleaned from the file info) is prepended to the page.

Discussion
 ·addshore·  talk to me! 18:55, 16 June 2013 (UTC)
 *  Theopolisme ( talk )  03:38, 17 June 2013 (UTC)
 * Please use EXIF creation date whenever available and specify where the date comes from: "YYYY-MM-DD (according to EXIF)" or "YYYY-MM-DD (original upload date)". --Stefan2 (talk) 20:07, 20 June 2013 (UTC)


 * Template_talk:User-multi, please change the template used to add user name, talk pgae and uploads link. Sfan00 IMG (talk) 15:55, 21 June 2013 (UTC)
 * Let me know when you are ready for another trial.  ·addshore·  talk to me! 10:13, 22 June 2013 (UTC)
 * Sfan00 IMG, what exactly do you want me to do? Sorry for the belated response, I was traveling.  Theopolisme ( talk )  20:54, 30 June 2013 (UTC)

Also, is there any way to make text that is already there go into the "description" parameter, like CommonsHelper does when moving files to commons?:Jay8gInspect-Berate-Know  WASH-BRIDGE-WPWA-MFIC 17:20, 23 June 2013 (UTC)
 * I can look into it. Also, note that the  in your signature makes it appear rather...strange.  Theopolisme  ( talk )  20:55, 30 June 2013 (UTC)
 * Is it sufficient to get all of the contents of the image description before the first section header (if there is one), strip newlines from it, and then save it as the description parameter (github issue)? Thoughts? Thanks,  Theopolisme ( talk )  04:04, 1 July 2013 (UTC)
 * Well, it is often under a header (often "Summary" or similar), so that wouldn't work in those cases:Jay8g [ V•T•E ] 04:12, 1 July 2013 (UTC)
 * What about stripping all headers and newlines from the page and then using the resulting content?  Theopolisme ( talk )  04:21, 1 July 2013 (UTC)
 * That would work:Jay8g [ V•T•E ] 00:15, 2 July 2013 (UTC)
 * ✅ with commit; still working on EXIF data; and I still need clarification on the Template_talk:User-multi thingamajigger.  Theopolisme ( talk )  02:30, 2 July 2013 (UTC)
 * If the page says "PD-self", then I suspect that the output would be "". Could this be avoided? Some templates (like music) obviously belong in the description while other templates (like PD-self) are better placed elsewhere. I'm not sure exactly how tools for moving files to Commons work with templates, but they often get unusual templates wrong. --Stefan2 (talk) 21:05, 2 July 2013 (UTC)
 * Good point. Perhaps just skip over templates?  Theopolisme ( talk )  23:29, 2 July 2013 (UTC)
 * Also, I suggest that you don't use User-multi. It doesn't exist on Commons and use of the template on file information pages might cause problems when files with the template are moved to Commons. I suggest that you indicate user names as User:Username . Usually, there is only a single link anyway, going to the user page. --Stefan2 (talk) 21:10, 2 July 2013 (UTC)
 * Right now I'm substituting, which does basically what you suggest.  Theopolisme ( talk )  23:29, 2 July 2013 (UTC)

✅: the bot will now use EXIF creation dates if available, commit.  Theopolisme ( talk )  01:09, 3 July 2013 (UTC)
 * Another trial?  Theopolisme ( talk )  01:09, 3 July 2013 (UTC)


 * Theopolisme, Another note , when using {{subst:usernameexpand}} any spaces in the user name have to be replaced with _'s for reason to do with the underlying template code.Sfan00 IMG (talk) 21:22, 3 July 2013 (UTC)
 * ✅ with commit  Theopolisme ( talk )  22:08, 3 July 2013 (UTC)

BAGAssistanceNeeded Another trial, perhaps?  Theopolisme ( talk )  21:31, 9 July 2013 (UTC)
 * -- Chris 15:27, 15 July 2013 (UTC)
 * with some bug fixes  Theopolisme ( talk )  16:25, 15 July 2013 (UTC)


 * Been cleaning up a few of these, Do you think you could add a polite nag notice to uploaders, to let them know a bot's added basic information for an image they uploaded? So that they can come and cleanup what the bot left if needed? Sfan00 IMG (talk) 10:32, 16 July 2013 (UTC)
 * Hmmm, yes, I suppose. Could you work on writing the notice's text?  Theopolisme ( talk )  14:01, 16 July 2013 (UTC)
 * See un-botfill which in it's final wording should perhaps be substituted.Sfan00 IMG (talk) 14:33, 16 July 2013 (UTC)
 * actually, I'd put it unprefixed so you could supply a list, group multiple batches of additions like the 2 existing notifications from the bot do :) Sfan00 IMG (talk) 21:35, 16 July 2013 (UTC)
 * Actually, I think it would be fairly difficult to provide batch notifications, and here's why: The existing tasks don't actually edit the images in question. However, this one does. What does that mean? Well, for starters, notifications can't be delivered (for obvious reasons) until the end of the bot's run, since prior to that it obviously wouldn't know which pages it edited...since it wouldn't have edited them yet! The problem with that is that in the process of adding information to articles, it's possible for the bot to be disrupted, confused, etc...in turn resulting in incomplete or otherwise corrupt notifications. Worst case scenario, yes, but with such great separation between edit and notification (the bot's complete run will take quite a while), I'm worried that the potential benefit would be rather difficult to achieve.  Theopolisme ( talk )  03:39, 17 July 2013 (UTC)
 * OK That's reasonable then, thanks for tweaking the template :) Sfan00 IMG (talk) 07:57, 17 July 2013 (UTC)
 * Alrighty, I've made it so the bot won't substitute a new notification each time -- instead, it does this. I've tested it (as you can see!) and pushed the new code to github, but I'm not averse to another trial if that's what needs to happen.  Theopolisme ( talk )  15:40, 17 July 2013 (UTC)

I have checked the latest trial, and I have a few comments.
 * The bot adds . In most other cases, the author parameter only contains a link to the user page. Not sure if this is important.
 * This is the default behavior of Usernameexpand.  Theopolisme ( talk )  15:43, 19 July 2013 (UTC)

with the newlines -- Chris 14:42, 20 July 2013 (UTC)
 * The bot removes newlines from the text it adds to the description field (example: File:"Girl with a Pearl Earring" (after Jan Vermeer) by Lawrence Saint.jpg). This means that anyone wishing to move the file to Commons needs to clean up the description field by restoring the newlines. Alternatively, if there haven't been any further edits after the bot's edit, you could use rollback, which is faster.
 * Yes, the bot does remove newlines; what would you like it to do instead? Take a look at this for an example of what happens when it doesn't remove newlines...you can see that it looks like spaces.  Theopolisme ( talk )  15:43, 19 July 2013 (UTC)
 * You can use
 * trouts himself. Okay, I've implemented that.  Theopolisme ( talk )  15:10, 20 July 2013 (UTC)


 * The bot sometimes inserts strange things in the description field (example: File:2 active lanes no emergency.svg).
 * I assume by weird insert you're talking about the . That issue was fixed during the second trial, and you can see that shortly afterwards the bot correctly edited that page.  Theopolisme  ( talk )  15:43, 19 July 2013 (UTC)


 * The description also appears below the Information template, so anyone wishing to move the file to Commons either needs to rollback the bot's edit or remove the duplicate description below the Information template. This takes extra time.
 * Would it make sense for the bot to try to remove the description (i.e., find and replace)? The one problem with this that I can see is the issue of trying to remove the description header, if there is one -- they come under a variety of names, and there would obviously be cases where it missed an unusual header and just ended up removing the description text, not the section header.  Theopolisme ( talk )  15:43, 19 July 2013 (UTC)


 * The bot states that the uploader is the author and that the source is "own work", but the licence templates state that the uploader is the copyright holder. Not sure if this difference is important. Compare with the Commons templates Commons:Template:PD-heirs, Commons:Template:GFDL-heirs and Commons:Template:CC-BY-SA-3.0-heirs. --Stefan2 (talk) 13:25, 19 July 2013 (UTC)
 * Hmm, I don't really know either. Alternatives?  Theopolisme ( talk )  15:43, 19 July 2013 (UTC)

I've replied above, inline, to various concerns. I'd like to get this task going, so your replies would be appreciated. Thanks!  Theopolisme ( talk )  17:46, 31 July 2013 (UTC)
 * I don't see any major issues Sfan00 IMG (talk) 17:48, 31 July 2013 (UTC)

BAGAssistanceNeeded Per discussion with Bot Operator, this is in BAG's court for approval. Hasteur (talk) 23:45, 19 September 2013 (UTC)
 * Which discussion are you referring to? — HELL KNOWZ  ▎TALK 16:58, 21 September 2013 (UTC)
 * I would hardly call it a discussion, more like a "do you mind if I poke BAG about this" on IRC. With that said, though, the requester appears satisfied with the bot's operation and to the best of my knowledge the above issues have been resolved.  Theopolisme ( talk )  19:13, 21 September 2013 (UTC)
 * I see, it wasn't the botop pinging BAG, so I wasn't sure what's happening. — HELL KNOWZ  ▎TALK 19:26, 21 September 2013 (UTC)

Edits look good to me, but since the above issues, another It doesn't look from the source code that you do many checks on the text found in the summary. Would, for example, adding templates inside templates mess the Regex up? Or embedding images? — HELL KNOWZ  ▎TALK 19:26, 21 September 2013 (UTC)
 * I'm not sure, so... Tomorrow I'll convert the code to use the much more superior mwparserfromhell module which has the fabulous strip_code function (removes all unprintable -- including templates -- wikicode from the string).  Theopolisme ( talk )  22:41, 21 September 2013 (UTC)

Third trial complete
edits  Theopolisme ( talk )  01:57, 9 October 2013 (UTC)

Sorry this took so long. — HELL KNOWZ  ▎TALK 11:09, 5 November 2013 (UTC)
 * etc. -- Signature in description
 * etc. -- Stripping links?
 * etc. -- Image strip left behind "300px"
 * -- Summary wasn't picked up? Was that just excluded due to funky syntax?
 * -- Nowiki interfered with your br's
 * Really minor: -- trim brackets?
 * Theopolisme, how come you didn't spot these problems during your trial? Josh Parris 09:41, 7 November 2013 (UTC)

Hmm, I think this is a time for decisions...and firming up exactly what should be removed, as well as what shouldn't. Templates? Images? Signatures? Should images and templates be converted to links? Or removed outright? Sfan00 IMG, thoughts as the requestor and as someone familiar with the file process?  Theopolisme ( talk )  15:45, 9 November 2013 (UTC)
 * As others have said. Sfan00 IMG (talk) 15:48, 9 November 2013 (UTC)

where to next? Josh Parris 09:06, 11 November 2013 (UTC)
 * Images should be converted to links. Existing links should be not be modified. Templates should be discarded. Signatures should be removed if possible (obviously some unusual ones might slip by). Am I reading "consensus" correctly?  Theopolisme ( talk )  22:54, 11 November 2013 (UTC)
 * I'm not so sure about discarding templates, some of them might be relevant.Sfan00 IMG (talk) 23:21, 13 November 2013 (UTC)
 * The templates remain on the page, just not in the description field.  Theopolisme ( talk )  00:26, 14 November 2013 (UTC)

On the assumption code changes have taken place, Josh Parris 01:19, 16 November 2013 (UTC)

Fourth trial complete
Here are tests showing fixes for the specific issues reported above:
 * no longer stripping links
 * converts images to image links
 * removes user signatures (when they match the default pattern)
 * re: the others...
 * trim brackets: out of scope of this bot
 * summary wasn't picked up: summary was just a template, which was stripped
 * nowiki interfered with br's: descriptions that include nowiki tags will now be skipped (too complicated for the bot to deal with)

If you'd like me to still run an additional trial I can do that, but since it takes a good deal of computing power (and time) to probe through the entire category (since a good number of files already have descriptions), might this be sufficient?  Theopolisme ( talk )  02:12, 16 November 2013 (UTC)


 * Thanks, cunning solution to finding other erroneous cases - just re-edit the broken ones!
 * I've had a look back through the BRfA, and I'm perplexed by the newline stuff. The description= parameter can take wikitext, so why all the futzing around with &lt;br />s? If there's a pipe symbol you'll be in trouble, but apart from that...
 * Identified bugs fixed. So, what's with the newline stuff?
 * Stripping all templates seems to be a problem but a decent solution would require a fair amount of effort. Is anyone going to be following after the bot, looking for bad edits like this one? 03:18, 16 November 2013 (UTC)
 * Stripping all templates seems to be a problem but a decent solution would require a fair amount of effort. Is anyone going to be following after the bot, looking for bad edits like this one? 03:18, 16 November 2013 (UTC)


 * Well, the problem is that the newlines are automatically converted to spaces, as seen in this diff. This is resolved by using br tags.
 * As far as "looking out for bad edits" goes... I truthfully don't think it's really that big a deal, although of course I'll keep an eye on it to some extent. Worst case scenario is that the file is tagged as "description missing" and eventually someone adds a description.  Theopolisme ( talk )  03:44, 16 November 2013 (UTC)
 * But newlines (as opposed to new paragraphs) are meant to be joined up in wikitext.
 * Can you point to an lump of wikitext that would have rendered wrong if it was just dumped into the template without inserting &lt;br />s? Josh Parris 05:02, 16 November 2013 (UTC)
 * Sorry, somehow I missed this.  Theopolisme  ( talk )  23:42, 20 November 2013 (UTC)
 * Yeah, in that edit you've eaten all the newlines. Of course it'll look crap.  Try this:
 * Yeah, in that edit you've eaten all the newlines. Of course it'll look crap.  Try this:


 * Heh, thanks -- you're entirely correct (and some of the commenters earlier in this brfa weren't ;) ). Looks like the parsing engine was doing something weird in that version of the code. I'll remove the  conversion line.  Theopolisme  ( talk )  02:34, 21 November 2013 (UTC)
 * No worries. I thought you knew something I didn't.  Mark sure you check for pipe symbols.  Anyways, I'm planning on approving this task.  Any objections? Josh Parris 02:50, 21 November 2013 (UTC)

An established operator, with a task that has consensus. Josh Parris 19:45, 21 November 2013 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.