Wikipedia:Bots/Requests for approval/Demibot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol neutral vote.svg Request Expired.

Demibot
Operator:

Time filed: 20:33, Friday November 21, 2014 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: On Github

Function overview: Generate indexes of talk page archives

Links to relevant discussions (where appropriate):

Edit period(s): Daily (more often if discussion believes it's warranted)

Estimated number of pages affected: 3207 (based on current configuration and logging)

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: Builds a list of all pages transcluding (currently only in the Talk, User Talk, Wikipedia Talk, File Talk, and WikiMedia Talk namespaces, although I will extend that to other namespaces when the bot is operational) and generates archive indexes according to the specification of HBC Archive Indexerbot. Specifically:


 * Finds and parses the Optin template on each page
 * Builds a list of all pages that fit the mask specified on the page
 * Iterates through that list and finds different thread with a regex, then does the following for each thread:
 * Stores the title of the topic
 * Estimates the number of replies
 * Finds the earliest and latest times in the thread, stores those as well as the difference between them
 * Generates a link to the thread
 * Grabs either the default or a specified template and generates the talk page archive index from this
 * Writes the talk page archive index to the page specified in the Optin template (this is the only edit the bot will make aside from to its own log page)
 * Logs its actions to User:Demibot/log

Discussion
I've been writing the bot already, and it's mostly functional for my own talk page, but before I write anything that makes edits (even to my own talk page index) I wanted to bring this to BRFA. The code is a bit of a mess right now, partly because it's my first time writing Python, but it's functional. It doesn't generate the index from a template yet, but the functionality is there; I just need to write a parser for the template. This bot replaces the now-inactive HBC Archive Indexerbot and the inactive Task 15 on Legobot, neither of which appear to be coming back any time soon. If anybody has any questions, I'll be glad to answer them. If anyone has any concerns or suggestions, I'll be glad to hear them. demize (t · c) 20:33, 21 November 2014 (UTC)
 * This is much needed functionality to the extent that I was considering coding it and requesting permission to run from Their Majesties. I'm sure it will have wide ranging support.
 * Note: You can of course run it on your own talk pages, even before trial approval.
 * All the best: Rich Farmbrough, 15:35, 22 November 2014 (UTC).


 * This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT ⚡ 18:12, 24 November 2014 (UTC)
 * It did, and I tagged the page it created for deletion (it's been deleted) and fixed the problem that made it create a page that it shouldn't have. I've also run it on my own talk page, and it generates the archive index excellently, in my opinion, with the one flaw that sections without any timestamps have a first comment date of December 31, 9999 and a last comment date of January 1st, 1900. I'm thinking that I'll just make it output dashes instead of dates and durations when there are no timestamps, that'll be a bit better. demize  (t · c) 19:00, 24 November 2014 (UTC)
 * Just an update on the current status of the bot: As per above, I have tested this on my own talk page. I have also made a version of the bot that should work on all talk pages. That's been pushed to GitHub, but the changes haven't yet been pulled to the Tools Labs server. What happens next with the bot is up to the BAG. demize  (t · c) 22:56, 24 November 2014 (UTC)
 * ((BAGAssistanceNeeded)) As it's been a week since I filed the request with no comment from a BAG member, I'm requesting BAG assistance. I've been making some changes to the code on Github over the past week, but there's very little I can do now without approval for a trial and a trial scope. demize  (t · c) 18:20, 28 November 2014 (UTC)
 * , any specific reason why the bot isn't marked as exclusion compliant? APerson (talk!) 00:19, 30 November 2014 (UTC)
 * Strictly speaking, it isn't. It's opt-in rather than opt-out; it doesn't look for the template at all, since if someone wanted to remove it from their page they could remove the opt-in template. It's still just as easy to stop from editing your pages, just not technically exclusion compliant.  demize  (t · c) 04:50, 30 November 2014 (UTC)

Trial

 * Definitely useful. Please do not do a full run but instead trial against one or two users talk page archives, such as mine! (which should already have the HBC template, if not then just add it :) ). Post links once complete!  ·addshore·  talk to me! 15:15, 4 December 2014 (UTC)
 * Thanks! I'll get this done tomorrow. I'll run it against mine and yours, and if another user or two volunteers their page I can run it against theirs as well. demize  (t · c) 15:23, 4 December 2014 (UTC)
 * Feel free to index my talk page ;) (iirc, I don't have HBC template, so please add it!) &mdash; Revi 15:41, 4 December 2014 (UTC)
 * Fixed a few issues that popped up (turns out my attempts to catch exceptions threw a few, and I added some other things without testing them... they're fixed now). It took longer than I expected due to real life and my need to figure out what on earth the masks for -revi's talk page archives could be. The mask option really wasn't designed for monthly archives, it turns out. That aside, the bot worked as expected once I ironed out the bugs. demize  (t · c) 21:03, 9 December 2014 (UTC)


 * In this edit, the bot/botowner added a template with multiple duplicate parameters mask. I commented that out. This is about CAT:DUPARG. User talk:Demize described that this is part of the bot, but I think we should not expect multiple parameters (that being undocumented). DePiep (talk) 01:17, 10 December 2014 (UTC)
 * brought up quite a valid point on my talk page regarding the Optin template just now (User talk:Demize). The duplicate parameters are an issue, and an easily correctable one if I want to make this bot not entirely compatible with the current operating instructions. I could make the masks all be specified as one parameter, with multiple masks separated by semicolons (and this could be done while preserving the ability to specify them as multiple parameters). If there's any way to exclude the template this bot uses from CAT:DUPARG, then that could be done as well. Certainly an issue to consider. demize  (t · c) 01:24, 10 December 2014 (UTC)
 * AFAIK, there are no exceptions possible (cat:duparg is build deep inside mw software). Also, one should consider what such repetition intends versus regular tempalte usage & practice (mw-level). DePiep (talk) 01:29, 10 December 2014 (UTC)
 * I'm certainly not averse to changing how the bot reads the template. The only issue is that HBC Archive Indexerbot and Legobot both read multiple parameters from the template rather than cramming them all into one parameter. Either way, it doesn't affect anything on the MW level: the template has no content since it's used simply as a way for the bot to find pages to work on and to know what to do on that page. I can certainly see the point of not repeating the parameters to keep within the accepted practices for templates though, and I'll probably write it in with semicolon delimiters when I get the chance... for now, it's almost 21:00 and I have an exam at 08:00, so I'm off for the night. demize  (t · c) 01:49, 10 December 2014 (UTC)
 * Umm, User talk:-revi/Archive Index links has some errors; blah blah are shown on link - it should not be like this... &mdash; Revi 11:53, 10 December 2014 (UTC)
 * Indeed, it does. I'll make it strip HTML tags when I get home, as well as write another regex to make it handle wikilinks, something else I didn't notice was an issue until now. Thanks. demize  (t · c) 12:15, 10 December 2014 (UTC)
 * Because it breaks the section link. &mdash; Revi 12:17, 10 December 2014 (UTC)

So, just a few comments from me / some things I would love to see... Naturally when people have a lot of talk page archives the index page is pretty big! Would it be possible to shorten "0 days, 15 hours, 39 minutes" to maybe "0d 15h 39m"? On big index pages this should make quite a difference. Another space saver could be in the Link column changing the text displayed from the link from "User talk:Addshore/Archive 1#Thanks" to "Archive 1#Thanks" for example! Also, If we have the first and the duration do we really need the last? I guess it could help people sort? I'm starting to think the template should have more epic options!  ·addshore·  talk to me! 20:19, 10 December 2014 (UTC)
 * The template I used for these pages was the one that has all the variables in it, so it's a bit bigger than the default one (which just has a few columns). I'll definitely take your other suggestions into account, and if you have a template you think should be the default, then let me know!
 * As for the bot configuration template, I actually just came up with another idea: would work for -revi's talk page, and is much simpler than how it is now. It also takes care of the issue of duplicate parameters, but remains mostly backwards-compatible with them (so long as I don't mess up the code too badly). The optional start and end parameters say which number to start from and end at. Start would default to 1. If the end parameter is specified, it'll keep looking through pages until it hits the number specified in end, otherwise the behavior is as it is now (it will stop looking through pages once it finds one that doesn't exist). <#M> would be replaced with month names in order to make monthly archives so much easier to work with. How does this sound?  demize  (t · c) 21:07, 10 December 2014 (UTC)
 * Well, since nobody seems to have any objections and I have a fair bit of free time coming up, I'll get started with this either later today or tomorrow. My todo list:
 * Make the bot strip HTML from section titles
 * Remove any extra equals signs from the beginning and end of the section title (minor change to the regex should work; instead of explicitly ==, look for ={2,})
 * Parse wikilinks so that the link title shows up without the rest of the link
 * Implement the other changes I described above
 * And then I'll be back here, ready to set the bot loose on a few pages again! Some of the items in that list (that is, the first three) might be easier to fix than I'm expecting, I'll have to see if the python library I'm using makes it easier... demize  (t · c) 17:10, 22 December 2014 (UTC)

Trial2

 * Approved for trial with the same conditions as before once you have done your poking. Let us know how it goes!  ·addshore·  talk to me! 16:45, 9 January 2015 (UTC)
 * Just an update to let everyone know I didn't forget about this. I have most of the changes implemented, but school and work are taking up a fair amount of time right now so I've had less of a chance to work out all the implementation details regarding the updated Optin template than I'd have liked. I should have this ready to go soon! demize  (t · c) 15:43, 28 January 2015 (UTC)
 * So, how's this going? Josh Parris 15:00, 9 March 2015 (UTC)

editor has not edited since January. Shall we close as expired and reopen if necessary? -- Magioladitis (talk) 22:07, 22 March 2015 (UTC)


 * , feel free to re-apply at any time, but it seems that you have other things on your plate at this time. Josh Parris 22:35, 24 March 2015 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.