User talk:Trialpears/Archiving manifesto

Thoughts
Hi Trialpears! I saw you referenced this at BOTREQ, and just wanted to say I'm super pleased to see work being done in this area and hope it succeeds. A few quick thoughts: Cheers, &#123;{u&#124; Sdkb  }&#125;  talk 23:58, 12 July 2021 (UTC)
 * Alluding to the obligatory xkcd, I think the ultimate path this will have to take is not just for a third bot to be introduced, but for the previous two to be deprecated, as getting them to integrate with the new scheme would be difficult (I think?). I see this as a plus, but it might generate some opposition just by virtue of being such a big change.
 * I would strongly suggest that the new bot be named User:ArchiveBot rather than something else. This bot will be among the most visible on Wikipedia (alongside SineBot and ClueBot), and giving it an intuitive rather than insidery username will help newcomers understand it better (and help communicate its communal ownership).
 * I think one of the biggest challenges editorially may be determining what the optimal archiving settings are. Personally, I think setting a high minimum thread count (4+) is often advisable, as it prevents overly aggressive archiving if activity ever drops off. But others might disagree. Having an entirely separate discussion to battle out this question might be best, but at the least it should be a distinct subsection.
 * Continuing on process thoughts, developing this will likely need multiple stages. An initial discussion could be held at VPI, VPT, or somewhere similar to get wider input and figure out the likely challenges. That would then be used to craft a solid CENT-listed proposal where the decision to adopt the system would be made. I'm happy to be involved in any of the stages in whatever ways I can be helpful.
 * Thanks for the feedback! My plan currently is to finish this essay and contact the bot ops tomorrow and then, assuming the first part goes well, draft some sort of discussion. I think going thru VPI is probably a good idea to get some input from people who aren't tech editors. Your help would then be very appreciated for drafting the big RfC.
 * After/in parallel to the RfC and I will presumably work on the actual bot code. I'm sure there will come up plenty of places where more help would be appreciated.
 * For the specific points you brought up:
 * I agree we should deprecate the other bots, or at the very least one of them, if this is to pass. Configuration conversion would also be a big project if we don't have backwards compatibility which kind of ruins a lot of benefits.
 * The bot should definitely have a good and descriptive name, ArchiveBot is an inactive commons bot though.
 * I'm worried a no consensus on defaults could be a major problem. --Trialpears (talk) 22:52, 13 July 2021 (UTC)
 * I think one of the biggest challenges editorially may be determining what the optimal archiving settings are. The key is acknowledging there is no one-size-fits-all archiving setting. Accept that most use cases will need to override the default. The default (say 30 days) is there to achieve the goal of making the no-param call work. Once you understand that, you realize how silly it would be to do battle over a default. Sure the default should probably not be "1 day" or "10 years" but that's about it. CapnZapp (talk) 22:56, 1 August 2021 (UTC)
 * I'm worried a no consensus on defaults could be a major problem. --Trialpears (talk) 22:52, 13 July 2021 (UTC)
 * I think one of the biggest challenges editorially may be determining what the optimal archiving settings are. The key is acknowledging there is no one-size-fits-all archiving setting. Accept that most use cases will need to override the default. The default (say 30 days) is there to achieve the goal of making the no-param call work. Once you understand that, you realize how silly it would be to do battle over a default. Sure the default should probably not be "1 day" or "10 years" but that's about it. CapnZapp (talk) 22:56, 1 August 2021 (UTC)

So happy to see this being drafted
I've been trying to simplify instructions at Help:Archiving a talk page for a while, finally gave up and made Help:Archiving (plain and simple) just to give myself one simple 2-part step to copy/paste. Simplifying it even more (and maybe even just making it automatic?) would be I think a big help to the casual editor who happens into an overlong talk page, goes looking for instructions, finds the main help page, and just gives up. Or follows the directions incorrectly or incompletely. Is there discussion now? —valereee (talk) 15:37, 20 July 2021 (UTC)
 * There's no discussion yet. I sent out emails to the current bot ops, but haven't gotten any reply (which is completely understandable as we are all volunteers). I'm hesitant to start a larger discussion before having a word with them and will give it some more time. --Trialpears (talk) 19:28, 20 July 2021 (UTC)

I took a similar step. Have a look here: User:Lowercase_sigmabot_III/Archive_HowTo CapnZapp (talk) 22:59, 1 August 2021 (UTC)


 * I don't understand what that page is telling me. It keeps saying "(Please remember, this is an example and this exact code won't work on your page)" and it doesn't include an archives search box? Why are we even showing people examples that won't work? And why the heck does the average nontech editor care that This tells the bot to archive threads over thirty days old (leaving the four most recent) from User talk:Example to User talk:Example/Archive 1 (more about variables below) until it fills up to 150 kilobytes, whereupon the bot will move to 2 (updating the counter when saving page). Remember to specify the maximum size of an archive, or it will behave pretty much like in the first example. In addition, each archive page is given a banner, which makes it easy to move between the different archive pages.? I have zero idea why any of that is important, and I don't care. All I want is 1. copy X and 2. paste it at Y and 3. save. Et voila, archives. This needs to be for dummies like me, with (for those who do care) links to this kind of explanation of how may kilobytes the archive is, how it updates the counter, etc. Seriously all I want is for the talk page not to have eighty gazillion sections, some of them fifteen years old, but to still be able to find old discussions.
 * Maybe we need a tech namespace to let people who care find this stuff but keep it out of the results for people who just want to be able to set up a simple archive and search box? It's really all just noise for people like me. —valereee (talk) 17:54, 2 August 2021 (UTC)
 * @Valereee See Help:Archiving (plain and simple).&#8213; Qwerfjkl talk  19:04, 2 August 2021 (UTC)
 * @Qwerfjkl, yes, I wrote that lol... —valereee (talk) 21:21, 2 August 2021 (UTC)
 * &#8213; Qwerfjkl talk  21:24, 2 August 2021 (UTC)
 * hahaha. I wrote it because I'd tried to get folks at Help:Archiving talk pages to simplify the instructions, and I got pushback that seemed to be about (?) the value of teaching people how to use templates (don't quote me on that). I just wanted one place I could go to copy/paste and done, no decisions to make, just set up a simple archives with a search box. —valereee (talk) 23:02, 2 August 2021 (UTC)

Defaults
Here would be my defaults: &#8213; Qwerfjkl talk  16:49, 1 August 2021 (UTC)
 * minimum number of threads to keep: probably 5, maybe 10
 * maximum archive size: 75k - this is always going to be arbitrary, but smaller is better for slower devices
 * minimum amount of threads to archive: 1
 * the archive header: aan - simple, and an index can be more complete
 * archive: after 30 days
 * , I overall really like this. My only compunction would be that on slower traffic pages, 30 days is just too darn quick. Seriously, look at this exact discussion. For 24 days, not a single person replied to you. If I hadn't seen this for another week, under default, your thread would be gone and we wouldn't have been able to interact! For pages that get <50 views a day, can we really expect the discussions to be lively and timely? I would suggest a default of 90 or 180 days, with higher traffic pages necessitating a quicker setting.
 * I feel it's important to have the burden of change be placed on the higher traffic pages, and for the default to be slower. I say this because the higher traffic pages will just naturally have more eyes on the talk page, AKA more people to adjust the settings. If we burden slow pages with changing the default, then we are dooming them to abrupt and repeated discussions.
 * We could also utilize a "max size" setting that archives automatically to reduce to 75k on the main talk page, so that pages that are high traffic are still caught by default.— Shibboleth ink  (♔ ♕) 23:24, 25 August 2021 (UTC)
 * @Shibbolethink I agree, and have created the Setup cluebot archiving with 90 as the default (perhaps Setup auto archiving should be changed to have 90 as the default as well). &#8213; Qwerfjkl  talk  07:17, 26 August 2021 (UTC)
 * @Shibbolethink Re: If I hadn't seen this for another week, under default, your thread would be gone and we wouldn't have been able to interact!, it would have been fine with minimum threads as 5. &#8213; Qwerfjkl  talk  19:03, 26 August 2021 (UTC)
 * @Shibbolethink I agree, and have created the Setup cluebot archiving with 90 as the default (perhaps Setup auto archiving should be changed to have 90 as the default as well). &#8213; Qwerfjkl  talk  07:17, 26 August 2021 (UTC)
 * @Shibbolethink Re: If I hadn't seen this for another week, under default, your thread would be gone and we wouldn't have been able to interact!, it would have been fine with minimum threads as 5. &#8213; Qwerfjkl  talk  19:03, 26 August 2021 (UTC)

Defaults
First off: while your aims are commendable, you're going the wrong way about it if you first reduce options and customizability, and only second increase the automation level. Please leave everything intact WHILE you make this happen.

age and units: please don't try to enforce a single time period for all pages. Supplying whatever your solution ends up with with the archiving period is an absolute must; there are too many scenarios for a single value to fit every case. A default of 30 days seems the most natural, but you absolutely must be able to easily and conveniently able to override this: some pages need as lttle as 7 or even 3 days, while for other pages 90 or even 180 days are more appropriate. Actually 30 days is probably one of the least used periods! (too slow for frequent archiving, too fast for infrequent archiving). But that's okay since all it means is that one-size-fits-all is a non-starter for this parameter.

bot Obviously good if (and only if) you manage to achieve a state where the existing bots are no longer used, or where they become completely interchangeable.

minthreadsleft I don't think you will ever achieve consensus for removing the ability for editors to manually tweak this. Assuming a default if left undefined is of course okay - provided the default number is higher than 3. Archiving down to less than four talk page sections means the TOC automatically disappears, and that makes newly archived pages look very empty and barring. A few editors insist this is not an issue, but I strongly disagree. The archive bot should leave the TOC alone, unless the default is explicitly overridden (obviously there might be cases where even 0 makes sense, but that should not be the default as (I believe) it is today; i.e. if you do not supply a minthreadsleft parameter every thread can be archived)

N.B. I am aware the parameter names used here are not the actual ways you supply the parameters to the bots. Instead its how you present the parameter values to the reader via Archives or Talk Header or Auto Archiving Notice. CapnZapp (talk) 17:33, 1 August 2021 (UTC)


 * I definitely don't want to enforce one solution for all pages. Like you said the needs are different for different pages. I do however want it to be non-mandatory to give any parameters at all. If you want to change it you just add the parameter and the bot respects it. The potential downside to that is that it's more difficult to modify the settings if you don't see anything relevant in the wikicode. My thoughts about how to solve that to make the bot add a descriptively named archiving_age or similar parameter with the current value. That would also allow the bot to be a bit more clever when choosing an archiving age if none is present and could for instance check the page history to see how active the page is. That means that if you tell the bot to run, without specifying anything, on a very active page it may say a week, but on a not so active page a year. It would only do this once per page and respect the parameter after that. I'll go through my thoughts on the rest of the parameters later today. --Trialpears (talk) 17:50, 1 August 2021 (UTC)
 * Ok. Aiming for no params is entirely fine, but I recommend you abandon the idea to make the code (bot, AI) do clever guesses. Focus on clearly stating a) what the default is b) how you override it and c) suggest some values for common use cases and you're golden. Users supplying no arguments will then get exactly what's written on the tin, and if they (or any subsequent editor) doesn't like that, everybody will easily know what to do. Another way of looking at AI guesswork is that the code is no longer deterministic - but that's much more of a headache than a help; and it opens the Pandora's box of subsequent "AI improvements"... CapnZapp (talk) 22:49, 1 August 2021 (UTC)