User:Kslotte/Auto-archiving

''This is a draft and work in progress. Other users feel free to contribute. But, edit with care and don't do major re-writing, so far these instructions are my view on auto archiving. Plans is to move this out of my user space at some stage to receive more criticism.''

Auto-archiving is about balancing a flow of talk page content, where the content should be kept on talk page long enough for discussion, but archived once it is getting outdated. Having a too long talk page becomes hard to navigate and edit. Having a too short or empty talk page gives the impression of an inactive discussion.

It should also be taken into consideration that the talk page may be visited by users not actively watching the talk page. Therefore, there is little reason to have a too frequent archiving if the page has space. With a frequent archiving some users may get the impression of discussion censorship. Allowing a longer time to elapse before archiving threads and keeping more threads on the talk page allow occasional visitors to the talk page the opportunity to respond.

Implementation
According to talk page guidelines (WP:TPG) archiving should be done once 10 threads are reached or talk page size is more then 50K. Implement auto-archiving, notifications, indexing and archive boxes only once needed. No need to do implementation preparations in advance, since only a small percentage of the talk pages will become in need of archiving procedures.

Different approaches for archiving should be taken dependent of discussion activity on the talk page. Details is found below.

Regular discussions
... normal auto-archiving ... have to be active, if there are more than seven threads posted within a year. ... how to choose suitable archive time

Sporadic discussions
It is likely that talk page discussion is only occasional, and few editors actively watch a page, if there are fewer than eight threads posted within a year. In such cases it is better to rely on the thread amount for archiving settings instead of elapsed time. One approach is to archive a specific number of threads once a maximum amount of threads is reached. Such an approach will fill up the talk page history log less rapidly than archiving by elapsed time. There are three phases: See the Setup section below for details.
 * 1) Archive process: add MiszaBot code snippet
 * 2) Archive box: using archives
 * 3) Archive indexing: using HBC Archive Indexerbot

Inactive discussions
It may not be worth setting up bot archiving on a talk page where most discussions are inactive, and there are no comments in the last few months. For example, a three-year-old event won't have much discussion going on. If the talk page is less than 70K consider doing nothing. If you believe that the page is large enough to benefit from removing old discussions, you can do cut-and-paste archiving and set up archive boxes. There are three phases: See the Setup section below for details.
 * 1) Archive process: cut and paste into archives
 * 2) Archive box: using archives
 * 3) Archive indexing: using HBC Archive Indexerbot

For regular discussions
...



For sporadic discussions
Miszabot configurations can be used to implement archiving five threads whenever there are twelve threads displayed on the talk page. An other working setup for longer threads is archiving three threads whenever there are seven threads are reached. For very short threads can use archiving three threads whenever there are seven threads are reached. An elapsed time ("age") setting of about 180 days (a half year) also ensures that the archive process won't archive too many recent threads.

You can do manual cutting and pasting to give the archiving process a kick start. This is preferred, since it is a risk that Miszabot won't archive threads that are unsigned or has a non-standard signing format.

The example settings below, which are user-configurable, set a maximum archive size of 100K, count how many archive pages have been kept (with the count started at "1"), force the archive bot to keep at least seven threads visible on the talk page, tell the bot not to archive unless at least five threads meet (twelve in total) other archiving criteria (here "age," how long it has been since each thread was updated), and set the age for threads to be archived at 180 days since the last update to each thread. The setting Talk:XXXX has to be substituted by the actual talk page name.



For inactive discussions
Instead of implementing automatic archiving, the easiest way is to cut and paste the oldest thread into archives manually. Leave a few of latest threads on the page to indicate that discussions isn't dead. If you leave less then four (if long threads, total more then 75K), be sure to manually add the location of table of contest with code:

For regular discussions
Add notification box with auto archiving notice or include it in the archive box.

For sporadic and inactive discussions
Do not add a notification box since, there doesn't exist anyone for sporadic and inactive discussions.

Setup of an archive box
Add a archive box with links to archives, archive searching and a possible search index. The template to use is archives. Template Archive box is deprecated and should not be used for new archive box implementations.

For active discussions
If you have added and notification leave out the paramters bot and age and use the code snippets in sporadic and inavtive discussions instead. ...

If there is only one archive add code:

If there are several archives you may also include an index:

If there are more than six archives consider using the parameter  like:

For sporadic and inactive discussions
If there is only one archive add code:

If there is several archives you may also include an index:

If there is more then six archives consider to use the parameter  like:

Archive indexing
Consider creating an archive index, if you expect that the archiving process will produce multiple archive pages. Archive indexing can be implmented using HBC Archive Indexerbot. There are two steps needed to get an archive index working:
 * 1) Implement the archive indexing process
 * 2) Set up the archive index sub-page

Archive indexing process
The archive indexing process is implemented by adding the following code to the talk page: Add several mask rows for topic specific talk pages. For example, a sub-page named POV will have a mask row as. Read instruction for more advanced configurations if needed.

Archive index sub-page
The archive index sub-page named Archive index is created with content: Without this code snippet, the indexing won't start. A later section explains what to do if your implementation will not start.

Tweaking
To tweak an auto-archive process there should be a balance between not having a talk page that is too long and keeping the talk page long enough for users to discuss and reply about generally discussed issues:


 * A talk page with fewer than seven threads and less than 25Kb in size is recommended to be configured with a longer archive time.
 * An talk page with more than fifteen threads and larger than 75Kb in size is recommended to be configured with a shorter archive time.

Archive time
The age of threads to archive is the most commonly adjusted setting. A thread will be archived on the next pass of the bot once the days elapsed since last reply ("age") exceed what is set. Try to find suitable ages to set for auto-archiving by verifying how many entries will remain after an archiving pass, and what size of talk page content will remain on the talk page. An optimal archive process leaves between 7 and 15 threads, with a size between 25Kb and 75Kb. Size is more important to follow then thread amount. Very active talk pages (more than 10Kb a day on average) can have a bit more threads and larger size, to keep from interrupting ongoing discussions.

Minimum threads reached
It is good to have some type of archive time notification that shows the current archive time. Because of the MizsaBot parameter, old threads can be left on the talk page. This caused because the archive time is set too low or  to high. In such cases the notification lies to the user. To resolve this, increase the archive time so more threads (more than  is set to) will be kept on the page.

As optimum is to set  and let the elapsed time take care of archiving. A table of contents is automatically created and it gives the impression that discussion hasn't died. below four won't automatically create a table of contents.

Discussion peaks
Also take into consideration special cases such as whether the discussion on the talk page is at a peak when you view it. Check the talk page history to see what the page sizes were during recent archive runs. For peaks, leave the archive time as set, because there is more confusion than improvement from changing the archive setting back and forth.

Unbalanced activity
The threads may be unbalanced in both size and their activity, such that a few threads are much more active and taking up more space than the others. In that case, you shouldn't decrease the archive age setting, because the other threads won't be given time for being visible and the more active threads may not be archived anyway (examine the situation). You should examine each page from the situation of the less active threads and decide on archive settings so that users have time to respond to those less active threads.

A simple way to detect if a talk page is unbalanced is to click on a thread in the middle of the table of contents. When such a thread is clicked, your browser's page scroll bar should also be about in the middle of your screen. If this is not the case, examine the threads in more detail.

MiszaBot parameter minthreadstoarchive
The MiszaBot parameter  defines how big each chunk of archiving should at minimum be. At same this affects how often archiving should be done. Having this value too low as  will fill-up the history  with bot messages about archiving. Filling up the talk page history with many archive messages isn't informative for users following the talk page history. Default value  should not be altered without good reasons. The following values can be used if archiving needs advanced tweaking: Having  more than 3 seems to work well on WikiProject talk pages, where most of the messages only notify users without follow-up replies.
 * , when a talk page has long threads (more then 20Kb each in average)
 * , when a talk page has short threads (less then 3Kb each in average)
 * , when a talk page has very short threads (less then 1Kb each in average)

Follow-up
Database reports/Long pages

Bot follow-up:, , Index log

Category:Archive_requests

AWB: transculations of MiszaBot, pages using archives or archive box

Alternative solutions
The essay concentrate mostly how to implement auto-archiving with one clear approach, were the reader doesn't need to make much decision in what solution to choose. Below are few alternative solutions that aren't covered in this essay:
 * User:ClueBot III, a bot that does auto-archiving, an alternative to MiszaBot