User:Trialpears/ArchiveBOT

This page is used for planning of ArchiveBOT, a new archiving bot I will be working on to make a better and easier to use bot for archiving talk page discussions. It is a part of my overarching plans to improve the ways we handle archiving which you can read more about at User:Trialpears/Archiving manifesto. To achieve this goal I am making a "mocumentation" page where I document what I want the bot to be capable to do. {{documentation|content= This bot takes old discussions on a talk page and moves them to a subpage to keep talk pages of reasonable length while keeping old discussions accesible. This process is called archiving and can be read about on Help:Archiving a talk page.

Quick-Start
To quickly get a well working and complete archiving setup with appropriate auto archiving settings, links to archived discussions and information about the archiving setup you can either use the banner style which may look like below or in box style with  which can be seen to the right on desktop or above on mobile.

How your page is archived
ClueBot III can be used to automatically archive sections/threads from a given page. Usually the page you are archiving from will be a talk page. Your archive pages can be organized by date, or numbered. Combinations of dates and numbering are also possible. This is configured by the arguments used for the format and archiveprefix parameters. Please see the parameter descriptions and examples for a few of the many ways to do this.

How does ClueBot III know when to archive my page?
ClueBot III works based on the revision history of the page in question, and not on the timestamps associated with signatures. This means that the practice of manually adding a timestamp way in the future will not work to prevent ClueBot III from archiving a thread. Furthermore, on one archival sweep, ClueBot III will select an archive target and will archive all eligible threads to that archive. This means that when you initially set up ClueBot III, if you have untidy archives then, it will sweep all of the old conversations into whatever archive it deems to be the current one. It also means that pages with massively long discussion threads will have the entire thread archived into the current archive, which can cause slight overstuffing of size-based archives.

Keeping linked
Once a series of threads is archived, ClueBot III looks for all pages that could have been linked to the moved parts, and updates the links that need to be fixed. When a page is linked to a huge number of other pages, the process takes a lot of time. In any case, this is the reason of collapse described at User talk:Jimbo Wales/Archive_202. Said more clearly: don't use ClueBot III on a page where the what links here are large.

Index generation
In addition to archiving your page, ClueBot III can also generate an index of all your archive pages. Assuming that you are archiving the page Talk:YourPage, the archive index generated by ClueBot III is located at: User:ClueBot III/Master Detailed Indices/Talk:YourPage You can make use of this either directly or by transcluding it onto your Talk:YourPage/Archive index page. One way to use this directly would be to include it as the archive index page in your archive box. For example:

will produce an archive box with an appropriate link to your index.

The talk header template automatically detects the existence of this index and will link it accordingly.

General template format
The general format for the archive template:

Required parameters
age

Type: unsigned integer

Default: 0

Description: This parameter must be set to the number of hours a thread can go without a reply before it should be archived. If you do not set it all threads will qualify to be archived. For 30 days, enter 720, and for one year, enter 8760.

archiveprefix

Type: string

Description: This parameter must be set to a fully qualified page name under the page you wish to archive. For example, if User talk:Cobi was being archived using dated archives, then  would be appropriate. For the same page being archived with numbered archives,  would be appropriate. Not setting this parameter correctly can have some strange results. The variable can not actually be passed to ClueBot III. The fully qualified page name must be hard coded. The easiest way to do this is to use immediate substitution such as:

For archives organized by date enter:

/Archives/

For numbered archives enter (note /Archive instead of /Archives/ ):

/Archive

Warning: system variables such as are replaced by the page name to be archive when the system variable is saved to a page to be archived. However some punctuation characters that can appear in a page name are replaced by HTML character codes that are not recognised by the bot (see mediawikiwiki:Manual:PAGENAMEE encoding for details about these and other characters). For example  is replaced with   (so if the bot is to work then   must be replaced with.

format

Type: string

Default: ""

Description: This parameter must be set to a valid argument to PHP's date function. You may also include an %%i value. %%i is for numbered archives instead of dated archives, and is replaced with the archive number. Once variable substitution has occurred it will be concatenated onto the end of the value of archiveprefix. The result of the concatenation will be the the name of the archive page(s). See examples. Most commonly, for archives organized by date use:

. This gives "Archives/2016/January". In order to obtain "Archives/2016/01 (January)", use.

For numbered archives use:

Note for date-based archives: Cluebot III stores files into a single archive page each time it runs. With date-based archives, the page name is whatever archiveprefix concatenated with format is for the date that is NOW-age (where age is in hours). For example, if you are starting up YYYY/Month archiving from scratch with many old threads in the page being archived, ClueBot II will put all of the threads into a single file, not multiple files. If you want a single archiving run to be split into multiple files based on the last date in each thread you will need to use lowercase sigmabot III. Note:  is better in most cases. can end up as many tiny archives that can take forever to look through manually when not sure what search term to look for with the archive search tool.

Optional parameters
These parameters are shaping the behavior of the archiver.

archivenow

Type: comma delimited array of strings

Default: ""

Description: This optional parameter should contain a comma separated list of strings for which ClueBot III will search within the threads on the page. If any of these strings are found in a thread, the bot will archive the thread immediately. The bot will also convert in this list to  upon archival. This could be useful for pages where resolved or such is used. The User:ClueBot III/ArchiveNow blank template is available for this use. However, it has no special properties. It is only just another string that happens to be a template for which ClueBot III can be told to search. In addition to the following typical usage, an example of its use is shown below.

Typical usage:

header

Type: string

Default: "Archive"

Description: When creating a new archive page, the bot will put this at the top of the new page.

headerlevel

Type: unsigned integer, between 1 and 7 inclusive

Default: 2

Description: This is the header level for the threads the bot will archive. Anything on the page before the first header of this level will not be archived. A level 1 header is, a default thread (level 2) header is  , and the highest level header is a level 7:.

key

Type: string

Default: ""

Description: The value of this parameter must match an internally generated key in order for the archives to be stored anywhere other than as subpages of the page being archived.

maxarchsize

Type: unsigned integer, greater than 10000

Default: 0

Description: The target maximum size of the archive in bytes before %%i (see format) is incremented. If 0, this is disabled. In general, this parameter is used for numbered archives, but not for archives organized by date. This is not a hard limit. Resulting archive page sizes will almost always exceed this number, perhaps by a great amount. Each time ClueBot III runs on a page it archives all threads that are old enough to qualify for archiving into a single file. If you have  with a current archive file size of 90k and it ends up that there are 60 threads to archive with a total size of 250k, then the current archive will be extended to 340k despite the 100k limit.

maxkeepbytes

Type: unsigned integer

Default: 0

Description: If greater than 0, this is the maximum number of thread content bytes to keep on the talk page. Older threads are forcibly archived if there are more than this number of thread content bytes on the page. If 0, this option is disabled.

maxkeepthreads

Type: unsigned integer

Default: 0

Description: If greater than 0, this is the maximum number of threads to keep on the talk page. Older threads are forcibly archived if there are more than this number of threads on the page. If 0, this option is disabled.

minarchthreads

Type: unsigned integer

Default: 0

Description: The bot will not archive unless this many or more sections need archival.

minkeepthreads

Type: unsigned integer

Default: 0

Description: The bot will not archive if there will be this many or less sections left on the page.

nogenerateindex

Type: unsigned integer (boolean)

Default: 0

Description: If this is set to 1, the bot will not generate an index under User:ClueBot III/Indices/. There are very few times this option should be used. If this option is used, the index option will no longer work right.

numberstart

Type: unsigned integer

Default: 1

Description: Default value for %%i in format.

transformheader ''Warning! An invalid option here can screw up your archives!''

Type: string, search===replace pairs delimited by &&&

Default: ""

Description: Convert archived thread headers. For each pair, search must be a valid regular expression and replace is a replacement string. See this for more information, search corresponds to $pattern, replace corresponds to $replacement , and the thread header corresponds to $subject. If you do not understand what this does, don't try to use it. Instead, ask for help from Cobi.

Cosmetic parameters
These parameters are shaping the archive box displayed on the page where archiving is active.

archivebox

Type: string ("yes" or "no")

Default: "no"

Description: Causes an archive box to be displayed similar to archives

The example archive box is with non-default value of.

box-width Type: string

Default: "238px"

Description: The width of the archive box. This parameter is only valid if

box-advert

Type: string ("yes" or "no")

Default: "no"

Description: Displays the string "This page is archived by ClueBot III." at the bottom of the archive box. This parameter is only valid if. The example archive box is with non-default value of. Compare this example to the example next to the archivebox parameter.

box-separator Type: string ("yes" or "no")

Default: "yes"

Description: Display separator lines in the archive box. This parameter is only valid if. The example archive box is with non-default value of. Compare this example to the example next to the archivebox parameter.

image

Type: string

Default: " "

Description: If set this is an alternate image file to use as the archive icon in the archive box. This parameter is only valid if. The example archive box is with non-default value of. Note that the image size must be specified as part of the argument to this parameter, not with the image-width parameter. Compare this example to the example next to the archivebox parameter.

image-width Type: string

Default: "40px"

Description: The width of the image in the archive box. This parameter is only valid if  and you are using the default image.

search Type: string ("yes" or "no")

Default: "yes"

Description: Display a search field in the archive box. This parameter is only valid if .The example archive box is with non-default value of. Compare this example to the example next to the archivebox parameter.

talkcolor /  talkcolour Type: string

Default: ""

Description: This parameter is only valid if  and the page on which the archive box is displayed is in the User talk namespace. If not set, or set to anything other than yes, the color scheme of the archive box on a User talk page will not be the standard talk page color scheme. In such a case, the archive box will use the same color scheme as is used for archive boxes on non-talk pages. If this parameter is set to yes and the page on which the archive box is displayed is in the User talk namespace, the archive box will use the standard talk page color scheme. The two parameters are equivalent, and are only provided to avoid an attrition war in the MOS-style.

index

Type: string

Default: ""

Description: All values of this parameter are equivalent except yes. This parameter should not be set to yes unless you have wrapped the ClueBot III template with archives. Using  results in the ClueBot III automatically generated index contained on the page User:ClueBot III/Indices/Talk:YourPage being transcluded onto Talk:YourPage in place of the User:ClueBot III/ArchiveThis template on Talk:YourPage.

This parameter is only valid if. If  this parameter has no effect on the index being included in the archive box and does not cause a copy of User:ClueBot III/Indices/Talk:YourPage to be transcluded onto Talk:YourPage.

like in

Examples
The following examples can be cut and pasted into the top portion of the page you desire to archive. {{subst:FULLPAGENAME}} will be substituted by the name of the page you are editing when you save the page.

The text explaining the following examples assumes that {{subst:FULLPAGENAME}} evaluates to Talk:YourPage. In other words, that Talk:YourPage is the page you are archiving.

The age in all of these examples is set to 2160 hours, which is 90 days. If you want a different amount of time, then change the age argument.

Example: Archives by date (without archive box)
The archive subpages produced by this example will be in the format of:

Talk:YourPage/Archives/2013/June

Talk:YourPage/Archives/2013/July

...

Example: Archives by date (with archive box)
The subpages created, as needed, for your archives by this example will be named similar to:

Talk:YourPage/Archives/2013/June

Talk:YourPage/Archives/2013/July

...

Example: Numbered archives (without archive box)
The subpages created, as needed, for your archives by this example will be named similar to:

Talk:YourPage/Archive 1

Talk:YourPage/Archive 2

...

Example: Numbered archives (with archive box)
The subpages created, as needed, for your archives by this example will be named similar to:

Talk:YourPage/Archive 1

Talk:YourPage/Archive 2

...

Example: By year and numbered (with archive box)
{{  User:ClueBot III/ArchiveThis }} The subpages created, as needed, for your archives by this example will be named similar to:
 * archiveprefix={{subst:FULLPAGENAME}}/Archives/
 * format=Y %%i
 * age=2160
 * minarchthreads=0
 * minkeepthreads=0
 * archivenow= {{User:ClueBot III/ArchiveNow}},{{resolved|,{{Resolved|,{{done}},{{Done}}
 * header= {{Automatic archive navigator}}
 * headerlevel=2
 * nogenerateindex=0
 * maxkeepthreads=0
 * maxkeepbytes=0
 * maxarchsize=150000
 * numberstart=1
 * archivebox=yes
 * box-advert=yes

Talk:YourPage/Archives/2012 1

Talk:YourPage/Archives/2012 2

...

Talk:YourPage/Archives/2013 1

Talk:YourPage/Archives/2013 2

...

Example: Changing from MiszaBot to ClueBot III
The subpages created, as needed, for your archives by this example will be named similar to:

Talk:YourPage/Archive 21

Talk:YourPage/Archive 22

...

The numberstart is one more than the last used by MiszaBot because the first time ClueBot III is run it may be archiving a large number of sections. Increasing this by one will prevent the current archive from being appended. In the case from which this example is taken a 90k archive page was appended by an additional 100k leaving a 190k page for which the max was supposed to be 100k. This has been reported as a bug.

Age is in hours, not days. 90 days is 2,160 hours.

If you are using an Archives template, or other template that shows which archiving bot you are using, don't forget to change:

to:

}}