User:ListGenBot/Details

This is a specification for a bot that I intend to write. The initial inspiration was to better generate List of songs featured in The Simpsons but I think it would be appropriate to all sorts of list pages.

I'd like any feedback on whether this would be a generally useful bot to have around, and if there's any tweaks that could make it more appropriate to your particular requirement --Mortice 21:26, 28 November 2006 (UTC)

Introduction
ListGenBot is a Wikipedia bot which generates lists from page entries. If a series of pages have sections with appropriate markup, another page will be generated with a copy of each of those sections.

A typical use would be if you have a page corrresponding to each episode of a TV show. If there is a section that's common to each episode page (with appropriate bot markup), this bot will populate a sectoin of another page with all of those sections.

Markup
The bot is driven by templates which have no external appearance but contain markup which is read by ListGenBot and has category entries used by the bot to identify which pages have bot code on them.

If a section of text on a page is desired to be added to a list, the section should start with:

 

and end with:

 

Where listname is a name of a list - any name can be used as long as it is common to all entries that are required to be in the list.

To generate a list of these sections of text, where all lines in all sections are taken together and sorted alphabetically with the original page name added in brackets after it, put this onto the page that should contain the list:

 

 

When the bot has processed the page, these templates will have the appropriate text inserted between them (see example below). The template entries on the page must be maintained in order to tell the bot where to put updates.

To generate a list of these sections of text, where the list contains a level 3 heading named with the name of the original page followed by the text from that page:

 

 

Example
Page MyFavouriteColours contains:

This is a list of my favourite colours: * Red * Green

Page MyHatedColours contains:

This is a list of the colours I hate: * Blue * Yellow

Then a page containing:

Colours I have an opinion about include:

would, once processed by ListGenBot, become:

Colours I have an opinion about include: * Blue (MyHatedColours) * Green (MyFavouriteColours) * Red (MyFavouriteColours) * Yellow (MyHatedColours)

Then a page containing:

==My opinions on colours==

would, once processed by ListGenBot, become:

==My opinions on colours== === (MyFavouriteColours) === * Red * Green === (MyHatedColours) === * Blue * Yellow

Other information

 * The list data is generated dynamically - if you update the source data, the generated lists will be updated 'shortly' afterwards (depending on how often the bot runs and how long the run takes)
 * You can declare multiple text areas for multiple lists on a page, and have the bot generate multiple lists on a page, in any combination as many as you need
 * You can specify multiple list names in a source template line, such as which will add the source to all three lists. This could be useful for instance to add TV series episode data to a list for the series and a list for the season of the series
 * If ListGenBot detects a problem such as mismatched tags or a template that names a list for which there is no source data, it will ignore the tags without warning, so you should ensure the lists are generated as you expect them when you first set up tags
 * ListGenBot will attempt to spot and report on edits that users may have made to the generated list - see the next section for details
 * When the bot generates text, it will prefix it with a comment recording the time when the text was updated

Spotting unexpected updates
The list generated by ListGenBot will appear to be standard text on the page, and it will be no surprise if users ignore warnings and make edits to that list. Whenever there's a change to the source data, ListGenBot will replace the generated list on the page with the new generated list, which would overwrite any manual changes made by users - to try to address this, ListGenBot will attempt to spot this happening and report it on the talk page.

Note, this has nothing to do with changes to the source list data - if a user makes a change to the data that ListGenBot uses to generate a list, that update will be used to generate an updated list just as you'd expect.

Let's take an example of a user editing the generated list data. Data page 'S' is the source for generated list page 'G'. S contains 'A B C' and so G will contain 'A B C' as well. (For the purpose of this example 'A' represents a whole line, so S and G have 3 lines each.)

A user changes S to 'A B C D' and ListGenBot changes G to 'A B C D' as well. That's normal behaviour, there's no warning on the talk page, etc.

But then another user changes G to 'A B X C D'. ListGenBot generates the list from the source page (in memory for the moment) which is 'A B C D' and spots that the generated page G has an extra X in it. It writes 'A B C D' to the page (overwriting X) but adds an entry to the talk page reporting that X has been overwritten.

It's not very 'clever' - if a line is edited on G, ListGenBot can't tell it was an edit and assumes it's a new line (since it doesn't match any of the source lines in S) so reports it to the talk page in the same way.

Implementation details

 * I will publish the source for the bot, for others to comment on (and to implement in the unlikely event that my system has a long-term failure)
 * The use of templates and the wiki 'what links here' feature means that the bot's influence is absolutely limited to pages that contain the bot's templates - it does not need to trawl every page looking for markup, so will be much less server impact than a trawling bot

More information
If you have any questions, leave a message on my talk page.

If you would like to see any enhancements, check on User:ListGenBot/Enhancements and if necessary add to the talk page there.