Wikipedia:Bots/Requests for approval/ListGenBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

ListGenBot
Operator: Mortice

Automatic or Manually Assisted: Automatic

Programming Language(s): Python

Function Summary: Generation of lists on pages based on marked sections on other pages

Edit period(s) (e.g. Continuous, daily, one time run): Once a minute

Edit rate requested: 6 edits per minute

Already has a bot flag (Y/N): No

Function Details: This bot is driven by the presence of specific ListGenBot templates on pages. Those templates add the page to a category, so the bot will only examine pages in those categories. Some pages will use the templates to identify 'source data', other pages will use templates to identify 'destination' - the bot will copy all the source data from all the appropriate pages to the destination page.

An example of typical use is for pages relating to TV episodes, where each episode lists guest stars. This bot could be used to automatically generate a page listing all the guest stars, dynamically maintained from the episode pages, so avoiding duplication of effort and mismatching between the two sources of guest star information.

The use of templates and categories means that the bot's influence is absolutely limited to pages that are members of the bot's categories - it does not need to trawl every page looking for markup, so will be much less server impact than a trawling bot.

The initial purlpose relates to The Simpsons for generating pages such as which songs are used in each episode.

Full details can be seen at User:Mortice/ListGenBot.

Discussion
This is an interesting idea that on the surface doesn't appear to have problems. Still, I'm not sure that adding the templates to the source articles purely to facilitate list generation is the right way to go. Adding " " and "  " to each article seems like it complicates the articles too much. Will it scare off users who are not accustomed to this? I'm worried mainly here about feature creep. If the list could be generated without these templates, I'd be happier. Still in the interest of giving this idea a chance, could you produce one full example of what this bot will do and post the diffs here? -- RM 13:19, 29 November 2006 (UTC)


 * Thanks for the feedback - yes you're right about it being not obvious for newcomers. I would expect anyone adding the templates to also add HTML comments along the lines of 'add new list items within these templates', but I appreciate that's no guarantee of avoiding problems.


 * One desire for the design would be to make it as general as possible so I didn't want the bot to have page or section names hardcoded into the bot code.


 * Perhaps an alternative would be for the bot to take all the text that's in a section, then it would only be necessary to have a single " " entry in the section, but runs the risk that someone would add a para in the section but not intended for the list (see in the example the phrase "This is a list of my favourite colours"). But I think that would be worth experimenting with.


 * For an example fo the source and generated text, see this example and let me know if you'd like to see a more detailed one, or an example of the alternative suggested above that would copy the whole section --Mortice 13:38, 29 November 2006 (UTC)


 * I'm noticing from the example that at minimum you have to be able to specify which pages go together. How do you know to group "MyHatedColors" and "MyFavoritColors" pages together?  Also, how do you know which list format to use?  You have two formats along with headings such as "Colours I have an opinion about include:" and "My opinions on colours".  Where does that text come from?  It would seem to me that you already have to do some hardcoding or a dynamic interface to allow you to control this.  Perhaps I'm misunderstanding this.  If you already have to do such manual control, why not use hardcoding on the articles themselves.  Sure it's a pain, but it would work.  I know with my bot I chose to go the route of automatically updating the text, but if it didn't match a format I know force me to update the bot or to perform the edits manually.  It was more work, but there was no clutter in the articles. -- RM 13:50, 29 November 2006 (UTC)


 * In my example, the 'start' templates all have a first argument of 'MyColours' which is the list name and connects the source and destinations together. The same templates with a different list name would work independent of these.


 * The two formats in my example come from the fact that they are generated by two different templates - 'ListGenBot-ListAlphabeticalStart' and 'ListGenBot-ListSectionedStart'. The text "My opinions on colours" is outside the 'start' template, so that remains on the list page and is unaffected by the bot (I figure you'd never want a page that was purely a list, you'd always want some text on the page as well). I beleive this all should be do-able with no hardwiring of anything in the bot, except the wikipedia language (and that could be adapted)


 * My thought (perhaps I'm optomistic) is for this to be come a generally used feature, that anyone with this requirement would be able to use the bot to generate lists on pages without having to get me to edit the bot source. Perhaps there's another way of configuring a 'driver file' to tell the bot what to do - categories are a fine way of alerting the bot to pages, but they're not so approriate for sections within the page. Any thoughts on a compromise between the requirements for 'generic feature' and for 'clean page'? --Mortice 14:24, 29 November 2006 (UTC)


 * I'd have to give this some more thought, nevertheless, I don't think that my desire for a clean page is necessarily sufficient reason to block this bot. It seems useful enough, and a couple lines in the article are not that big of a deal.  Still, I'd like to get other opinions on this if possible, that is, if anyone else cares to comment on this bot.

Due to the nature of this bot, I'm going to authorize a small trial. You may perform this bot function for up to 5 different generated lists. Try to limit the number of articles to which the lists are based to no more than 8 source articles each and preferably much less than that. Please post your results here when you are finished. If you'd like, you can wait to see if there are more comments regarding this bot from other members of the community (which may or may not happen), that way if there turn out to be lots of objections you won't waste your effort. It's entirely up to you. -- RM 15:16, 29 November 2006 (UTC)


 * Thanks for the trial approval. I'm having second thoughts about using categories to identify pages with markup (particuarly the source pages), because it necessarily requires that an entry is added to the 'categories' list at the bottom of the article, so is more visible than I'd want. Are there better ways to manage this, short of either defining a page that lists all the pages the bot should manage, or processing every page ever updated to look for markup? --Mortice 15:32, 29 November 2006 (UTC)


 * There is a better idea: Use the "What links here" option from the template. That will give you a list of articles that are transcluded using the template.  Your template already uniquely identifies its purpose, so perhaps that is sufficient? -- RM 15:49, 29 November 2006 (UTC)


 * Of course! Yes you're quite right that's a very logical way to find what pages are using the markup templates. I'll investigate... --Mortice 16:02, 29 November 2006 (UTC)

UPDATE - I've been occasionally running the bot code (I've not yet running it continuously) and (after discussion at Wikipedia talk:WikiProject_The_Simpsons) added the bot source tags to the pages for the first 5 episodes of the 18th season of The Simpsons (starting with The Mook, the Chef, the Wife and Her Homer).

ListGenBot is populating the lists on WikiProject The Simpsons/Example generated lists which is currently a test dumping ground for all the potential lists. I've invited page editors (pages such as List of guest stars on The Simpsons) to add the lists (still dynamically updated) to those pages, however they see need.

In addition, I have numerous odd pages for testing the list generation. I've not seen any problems caused by the bot so far - the only pages that it updates are ones with lists on (which will never be very many) and the python code it uses for reading pages has throttling which seems to only allow it to read a page every 10 seconds or so. When the code is tidier and commented, I'd like to get some comments from a wiki python bot expert to ensure I'm using wiki in the most efficient way.

It would I think be desirable but not essential if the bot had the 'bot flag' - to not have the flag would mean bot edits would be intermingled with other edits in the history. See the history for WikiProject The Simpsons/Example generated lists to see what that would look like - one edit each time the bot spots the source data for a list has changed.

So I'd like approval for 'continuous' running of this bot (which will be subject to throttling) --Mortice 20:53, 9 December 2006 (UTC)

If I used these articles a lot, I'd hate all of the tags cluttering things up. Nevertheless, I've read the comments by those who also manage those pages, and everything seems to be quite positive. So my personal opinions aside, there is no real reason to prevent this. I'll approve this bot with a bot flag and a maximum rate of 6 edits per minute. -- RM 23:39, 11 December 2006 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.