Wikipedia:Bots/Requests for approval/Tom's Tagging Bot


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved

Tom's Tagging Bot
Operator:

Time filed: 09:15, Wednesday August 22, 2012 (UTC)

Automatic, Supervised, or Manual: automatic

Programming language(s): Python (using mwclient)

Source code available: Not yet, but I'm happy to do so within the next few weeks.

Function overview: Bulk creation of talk pages with WikiProject banners

Links to relevant discussions (where appropriate):

Edit period(s): Continuous

Estimated number of pages affected: 20,000+ in the course of a few months. I have just under 4,000 tasks sitting in the queue at the moment already.

Exclusion compliant (Yes/No): No

Already has a bot flag (Yes/No): No

Function details: There are currently 250,000+ article pages on English Wikipedia that do not have a talk page and are thus not part of a WikiProject. I have been manually tagging hundreds of these from my personal account. Tom's Tagging Bot should be able to make a severe dent in this and help bring to the attention of WikiProjects pages that they didn't even know where part of their remit.

The bot consists of a simple message queue, namely Beanstalk'd. I have 'reader' processes that locate tasks that need to be done by scanning through categories and inspecting pages to see if they don't have talk pages. When it finds pages that don't have talk pages, it adds a task to the queue, which is basically a mapping of page name to WikiProjects. When I set the reader tasks going, I specify what the WikiProjects are based on context. So, for instance, Beanstalk'd has a binlog, so if the power goes out, I can restore the queue to a recent version. The 'reader' process checks to make sure the page isn't a redirect.

The page creator process simply pops jobs off the queue and handles them. Both the page creator process and the reader process use mwclient for Python which automatically handles replag. The page creator process checks immediately before page creation to ensure that the page hasn't been created already. I'm planning to add a log of these cases for later human review too.

A very simple example of this might be: the reader process might iterate through every page on Category:History of Roman Catholicism, finding any that do not have a talk page. It would then add a job to the queue with the page name and the tag. The writer process would then pick the job up and create the talk page with the banner from WikiProject Catholicism.

The code currently isn't exclusion compliant. I see no need to add exclusion compliance to bot code for page creation since pages which do not exist are highly unlikely to contain a bot exclusion template. If the BAG feels it necessary, I am happy to check to ensure that the main article page does not have an exclusion template before creating the talk page.

Discussion
Name fails WP:BOTACC.  Rcsprinter  (post)  @ 10:31, 22 August 2012 (UTC)
 * I can set up a new account that's less fabulous and more literal... —Tom Morris (talk) 11:02, 22 August 2012 (UTC)
 * ✅ —Tom Morris (talk) 15:07, 22 August 2012 (UTC)

Sounds like a great idea. How are the categories mapped to WikiProjects? Kaldari (talk) 04:30, 23 August 2012 (UTC)
 * I look at the category in question and take a rough stab at it. —Tom Morris (talk) 06:52, 23 August 2012 (UTC)
 * I would like you to add tags to all articles in category:algae and all its subcategories. Can you also make them class:stub if the article is in a stub category? And, could you then provide a list of articles you tagged so I can rate their importance? I thought I saw a bot already that did this, but could not find it again. Eau (talk) 12:13, 23 August 2012 (UTC)
 * Yep, although the first iteration of the bot won't handle subcategories. I may write a function that recursively walks the category tree and does so. I handle the stub thing manually. In fact, I can specify any WikiProject banner text I want when setting the jobs going and add custom functionality for specific types of article when necessary. —Tom Morris (talk) 17:51, 23 August 2012 (UTC)
 * Following subcategories is dangerous. I would advise against it unless it is manually supervised. Kaldari (talk) 03:24, 24 August 2012 (UTC)
 * I would be glad to check the subcategories before you template them.Eau (talk) 03:51, 24 August 2012 (UTC)

Well, with the caveat that you notify Wikiprojects before doing a run when the bot is approved, I don't see any reason to not go forward with trial with this. So since I'm part of WP:PHYS, and that we do stuff like this all the time, I'm going to approve the bot for a run on Category:Physicists and it's sub-categories. Headbomb {talk / contribs / physics / books} 17:49, 25 August 2012 (UTC)
 * Tag with
 * Inherit assessments from other projects
 * If no assessments from other projects, add |class=stub if a stub template is found in the article


 * Not all articles in that category are biographies. We couldn't have done a trial run with an algae category? Eau (talk) 18:29, 25 August 2012 (UTC)


 * ✅ The bot has run through Physicists and made 50 edits. —Tom Morris (talk) 20:53, 25 August 2012 (UTC)
 * "wikiprojects" isn't a very good edit summary. A better one would be something like: "Bot: Taggging with ". Or something to that effect. LegoKontribsTalkM 21:17, 25 August 2012 (UTC)
 * Good point. Will write a function to do just that. —Tom Morris (talk) 22:23, 25 August 2012 (UTC)
 * Hmmm.... right, you're only tagging new pages, so no need for inheriting class from other banners. However, it would be good if you could assess articles as stubs based on length, much like AWB does. An article with 300 readable prose characters (i.e. excluding infoboxes, navboxes references, categories, etc..., see for a neat script that calculates this) can safely be assessed as a stub, but that length could be double/tripled/etc. by project upon request. Headbomb {talk / contribs / physics / books} 09:20, 26 August 2012 (UTC)
 * That seems reasonable. Added a prose size calculator to the todo list. —Tom Morris (talk) 10:01, 26 August 2012 (UTC)
 * Any update on these two extras? The prose size calculator I'm not overly concerned about - it's a nice extra to have but not essential. The edit summary does need to be improved however. - Kingpin13 (talk) 15:13, 26 September 2012 (UTC)
 * See above - Kingpin13 (talk) 20:25, 2 October 2012 (UTC)
 * I haven't gotten around to adding the prose size calculator. I was going to rewrite the JavaScript as Python, but the code is indiscernibly unreadable (or I suck at reading JavaScript). I can fix up the edit summary super-duper quickly. —Tom Morris (talk) 21:45, 2 October 2012 (UTC)
 * ✅ I've taken the liberty of running a test: you can see the formatting here. —Tom Morris (talk) 17:38, 3 October 2012 (UTC)
 * Very little chance of anything going wrong here, since the bot only edits previously red-linkd talk pages. Wikiproject tagging is a common task for bots, so no worries there. Please do add bot to the bot's user page. - Kingpin13 (talk) 17:43, 3 October 2012 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.