Wikipedia:Bots/Requests for approval/Rootology Bot 2


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol delete vote.svg Denied.

Rootology Bot 2
Operator: rootology ( C )( T )

Automatic or Manually Assisted: Automatic

Programming Language(s): AWB

Function Summary: Add to any main space articles under Category:Living people that lacks them currently.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous.

Already has a bot flag (Y/N): N

Function Details: I will generate a list of article for BLP subjects in Category:Living people, and if their talk pages lack the appropriate WPBiography tags for living=yes, I'll add them with AWB.

Discussion
Wouldn't it be better to add other appropriate WikiProject tags at the same time? And perhaps also define a class and an importance level. Having multiple edits when one would suffice seems wasteful. --MZMcBride (talk) 04:07, 16 October 2008 (UTC)
 * I talked with MZMcBride a bit about his concerns, and I agree with him. It would (in the long term end) lead to more edits. The primary goal I was after was getting them all tagged for reasons essentially. I did a tiny sample after I figured out the method I wanted in AWB with my main account, and it seemed to be running around 10% needing Biography project/BLP tagging. This would get the articles tagged for BLP, and into Category:Unassessed biography articles, which has a bit of a backlog. We discussed that the real evaluations would require a human touch, and they would. You'd need a fairly complex bot to begin to assess them on the scale by a variety of factors (inbound links, length, etc., if you could even get it right).
 * My thinking in the end was that the benefit of tagging up all the unmarked BLPs would help to facilitate the process, in the end, of getting them under the Wikiproject, marked as BLPs, and in the pipeline. Sort of like kitchen prep work for the next generation smarter bot yet to be written or the person that can go through and evaluate them all, in the end. My ultimate thought was that the cost/benefit of some extra edits down the road are balanced against the importance of BLP--a tag won't stop vandalism or bad edits, but it's every little deterrent that helps to make Wikipedia better for BLP subjects that helps in the end. If that leads to another couple edits days or years down the line, I think it's worth it... in the end. I hope this makes sense. rootology ( C )( T ) 04:55, 16 October 2008 (UTC)

There are limits to the extent a bot can assess articles. If they have a stub template, it could presumably class them as a stub. It could also copy any assessment made by other projects, though I gather some projects differ in their approach to classing articles. A bot certainly can't determine importance though and importance can clearly vary depending on the point of view an article is being looked at from. Adding articles to "unclassed" categories increases the change someone will come along and class them - which in cases of BLPs means extra scrutiny which should be a good thing. Whilst I agree that as a general rule project tagging is better done by humans than bots, this task seems beneficial. I would also note that there are a number of bots that have been approved in the past for general Wikiproject tagging of articles. WJBscribe (talk) 17:58, 16 October 2008 (UTC)

Why can't the tags just be added if/when the article gets assessed? That seems like it would make more sense. A backlog of 61,000 unassessed articles doesn't seem like its going away any time soon. I really don't see the tag, on the talk page as a deterrent at all to bad edits. Most tagging runs do a few dozen to a couple thousand articles. Based on category counts, this would do ~58,000; nearly doubling the size of the WPBIO unassessed backlog, and increasing the total scope of the project by more than 10%. Has there been any discussion in the project about doing this? It seems like it would be better to just generate lists of articles needing tagging, iff the project actually wants it. Mr.Z-man 18:07, 16 October 2008 (UTC)
 * I've asked the Wikiproject to weigh in here as you suggested. Technically, each and every one of these pages already belong to the Wikiproject Biography already, and to BLP coverage. The backlog is huge, yes, but having them in is better than having them out, for the benefit of centralizing all of it, and for the (light, possibly) security benefit of the BLP articles themselves of carrying the tag they'll pick up via this project tagging. The if/when may or may not ever happen, or there could be a project organized to go through and clean up the BLP articles, which could result in anything--there's no way to know, but the tags are beneficial in those two ways, for the hopefully light to moderate bit of deterrence BLP tags offer as WJBscribe mentions, and for getting them shunted into the proper work queue like I wrote in my reply to MZMcbride above. I think the light cost of having a few extra bot edits floating through doesn't outweigh the benefit here. There are already bots that do similar approved tasks already -- User:Yobot's task #9 comes to mind. rootology  ( C )( T ) 23:28, 16 October 2008 (UTC)
 * First, what's the benefit of the tag vis a vis BLP security? The pages are already in a category that defines them as a BLP. So there's no benefit for tracking purposes. I find it hard to believe that there are users who think to themselves "I'm going to insert into this article" but then change their mind as a result of something on the talk page.
 * Second, lists of articles would seem to have the same effect as a category, but with only 0.2% as many edits (with 500 articles per list) and has the extra benefit of being able to organize the list in something other than alphabetical order that could possibly help people work faster (such as ordering by occupation so people more experienced with scientists can work on scientists etc.). Or the lists could be further subdivided so people can easily take a chunk of the list to work on.
 * Third, 58,000 edits is not "a few" (its more than 4 times as many edits as Yobot made for task 9), especially when done with an inefficient program like AWB. Mr.Z-man 00:27, 17 October 2008 (UTC)


 * If you are going to look for biographical articles that don't have WPBiography on their talk pages, no need to limit yourself to living people. You could check all the articles under Category:Births by year and Category:Deaths by year, and putting "living=no" if they have a death year, and leaving those without a death year and no "living people" tag for humans to check. This won't, of course, pick up biographical articles that lack the "living person" tag and lack birth and death year categories, but it would be a bit more comprehensive than your proposal. Which I'm now going to read in case there is a reason you are limiting yourself to those in category "living people". For what it is worth, I think User:Kingboyk and his bot did a lot of the WPBiography tagging in the past. Why not look at how his bot did this? see User:Kingbotk and the associated pages and FAQ. He's not very active, but did make a few edits a few days ago, so you could even drop him a line. Carcharoth (talk) 01:48, 17 October 2008 (UTC)
 * Interesting... it looks like he used AWB as well, with this plugin of his: User:Kingbotk/Plugin. rootology ( C )( T ) 01:52, 17 October 2008 (UTC)
 * I must admit I'm mildly surprised that no-one else here seems to have known about or remembered Kingbotk. Though I had forgotten his plug-in, so I can't really claim perfect memory. I think some comments about institutional memory and reinventing the wheel might also be appropriate. :-) Carcharoth (talk) 01:55, 17 October 2008 (UTC)
 * I'd been thinking if this went ahead after picking off the low-lying "easy" fruit of flagrant pages that just had no tags (like I did with my tiny sample here to get a vague idea of what percentage would be touched up) that I'd be regenerating lists for various re-tweaked edits on different batches in a much harder format. His plugin would make this incredibly easy to pick up where he left off, from looking at it so far. rootology ( C )( T ) 02:01, 17 October 2008 (UTC)


 * A quick comment about the "61,000" backlog. Some people here seem to think that this is some massive backlog that has built up. In fact, it is the opposite. It is the remnant of a massively reduced backlog. See WikiProject Biography/Assessment and have a look at the total number of assessed articles: 483,046. That is a massive amount that have been assessed. For the history, lok at Version 1.0 Editorial Team/Biography articles by quality statistics. Compare March 2007 (109,434 assessed, 120,809 unassessed, total of 230,243) with October 2008 (483,046 assessed, 61,767 unassessed, total of 544,813). Those stats alone are fairly eye-opening. It should be a well-known stat now that biographical articles make up between 1 in 4 and 1 in 5 of Wikipedia's articles. Take into account the rate of article creation, and you can estimate how many new biographical articles are being created every day. From March 2007 to October 2008, the number of tagged biographical articles more than doubled (some of that is due to article creation, and some is due to tagging untagged articles). The increase in the number of assessed articles from around 109,000 to around 483,000 is mainly due to a combination of automated assessments as stubs (based on a stub template on the article) and several assessment drives (people dispute how helpful these drives were in actually assessing articles, but at a minimum, those drives carried out a rough sweep to gauge some level of assessment). See also WikiProject Biography/Assessment and WikiProject Biography/Assessment. For details of those assessment drives, see WikiProject Biography/Assessment/Assessment Drive. You can see that WikiProject Biography/Assessment/Assessment Drive/Spring 2008 reduced things by about 20,000. But it seems things have crept up again. Anyway, that is the background, mainly so that people know what sort of numbers are involved, and how big (or small) a backlog 61,000 is in this context. Carcharoth (talk) 01:48, 17 October 2008 (UTC)


 * Relevant category deletion discussion is here. Carcharoth (talk) 07:22, 21 October 2008 (UTC)
 * It was closed with a resounding keep and endorsement of the idea. rootology ( C )( T ) 17:14, 25 October 2008 (UTC)

to test plugin with an eye toward approving for 61K tagging run.  MBisanz  talk 07:51, 8 November 2008 (UTC)
 * Actually, I was already into my trial here and done with 50 before I saw Z-Man's reply. I'd be happy to try to learn the more advanced bot scripting, and Cacharoth layed out the advantages. I'll admit that I'm confused why one bot would (KingBot) would be fine running this task, but another (me) would not be...? rootology ( C )( T ) 16:51, 8 November 2008 (UTC)


 * I guess my main issue is, you've got 2 lists of articles, the current unassessed category and the list of pages to run the bot on. The proposal is to use 60,000 edits to merge the 2 lists, so that 60,000 more edits can be made to make the list actually useful to people other than the people making the second 60,000 edits. That just seems really ... inefficient. Much of that is just the inefficiencies in our category and WikiProject tagging systems. But this is how I see it: Imagine if the list of pages to tag was already published on Wikipedia, so we've got 2 big backlogs, the list and the category. Instead of leaving it as 2 backlogs, we're going to use 60,000 edits to ... merge them into 1 really, really big backlog? Looking at it like that, it seems almost nonsensical. Mr.Z-man 22:35, 8 November 2008 (UTC)
 * On the surface, I'd agree, and I considered scrubbing this nomination, since in hindsight it did look possibly borderline to even me. But that was before Carcharoth detailed how easily they went through and obliterated the previous easily as large backlogs that existed with their last BLP assessment/tagging drive. There are even more people now to do that, so that the next one will be more efficient. The problem is that there is no other "easy" way to generate that backlog for them (and do the BLP aspect of the tagging, for the degree of coverage that offers our BLPs). Yes, it's possibly a lot of edits, but in the grand scheme of things, not that many. Within a year it won't even be a notable statistical blip on the radar--it would just be another project done, wrapped, signed sealed and delivered with the next WP:Biography drive. :) As for the generation of the lists of things I'd be going through in the background for tagging, I can get what I'd need with a couple (maybe one?) decent MySQL query that I can get ran for me since I don't have access. I can take that list of raw pages and reformat them trivially with a few passes of sed/awk/and a for loop or two. Like I did for this thing, which I began working on also before my break. From the standpoint of WP-system resources, we're in the end talking about a big DB query and the actual tagging by me. The rest is my own time, energy, and local cycles. rootology ( C )( T ) 23:34, 8 November 2008 (UTC)


 * Before that's done, or at least anything more than that is done, though, I'd really like to see some responses to my concerns (00:27, 17 October 2008 comment). I still think it would be much better, from a technical standpoint and an organizational standpoint, to use a real bot framework to generate lists of articles. The benefit from the current proposal seem to be imaginary to minimal. Only Carcharoth actually responded to the inquiry on the project talk page, so we still have no idea if the project really wants this (and the lack of response really makes me question whether they'll ever be able to reduce this backlog, if they don't even respond to things like this on the main project talk page). Mr.Z-man 16:27, 8 November 2008 (UTC)
 * Too bad I didn't see this one before the trial. WPBIO uses priority instead of importance. (The importance param does nothing)  §hep   •   ¡Talk to me!  04:54, 30 November 2008 (UTC)
 * Hi, thanks for the heads up--I could change the settings in AWB for that easily, to correct it. rootology ( C )( T ) 05:38, 4 December 2008 (UTC)
 * Just wondering if you're using the KingBotK Biography tagging plugin or your own custom module? §hep   •   ¡Talk to me!  05:45, 4 December 2008 (UTC)
 * You know, I think I just realized that my test run there ran with my own basic regex, not a plugin, set to do just the basic generation, same as the KingBot plugin. Should I rerun the 50 with the plugin on? rootology ( C )( T ) 06:31, 4 December 2008 (UTC)
 * I'd assume so, since that's what the trial was for, but we all know what happens when I assume something. But please, don't do anything off my word...or else I'd get in trouble. §hep   •   ¡Talk to me!  06:42, 4 December 2008 (UTC)
 * Using the plugin is much better. The plugin also can add living=yes to an existing template among other things. How are yu going to load the Category:Living people in AWB? The category has far more articles than the supports number of AWB + working with so many articles altogether may causes problems to the server. -- Magioladitis (talk) 00:10, 13 December 2008 (UTC)
 * For the overall volume, to accomodate AWB and to avoid problems, I was going to chunk it up into sections, and have someone pull me the content in a DB query like I did for images at User:Rootology/Images, which I've been working on, on and off. Once the group got stale after so many weeks or months or I completed, I'd ask for a fresh query, which would be by then a much more reasonably sized batch. It would get progressively easier to maintain over time. rootology ( C )( T ) 02:59, 13 December 2008 (UTC)

Status of this bot? Trial finished a month ago. -- Magioladitis (talk) 01:30, 30 December 2008 (UTC)
 * It wasn't tagged. BJ Talk 01:38, 30 December 2008 (UTC)
 * Echoing Mr. Z-Man's comments, I can't see the usefulness of making thousands of edits just so that someone else can come along and assess the articles. The relevant categories can be used to tag the articles and assess at the same time. Furthermore, using DB queries to split the work doesn't strike me as very efficient. If this were to be done, it would be better to have the bot process articles in batches by getting the first 1000 articles, then the next, etc. until it reaches the end, which AWB cannot do.  Richard 0612  17:47, 24 January 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.