Wikipedia:Bots/Requests for approval/BG19bot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol oppose vote.svg Withdrawn by operator.

BG19bot 3
Operator:

Time filed: 10:29, Monday February 27, 2012 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): AWB

Source code available: Yes

Function overview: Updating Persondata templates for sports people only

Links to relevant discussions (where appropriate): Village_Pump_(proposals) and Bots/Requests for approval/RscprinterBot 3

Edit period(s): One time run. An occasional run maybe done at a future date if warranted.

Estimated number of pages affected: 150,000

Exclusion compliant (Y/N): Yes

Already has a bot flag (Y/N): Yes

Function details: Grab a list of people from a certain category... category:footballers or category:cricketers. Bot will parse thru each article. If persondata's short description parameter is empty, the appropriate value for the parameter will be added.

Discussion

 * How does the bot determine what description to use? Do you simply add the value from the currently parsed category (at least it looks that way from the source)? What if there are several persondata templates? What about more important descriptions that the bot wouldn't recognize (like a king was also a footballer, but only "footballer" got listed)? Are we OK with partial data? — HELL KNOWZ  ▎TALK 10:22, 28 February 2012 (UTC)
 * Yes, the value used comes from the parsed category.
 * Only one persondata template per article is the rule. If the article is about two or more people, then the persondata template does not get applied.
 * I agree that there will be cases where football isn't the only description that should be applied. For example, the Prince of Monaco was also an Olympic bobsledder.  Sports is probably the only big category that can be done by a bot.  The vast majority of sports people do not do multiple sports or become notable for multiple things.  Plus, sports can be summarized easily.   There are exceptions of course.  Danny Ainge played baseball with the Toronto Bluejays, was an NBA basketball player and is the general manager of the Boston Celtics.  However, I think the error rate will be well under 1%.
 * Are we OK with partial data? There are 150,000 articles missing the description.  I don't see them getting done anytime in the future.  Personally, I'm ok with it for sportspeople only.  Go beyond sports and the error rate goes beyond 1%. Bgwhite (talk) 21:14, 28 February 2012 (UTC)
 * Just from Rscprinter123's run, I saw a case where there were 2 templates on the same article. My question was: what does the bot do? Your source code doesn't skip cases like these and would change both templates.
 * Personally, I wouldn't approve a bot that makes incomplete edits while "knowing" clarification is possible. Also I doubt the rate is 1% unless you have some empirical testing evidence. I don't want to approve large trial just to make a point and to test 1% you'd have to have at an least 100 edit trial. So I can only suggest dry runs and making a report of what would have been changed. — HELL KNOWZ  ▎TALK 21:37, 28 February 2012 (UTC)
 * I don't remember that much about stats, but for a 1% error rate to show up reliably I'm pretty sure you need an edit count in the multiple hundreds area. Josh Parris 02:36, 29 February 2012 (UTC)
 * Yeah. I was working on assumption it's >1%. — HELL KNOWZ  ▎TALK 09:24, 29 February 2012 (UTC)


 * Let's take Ehab Karim for example. He falls into
 * category:1981 births
 * category:2007 deaths
 * category:Iraqi footballers
 * category:Murdered footballers
 * category:Iraqi football biography stubs
 * category:Articles needing additional references from August 2011
 * category:All articles needing additional references
 * category:Persondata templates without short description parameter

Does the SHORT DESCRIPTION = footballer? Josh Parris 11:55, 28 February 2012 (UTC)
 * Yes, the description is just footballer. That is how I would classify him manually.  Everything but "Murdered footballer" is irrelevant to the description parameter. How one dies is extremely rare to add to the description parameter as it is not usually the cause for one becoming notable.  btw... Karim is not a notable footballer under current football guidelines per WP:NFOOTBALL.  Bgwhite (talk) 21:14, 28 February 2012 (UTC)
 * Okay, but he's not in the footballers category. What about Mike Sheahan?  Like Ehab Karim he's not in category:Footballers, but he is in
 * category:Australian journalists
 * category:Australian rules football commentators
 * category:Lists of players of Australian rules football
 * category:Living people
 * category:Australian television presenters
 * category:1945 births
 * category:Australian journalist stubs
 * category:Australian rules biography, 1940s birth stubs
 * category:Persondata templates without short description parameter
 * Does the SHORT DESCRIPTION = footballer? Josh Parris 02:45, 29 February 2012 (UTC)
 * I think Bgwhite just adds pages from a category, then adds the same description to every page based on what category they chose. — HELL KNOWZ  ▎TALK 09:24, 29 February 2012 (UTC)
 * If that's the case, that's unacceptably naive. Not everyone who was a footballer is most notable for being a footballer. Josh Parris 10:37, 29 February 2012 (UTC)
 * I did mention that "What about more important descriptions that the bot wouldn't recognize (like a king was also a footballer, but only "footballer" got listed)?" and that's where the 1% error rate estimate was from. — HELL KNOWZ  ▎TALK 10:40, 29 February 2012 (UTC)
 * The number was "less than 1%", which sounds like a WAG to me. Josh Parris 10:54, 29 February 2012 (UTC)

I've formed the opinion that this task as proposed is unacceptable due to the naive approach. Unless Bgwhite can talk me 'round, I don't like the idea of any expected error rate. As mentioned in Bots/Requests for approval/RscprinterBot 3, crowdsourcing the solution seems to be a better approach, possibly coupled with algorithmic guesses to accelerate the process. Josh Parris 10:52, 29 February 2012 (UTC)
 * I'm with JP on this. I'm not opposed to bot-assisted work, but the algorithms needed for this would need to be highly-refined, and human-review is still needed to handle mistaggings, or lack of sufficient information. Seems much better to leave as semi-automated tasks, and even there it seems to require lots of attention, given the misfire rates of RscprinterBot 3. Even as a semi-automated task this has the potential to be very explosive in the hands of someone who does not exercise great care. Headbomb {talk / contribs / physics / books} 16:53, 29 February 2012 (UTC)

Thank you for the message calling me "another Kuimoko wannabe who I will personally drum out of Wikipedia if you ever do a bot." Lovely. As pointed out, it will take a year for one person working 10 hours a day to do this. As Waacats and I have been the only naive ones even working on this, it means it will never get done. As the message said for me to "get lost", I shall. Bgwhite (talk) 20:30, 29 February 2012 (UTC)
 * What are you talking about? No one said that. Headbomb {talk / contribs / physics / books} 22:49, 29 February 2012 (UTC)
 * I concur. This absolutely isn't about you, or your motivations.  It's about the performance of the task as specified - the performance of the bot.  Please AGF.  This isn't a matter of "getting lost", it's about solving the problem in an acceptable way.  Do you feel you've been mistreated? Josh Parris 02:22, 1 March 2012 (UTC)
 * Sorry to hear about the comments BG, those comments are truly disappointing after all I tried to do. Glad to see my efforts were appreciated. :-( 71.163.243.232 (talk) 04:55, 1 March 2012 (UTC)
 * Mistreated here, no. Elsewhere is a different story. Bgwhite (talk) 20:09, 1 March 2012 (UTC)
 * I've looked at your contributions, talk page, people you contacted, etc... in the last week or so and I can't find anything remotely close to what you're suggesting above. Headbomb {talk / contribs / physics / books} 02:22, 2 March 2012 (UTC)

Refinement
What if the task were intentionally limited; only process footballers who fall into a footballer category, and have no categories other than: so no-one who has a category other than footballers, for example category:Monarchs. Presumably there's still a bunch of people that are notable only for their sport. I'm not suggesting everything under category:footballers, I'm saying Category:Australian Football Hall of Fame inductees‎ (for example) because it has no sub-categories. Could that work? Josh Parris 06:26, 1 March 2012 (UTC)
 * 1) birth;
 * 2) death;
 * 3) Living people;
 * 4) one specific footballer category;
 * 5) footballer-only stubs; and
 * 6) other hidden categories
 * I'm not sure how to generate such a list. There are people in a hall of fame that are also notable for something else.  John Madden or Bill Bradley pops into my head. Bgwhite (talk) 20:09, 1 March 2012 (UTC)
 * John Madden - Categories:
 * Category:1936 births cool
 * Category:Living people cool
 * Category:American football offensive linemen this would be the one specific footballer category you'd be populating from
 * Category:American people of Irish descent blerk! unexpected category
 * Category:American sports radio personalities blerk! unexpected category
 * Category:American sportswriters blerk! unexpected category
 * Category:Cal Poly Mustangs football players blerk! unexpected category
 * Category:California Polytechnic State University, San Luis Obispo alumni blerk! unexpected category
 * Category:Madden NFL blerk! unexpected category
 * Category:National Football League announcers blerk! unexpected category
 * Category:Oakland Raiders coaches blerk! unexpected category
 * Category:Oakland Raiders head coaches blerk! unexpected category
 * Category:People from Austin, Minnesota blerk! unexpected category
 * Category:People from Santa Maria, California blerk! unexpected category
 * Category:Philadelphia Eagles players blerk! unexpected category
 * Category:Pleasanton, California blerk! unexpected category
 * Category:Pro Football Hall of Fame inductees blerk! unexpected category
 * Category:San Diego State Aztecs football coaches blerk! unexpected category
 * Category:Sid Gillman coaching tree blerk! unexpected category
 * Category:Sports Emmy Award winners blerk! unexpected category
 * Category:All articles with dead external links acceptable hidden cat
 * Category:Articles with dead external links from October 2011 acceptable hidden cat
 * Category:Use mdy dates from October 2011 acceptable hidden cat
 * Category:Persondata templates without short description parameter expected hidden cat
 * So you ought not process John Madden, it seems he's notable for something other than playing football. I'd accept People from... and ...of xxx descent aren't categories of notability.  Some discussion may be needed regarding the ...players categories.  But you still butt heads against Category:San Diego State Aztecs football coaches, for example. So you just skip over this one and move onto the next.  Naturally, the bot would need the lists of acceptable categories fully populated before you began. Josh Parris 12:58, 2 March 2012 (UTC)

There's been no activity here for several days. Are you still intending to pursue this BRfA? Josh Parris 00:22, 8 March 2012 (UTC)
 * I had removed this page from my watchlist awhile back. So, I don't intend to pursue this. A couple of other editors have taken up the cause and are going thru the lists manually.
 * FYI... Looks like you are Australian. John Madden is in the Football Hall of Fame for coaching then was known for sportscasting.  Bill Bradley is in the Basketball Hall of Fame and was a U.S. Senator and Presidential candidate. Bgwhite (talk) 05:14, 8 March 2012 (UTC)
 * Oh, what is the status of BG19bot 2? Bgwhite (talk) 05:17, 8 March 2012 (UTC)

Let's find out. Josh Parris 11:21, 8 March 2012 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.