Wikipedia:Bots/Requests for approval/WildBot 3


 * The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Symbol keep vote.svg Approved.

WildBot 3
Operator: Josh Parris

Automatic or Manually assisted: Automatic

Programming language(s): Python, pywikipedia

Source code available: https://svn.toolserver.org/svnroot/josh/ (revision 6)

Function overview: Add checking of #section anchors for existence to existing bot

Links to relevant discussions (where appropriate): Guideline: Linking

Edit period(s): Continuous

Estimated number of pages affected: I'd guess less than 5% of new pages have #section links, and perhaps 20% of those would be wrong. At 1000/new pages a day, this would be about 10 edits. Hard figures show: 4% of new pages have #section links, and 32.5% of these are wrong; At 1000/new pages a day, this would be about 13 edits/day.

Exclusion compliant (Y/N): Y, standard in pywikipedia

Already has a bot flag (Y/N): Y

Function details: At the same time as checking new page's wiki markup for links to dab pages, the bot will also check for links containing a #section anchor to ensure the anchor appears on the target page. Normally this is a section heading, but there are techniques available (templates like Anchor and raw HTML tags) which create an anchor without a ==section== ; to detect these cases, the HTML of the target page will be downloaded and searched for these anchors.

Discussion
This is certainly a good idea. @harej 16:22, 17 January 2010 (UTC)
 * Isnt there an inline template for this, similar to deadlink or something? If there is it would certianly be more helpfull. Would you mind telling me why you think this should only be limited to new pages? You could do a dump scan for the whole project. Tim1357 (talk) 16:57, 17 January 2010 (UTC)
 * Bandwidth; I just don't have it. Running the bot as is consumes a solid 20% of a bandwidth I have available.  Unless I get a Toolserver account, recent changes or a database scan is off the cards. Additionally, I've got plans to make the bot smarter and more helpful, so I don't want to bomb every broken page link in the 'pedia with a mere advisory. Josh Parris 22:20, 17 January 2010 (UTC)
 * You could get a Toolserver account if you'd like; it would probably help with the running of your bot and it's not very difficult to get one if you can demonstrate need. @harej 00:34, 18 January 2010 (UTC)
 * DaB said he'll look at my application from 29th Dec on Sunday. Today's Sunday in Germany I believe.  Or has it just finished? Anyway, WildBot's approval may help things along there. Josh Parris 00:53, 18 January 2010 (UTC)
 * I've had a look, there's nothing for inline work. It may be inappropriate to inline too, because the link still kind-of works, it just goes to the target page rather than a part thereof. Josh Parris 00:58, 18 January 2010 (UTC)


 * On another note, I would appreciate it if the bot only works only in the main-namespace. Tim1357 (talk) 16:57, 17 January 2010 (UTC)
 * I was thinking along these lines but couldn't think of a reason not to check the other namespaces the bot currently patrols. What difficulties do you foresee outside of mainspace? Josh Parris 22:20, 17 January 2010 (UTC)


 * Damn, one other thing. It is generaly frowned upon for bots to download the html markup. If I may suggest a more server-friendly version: use http://en.wikipedia.org/w/index.php?title= &action=raw&templates=expand . That solves the problem of the Anchor template. Tim1357 (talk) 17:06, 17 January 2010 (UTC)
 * That's pretty much what I've done; I called the API version (which I'm not sure, having seen your suggestion, is the best idea). Josh Parris 22:20, 17 January 2010 (UTC)
 * So in order of questions:


 * Thats ok, if you only want to do new pages, thats fine. You could look into the toolserver idea if you want.
 * My reasoning is that there realy is no need for notifications outside of the mainspace. Plus there is no "talk" pages for talk pages, if you know what I mean.
 * Ok, if you really need to download the html, thats fine. I just thought the templates=expand bit would be helpful; I myself just found out about it. Tim1357 (talk) 00:55, 18 January 2010 (UTC)
 * Yes, I prefer your method over my API call. WildBot task 1 doesn't do talk pages, so no probs there. Toolserver account is in process. Josh Parris 01:07, 18 January 2010 (UTC)
 * Nice, this gets the thumbs up from me as long as this acts in the manor that the Disambiguation Wildbot does. Tim1357 (talk) 01:10, 18 January 2010 (UTC)

@harej 03:30, 23 January 2010 (UTC)

adding this functionality has demanded a substantial internal redesign for WildBot, as it's no longer making one edit to a talk page (at least, not internally). The hard figures above were produced by a very rough draft. Josh Parris 22:20, 24 January 2010 (UTC)

The trial has commenced, with some preliminary results are available in this a seeded group of #section checking with nine hits. The rest of the results are going to be spread out though the normal run of WildBot. There's code to limit it to 50 #section edits per run. Josh Parris 01:01, 26 January 2010 (UTC)
 * Thus far 25 edits have been made, and I've discovered a number of things. Turns out that pywikipedia has code to detect valid section references - but it doesn't work correctly when there's markup; the common case being an article link in a section header. People put all kinds of crazy stuff into section headers.  I won't bore you with the stories.  I seem to have bitten off quite a large, chewy part of the world.  The internal re-coding has been shaken-out, so I'll soon be tidying up the code and running that in production. Josh Parris 13:57, 27 January 2010 (UTC)
 * 33 edits Josh Parris 04:33, 28 January 2010 (UTC)

I'll be posting links to the edits in a few hours Josh Parris 02:45, 29 January 2010 (UTC) If I might add, this has been terribly buggy. I'm going to be keeping a very close eye on it in its early life, the multitude of problem that turned up during the trial haven't endeared the code to me. Josh Parris 11:48, 29 January 2010 (UTC)
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340037212
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340037371
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340037392
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340037642
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340037710
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340037816
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340038259
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340038285
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340052590
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340054153
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340066511
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340066727
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340306801
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340306925
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340306956
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340459065
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340461285
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340461444
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340462031
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340462264
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340463673
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340465392
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340465459
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340483435
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340483807
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340486642
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340496044
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340523826
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340530881
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340534529
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340534637
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340535508
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340535773
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340536163
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340638024
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340638848
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340638907
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340639103
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340640770
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340641235
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340642113
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340642165
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340645876
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340690850
 * http://en.wikipedia.org/w/index.php?diff=prev&oldid=340694909
 * Seems good to me. Tim1357 (talk) 00:54, 30 January 2010 (UTC)
 * The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.