Wikipedia:Bots/Requests for approval/ShakingBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol neutral vote.svg Request Expired.

ShakingBot
Operator: ShakingSpirit

Automatic or Manually Assisted: Automatic

Programming Language(s): Python (pywikipedia framework)

Function Summary: To detect and notify on talk pages any dead external links found within articles

Edit period(s) (e.g. Continuous, daily, one time run): Weekly to Monthly

Edit rate requested: 30 edits per minute

Already has a bot flag (Y/N): N

Function Details: I noticed that there's a fair number of dead link bots based on the weblinkchecker.py script included with the pywikipedia framework, but (to the best of my knowledge) none are actually up, alive and kicking. After playing with the weblinkchecker.py script for a while it's easy to see why; although a very easy bot to set up, it requires a very large amount of time to run (a few weeks at the very least per sweep) because, even using something like Special:Export, going through every single article on Wikipedia and grabbing links is no small task! Even once it's done this once, it needs to do at least two sweeps spaced apart to check that the link isn't just down temporarily. ShakingBot abandons the idea of checking through every article as it stands, and instead uses the monthly externallinks.sql database dump. It goes though the following process:
 * 1) Check through every article alphabetically in externallinks.sql for bad links, and log any it finds
 * 2) 7 days later, run though this check again, and note any links which were marked 'bad' in both checks
 * 3) Go though the list of 'bad' links and check to see if each link is still used in the 'live' article (as it may have been removed since the dump was taken) - disgard those which aren't
 * 4) Add a note to the article's talk page, letting users know that the link is bad, and should be reviewed or removed

Please note that ShakingBot will not remove links by itself, as some links are important and should be replaced with similar pages or an archived version of the original link - this requires human intervention.

Currently I am working on detecting if ShakingBot previously noted a dead link which hasn't been removed since the last run, and if so, note it to file, and to a noticeboard in my userspace. I hope to have this finnished for if and when ShakingBot is approved.

Discussion
Seems pretty safe, wanna trial it for a bit -- Tawker 03:12, 27 December 2006 (UTC)


 * That would be great, but will need to be for about a month to get in results; it takes a long time as it is, and I'm still optimizing the number of threads it uses for my server's specific CPU and bandwidth ShakingSpirit talk  03:55, 27 December 2006 (UTC)


 * For a trial, choose a select few articles, and have it work on them. No need to have it be the whole encyclopedia.  Cheers, ✎ Peter M Dodge  ( Talk to Me  • Neutrality Project  ) 16:25, 27 December 2006 (UTC)


 * Two questions:
 * Would it be possible (make more sense?) to work with the lists of bad links being generated by the system, at Dead external links, or to otherwise get a list from the Wikipedia database?
 * That would be ideal if it was possible, but to the best of my knowledge the lists on Dead external links haven't been updated for nearly a year now. ShakingSpirit talk  04:11, 28 December 2006 (UTC)
 * Why not TAG the external link (similar to ) IN the article, rather than posting to the talk page?  John Broughton  |  Talk 01:05, 28 December 2006 (UTC)
 * I thought about tagging them with, but this doesn't say "please fix me!" like a note on the talk page does, and personally I think it makes inline refs look ugly and confusing. I'd like to hear other people's opinions though ^_^  ShakingSpirit talk  04:11, 28 December 2006 (UTC)

Please look at Dead external links — Iamunknown 01:47, 29 December 2006 (UTC)
 * Please look at what I've written above; Dead external links is very much out of date. ShakingSpirit talk  05:44, 29 December 2006 (UTC)
 * Not sure how I missed that. Maybe you could update the lists with the ShakingBot. I for one still occasionally go through the Dead external links lists and try to help out. — Iamunknown 20:28, 29 December 2006 (UTC)
 * That's certainly a possibility, as I've already completed the code to post persistent dead links to User:ShakingBot/Persistent dead links, and it wouldn't take too much effort to re-organize them to Dead external links's style and post them there instead/as well ShakingSpirit talk  07:09, 30 December 2006 (UTC)

Regarding the inline tag, < > is long and ugly. I was thinking of something like this: &#91; bad link &#93;. (I've not used a template here, but rather a modified version of Template:Fact, for illustration.) John Broughton  |  Talk 03:14, 30 December 2006 (UTC)
 * I guess that would work, as long as the link wasn't inside a or somesuch, as that breaks it. I'll try various other ways of tagging and see if it's feasible ^_^ ShakingSpirit talk  07:09, 30 December 2006 (UTC)

Combining John Broughton and Iamunknown's ideas above, what about tagging the links with something like &#91; bad link &#93; and as part of that template, adding the page to a category (in the same way that &#91;citation needed &#93; adds the page to Category:Articles with unsourced statements)? This has the advantage of the list of bad link pages being dynamic, and doesn't need to have fixed links manually removed and new links added in bulk as the ones on Dead external links do. Thoughts, opinions? ShakingSpirit talk 07:23, 30 December 2006 (UTC)
 * I think that would definitely be appropriate. It might help stem the tide of what I've experienced in fixing dead links, which is the removal of them. It would keep the content (links) and the process (a link to Dead external links) in the same spot (the article), whereas previously both the content and process were both totally separate from the spot. Great idea! — Iamunknown 22:34, 31 December 2006 (UTC)


 * 30 edits per minute? Or do you mean 30 page-gets per?  I'm afraid, though, that this seems to very much fall foul of BOT:  "Bots that download substantial portions of the page database are prohibited. Instead, download the database dumps. See also Wikipedia:Mirrors and forks."  I'd suggest if you can do that, and then check only the "positive" pages from the dump against the live database, that would be much less of a resource hit.  Alai 05:01, 12 January 2007 (UTC)

What is the status of this bot? — Mets501 (talk) 20:56, 25 January 2007 (UTC)
 * Don't pin me to this but I think it is still in test phase. RED  skunk  TALK 01:24, 2 February 2007 (UTC) P.S.  Shaking Spirit has gone inactive for a while.
 * I notice that he "unidled" long enough to remove an image upload notification from his talk page, but given the lack of a response here, or to queries about this matter on his talk page, I think this is certainly at the "archival" stage. Alai 16:53, 8 February 2007 (UTC)

— Mets501 (talk) 22:36, 8 February 2007 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.