Wikipedia:Bots/Requests for approval/MerlLinkBot


 * The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was Symbol keep vote.svg Approved.

MerlLinkBot
Operator: Merlissimo

Automatic: supervised

Programming Language(s): Java (Framework written by myself - long time usage in dewiki)

Function Overview: changes links in articles which are outdated and can be successfully replaced by a new one

Edit period(s): when links have to be changed ;-)

Already has a bot flag (Y/N): dewiki

Function Details: The bot replaces urls that have to be changed. This can be only a domain change or a more complex page structure change on a website. Links are dectected with the help of the api (and not with regex) and are only replaced if the webserver of the new url returns a 200-status-response for that new resource. Links in “normal article text” are not changed.

You can have a look at my edits on dewiki where my bot is changing .yu-Domains at the moment. You can also check the fault report of the yu-job on dewiki why some links weren't replaced.

On dewiki this job was also done by my first bot before.

Discussion
I'm liking the fault report; I'm interested, though, in the 406 errors - I tried a couple and they seemed fine to me (how do they come about, anyway?). Anyhow, if you're going to be doing .yu->.rs, you might want to have a chat with NicDumZ - his application, which is currently open, would do some of the block, en masse stuff; yours might handle the individual transitions. Clearly though, broken links should be avoided, and anything that helps is fine by me. - Jarry1250 (t, c) 09:48, 19 February 2009 (UTC)
 * I think some of the errors are because some servers are trying to block webcrawlers. Mostly i am doing a second run some hours later which solves some of the problems. If the bot has a problem with a link no replacement on the wiki is done. So a second run causes no problems.
 * DumZiBot wants to fix some links as mentioned in the mailinglist where only the domain name has changed. I think this is working >99% correct for these few listed websites. But in dewiki and enwiki >>1000 yu-domains are linked and most of them can't be fixed by simple domain replacement. Next week i will have a closer look at these domains how the links can be corrected. I think most of them are responsing with an redirect-header to the new non-yu-url which can be used for the replacement. We have just started to analyse the websites.
 * The bot is able to do more complicated replacements. e.g. for relinking austian laws on december the content of the old pages had to be analysed to build the new, completly different urls. --Merlissimo 22:59, 19 February 2009 (UTC)


 * What's the status of this bot request? ST47 (talk) 03:26, 6 March 2009 (UTC)
 * I don't know. Nobody ask me to do some test edits and bot flag is not granded. So what would you like to know? --Merlissimo 19:44, 8 March 2009 (UTC)
 * I never saw anything that described what this task was. All I saw was a 'this bot is able to change links.'  If there is not a specific task that needs done, this BRFA was filed a bit prematurely.  Q  T C 00:57, 13 March 2009 (UTC)
 * The bot is doing url rewriting if a popular websites is changing its domain (e.g. new compony name) or its website structure (e.g. a database changes its web interface) and the old url will not be reachable in future. e.g. the last really big job (~12000 external links) on dewiki was changing links to austrian legal laws because the offical website of the government changes its link structure. (e.g. the old link is now a dead link but  has the same content now).
 * If i have to specify every url i like to change on enwiki in an extra new bot approval request i won't do that job on enwiki because that would end up in 10-20 request per year. --Merlissimo 12:13, 13 March 2009 (UTC)
 * Today i will change links to www.tigr.org on dewiki. Would you like that i also change them on enwiki? All links are not working anymore (e.g. http://www.tigr.org/reptiles/families/Agamidae.html) --Merlissimo 16:36, 20 March 2009 (UTC)

Edit the request to reflect this and we'll likely give ya a trial. Q T C 16:39, 20 March 2009 (UTC)

If you want me to do a new request for every domain my bot will change i withdraw my request. It is to much overhead for me if an url is linked only 20 times (like most of them). Perhaps you can show me a site where i can post urls which will get broken in future, so that other bots can to this job. --Merlissimo 17:37, 20 March 2009 (UTC)
 * I would archive this but I'd like to see Q's response; withdrawn must be the worst possible outcome. Personally, I don't think that a BRFA would be necessary for every transition, especially with the extra checks you've got - but I'm not convinced that's what Q is going on about. - Jarry1250 (t, c) 09:18, 21 March 2009 (UTC)
 * Ah, there was an IRC conversation, which would explain the unfortunately swift change of direction. My suggestion would be to go get yourself a list of transitions that you'd like to do as part of a trial; I'm certain you'd get a trial based on them. What do you say? We'd hate to lose the support of someone who obviously knows what they're talking about. - Jarry1250 (t, c) 09:48, 21 March 2009 (UTC)
 * In the past most of the replacements need hints from the old url or contains some checks to garantee that old and new page conains the same content. But most of the old urls are disabled now. So i have to check fist which old replacements are still possible and will post them here.
 * But i let my run for tigr.org in debug mode (=no edits) on enwiki. I have posted the log to User:MerlLinkBot/Log. There you can find the replacement rules and how urls would be replaced. Red lines are replacements which my bot won't do or contain some hints for me (e.g. old domain name is still mentioned on page text because my bot only changes linked urls and not the link description). --Merlissimo 22:46, 21 March 2009 (UTC)


 * Seems reasonable, example switches work, as long as the error checking works as intended. Q  T C 09:27, 26 March 2009 (UTC)
 * Sorry, I was a bit busy. I'll start the test tomorrow. --Merlissimo 00:01, 6 April 2009 (UTC)

Nudge: – Quadell (talk) 23:10, 15 April 2009 (UTC)

Ok, here some explanation of my first edits: --Merlissimo 04:36, 23 April 2009 (UTC)
 * First i've done some nonsense edits in my userspace first to get autoconfirmed status, because my bot cannot answer the captcha
 * replacing links to "Bonner Kant-Korpus" (informed by website operator)
 * the domain tigr.org will be switched off soon. For this test i have replaced only some few links.
 * website of the austrian government changed it db structure at the end of last year

Looks great. – Quadell (talk) 18:57, 24 April 2009 (UTC)


 * The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.