User talk:WebCiteBOT/Archive

AfD
Is it doing what you want with "Riders of the Flood"? It has an AfD tag, and the bot apparently archived several pages for it. - Peregrine Fisher (talk) (contribs) 02:47, 18 April 2009 (UTC)
 * I'm basing this on User:WebCiteBOT/Logs/2009-04-17.log. Where it says "Attempting to archive http://www.tuckwillergallery.com/...SUCCESS" etc. - Peregrine Fisher (talk) (contribs) 02:48, 18 April 2009 (UTC)
 * I hadn't actually implemented the feature to skip AfD articles (forgot). I will do so now. --ThaddeusB (talk) 13:51, 18 April 2009 (UTC)
 * Cool. - Peregrine Fisher (talk) (contribs) 15:46, 18 April 2009 (UTC)

Here there is a bare URL, which the bot then formats with the cite web template. The problem is that the page doesn't have a references section. It also doesn't do anything to the ref in single brackets, which may be intentional, I don't know. - Peregrine Fisher (talk) (contribs) 07:14, 19 April 2009 (UTC)
 * I have now added a function to add a reference section if it converts a bare URL and no reference section exists yet. At this time, it doesn't touch any other bare URLs. --ThaddeusB (talk) 14:38, 19 April 2009 (UTC)
 * Sounds good. - Peregrine Fisher (talk) (contribs) 16:28, 19 April 2009 (UTC)

Manually run the bot on a given article
I would love an option (maybe via the toolserver) to set the bot loose on a given article. So if I want to archive all the references in any one article I'm working one, then I could go to a page and queue it for the bot to work on. Just an idea. — LinguistAtLarge • Talk  17:39, 24 April 2009 (UTC)
 * Another useful option would be to go after all WP links to a specific base URL. See Administrators%27_noticeboard for a couple of cases in point where this might be useful.LeadSongDog come howl  18:42, 24 April 2009 (UTC)

Thanks for the suggestions. I will look into expanding the bot's scope after the initial form is fully approved (which hasn't happened yet). --ThaddeusB (talk) 14:22, 25 April 2009 (UTC)
 * No hurry, but I've had the same thought. It would be great to archive all the links in an article that just made FA or GA and has had a thorough going through, for instance. - Peregrine Fisher (talk) (contribs) 14:51, 25 April 2009 (UTC)
 * @ThaddeusB - Sounds good, thanks. — LinguistAtLarge • Talk  20:08, 29 April 2009 (UTC)

Any updates on this? – Quadell (talk) 00:16, 8 May 2009 (UTC)
 * Still working out the kinks in the main version... almost done with that. {fingers crossed} --ThaddeusB (talk) 21:37, 9 May 2009 (UTC)
 * You can use Checklinks. Its main purpose is to fix dead links, but there is a button that says "Archive with cite web"Tim1357 (talk) 23:34, 26 September 2009 (UTC)

FT article
WebCiteBOT recently visited 10,000 Women and tried to archive this FT story, but the archived copy doesn't display correctly. Gareth Jones (talk) 23:19, 26 April 2009 (UTC)
 * Thank you for alerting me of this error. The problem is on WebCite's end (it archives a piece of javascript rather than the file requested.)  I have just alerted them of the bug and hopefully it will be corrected shortly.  In the mean time, I will write a check function to prevent such archives from being recognized as successful by my program. --ThaddeusB (talk) 23:57, 26 April 2009 (UTC)

Interwiki map
Is it worth having WebCite URLs listed on the interwiki map? Mapping to http://webcitation.org/$1 should work as far as I can tell. That way instead of linking to  in refs you could link to something like   (looks like "wc" is free btw). I appreciate this isn't a huge difference, but clearing out extraneous addressing fragments from the mass of wikitext involved in ref formating makes it easier for humans to parse, plus as that page on meta shows we already have huge numbers of obscure linking shortcuts for websites that won't be linked to anywhere near as much as WebCite, so there seems to be consensus that these things can only be of help to editors. I thought I'd seen discussion somewhere where you or someone else said you'd like to build WebCite syntax into the cite templates, although I can't now find it. Implementing this in the interwiki map would avoid having to change the templates at all, which some might feel privileges a particular archiving website over others. The logic seems clear to me! If this isn't done, can I at least suggest removing the unnecessary www from WebCite URLs, on the grounds that any shortening of the ref clutter is helpful, and also that www is deprecated! 79.64.170.147 (talk) 02:07, 7 May 2009 (UTC)
 * Hello,
 * It was me that raised the idea of working WebCite it into the cite templates' code, but I quickly dropped the idea when it became clear there was no need.
 * I certainly wouldn't object to such a listing on the interwiki map. If you want to create a proposal (or whatever you want to call it) to add one, I'd be happy to comment on it.  Just leave me a link to the discussion here if you decide to do so.  Changing the bot's code would be a trivial matter if/when the change went live.  --ThaddeusB (talk) 02:48, 7 May 2009 (UTC)
 * This is an excellent idea, in my opinion. If the IP wants to suggest it, I'll lend my support as well. — Huntster (t • @ • c) 03:31, 7 May 2009 (UTC)

Proposed here 79.64.254.219 (talk) 11:38, 7 May 2009 (UTC)


 * Thanks 219, I'll add my support. I've realised a slight problem with this, however. Since any archive url, be it WebCitation or the Internet Archive, should be placed in the  field of the various "Cite X" templates, an interwiki link such as   won't function. For example:
 * Any ideas on how this can be fixed or otherwise manipulated to work? — Huntster (t • @ • c) 22:22, 7 May 2009 (UTC)
 * Yah, that is definitely an issue. I imagine it would take a change in the wiki software to correct, although possibly a fix to Citation/core‎ template would do.  I have no ideas on possible workarounds either. --ThaddeusB (talk) 21:37, 9 May 2009 (UTC)
 * Sure, but this is a good first step. All in good time. — Huntster (t • @ • c) 23:43, 9 May 2009 (UTC)
 * Sure, but this is a good first step. All in good time. — Huntster (t • @ • c) 23:43, 9 May 2009 (UTC)

Many thanks to the creators of this!
Many, many thanks to the creators of this! I always had this knawing feeling that the contributions I made would be lost or distorted in some way over time. But with this function you have put my mind at ease! Many, many thanks! Boyd Reimer (talk) 13:10, 15 May 2009 (UTC)
 * Love-fest pile on! I wish your bot had lips so I could kiss it. – Quadell (talk) 20:28, 15 May 2009 (UTC)

I applaud this effort as well. -- C. A. Russell ( talk ) 17:15, 3 June 2009 (UTC)


 * Thanks guys, the praise is appreciated. --ThaddeusB (talk) 18:55, 3 June 2009 (UTC)

WebCiteBOT – will you be my valentine? Pslide (talk) 12:22, 11 July 2009 (UTC)

Stats?
Is your bot keeping track of its work? It would be interested to know how many refs are added in a day and whatnot. - Peregrine Fisher (talk) (contribs) 05:22, 16 May 2009 (UTC)
 * I will put adding a feature to track statistics on my to do list. Thanks for the suggestion. --ThaddeusB (talk) 20:22, 18 May 2009 (UTC)
 * See User:WebCiteBOT/Stats - more stats will be added soon. --ThaddeusB (talk) 19:19, 27 May 2009 (UTC)
 * Nice job. Why hasn't the bot been going lately? - Peregrine Fisher (talk) (contribs) 19:24, 27 May 2009 (UTC)
 * Because I was adding and testing a feature to capture all the human supplied metadata on Wiki page in order to build a database of publishers and such. All done now, so the bot will be back in force later today. It should be running 24/7 by the weekend. --ThaddeusB (talk) 19:30, 27 May 2009 (UTC)
 * Cool! - Peregrine Fisher (talk) (contribs) 20:12, 27 May 2009 (UTC)

Links that are already archived
I have in the past manually supplied Internet Archive or WebCite archive links in a reference. How will WebCiteBot handle links that are already cited in the context of an archived version? I do hope it will avoid creating a circular reference (archiving an archived copy). --User:Ceyockey ( talk to me ) 01:34, 28 May 2009 (UTC)
 * Fortunately, the bot is intelligent enough to skip links that point to existing archives. --ThaddeusB (talk) 03:07, 28 May 2009 (UTC)

Reaction from WebCite
Do you have a notion of the reaction from the managers of WebCite to suddenly seeing an upswing in archiving activity as a result of this Bot completing its tasks? --User:Ceyockey ( talk to me ) 01:36, 28 May 2009 (UTC)
 * I have been in contact with Gunther Eysenbach throughout the process. They have been very supportive of the project (in fact, they had the idea of creating a bot like this in their FAQ before it was independently thought up here.) --ThaddeusB (talk) 03:07, 28 May 2009 (UTC)

Please clarify "not all links are caught when approximately 4 or more links are added at once"
I spend a lot of my time on Wikipedia adding references to articles that are nominated for deletion, and usually do all of my additions in one edit, so I often add "approximately 4 or more links" at once. I'd like something better than "approximately" to work on, so that I know how often I should save. For example, if I save an article after adding three links will that guarantee that they will be seen by this bot? Phil Bridger (talk) 22:26, 28 May 2009 (UTC)


 * I'll try to be as precise as possible... IRC has a hard limit on the number of characters allowed per line.  The bot that reports to the IRC feed that I rely on puts all the added links on one line, regardless of how long this makes the line.  After you take away the other info it reports (person who added the link, diff link, etc.) there are about 300 usable characters for the actual links.  ~15-25 of these are taken by the wiki page name.  For each URL reported the IRC bot also reports the number of times it has been added to Wikipedia & the number of links the addeing user has added.  This takes up about 16 characters.  So to get an exact measure you have to find the length of the URLs you added and add 16 bytes per URL (expect for the last one as it doesn't matter if the extra info gets cut off).  When you get beyond about 275 bytes the URLs will start to get cut off.  In practice, you get 4 URLs of normal length, or 3 longer ones, into 275 bytes. --ThaddeusB (talk) 23:57, 28 May 2009 (UTC)
 * OK, thanks, I think I can work with that. Phil Bridger (talk) 01:00, 29 May 2009 (UTC)

WebCiteBOT going wrong
This WebCiteBOT has recently edited the article I am working on, the 2008 French Grand Prix. Several times, now, it has added authors in refs where there are not any, so I don't want them. Archiving pages and all that is fine, but adding authors, especially as it has put the author for one of them as "glopes" (goodness knows how it came up with that) I do find it frustrating. Is there a way to fix this? Darth Newdar  (talk)  07:07, 29 May 2009 (UTC)
 * Aargh, its done it on the 2008 German Grand Prix article now. I really do know what I am talking about when I say I do not want an author for some of these refs. Darth Newdar   (talk)  07:14, 29 May 2009 (UTC)
 * Very interesting, and not good. Thaddeus, may I strongly suggest that the bot not add Author or any other data of this nature? Metadata elements are simply too prone to error to try and scrape them. — Huntster (t • @ • c) 10:04, 29 May 2009 (UTC)
 * First of all, thank you for your valuable contributions to Wikipedia. Now, in fairness the bot didn't actually edit the page several times - it edited it twice.  It also did not re-add the author info you deleted - it just added author info to a newly added source.  The author "glopes" came directly from the PDF file - in all likelihood someone with the last name "Glopes" was responsible for putting together the info in the file.  Even if it is incorrect, it is not the bot's fault that the pdf contained an inaccurate author - that information was supplied by a human and just copied by the bot.
 * The software already has some checks to prevent bad author info from getting copied, but the "GrandPrix.com" slipped through so I will look into improving the code to prevent that from happening again. --ThaddeusB (talk) 18:41, 29 May 2009 (UTC)
 * Thanks. Darth Newdar   (talk)  19:22, 29 May 2009 (UTC)
 * Its done the author info "glopes" on the 2008 Turkish Grand Prix now. Darth Newdar   (talk)  11:07, 2 June 2009 (UTC)

www.webcitation.org not working?
Hey what's going on with www.webcitation.org?, I haven't been able to manually archive anything for a while now, is anyone else getting constant error messages? Argh. -- &oelig; &trade; 20:12, 5 June 2009 (UTC)
 * Yes, I got the error message too. Rettetast (talk) 20:23, 5 June 2009 (UTC)
 * Yah it has been down for at least 48 hours straight now. These down times seem fairly common, but this is the longest I've seen. --ThaddeusB (talk) 23:58, 5 June 2009 (UTC)

WebCiteBOT on nl?
Hello Thaddeus, I think that this bot is a great tool. I've been running weblinkchecker.py on nl: until now, but your bot is much better. Could you run it also on nl: (I'm quite sure that it will get approved), or alternatively, could I use your script on nl.wikipedia? Please let me know how you prefer to approach other language editions of wikipedia. Kind regards, --Maurits (talk) 08:02, 7 June 2009 (UTC)
 * I do plan to port the BOT to other Wikipedia's, but the task is a little more complicated than just changing the name of the encyclopedia as each site has their own conventions for how references are handled (and of course non-English template names, "reference" sections, and such). I am still perfecting the English version, but when I am ready to start porting it I will be sure to contact you for help.  Thank you for the offer. --ThaddeusB (talk) 19:00, 7 June 2009 (UTC)

Thank you for your reaction, I'll be patient :). Some details about the Dutch Wikipedia in advance:
 * 1) Our IRC is #wikipedia-nl-vandalism.
 * 2) We don't have a dead-linktemplate, as an alternative I normally use <--! dode link -->.
 * 3) Our deletion-templates (or for the consideration thereof) are: Artikelweg, Auteur, Auteur2, WB, Reclame, Wiu, Nuweg, Nuweg-reclame, Transwiki, Weg2, NE, Xauteur, Xreclame, Xwb, XNe, Xweg, Xwiu. Their prefix is Sjabloon:. There are some redirects to these templates too: Weg, Artweg, Wb, Woordenboekdefinitie, Promo, Promotie, WIU, Delete, Speedydelete, Speedy, Db, Reclame-nuweg. (This enumeration excludes those for images, files, categories, et cetera; if my assumption that these are unnecessary is false, please let me know).
 * 4) Our web reference-templates are 'Cite web' (identical to the english version) and 'Voetnoot web' (almost identical; if you need some translation/interpretation, let me know). Redirects: 'Citeweb', 'Cite Web' (to the former), 'Citeer web' (to the latter).
 * 5) Links can be mainly found between -syntax and in reference sections with the titles: 'Externe verwijzing', 'Externe verwijzingen', 'Voetnoten', 'Voetnoot', 'Referenties', 'Noten', 'Bronvermelding'. The following templates include a
 * with


 * that should prevent the bot from re-adding the info. (Make sure to delete archivedate=... as well since it wouldn't make sense to have a date for a non-existent archive.)  If that doesn't work, let me know. --ThaddeusB (talk) 03:48, 13 September 2009 (UTC)

Works fine. Thanks. --Farry (talk) 18:32, 14 September 2009 (UTC)

FlickreviewR bot on Commons
I would like suggest a new use for the WebCiteBOT: use it with the commons:User:FlickreviewR bot to archive the flickr pages. That way, if the image is later deleted or license changed, there won't be any question about whether it was actually available under a free license at the time of the upload. It would be very helpful in substantiating claims under images tagged with this: commons:Template:Flickr-change-of-license--Blargh29 (talk) 01:45, 22 September 2009 (UTC)


 * If I ever get the bot working a consistent basis, I will definitely expand it to do that. --ThaddeusB (talk) 05:39, 27 September 2009 (UTC)

Backlog
The bot seems to still be in the "2"s. Will it ever get to later in the alphabet? It seems to have been in "2"s for months now. - Peregrine Fisher (talk) (contribs) 05:19, 27 September 2009 (UTC)


 * Yah, I've been having a lot of problems. Between the API timing out on me and WebCite inexplicably entire sequences of page as 404 when they aren't, I've had to make the bot redo the same pages over & over again.  That said, it should fly through the rest of the #s after it gets past "2009..." which it almost is past now. --ThaddeusB (talk) 05:43, 27 September 2009 (UTC)
 * Sounds good. I look forward to the "3"s and beyond. - Peregrine Fisher (talk) (contribs) 05:51, 27 September 2009 (UTC)

Is there any way to see the backlog? See the list of articles that still need to be processed?—NMajdan &bull;talk 16:15, 4 October 2009 (UTC)


 * Not directly, but I can query it locally. As of right now there are 1.1M pages in the backlog of which a large number are duplicates - probably around 300k unique pages. If you want a more precise count, let me know and I'll generate one. --ThaddeusB (talk) 16:48, 4 October 2009 (UTC)
 * No, I was more of less curious to see the order in which the bot may be tackling these articles. Maybe when the backlog gets down to a manageable size, you can have the bot update a page once-a-day or so with the backlog. Frankly, I was just curious if citations I have been making were going to be included in a future archival so my question wasn't entirely altruistic.—NMajdan &bull;talk 21:53, 5 October 2009 (UTC)
 * Another question, at what rate are you adding to the backlog? Just trying to figure out how long before the bot is caught up. Looks like the average rate in September was about 6500/day. At 300,000 articles in the backlog now, it would take about 45 days to get caught up assuming no more articles are added. Obviously, I know this bot provides a tremendously valuable service to Wikipedia so I'm a bit more inquisitive than normal.—NMajdan &bull;talk 21:58, 5 October 2009 (UTC)
 * Just following up. Still very curious about these aspects.—NMajdan &bull;talk 14:28, 15 October 2009 (UTC)
 * Not sure why I never replied the first time, just slipped my mind I guess... The bot currently functions by first sorting everything into alphabetical order so that it can easily combine duplicate entries into one.  Thus, pages nearly to the start of the alphabet will be processed sooner regardless of when exactly the inks were added (this only happens because of the backlog problem.)
 * The backlog is still growing (although naturally the rate of growth slows as a larger percentage of new entries become duplicative of old entries). I'm not sure if there has been a single day where the program to monitor additions was running and the backlog didn't grow. I don't have precise numbers, though.  I finally wrote an effective workaround for the API problems - hopefully that will get it to where at least it is up to break even.
 * Currently the bot isn't running at all because I am making some modifications so it can rapidly archive all GeoCites links before they go dead later this month. I think it will probably be back up later tonight.  Failing that, tomorrow for sure. --ThaddeusB (talk) 01:02, 17 October 2009 (UTC)
 * P.S. Feel free to ask as many questions as you like - I don't mind answering them at all. --ThaddeusB (talk) 01:02, 17 October 2009 (UTC)
 * That open invitation for questions may have not been a good one :). I was wondering if the bot requests that pages be archived all the time, so that even when it is updating the article, it is still sending requests to WebCite. If it isn't, then it seems that would be a more efficient use of time. Tim1357 (talk) 20:25, 23 October 2009 (UTC)
 * The may it currently works is:
 * pulls the first N links waiting to be archived
 * pulls up the associated Wikipedia page for the first/next link and checks to see if the link is still there & used as a reference
 * if the link is a reference, it makes sure the webpage is valid & pulls soem metadata from it
 * if the page is valid, it sends an archive request
 * return to step 2, until all N links are processed
 * waits a hour (per request by WebCite people)
 * goes through each link & checks the status of the archive
 * if the archive was valid, it updates the Wikipedia page (some links can't be archived due to robots.txt or no_cache settings)
 * It works like this because archiving isn't always instantaneous (it recent history it has been, but historically it hasn't). Again, the backlog problem isn't due to time constraints, but rather getting the code up to a level where it is stable enough to run 24/7.   The current setup should be able to process ~10K unique links a day once it is fully stable and the "true" rate of unique new links being added a day is most likely under 1K. --ThaddeusB (talk) 04:08, 26 October 2009 (UTC)

uic.com.au
From WP:VPM, I was advised to make a request here. Can this bot correct those citations? --Quest for Truth (talk) 18:21, 28 September 2009 (UTC)


 * Yes, I will make sure it archives the links before they disappear in December. --ThaddeusB (talk) 00:05, 29 September 2009 (UTC)

Archiving links on this page
Would it be possible to have WebCiteBOT periodically archive the links on this page, to prevent any archive.org link rot or removal? If it isn't, sorry for bothering you; if it is, it would be great if you could do that. Thanks for your time. JimmyBlackwing (talk) 05:36, 30 September 2009 (UTC)

Request
Hi, please archive the references (1 and 2) on Detroit Lions Television Network. They seemed to have died. Thanks. TomCat4680 (talk) 21:52, 30 September 2009 (UTC)

Also do so for ref 18 on Detroit Lions. TomCat4680 (talk) 21:56, 30 September 2009 (UTC)


 * If a link has already died, it cannot be archived (as in, there's nothing there to archive). You'll have to try finding it at http://web.archive.org. — Huntster (t @ c) 22:15, 30 September 2009 (UTC)
 * Also Checklinks is a useful tool.Tim1357 (talk) 02:02, 1 October 2009 (UTC)

Concerns about webcitation.org
I was at the .pst article and wanted to see the source reference. Normally I hover the mouse over a link before I jump and was surprised to see it was pointing to something which conceals the ultimate destination much like tinyURL does.

The problems I'm seeing with links to www.webcitation.org are:


 * 1) When looking at links I regularly hover the mouse over the link and look at the status bar to see what site is being linked. Usually that's enough for me and I don't click. Converting the links to use the Webcitation.org web site breaks this feature.
 * 2) I regularly use Special:LinkSearch to see if a particular web site is in "good standing" as far as being a source of references. Having many, and presumably at some point, all, of the article reference links converted to use webcitation.org destroys the usability of Special:LinkSearch.
 * 3) The WP Foundation takes WP user privacy seriously. There are many policy hurdles to get the IP address of an editor via CheckUser. You'd need to subpoena the foundation to get the IP address of a reader or to have them tell you which pages an IP has visited. Webcitation.org is able to collect three elements for every person that follows a link off Wikipedia. 1) The person's IP address, 2) the Wikipedia page the person was on (referrer tag), 3) The web page the user is interested in, 4) the date/time the request was made, and 5) other information that web browsers sends to web sites such as the operating system, browser used, etc.
 * 4) Related to Wikipedia user privacy is that webcitation.org is tracking users via cookies meaning that they will be able to tie my use of links from Wikipedia to the use of links from other web sites.
 * 5) Webcitation.org's privacy policy is not reassuring with "From time to time, we may use customer information for new, unanticipated uses not previously disclosed in our privacy notice." While Canada's privacy rules are better than what's in the USA I see that the webcitation.org servers are physically in Texas meaning USA rules apply.
 * 6) Wikipedia has a spam blacklist intended to prevent links links to certain sites from getting added to Wikipedia pages. Webcitation.org allows editors to circumvent the blacklist because the link to webcitation.org looks like http://www.webcitation.org/5k40hOrFo where "5k40hOrFo" is a random value that's only meaningful to the webcitation.org web servers.
 * 7) Wikipedia editors frequently evaluate the potential of links by inspection alone. If I see a link to someuser.blogger.com then I know the odds are low it'll be a reliable source. Webcitation.org breaks evaluation by inspection and instead forces editors
 * 8) Webcitation.org is a private web site that has full control over the outer frame. At present this frame is a plain blue bar. Their policy contains no prohibition against inserting advertising in the frame or even modifying the content they are showing in the lower frame.
 * 9) People are are not familiar or aware of webcitation.org will believe the content they are viewing comes from webcitation.org. When I was on the .PST article and seeing the link to www.webcitation.org/5k40hOrFo I assumed this would work like tinyurl and redirect me. I clicked and was thinking "Why would a site called www.webcitation.org have a page about file extensions?" I was then thinking this is was a bootleg site where someone had stolen a  www.FILExt.com page and uploaded it to webcitation.org.

I would like the WebCiteBOT modified so that it adds articles to webcitation.org as it does now but that any links in the article go to the orignal web site. The webcitation.org link can be maintained as a comment visible to Wikipedia editors. Should the original site fail then an editor would see the commented out archive link and could start using that pending finding an appropriate substitute for the original site. There's no reason at all for live links from Wikipedia to this site if the original site is available. Webcitation.org would still serve its (very useful) function of archiving content so that links to citations will not rot completely should the source site be updated or removed. Webcitation.org is also useful in that it can store older versions of the page.

The existing webcitation.org links should also commented out and links to the source web site restored.

I have also asked that adding links to webcitation.org be blocked at MediaWiki talk:Spam-blacklist. --Marc Kupper&#124;talk 04:33, 13 October 2009 (UTC) But, here's a workable solution to some of your other concerns about the hover-over url identification. Instead of asking to comment out the Webcitation.org archived pages, which would totally defeat their purpose, I suggest that the citation Citation templates be altered to show the original url as the main link and the archived link as the secondary url. This would be an easy fix at the template-level and would assuage any of your concerns, while preserving the work of the many people, including myself and ThaddeusB, who have taken the time to preserve Wikipedia's web sources for the future. pehaps the discussion can take place at Citation templates. --Blargh29 (talk) 05:15, 13 October 2009 (UTC) As for the original complaint, it shows a fundamental misunderstanding of the situation. Point 2 is bogus since the original link is always retained by the bot and should be by any human editor. Point 3 applies to every external link to any site (do you want to ban all external links?). Point 4 also applies to pretty much every external link as well, and I note that webcitation.org's cookie is just the default PHPSESSID used for PHP session handling and will be removed when the browser is closed (which is far better than you'll get on many sites commonly used in external links). Point 5 seems needlessly reactionary, as the quoted section of their privacy policy is geared towards people who actually create an account there rather than visitors; regarding visitors, they basically collect what would already be in the web server log anyway (Wikipedia does that too! Oh noes!). Point 6 is easily enough refuted considering that point 2 is bogus: any webcitation.org link without a corresponding original link can and should be subject to scrutiny. Point 7 is basically a duplicate of point 1 and depends on the bogus point 2 to really make any sense. Point 8 is again rather reactionary, as many sites used in external links (for example, pretty much every "mass media" news site) already has advertisements. And as for point 9, there's no accounting for people jumping to conclusions instead of taking the time to click the "What's this?" link in the blue header. Anomie⚔ 11:49, 13 October 2009 (UTC)
 * Webcitation.org and the Internet Archive are the best tools we have to combat linkrot, which I believe to be one of the biggest threats to the content of Wikipedia. Please let me address some of your concerns. First, Webcitation.org is not like tiny.url, but rather it is an archive of web content, which is clearly shown by the blue frame in every Webcitation.org archive. It's used in many academic journals. Second, I really doubt that any spammers, whose business relies on the quick and robotic insertion of their links, would use webcitation.org, which requires a 2 step process to create an archive and requires an email address.
 * The counter to that suggestion is that when the original location is known to be dead, we most likely do want the "main" link to go to the archive copy. WebCiteBOT already adds a "deadurl=no" parameter to citation templates when adding the archiveurl parameter, so the change could very easily be done only to citation templates where that parameter is specified and the current behavior retained when "deadurl=yes" or deadurl is unspecified.
 * I pretty much agree with the above responses... The use of the archived as primary is done by the template and not the bot, so that really is beyond my control. The correct place to ask for the original to be primary is the template talk page.  Past discussion about which should be primary has been fairly evenly split, so no change has been made thus far.  In any case the fact that the link is archived is clearly indicated in the reference itself: "Archived from the original on 2009-01-01." If you are interested in seeing the original without going through webcitation.org, all you have to do is click (or hover over) that link instead.  The rest of the complaints pretty much are true of every external link (and indeed some external links are far worse, containing for example, malicious script exploits.) --ThaddeusB (talk) 14:52, 13 October 2009 (UTC)
 * I'd like to answer some of this today but was without power for a good part of the day plus running around dealing with all of the loose ends a season's first major uncovers. --Marc Kupper&#124;talk 08:44, 14 October 2009 (UTC)

WebCiteBot on other wikis
Hi,

do you run WebCiteBot on other wikis? Or is the code public? I'm looking for a way to preserve Geocities references on hu.wikipedia, and Webcite seems like the obvious choice. --Tgr (talk) 19:13, 13 October 2009 (UTC)


 * I do have plans to eventually expand it beyond enwiki, but so far it isn't yet stable here so I haven't pursued the matter. I haven't released the code under GFDL/CC-BY-SA, but even if I did it wouldn't do you much good as it would need to modified to run on a foreign wiki.  Plus each wiki has their own bot policies, so I'd have to investigate that and possibly get approval for each one - in short it is a time consuming matter which I haven't pursued yet.
 * Now, in regards to GeoCites links, you raise a very good point. I won't have a bot up and running to add the archived links to xx.wikipedia before the end of the month, but what I can do it have it go ahead and do the actually archiving for links found at hu. (and elsewhere) so that an archived version of the link will at least exist, which can then be manually updated or bot updated at a later date. --ThaddeusB (talk) 15:46, 14 October 2009 (UTC)

Thanks, I would appreciate if you could do that. (And I would definitely support if you wanted to run WebCiteBot as a global bot, though I suppose compatibility with the various cite templates would be tricky.) --Tgr (talk) 20:14, 15 October 2009 (UTC)

Wikipedia talk:WikiProject Spam/LinkReports/webcitation.org
We should keep an eye on this as well: WikiProject Spam/LinkReports/webcitation.org--Blargh29 (talk) 13:59, 14 October 2009 (UTC)
 * I don't put much stock in those pages since they don't seem to care whether they list various well-known bots (e.g. AnomieBOT gets on their lists fairly often when it fixes orphaned references with a link that their bot doesn't like, and nothing ever seems to come of it). I suppose WikiProject Spam finds some use in them, but what that might be I don't know. Anomie⚔ 16:35, 14 October 2009 (UTC)

Expanded linkrot policy
I think that Wikipedia needs a stronger linkrot policy. A page that explains 1)What linkrot is 2) Why linkrot is a problem 3)What can be done to prevent it (aka WebCite) 4)What can be done to repair it (aka Internet Archive) 5)How to mitigate unfixable rotted links. Some of this is covered by WP:DEADREF, but that information needs to be beefed up.

But, the first step is to rename Dead external links to Dead links, so that the policy is clear that it applies to ALL links, including inline citations, and not just those in the "External links" sections. Please make your comments at Wikipedia talk:Dead external links.--Blargh29 (talk) 03:26, 20 October 2009 (UTC)


 * Wikipedia encourages people to be BOLD, so if you think the current policies need better explanation go ahead and modify them. If you think we need a new set of instructions, go ahead and draft one (in user space if you like) and I'll be happy to look it over.
 * An overhaul of WP:Dead external links is on my agenda (it i horribly out of date and poorly organized), but I have no idea when I'll get to it. --ThaddeusB (talk) 12:56, 20 October 2009 (UTC)

Other wikis
Hi, what about running this bot on other wikis as well? --Nemo 12:13, 27 October 2009 (UTC)
 * Ah, there's already . Well, If you want I can translate templates, check policies, make request etc. for you on it.wiki. --Nemo 12:15, 27 October 2009 (UTC)
 * Great, thanks for the offer. I'll get back to you when I'm ready to port the bot. --ThaddeusB (talk) 14:50, 27 October 2009 (UTC)

Bot Status
No contribs since 11/1. Is the bot down/broken? Or is there just a delay in getting the bot switched back from Encarta/GeoCities archiving to general archiving (we all can understand that real life can get in the way). Just curious.—NMajdan &bull;talk 19:13, 9 November 2009 (UTC)
 * The lack of editing was just do to real life time cramp. --ThaddeusB (talk) 02:09, 10 November 2009 (UTC)
 * When will the bot be operational again?—NMajdan &bull;talk 20:25, 18 November 2009 (UTC)

Looks like the bot made about 200 edits after it was restarted on 11/21 but has again been down for a week. When will the bot be fully operational again?—<span style="font:bold 11px Verdana,sans-serif;">NMajdan &bull;<span style="font:9px Verdana,sans-serif; color:#000;">talk 16:14, 2 December 2009 (UTC)
 * After this week, I will be on real world vacation and have a lot more time for Wikipedia. Thus, you can expect it running full time by next week at the latest. --ThaddeusB (talk) 00:02, 3 December 2009 (UTC)

AHH BOT ERROR!
diff Hey man, saw the bot got up again! Great job, but you have a space before your new references that puts them in a box. Scroll down through the page and you'll see what i mean! Peace,

Tim1357 (talk) 01:31, 23 November 2009 (UTC)


 * Thx for the notice. I have adjusted the code accordingly. --ThaddeusB (talk) 01:46, 23 November 2009 (UTC)

Support for other archiving services besides webcitation.org?
There is a very useful site where articles expire very quickly. Webcitation.org doesn't work with the site. However, freezepage.com, for example, does. Would it be possible to add freezepage support to the bot? (Or maybe some other archiving service that works.) Offliner (talk) 18:51, 30 November 2009 (UTC)
 * One has to be careful with newer archiving sites, as there are legal issues involved with copying content that they may or may not have looked into. That said, I will definitely look into the suggestion. --ThaddeusB (talk) 21:42, 30 November 2009 (UTC)
 * Haven't tried it yet, but I don't think FreezePage is going to be a good alternative. From their FAQ: "To save space on our system, we require that you use your account regularly, i.e. that you log in or visit any page on our site. If you are an unregistered user, you must visit our site every 30 days. If you are a member (sign up for free), we only require you to log in every 60 days. If you don’t, we may delete your account and all the frozen pages in it." - Kollision (talk) 16:10, 23 June 2010 (UTC)

Disappearing source: Editor & Publisher
Wikipedia has over 600 links to the legendary publishing periodical Editor & Publisher, which is now ceasing publication. The list is here. I suspect that the website will soon be shuttered as well. Is WebCiteBOT able to be deployed to WebCite these links before they disappear? Please see Wikipedia talk:Linkrot to help coordinate. --Blargh29 (talk) 03:32, 11 December 2009 (UTC)

I miss you
Hey WebCiteBOT

I was wondering when you would be able to start up again. I know you have been busy with Geocities and all. Good luck

Tim1357 (talk) —Preceding undated comment added 14:52, 10 January 2010 (UTC).
 * He left a comment on his regular user page saying that he is back to normal access after over a month of limited access. So I would assume he would begin on this pretty quick. Fingers crossed its up and running by the end of the month. This is an extremely useful bot and brings a lot to Wikipedia, so its a shame its been down for so long. I wonder if the backlog has continued to build?—<span style="font:bold 11px Verdana,sans-serif;">NMajdan &bull;<span style="font:9px Verdana,sans-serif; color:#000;">talk 22:47, 12 January 2010 (UTC)


 * Yes, your speculation is accurate - I will get the bot back up ASAP. It isn't #1 on my agenda, but its near the top.  The backlog has continued to grow, although at a slower than normal rate since the logging program also wasn't online all the time. I will post a status update w/in a few days. --ThaddeusB (talk) 02:46, 17 January 2010 (UTC)
 * Update? Your talk page says you're about caught up. Hopefully this bot will be back up and running soon.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan &bull;<span style="font:9px Verdana,sans-serif; color:#000;">talk 21:09, 2 February 2010 (UTC)
 * I am hopeful it will be up soon as well. --ThaddeusB (talk) 20:11, 6 February 2010 (UTC)

@WebCiteBOT: Miss you big time! Great job earlier. Hope it will resume soon. Nsaa (talk) 08:54, 9 February 2010 (UTC)
 * News?Tim1357 (talk) 02:58, 13 April 2010 (UTC)
 * Last update.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 13:22, 13 April 2010 (UTC)

URGENT: NY Times and WebCiteBOT
The New York Magazine is reporting that the New York Times is going to cease providing free content and will install a "metered" payment system. Is it possible for WebCiteBOT to archive NY Times articles before this happens?--Blargh29 (talk) 22:39, 18 January 2010 (UTC)


 * Perhaps this is the last gasp of NY Times. This should definitely be a priority. — Huntster (t @ c) 00:19, 19 January 2010 (UTC)


 * It has been announced that the NY Times will begin the pay model in 2011, so we have all of this year to archive NYT articles.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan &bull;<span style="font:9px Verdana,sans-serif; color:#000;">talk 14:42, 20 January 2010 (UTC)


 * Very good, thanks for the update NMajdan. Sometimes companies like to jump into such things quickly...glad this is not the case. — Huntster (t @ c) 21:02, 20 January 2010 (UTC)


 * Let's not panic. IIRC, the NYT plan is to allow IPs a few articles per month. So one could still access them, just not in bulk. --Gwern (contribs) 19:20 2 February 2010 (GMT)

Actually, I agree with Blargh29 that we should do the archiving as soon as possible. Looking at this VPM thread, I see that The Times (of London) has added NOARCHIVE to its pages in anticipation of its move behind a paywall, in which cases webcitation.org will not grab the content. I think there's a reasonable risk that NYTimes will do the same thing. We should archive these pages while we still can. user:Agradman editing for the moment as 160.39.221.164 (talk) 06:33, 4 May 2010 (UTC)

Priority : Archiving of BBC News articles
BBC has announced that several sections of its old websites would be axed and its old content pruned, owing to a funding shakeup to BBC Online. I'm concerned that this is likely to include old versions of BBC News articles dating back to 1999, which an awful lot of articles heavily depend upon for reliable sourcing (some of them the only source, in fact). I think we should start converting them into WebCites before they are removed and then we'll have a huge sourcing problem in our hands. - Mailer Diablo 16:25, 2 March 2010 (UTC)

Times / Sunday Times
More news: The Times / Sunday Times will charge from June. Rd232 talk 07:33, 26 March 2010 (UTC)

WebCiteBOT is not operating
It seems the bot has made no edits since November 2009. Any hints why? User:LeadSongDog come howl  15:36, 13 April 2010 (UTC)
 * Look a couple threads up.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 15:52, 13 April 2010 (UTC)
 * I saw that, but it doesn't give any hints why the bot is down, just says that it is down. There are ways other users might be able to help, with bug reporting, analysis, code inspection/review, test cases, etc, but right now, we're in the dark as to the problem. User:LeadSongDog come howl  16:44, 13 April 2010 (UTC)
 * I hope to have it up again tonight or tomorrow at the latest. --ThaddeusB (talk) 23:14, 25 April 2010 (UTC)
 * Great to hear, Thaddeus. — Huntster (t @ c) 01:02, 26 April 2010 (UTC)
 * That is great, but should a bot request be made to have a second bot that does webcite citations? - Peregrine Fisher (talk) 02:21, 26 April 2010 (UTC)
 * What do you mean, webcite citations? This bot takes newly added references and archives them using WebCitation. I just want to clarify your question before Thaddeus responds.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 12:12, 26 April 2010 (UTC)
 * I mean the same thing that this bot does. I was the one who made the original bot request, so I know how it works.  It's really cool, when it's working. - Peregrine Fisher (talk) 14:35, 26 April 2010 (UTC)
 * Ha. Ok. You're wanting a duplicate bot. And I would agree. A duplicate would be nice. Once this one gets up and running and stabilizes, I'd like to see the ability to do certain articles on demand. Sorry for the confusion.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 15:03, 26 April 2010 (UTC)

User:WebCiteBOT/Stats
It says it's done about a million links this year, but looking at its contributions, it seems to be around 500. Any idea what's going on, or am I reading it wrong. - Peregrine Fisher (talk) 03:48, 29 April 2010 (UTC)


 * That's how many website URLs it has collected from articles, I believe. It then sorts through them and attempts to archive them. — Huntster (t @ c) 05:46, 29 April 2010 (UTC)


 * I was just about to ask that as well. What Huntster said was my conclusion as well. It seems this whole time WebCiteBOT has been down, it has still been collecting new references in its database which it will archive when it comes online. He definitely has his work cut out for him!—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 15:11, 29 April 2010 (UTC)


 * Looking at the last entries in WebCiteBot's contributions and log they are available through webcitation.org's query page. It looks like it's a simple issue of the bot no longer requesting the additions since November 2009. If the bot's maintainer can't spare the time to get it running, there may be no alternative to creating another bot. I must say though that recently webcitation.org's parent organization University Health Network has been devoid of any mention of webcitation. Perhaps someone should contact Gunther Eysenbach to find out what's going on? LeadSongDog come howl  16:47, 29 April 2010 (UTC)


 * Why don't we get a second bot going? Even if this one worked perfectly, we should probably have a backup.  It looks like there are several people aware of the situation right now, so we could work on it together (it's mostly just a bot request).  I can make the request, but I don't spend much time on wiki anymore.  So, if someone who's more active would do it, that would be best.  If no one else wants to do it, I'll try and get it done in the next week or so, but as I said, I'm not that active.  We should probably drop Thaddeus a line on his talk page as well.  I think he doesn't give out his code (or free license it), so the new bot may have to be created from scratch.  But, if we ask nice, he might help someone else get up and running really quickly. - Peregrine Fisher (talk) 03:20, 30 April 2010 (UTC)


 * We probably do need a second WebCiteBOT. Primarily because there are so many sources on Wikipedia that one bot can't be expected to handle them all. And lately, with GeoCities, Encarta and now the Times, there is constantly some source that is going offline that requires priority handling that the general references don't get archived. Also User:WebCiteBOT has been offline since November and the bot running has posted numerous times that the bot would be running again soon but nothing has ever happened (here, here and here). The bot operator was busy in the real world for the last part of 2009 and first part of 2010, but it appears he has resumed normally Wikipedia activity so it is obvious this bot is just very low on his priority list. Because of the good this bot can do, I would really like to see a second bot, even if the current bot resumes normal activity. The current bot has a database of references that have been added since it went live, but (I don't believe) it will archive references that existed before it went live, so a new bot could handle those. Now, I don't know how to create bots, but I will support your request if you do request one.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 13:21, 4 May 2010 (UTC)

(redent) All you have to do is make a friendly request at Bot requests in Plain English. Other people do the actual coding. - Peregrine Fisher (talk) 18:16, 4 May 2010 (UTC)


 * I am aware of that. I've made a similar request there before so I think it best if someone else handles the request.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 18:29, 4 May 2010 (UTC)


 * Looking again at my previous request, it seems WebCiteBOT's owner said that WebCitation limits bots to one query every five seconds. Now, I do not know if that it limits each bot to one query every 5 seconds or every bot to one query every 5 seconds. If the former, then no big deal. If the latter, then the two bot owners would have to tailor their bots to not violate this. For instance, the two bots would probably have to limit their bots to one query every 10 seconds. Regardless, we really do need at least one bot up and running.—<span style="font:bold 11px Verdana,sans-serif;">NMajdan •<span style="font:9px Verdana,sans-serif; color:#000;">talk 14:13, 5 May 2010 (UTC)
 * As a simple suggestion, building the database can be done by one or both bots, independently from sending requests from it to WebCitation. I'd suggest both bots should have an active and a watchdog mode. In the latter mode, perhaps send just one request an hour, to be sure that a)both bots are working and b)WebCitation is working. In the former mode, burn through the database as fast as WebCitation.org is willing to let us go.
 * It's also worth asking WebCitation.org if it would make a difference to their server loading if we send the requests grouped (or ungrouped) by host order. One obstinate host might tie them up a long time in the first case, or they may already have code to manage that. LeadSongDog come howl  22:13, 5 May 2010 (UTC)
 * It could also help if ThaddeusB ran the bot soon Smiley.svg. Thaddeus, do you think you could graduate the bot to a Crontab job so we don't have to bug you to run it? Tim  1357  <sup style="font-family:Times new roman; font-size:small;">talk  21:11, 13 May 2010 (UTC)

What has Thaddeus said lately? I missed it if he's commented somewhere. - Peregrine Fisher (talk) 04:41, 14 May 2010 (UTC)
 * He's still active on other articles. LeadSongDog come howl!  05:51, 14 May 2010 (UTC)

UPDATE 2010 Aug 25 - Related conversation at Bot requests/Archive 37 - Hydroxonium (talk &#124; contribs) 01:08, 25 August 2010 (UTC)

Ah, this is terribly discouraging. I guess we'd better get moving on another bot.LeadSongDog come howl!  20:24, 1 October 2010 (UTC)

UPDATE 2011 Feb 23 - ThaddeusB posted ":Note: I have just returned to Wikipedia and hope to have the original WebCiteBOT back up and running within the next few days unless people object to me doing so here --ThaddeusB (talk) 23:33, 22 February 2011 (UTC)".  —  Jeff G.  ツ  22:33, 23 February 2011 (UTC)

UPDATE 2011 May 24 - Unfortunately, Special:Contributions/ThaddeusB tells us that good intentions came to naught. We have to get on with an alternative.LeadSongDog come howl!  16:53, 24 May 2011 (UTC)

http://webcitation.org/ at 10:24, 12 May 2010 (UTC): "WebCite is currently under maintenance We will be back up soon. "
Another reason why we should build our own. I am having nightmares that one day they are going to break the entire (scholarly) internet.

Anyhow back to exams... AGradman / talk.

Little Thetford and WebCite
Running Little Thetford through the WebCite comb produces 673 possible URL's, each with a tick box. Many of them are duplicates and many others are wikilinks or wikipedia maintenance pages. Manually trawling through each identified URL will take a while. A little research revealed a potential solution, WebCiteBot, which, if I understand correctly, will do the job for me. In particular, I believe it will archive the references that contain urls and then edit each such reference to include archiveurl and archive date! Magic! Is the bot still working? Is there a version that can be targeted on one page? How much money does the author of WebCiteBot want? --Senra (talk) 13:14, 2 August 2010 (UTC)

webcitebot can make wikipedia money!
I posted this project on the Bounty Board, as explained in this post at the village pump. Good luck and best wishes. AGradman / talk / how the subject page when I made this edit 18:13, 11 November 2010 (UTC)

WebCiteBOT edit 'Belgium' article on 31 October 2009
WebCiteBOT edit |(dif) inserted <B> |archiveurl=http://www.webcitation.org/5kwPxLurr|archivedate=2009-10-31|deadurl=yes </B> twice. The first insert directs to the Encarta Encyclopedia as expected, the identical second also - as definitely NOT expected: it should show (the content of) a .pdf from an entirely different source that is (at least today) a dead url. Can this still be corrected for the proper webarchive retrieval?

Please, find the cause of this apparent malfunction, and try to find out where else it may have occured so as to correct it there as well. ▲ SomeHuman 2011-01-28 17:04 (UTC)