Talk:ReCAPTCHA/Archive 1

Where are the results?
This article, and the FAQ on their site, repeats the mantra that recaptcha is a wonderful idea because instead of just wasting people's time with useless captchas, this one actually does a good thing - help digitize content. This would have been true if this content would have somehow been available. But where is it? Are there any books freely available thanks to recaptcha? The article says they are working now on the new york times - where are the results? Are they freely available? If all the results remain hidden by some company, then I don't see what the benefit this gives to humanity. A simple one-word captcha would have sufficed... 84.111.127.77 (talk) 21:48, 27 May 2010 (UTC)

I agree. This is very important missing info that urgently needs to be added. Are we all unknowingly helping google become richer? Would someone pleeeease add the info? 190.100.246.23 (talk) 07:49, 2 June 2010 (UTC)
 * I am that someone. I finally added this information into the summary. :-) --84.159.190.96 (talk) 06:04, 19 February 2013 (UTC)
 * I've cut this, as it seems like a stretch of WP:SYNTHESIS by itself. If no source has commented on the apparent irony, then Wikipedia shouldn't either - it's our own point of view to assume that something "unfair" is going on here (when it's possible that cleanly-scanned text is a rare commodity and NYT are generous in providing it) and if nothing else, it might not be accurate ("many of The New York Times’ archived articles are behind a paywall", but maybe they're releasing the ReCAPTCHA-scanned ones). --McGeddon (talk) 09:31, 19 February 2013 (UTC)

¿Are google ebooks the result of the use of recaptcha, including the PAID ones? —Preceding unsigned comment added by 200.120.83.94 (talk) 22:04, 7 December 2010 (UTC)

http://www.nytimes.com/ref/membercenter/nytarchive.html

pay for the results here — Preceding unsigned comment added by 190.134.5.228 (talk) 08:15, 9 June 2011 (UTC)

reCaptcha Solver
NOTE: reCAPTCHA.net has officially been solved algorithmically. Here is the slashdot article  It was also presented at DEFCON 18, the largest hacking convention in the world. —Preceding unsigned comment added by 98.250.57.53 (talk) 04:24, 6 August 2010 (UTC)

Why some editors are removing the reCaptcha solver part ? If there is a software exists which defeats reCaptcha, we should not hide that fact. Lets discuss here first. —Preceding unsigned comment added by Deepbluepanther (talk • contribs) 13:14, 23 November 2009 (UTC)

I searched Google with the key "reCaptcha Solver" and got the site www.rajtuhin.com in results in first page. Why some editors are saying google do not have any data for this ? —Preceding unsigned comment added by Deepbluepanther (talk • contribs) 13:41, 23 November 2009 (UTC)


 * rajtuhin.com doesn't qualify as a reliable source - please see WP:N and WP:RS for more information. Laurent (talk) 13:51, 23 November 2009 (UTC)

I added a section on Security. I properly sourced the references in the slashdot article that is based on Chad Houck's presentation to the Def Con 18 conference. I don't think that links to programs that defeat reCaptcha are merited when the research papers showing its possible are available. In an effort to maintain neutrality, I've attempted to write my section without bias adding some of the positive aspects of reCaptcha security. --Sully343 (talk) 23:57, 17 November 2010 (UTC)

AfDM
Why on earth would this article be deleted? It describes a real tool used by thousands (maybe millions) of people. The article should very much remain. --Thorwald 00:31, 3 July 2007 (UTC)


 * Neither can I see a reason for deleting this article. And before deleting it completely, I'd merge it into Captcha. Glaubigern 11:41, 6 July 2007 (UTC)


 * This article looks like the features page of a product website. More information on its history, adoptation, etc. needs to be added htmnssn 05:08, 10 October 2007 (UTC)


 * As the creator of this article, I actually agree that it is spam...but I did not create it that way! When I created it, I made sure to write a whole section titled "Disadvantages". It was a specific anonymous user that censored that section and thus turned this article into an advertisement. You are welcome to restore that section. -Lwc4life (talk) 14:33, 22 December 2007 (UTC)

NOTE: reCAPTCHA.net has officially been solved algorithmically. Here is the slashdot article  It was also presented at DEFCON 18, the largest hacking convention in the world. —Preceding unsigned comment added by 98.250.57.53 (talk) 04:23, 6 August 2010 (UTC)

Real?
What I wonder about: Why do the captchas look like ordinary captchas? If they are from digitized books, they shouldn't have those strokes in them.Or are the unrecognizable words additionally distorted?
 * Yes, more distortion is added. 201.212.190.133 (talk) 15:10, 22 February 2008 (UTC)

if there is an ureaddable word just type in anything you want recaptcha cant tell the diffrence. —Preceding unsigned comment added by 173.242.246.70 (talk) 23:53, 7 August 2010 (UTC) Second thing I wonder about: How do they avoid using really unreadable words? A captcha used on a real page must be recognizable for the user in every case. But scanned text can be distorted to whatever degree.
 * I have seen really unreadable words. You just press Refresh to get a new one, or do your best try and get a new one if it's wrong. 201.212.190.133 (talk) 15:10, 22 February 2008 (UTC)
 * Unreadable? Like this? ;)  216.138.230.98 (talk) 02:41, 17 September 2010 (UTC)
 * I've even seen things that aren't letters at all.

If Wikipedia wouldn't stress this is a reputable university project, I would have said, this is a spammer trying to trick people to solve captchas for them. Please enlight me. --:Slomox:: &gt;&lt; 23:13, 14 February 2008 (UTC)
 * Why are the pictures further distorted? If they are words that current software is unable to read in the first place then there wouldn't be any need to further distort them would there? And if they are distorted to be used on websites as working security measures then these websites must have a check on wether it was correct or not. And if that check is in place then anything you do to help is pointless since that perticular word have already been translated/transcribed. --83.249.208.52 (talk) 12:50, 17 July 2009 (UTC)
 * Because you don't have to *reliably* guess the correct word in order to break the captcha. Say a word can be OCR'd 10% of the time (without added distortion). That's not good enough for the server because it needs to know 100% what the word is, but it *is* good enough for the evil client because it can just make 10 guesses. 155.198.65.73 (talk) 13:15, 18 March 2010 (UTC)
 * I believe they distort it to make it difficult to automaticly identify which word came from an actual book and which is the word they are already 100% sure what it is. Since they don't know for sure the correct response for one of the words, if it was easy to for computers to tell which is the unknown word, an attacker could put all efforts into the other one and just write random gibberish for the unknown and still get a bigger chance of getting in faster/cheaper than if it was a human doing it. --TiagoTiago (talk) 00:42, 10 September 2011 (UTC)

If the example graphic is a real example, and the stroke through the middle and the waviness have been added, then it doesn't say much for state-of-the-art OCR that these words couldn't be automatically deciphered in their original state. I thought the technology was a lot better than that these days...!! Matt 22:50, 3 March 2008 (UTC). —Preceding unsigned comment added by 86.142.110.150 (talk)
 * Modern OCR software isn't amazingly good at recognition, they recognise the word and if it isn't in the dictionary they look for the closest similar words, then guess which similar word is likely to fit in the sentence ("verbed 15 nouns" is more likely than "verbed is nouns"). This approach falls down when you have new words or words without context 86.156.199.238 (talk) 03:49, 14 June 2008 (UTC)

Here's a better question: If the word ends up in reCaptcha because it can't be recognized, how does the computer know if the user entered the word correctly or not? 129.2.231.243 (talk) 19:34, 5 May 2008 (UTC)
 * They send the same words out to different people and keep the answer that agrees. 86.156.199.238 (talk) 03:49, 14 June 2008 (UTC)
 * If you're lucky and get a word that hasn't been shown too many times already for them to have a consensus you can just write gibberish for that word and focus on reading the other correctly, and the system wouldn't be able to tell if the gibberish doesn't match. --TiagoTiago (talk) 00:42, 10 September 2011 (UTC)
 * Unless I am reading the article wrong, they send out two words, one UNKNOWN and one KNOWN. If the user answers the KNOWN word incorrectly, they fail the captcha. However, if the known word is typed correctly, the user passes the captcha regardless, and the ansswer to the unknown word is passed to the captcha service for evaluation as a possible solution (explained in the article as a points based evaluation). So to "pass" the unknown word incorrectly would also require them to correctly type the known word anyway. In terms of how a computer passes or fails a user on the test, all it does is compare the answer you give it to the answer it KNOWS is right on one of the words. MrZoolook (talk) 08:51, 17 September 2011 (UTC)

But then, it's not using wasted "most precious resource: human brain cycles", it's asking for additional human brain cycles to resolve their digitalization problem while using the same method as any other captcha to detect bots... if they take out the words that their OCR's can't read then they have the same security but easier for users? — Preceding unsigned comment added by 83.35.27.55 (talk) 13:58, 21 March 2012 (UTC)

logo
they should have a logo on it —Preceding unsigned comment added by 76.68.10.113 (talk) 00:40, 29 April 2008 (UTC)

I got permission from Recaptcha to use their logo. However, I do not know how to put it on. —Preceding unsigned comment added by 67.142.130.12 (talk) 19:42, 30 May 2009 (UTC)

ok i added a logo and then moved the previous image to the operation section--Tim1357 (talk) 01:56, 2 August 2009 (UTC)

Where is the reCaptcha Data??
Does anyone know where the plaintext data from reCaptcha is? There is nothing on this webpage about it, no link. Also, there is nothing in the F.A.Q. or the wiki on recaptcha.net either, it basically just explains how to implement the API! 76.14.42.156 (talk) 22:18, 15 August 2008 (UTC)

Couldn't this be abused?
Theoretically, since I would assume it's random which of the 2 words is known, couldn't you get lucky and intentionally type one word wrong, but still pass the captcha and mess up the results? 96.227.180.75 (talk) 00:43, 7 August 2009 (UTC)


 * Yes, and it has been done before, by Anonymous, to hack the Time 100 Most Influential People poll. This may be worth mentioning in the article... Oops, I lied --64.9.97.44 (talk) 01:51, 7 August 2009 (UTC)
 * No, the unknown words are run through the recaptchas multiple times, and if the two answers match up, then it moves on to the next one.--Tim1357 (talk) 16:37, 13 August 2009 (UTC)

Yes i abuse it evry day in protest. it's easy. —Preceding unsigned comment added by 173.242.246.70 (talk) 17:45, 6 August 2010 (UTC)

As of 2010, its well known how to guess the control word for recaptcha. It is NOT a previously validated unOCRable word. The words are now randomly generated syllables, not from english or any language but pronounceable gibberish in english, never containing punctuation or numbers or uppercase letters. I guess this appears to be to defeat dictionary, or word frequency enhanced recaptcha beating OCR algorithms. Try to find RSes on this info and add it to the article.24.90.127.71 (talk) 20:15, 5 September 2010 (UTC)

My proposal for Wikimedia to use reCaptcha
See my proposal for wikipedia to use reCaptchas here--Tim1357 (talk) 16:37, 13 August 2009 (UTC)

Validation of "unknown words"
There is one thing I cannot figure out:
 * "The system assumes that if the human types the control word correctly, the questionable word is also correct."

That would mean that the user actually needs to type only one of the two words correctly to pass through the captcha. However it does not seem to be the case. The Science 2008 article is not very clear about it: they merely state that "suspicious words" are sent to multiple users, but that doesn't answer how the system can recognize the word as correct in the beginning of the process? A clarification would be useful. Calimo (talk) 14:00, 10 April 2010 (UTC)
 * Only one, the known, word needs to be correct. Obviously it would be hard to guess which one that is--DieBuche (talk) 13:07, 28 May 2010 (UTC)

no it's easy actulyy cause they are obvious.

4chan abusing reCaptcha
People on 4chan are encouraging each other to replace the unidentifiable word (now very obvious due to the high traffic through the site, and the illegibility of some of the Captchas) with N*****. As the article says, if enough people do this, it will be accepted as the proper word and we'll have NYT articles with racist words all over the place. 203.206.64.167 (talk) 17:46, 31 August 2010 (UTC)
 * This won't work because 4chan just doesn't represent very many people. To successfully fake one of the inputs, two or three 4chan users would need to all see the same word. As the article says, over 100 million recaptcha images are displayed daily. I don't think any internet group is large enough to have a non-trivial chance of faking it. Even were they able to accomplish this, it would be trivial to set up a filter such that any suspicious solutions were flagged for verification, and for all we know they already HAVE such a filter. 203.217.150.69 (talk) 00:07, 21 January 2011 (UTC)
 * I guess you are right, but on the other hand it is a fact that 4chan has a very fast posting rate per user due to the extremely low quality standards.--81.223.23.159 (talk) 09:12, 26 January 2011 (UTC)

Non-Profit?
Is ReCAPTCHA a non profit organization? If no, why are websites implementing it instead of any other service that offers the same? There must be enough non-profit organizations that are trying to digitizing old texts.

Is the only reason that the website administrator can outsource the work of managing a captcha?

Anyway it should be mentioned if it is a profit/non-profit organization.

138.246.51.109 (talk) 15:56, 12 November 2010 (UTC)

the nytimes is profiting from all the recaptcha you've been filling for free

http://www.nytimes.com/ref/membercenter/nytarchive.html — Preceding unsigned comment added by 190.134.5.228 (talk) 08:09, 9 June 2011 (UTC)

Agreed - and what's more neither Carnegie Mellon nor Google emailed web masters to inform them when the service was sold to Google. This wasn't just a change of ownership, it was a change of social purpose. End result is since last year web masters like myself have thought they were contributing to an academic exercise when in fact they've been contributing to Google, NYT, etc. Surely a worth-while note in the "Criticism" section if ever it returns — Preceding unsigned comment added by 124.180.219.104 (talk) 13:46, 1 February 2012 (UTC)

Removing Criticism Section
It has no references and makes little sense. Ftc08 (talk) 05:14, 19 January 2011 (UTC)


 * Fair enough. Coverage of criticism is welcome so long as it is referenced to reliable sources and not disproportionate. It is certainly not for individual editors to air their personal grievances. --DanielRigal (talk) 15:59, 22 January 2011 (UTC)

Fair enough? NO NO NO! Want references? Type "reCaptcha annoying" or "reCaptcha disability" into Google and there you have your references. Why not add some references like http://www.bbc.co.uk/news/magazine-18367017 instead of deleting a valid section? What you have done by requesting and/or doing deletion is to lazily censor valid criticism instead of improving the article. What an outrageous abuse of Wikipedia. I would put a criticism section back in, but suspect it would be deleted and so why should I bother. Seriously, this is utterly unacceptable. Zctyp18 (talk) 23:21, 5 January 2014 (UTC)

Privacy implications
There seems to be no mention of anything about the potential privacy implications of using such a third party service, compared to using a local server-side implementation. Users must accept that their HTTP-Referer (and potentially other information, this has to be verified) tell the remote party (in this case Google) the site they are visiting. This then becomes similar to using third party counters or ad services (and in some contexts may be called a web bug). However, although proxy administrators may easily block ad syndication and counter sites, blocking a captcha provider also prevents the sites that utilize them from functioning properly. Fortunately, proxies can also forge HTTP-Referers, but that's hardly a default or common configuration, and it still discloses the IP address and browser information (some browsers also disclose OS version, installed plugins via the Accept header, etc). Basically it gives a lot of power to the third party service provider. In fact, even the server becomes vulnerable to the employees of such a third party, since they rely on it for a user-verification feature, but that's another topic. Ah, the joys of outsourcing... 66.11.179.30 (talk) 06:39, 24 January 2011 (UTC)
 * This is not specific at all to ReCaptcha. All third party tools on the web are affected. This topic would deserve its own article rather than a section here. Calimo (talk) 08:10, 24 January 2011 (UTC)
 * I agree that it's a more general topic which is (or should) be described elsewhere. However, considering this article has a "Security" section, it should probably still link to the relevant article(s) with a short note about the issues.  This article is not about any web feature (but one which in part controls user access, and is popular), nor any owner company, so this information becomes of increased relevance.  It appears that some recommend that Wikimedia itself use the service in the future, too (of course that's off-topic, but an example of the service's popularity and the potential implications). 66.11.179.30 (talk) 08:52, 24 January 2011 (UTC)

Also, it should be added that reCaptcha can be a bad thing as a user may be used to solve another's captcha. We don't know if the second word is "a book that needs digitizing" or some other captcha from another legit site. So if you want entry to this site with recaptcha, you allow captcha to BOT another site's captcha. — Preceding unsigned comment added by 141.237.108.140 (talk) 13:47, 18 March 2013 (UTC)

Facebook not using re-Captcha anymore
I tried creating a new account on Facebook, after email and name validation, a captcha is required and it doesn't seam to be a re-Captcha. If I remember well it was the case before. The article should be updated? — Preceding unsigned comment added by Pannini (talk • contribs) 18:04, 19 February 2011 (UTC)

It seems to be inconsistant, sometimes it's used and sometimes not. At a guess, Facebook was exhausting the recaptcha resources. Facebook alone would be hitting millions of these every day. 49.193.43.13 (talk) 07:47, 29 June 2011 (UTC)

Innacurate images
The "captcha challenges" in this article, in both the 2007 and 2009 examples, are not a single challenge as the text makes them out to be but the computer generated words from two separate captchas put onto a single image. This is relatively easy to tell with experience, and is both inaccurate and annoying. 66.183.75.131 (talk) 12:42, 25 April 2011 (UTC)


 * The spacing looks a little wide in the 2007 example, but that's subjective; I wouldn't call the image a combination of two separate captchas based on that alone. —C.Fred (talk) 15:38, 25 April 2011 (UTC)


 * Commonly however, there is little to no distortion added to the "real" book word. On both the images, the distortion usually reserved for computer generated words is applied, leading me to believe that it is a composite. 70.78.12.203 (talk) 00:23, 1 May 2011 (UTC)


 * I'd like to draw attention back to this, in addition reCaptcha has changed yet again. To use an unfortunately horrid example, but the only one I can think of with a prominent reCaptcha, 4chan.org (NSFW) has the latest captcha. You can also see form that one is a scan and the other is computer generated. 173.180.124.102 (talk) 07:38, 3 August 2011 (UTC)


 * Untrue. I made the second image and it is not manipulated at all. I assume you did *read* the article right? You know recaptcha works by providing two separate words? TimmmmCam (talk) 17:43, 31 August 2011 (UTC)


 * Actually, he means one of the two words/phrases is undistorted because it is a scan from a book. This is true, I have never seen a captcha where both words were distorted in the manner shown here. 68.0.150.57 (talk) 03:48, 24 June 2012 (UTC)


 * These images are accurate for the time they were captured. They used to distort both words. I believe that led to some unsolvable words and why it was eventually changed. ReCAPTCHA has been through so many iterations it would be nice to have a timeline of pictures to show its evolution.Sully343 (talk) 05:59, 12 September 2012 (UTC)

New York times finished?
The first paragraph of the article mentions "Twenty years of The New York Times have been digitized and the project planned to have completed the remaining years by the end of 2010." Anyone know if this has been completed? The page should probably reflect more recent progress and projects. Paulish (talk) 05:26, 13 July 2011 (UTC)
 * Second this. Zeyra (Zeldakitten or so) 18:41, 5 August 2011 (UTC) — Preceding unsigned comment added by Zeldakitten (talk • contribs)
 * According to the website and the NYT, they are now working on Google Books. It's unclear whether the Timees project is actually finished, though.  8/  -- Beland (talk) 19:06, 16 March 2012 (UTC)

ReCaptcha a service of Google Inc?
It would appear that ReCaptcha is operated by Google Inc, but the article has no mention of this. E.g. http://www.google.com/recaptcha/terms - see section 4, if the URL does not make it clear all by itself.
 * Yeah, nowadays recaptcha.net forwards to http://www.google.com/recaptcha and Luis von Ahn's website mentions that reCAPTCHA and the ESP game have been acquired by Google, though I can't find any mention on when that happened and on what terms. Does anybody know?