Wednesday, June 27, 2007

reCAPTCHA & Other Reads

You know how leaving comments on people's blogs sometimes involves having to figure out some squiggly alphabets that are hard to read? Well, I just learned that some of these are linked to reCAPTCHA, a programme that digitises old scanned texts by having humans solve these 'puzzles' around the world, all the time. It's pretty interesting and if you go to the site, you can actually help digitise scanned words. (After 20 'successes', I also realised it's easy to get addicted to reCAPTCHAing.) Anyway, never once did it cross my mind that all those annoying validation processes potentially have a noble cause.

Two other pretty good reads:
The Worst Jobs in Science 2007 (from Popular Science) - think 'Whale-Feces Researcher' or 'Elephant Vasectomist'
The Formula (from New Yorker) - what if we built a machine to predict hit movies?

2 comments:

HuiChuan said...

cool! yeah, heard about it too, that we're actually helping to digitise scanned text. not really read about it's methods and all yet, but i find it strange that when typed incorrectly, the computer rejects it. Does the computer already recognize the text?

a passing cloud said...

from the reCAPTCHA website:

"But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR (Optical Character Recognition) is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct."

But some sites just give you random alphabets mixed with numbers, not actual words. I hate those and always get 'em wrong.