Digitizing Old Books Using Human Computation and reCAPTCHA, with Luis von Ahn

Luis von Ahn, CEO of Duolingo and one of the inventors of CAPTCHA, explains how reCAPTCHA harnesses the abilities of both humans and computers in order to accomplish tasks such as the digitization of old books.

Luis von Ahn: So human computation, the idea, is that there are problems that computers cannot yet solve. It's funny because some of these problems are very simple problems seemingly. For example, a computer cannot tell you what's inside an image. They can tell you somethings but it can't really quite tell you there's a cat next to a dog and they're both running. A computer can't do that. Well humans, we can do it super easily. And there are many things that computers cannot do that humans can. Conversely, there are also things that computers can do that humans can't do. I mean computers can multiply humongous numbers, humans may be able to do it but very slowly and we're error-prone. And so the idea with human computation is to combine both humans and computers together in a very large scale to solve problems that neither can solve alone.

My project that has been used by most people is a project called reCAPTCHA, where the Idea with reCAPTCHA is that we take a problem that neither humans nor computers can solve by themselves, which is fully digitizing books. The idea there is we would like to digitize books. And the way this process works is you start with a book and then you scan it. The next step in the process is that the computer needs to be able to decipher all of the words in this picture. It's a picture of words. The computer needs to be able to decipher all of those words. The problem is that sometimes the computer cannot decipher these words because for older books the ink has faded a little or the pages have turned yellow so the computer cannot decipher all of the words. But, humans can. So what we're doing with reCAPTCHA, If you've ever seen these distorted letters that you have to type all over the Internet, for example, when you buy tickets on Ticketmaster or whenever you get a Facebook account or something you have to type these distorted characters. That thing is called a CAPTCHA and I was one of the people who helped invent it. And the reason it's there, there's a primary purpose, which is to make sure that you're a human and not a computer. And it's because humans can read these squiggly characters but computers can't. This is a security mechanism and it has been there for a while, but at some point I realized its second use, which is helping to digitize books. The idea is that some of these words, nowadays some of these words are words that are actually coming from books that the computer could not recognize in this process and we're using what people enter to help us digitize the books. So that's the idea.

And so this is a project where it's about 1.1 billion people in the world have helped us digitize at least one word out of a book using this. So here we're taking a very large number of humans to do precisely the step that computers cannot do in the book digitization process. This is a company that was bought by Google, by now Google is digitizing the equivalent of about 2 million books a year with basically humans typing every now and then some of the words through CAPTCHAs all over the Internet. So that's the idea of human computation.

Directed/Produced by Jonathan Fowler, Elizabeth Rodd, and Dillon Fitton

 

Back at the beginning of the century, Luis von Ahn helped invent CAPTCHA, the online security device featuring squiggly letters that you have to re-type in order to prove you're human. In 2007, von Ahn invented reCAPTCHA, a new form of CAPTCHA that serves a second purpose: the digitization of old books.


In this video clip, von Ahn describes how reCAPTCHA works while discussing the power of human computation, a term he helped coin that describes the harnessing of both human and computer abilities in order to accomplish difficult tasks.

Photo: Luisa Conlon , Lacy Roberts and Hanna Miller / Global Oneness Project
Sponsored by Charles Koch Foundation
  • Stories are at the heart of learning, writes Cleary Vaughan-Lee, Executive Director for the Global Oneness Project. They have always challenged us to think beyond ourselves, expanding our experience and revealing deep truths.
  • Vaughan-Lee explains 6 ways that storytelling can foster empathy and deliver powerful learning experiences.
  • Global Oneness Project is a free library of stories—containing short documentaries, photo essays, and essays—that each contain a companion lesson plan and learning activities for students so they can expand their experience of the world.
Keep reading Show less

Ashamed over my mental illness, I realized drawing might help me – and others – cope

Just before I turned 60, I discovered that sharing my story by drawing could be an effective way to both alleviate my symptoms and combat that stigma.

Photo by JJ Ying on Unsplash
Mind & Brain

I've lived much of my life with anxiety and depression, including the negative feelings – shame and self-doubt – that seduced me into believing the stigma around mental illness: that people knew I wasn't good enough; that they would avoid me because I was different or unstable; and that I had to find a way to make them like me.

Keep reading Show less

Sexual activity linked to higher cognitive function in older age

A joint study by two England universities explores the link between sex and cognitive function with some surprising differences in male and female outcomes in old age.

Image by Lightspring on Shutterstock
Mind & Brain
  • A joint study by the universities of Coventry and Oxford in England has linked sexual activity with higher cognitive abilities in older age.
  • The results of this study suggest there are significant associations between sexual activity and number sequencing/word recall in men. In women, however, there was a significant association between sexual activity in word recall alone - number sequencing was not impacted.
  • The differences in testosterone (the male sex hormone) and oxytocin (a predominantly female hormone) may factor into why the male cognitive level changes much more during sexual activity in older age.
Keep reading Show less

What the world will look like in the year 250,002,018

This is what the world will look like, 250 million years from now

On Pangaea Proxima, Lagos will be north of New York, and Cape Town close to Mexico City
Surprising Science

To us humans, the shape and location of oceans and continents seems fixed. But that's only because our lives are so short.

Keep reading Show less

Scientists are studying your Twitter slang to help AI

Mathematicians studied 100 billion tweets to help computer algorithms better understand our colloquial digital communication.

Photo credit: Getty Images
Technology & Innovation
  • A group of mathematicians from the University of Vermont used Twitter to examine how young people intentionally stretch out words in text for digital communication.
  • Analyzing the language in roughly 100 billion tweets generated over eight years, the team developed two measurements to assess patterns in the tweets: balance and stretch.
  • The words people stretch are not arbitrary but rather have patterned distributions such as what part of the word is stretched or how much it stretches out.
Keep reading Show less