Are You Likely to Commit a Crime? Here’s What Google Data Reveals

Data Scientist, Contributing Writer at The New York Times

Are you a future criminal? You might not think so, says data scientist Seth Stephens-Davidowitz, but what do you look like on paper? Have you ever searched something suspicious online? Ever been curious about a dark topic? Just like the film Minority Report, where "future murderers" are arrested before they commit their crimes, we have a similar predictive tool ready-made: Google's search data. People really do search for things like 'how to kill your girlfriend' or 'how to dispose of a body', but as Stephens-Davidowitz points out, it’s not supposed to be illegal to have bad thoughts. Beyond privacy and ethics, data science also backs the idea that you can't predict with any accuracy who will commit a crime, as he says: "a lot of people have horrific thoughts or make horrific searches without ever going through with a horrific action." Data also provides intriguing correlations about who or won't will pay their loans based on a single word used in their loan application, and reveals the questions people in the Bible Belt are too afraid to ask aloud. This kind of data in the wrong hands can leave people vulnerable to discrimination or worse, if society lets its ethics slide. Stephens-Davidowitz is the author of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are.

  • Transcript


Seth Stephens-Davidowitz: A big question—if anybody’s seen the movie 'Minority Report' where people are arrested for crimes before they actually commit them, just because the data suggests they’re going to commit a crime—is: are we entering this world with so much data available? And there definitely are clues on the internet that people are considering committing a crime. People really do type “how to kill your girlfriend” on Google or “how to commit a murder”. So what should we do with this information as a society? I think we have to be really, really careful. There’s an ethical and privacy reason to be careful; as a society it’s not supposed to be illegal to have bad thoughts.

But I think there’s also a data science reason for this. One of the things that you do see in this data is that a lot of people have horrific thoughts or make horrific searches without ever going through with a horrific action. So it may be that when we have all this data we think we’re just going to be able to figure out exactly who is a risk of committing a crime or doing something bad, but it may be that it’s just really, really hard because a huge percentage of people look really, really bad on paper but never go through with the action.

There is a study of Prosper, a peer-to-peer lending firm. So people can apply for loans, and scholars analyzed what people wrote in their loan application, and whether they paid back their loan. And they found that you could predict whether someone will pay back the loan based on the words that person used in their loan application. So if a person uses the phrase “I promise” they’re much less likely to pay back a loan, because I guess everybody lies, so “I promise” is a clue that you’re not going to pay back the loan. And one of the more striking indicators, one of the single highest indicators you’re not going to pay back the loan, is if you use the word “God” in your loan application. And this is kind of a little bit eerie and suggests a potentially dark future. It means that someone, a lender, would be “wise” to not give a loan to anybody who mentions God. If someone says, “God bless you” in a loan application they’re put together in a large group of other people who tend, on average, not to pay back their loans.

So there’s real danger to some of this big data where a lot of the correlation—everything kind of correlates with everything else, and sometimes for reasons that we don’t understand, some words people use, or likes they have on Facebook, predict that they’re going to do bad things, even if they’re not really going to do bad things, and they may be punished without even realizing why.

One thing you see in the Google search data related to religion is the questions people have, and they’re usually concentrated in the Bible Belt. But people have kind of loaded questions about God. So “why does God allow bad things to happen to good people?” or “why does God allow suffering?” or “why does God need so much praise?” These are questions that people might not raise aloud because they don’t want to share their doubts with others, but they turn to Google and ask some really, really loaded questions about some of the stories that they hear related to religion.