Do Your Google Searches Reveal the Real You?

Data Scientist, Contributing Writer at The New York Times

Seth Stephens-Davidowitz has a sneaking suspicion that everybody lies. Instead, we seem to be far more honest with a website than with each other. The things that people type into the Google search bar, Stephens-Davidowitz says, reveal far more about a person than any in-depth interviewer could ever dream of. Even how racist someone can be. What's alarming is that prior to the 2008 election, Stephens-Davidowitz saw a big uptick in racist searches coming from alarming places. He had expected the South to make perhaps a portion of these searches, but he was shocked to see the searches coming from Michigan, Pennsylvania, and more. And to cap that off, most of those searches were hardly fringe searches: they matched the amount of bigger-name searches like Lakers, migraines, and The Daily Show. Stephens-Davidowitz is the author of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are.

  • Transcript


SETH STEPHENS-DAVIDOWITZ: So, the past 80 years if you want to know what people want, why they do the things they do, what they’re going to do, you ask them in a survey. But people may lie to surveys. But it’s been shown that people are really, really honest on certain internet sources, particularly their Google searches, so they tell Google things they might not tell to anybody else. They might not tell family members, friends, surveys, even themselves sometimes. And by mining this data we can get better insights into who we are.
All the data I analyze is anonymous and aggregate, but you can see patterns in this. So, for example, before an election if you ask people in a survey, “Are you going to vote,” pretty much everybody says yes, or a huge percentage of people say yes, even if they have absolutely no intention to vote just because it makes them feel good to tell a survey that they’re voting.
But you can actually see—based on where searches for how to vote or where to vote are highest—how high turnout really will be in different parts of the country. And that’s a very accurate predictor of who is actually going to vote.
When I started this research I measured how frequently people made racist searches in the United States. And the searches I looked at were very, very strikingly racist searches looking for basically jokes mocking African Americans. And I was struck by how frequently people are making these searches. In the time period I was looking at, it was as frequent as searches for “Lakers” and “migraine” and “Daily Show” — so not a fringe search. And I was also shocked by where these searches were located. If you had asked me, 'Where’s racism highest in the United States?' before I saw this Google data I would have said the South.
If you think of our country’s history, of slavery, the Civil War, we usually think of racism as a Southern issue. But the places with the highest racist search volumes included western Pennsylvania, eastern Ohio, upstate New York, industrial Michigan.
The real divide in racism these days is not north versus south. It’s east versus west, where you get a lot less of this west of the Mississippi River as compared to east of the Mississippi River.
And then if you remember the 2008 election, all the way back then when Barack Obama was elected president, after the election there was this question: did people care that he was black? And Gallup asked people and some other surveys asked people and 98 percent, 99 percent of Americans said “No, no, no, no—of course not. This was not a factor in our voting decision.”
But you actually see very, very clearly in the data that Obama did far worse than other previous Democratic candidates in places with higher racist search volumes. And this isn’t explained by anything else in the data. It’s not explained by demographics or gun ownership or church attendance or liberalism. The main factor that predicts where Obama did worse than other Democrats is how frequently they made racist searches on Google.
Anyway, this data kind of languished I think on my website for a while, but then during the 2016 Republican Primary some data journalists got data on where Trump was doing best in the Republican Primary. Trump, of course, was saying some very, very racially charged things and people were expecting that these were gaffes, that he would collapse because he was saying things that you are not supposed to say, you know: retweeting false statistics about how frequently African Americans commit crimes, or not repudiating support from a former leader of the KKK, saying that Black Lives Matter protestors should be roughed up. And then what these data journalists found is, the single highest predictor of where Trump was doing well was the measure of racist searches on Google. So the same hidden racism that was secretly hurting Obama was waiting for a candidate to support and helped Trump tremendously in the Republican Primary.
So in the 2016 election one thing that was very, very clear in the data is that African American turnout was going to be way down. Because if you looked at cities with 90 percent black populations or 95 percent black populations, searches for information on voting were way lower than the previous two elections. So this was a clue that Trump may do better than expected because Clinton wasn’t going to get the same black support that Obama had gotten. And I think as far as predicting the elections just based on searches, we’re not really there yet because we haven’t had enough elections to test the models on. We’ve only had a few elections to build the models, whereas polls have been building models over many, many elections. I think there definitely are clues in searches for which way people will go. They’re a little more subtle than we usually think. So, for example, you can’t predict which way a state is going to go based on how frequently they search for a candidate. It’s not like places that search for Trump more go for Trump and places that search for Clinton more go for Clinton. The problem, and you can probably guess it’s obvious, is that you might search Trump because you love him or you might search Trump because you hate him.
So it doesn’t really tell us too much. There are some subtle indicators that seem to have predictive powers. One that I’ve found is the order in which people search candidates. About 26 percent of searches for Clinton in the previous election cycle also included the word Trump. So people searched for Clinton Trump polls or Trump Clinton debate. And it turns out, interestingly there’s a subtle clue in which way people will go based on the order in which they list the candidates in their search. So if people search Trump Clinton polls they’re much more likely to go Trump. If people search Clinton Trump polls they’re much more likely to go Clinton.
But it’s going to take a lot of elections to kind of build these models and weight these models and figure out exactly how to translate the searches to vote totals. But I think there is some information in these searches for the purposes of predicting elections.