Studies have shown that people who have recently read online obituaries tend to be higher purchasers of weekend rental cars. Why this is true isn't exactly clear to Dave Morgan, founder of Tacoda Inc., an online advertising company that was acquired by Aol. in 2007 for $275 million. But the correlation in the data is significant enough that Avis, Hertz, and Enterprise Rent-a-Car ads should start appearing in front of you soon after you have read about the passing of an old friend, a loved one, or (as is often the case when reading obituaries) someone you didn't know at all.
Consumers today are knowingly and unknowingly providing businesses with more data than they've ever been capable of collecting before. Internet entrepreneurs, privacy analysts, and business consultants alike believe that for the next fifty years, capitalism around the world will (for better or worse) be focused on sussing out what all this data actually means. "We are finding things that are completely non-intuitive," says Morgan. "This is just the very beginning of this enormous explosion of information being available about what people do, how they react to information, and how they interact with each other."
Most big businesses today are already data businesses at heart. Facebook, which claims to help you connect and share with the people in your life, is worth a reported $33.7 billion largely because 400 million people have shared with it the details of their personal lives. Google processes about a petabyte of information every hour. To put this seemingly insignificant number in perspective, that's 1,125,899,906,842,624 bytes, or about a fifth of all the information delivered in letter form by the USPS in a year. What's worse—or better, if you're in the data business—businesses are not only recording your data while you're surfing the Web. Long gone are the days of a bifurcated online and offline world.
"There are ways that we probably don't think about that we're sharing information," says Mike Spinney, Senior Privacy Analyst at the Ponemon Institute. "Whether it's an E-Z Pass, or a public transit card or swiping your credit card to make micro-purchases." Wal-Mart collects data from more than 278 customer transactions every second.
Our growing use of digital technology is creating so much "data exhaust," as industry insiders call it, that entire economies are and will continue to form purely around the collection, preservation, protection, implementation, and—most importantly—understanding of our data.
"The big trend that's just starting now and will continue for the rest of the foreseeable couple of decades is the creation and manipulation of vast quantities of data in a meaningful way so that you can learn from the data in ways that are actionable," says Jim Cortada, Director at IBM's Institute of Business Value. He equates the problem of data management and analysis to the challenge of herding sheep. "One of the things that's happening is the emergence of data management tools—hundreds of them," says Cortada. "Think about data like a whole bunch of sheep on a hillside—you gotta get them in. Herders use dogs. Businesses are increasingly using software to get the data herd in."
The companies that will thrive most in the coming decades are those that manage to aggregate and recycle their consumers' data toward improving their products or services in an automated fashion. Google, for example, hones its search algorithm with each of the 35,000 queries it receives every second. Amazon tracks the books you buy, but also records your browsing history to better recommend other books you might like to buy. Netflix uses a collaborative filtering process to make suggestions based on data from other people's preferences.*
Jeff Hirsch, CEO of AudienceScience, a data management and audience targeting platform, predicts that recommendation systems that consider not just your purchase history and search behavior, but also seemingly unconnected data—like whether you've read an obituary lately—will soon be digitally ubiquitous. "The consumer is going to be able to go onto a digital device and get exactly what they want," says Hirsch. "We'll look back at the days when we used to have search for what we wanted and think it was archaic." Television, he says, is a medium that could do a better job of "pushing" consumers what they're looking for. "If I watch various shows, why doesn't my T.V. give me the shows I like?" he asks. A company's imagination alone will limit the possibilities of analogous applications for these type of smart recommendation systems, once they begin to find meaningful and actionable information in their data.
This increased amount of targeting and automation comes with its own set of challenges. The May 6th Flash Crash and mortgage crisis in general are just two historical examples of what can happen when businesses rely too heavily on automated processes for managing their risk. Furthermore, some companies may also begin to automate their sales behavior upon data that many might consider private, and in ways that may also be considered predatory. "There has already been very significant use of very personal data," says Morgan. "Credit card companies buy your bank account balance from your bank and they don't buy it anonymously, they buy it with your name and address. They say 'So-and-so bounced three checks in the last month, maybe they need a prepaid credit card.' But you don't know that. All you know is that you get a prepaid credit card in the mail. The analog world is very slow to understand what's going on."
For example, you may be surprised to know that a simple visit to Dictionary.com results in 223 different businesses uploading third party tracking files to your computer, each of which assigns it a unique identifying number that marketing and data-gathering companies use to record your behavior online.** The more sophisticated of these files even record everything that you type into your browser. Over the past few years a robust market has emerged around the sheer buying and selling these files. The marketing and advertising industries have been at the forefront of leveraging these files to more efficiently target advertisements towards online audiences.
Privacy analyst Mike Spinney understands why people may find it creepy for businesses to be collecting so much of their data, but he says the Luddites amongst us should not be too concerned about their personal data being abused by corporate entities. "People may think industry self-regulation does not work — that they're inherently evil — and yet the opposite is true. They recognize there's a fine line and they do not want to cross it." Spinney, Morgan, and Cortada all agree that though the increased use of data may at times seem invasive, successful businesses will always be most concerned with using it to provide their consumers a better product or service.
In future, Morgan predicts we could well reach the "holy grail of advertising," where we only receive ads that we like—and are therefore happy to get them. Cortada wonders whether current forms of technology will be capable of handling the enormous amount of data that the future holds in store. "I am even questioning whether we're going to have computers made out of metal or composite material," he says. "As a human race we're going to need a whole series of different platforms." Biology, he says, provides a good template for foreseeing the complexity of the data feedback systems that we can expect to develop down the road. "If you take a look at how much data there is in living human cells, the patterns of data collection that the body uses to be a successfully functioning organism are colossally larger than anything we do today."
No one knows how long it will take before businesses begin implementing the kinds of complex feedback systems that biologists see in nature, but for now one thing is for certain: the world is sitting at the foot of what will continue to be an unfathomable mountain of data with the potential to profoundly revolutionize much more than just the way that businesses target us with pesky advertisements. There is already so much data, in fact, that the very thought of beginning to mold it into useful information is enough to make one throw their hands up in the air and give up. "We are almost at a point now where trying to do an inventory on all this data is almost a superfluous exercise," says Cortada. "It's like trying to count all the stars in the sky."
— * For a fascinating look at the number of tracking files each of the 50 most popular Web sites place on your computer, explore the data on the Wall Street Journal's "What They Know" Interactive Series.
— ** For a comparison of Amazon and Netflix's recommendation engines, check out this blog post by Chris Dixon.