Computer Uses Ordinary Journalism to Forecast the Future

At the end of War and Peace Tolstoy compares belief in free will to medieval cosmologies where the Sun revolved around the Earth. To know the true cosmos, he writes, people had to give up their certainty that the ground beneath them was still, "and to recognize a motion we do not feel." So to understand history, people have to recognize their individuality counts for nothing—that the real causes of our actions are to be found in vast social patterns of which we are a tiny and unwitting part. That's a difficult viewpoint to hold onto for a species that focusses so much brainpower on individual personalities and relationships. But maybe computers can help. That's certainly the impression I took from this mind-blowing paper in the online journal First Monday. It describes a technique for finding hidden social motion by analyzing the texts of news stories over time. This "window into national consciousness," it claims, predicted upheavals in Tunisia and Egypt and approximated the location of Osama bin Laden within 200 kilometers.

Author Kalev H. Leetaru's approach uses huge databases of news stories from around the world (translated into English on a daily basis for American and British intelligence purposes) and performs on them the sort of analysis I would have thought required a human. The system, he writes, engages in "tone mining," to take the measure of national mood; geocoding, to deduce the location of new subjects from the location of stories about them; and network analysis, to show who is reading (or viewing or listening) about whom.

Leetaru says this work is an extension of the concept of "culturomics" (which I wrote about here). Culturomics, as launched in this paper last October, is retrospective, measuring changes in the frequency of published words are used over decades or centuries. On the other hand, Leetaru's "Culturomics 2.0" works on the near-real-time evidence of the news cycle, and it assigns meaning to the frequency changes. For instance, it counts and tallies negative words ("terrible," "awful") and positive ones ("good," "nice") to score news stories' sentiments.

Human analysts have been doing this kind of thing for governments for decades (among the many things I learned from Leetaru's paper is that more than 80 percent of the "actionable intelligence" that the Cold War West got about the Soviet Union came out of this kind of work done on newspaper articles, conference proceedings, news broadcasts, technical reports and similar non-secret sources). That computer algorithms can do this sort of work (and are being used by corporations to monitor their brands) is interesting, but the big news in the paper is this: Leetaru says a computer's score of the emotional tone of journalism and other open sources in a nation can predict when conflict is likeliest to occur there.

For example, his system analyzed a collection of the British Summary of World Broadcasts' 52,438 articles in any language from January 1979 until March 2011 that mentioned an Egyptian city (in other words, it included both Egyptian sources and foreigners' views of the country). The computer's score for the aggregate emotional tone of the articles showed a plunge toward negativity in January 2011. The drop was equalled only by January 1991 (the beginning of the first Iraq War) and nearly equalled in March 2003 (the start of the U.S. invasion of Iraq). An analysis of Egypt-only and Arabic-only sources from the same database showed the same pattern, but with a less extreme swing downward, which Leetaru attributes to censorship.

"While such a surge in negativity about Egypt would not have automatically indicated that the government would be overthrown," Leetaru writes, "it would at the very least have suggested to policy–makers and intelligence analysts that there was increased potential for unrest." An additional indicator, he adds, is that the 13,061 stories in the database that mentioned Hosni Mubarak showed the most negative tone in three decades, in the weeks before the Egyptian revolution began.

Interestingly, despite the Internet's rep for unequalled reaction-time, a cross-check with a database of web-only news showed that the tone there followed the mainstream non-American journalistic outlets by about a month. In turn, articles in The New York Times lagged behind the web sources).

More surprising, to me anyway, was Leetaru's attempt to see if geocoding of news sources could be used to find a prominent person. To do this, he crunched all the articles in the Summary of World Broadcasts that mentioned "bin Laden" between January 1979 and April 2011, coding every geographic reference. Northern Pakistan is the most frequently mentioned geographic area in the articles, the analysis found. And two cities there, Islamabad and Peshawar, were among the five most-mentioned non-Western cities in the texts. Hence, Leetaru writes, "global news content would have suggested Northern Pakistan in a 200 kilometer radius around Islamabad and Peshawar" as the place to hunt for bin Laden.

Well, not too many points for being right—this analysis, like the one on Egypt, was done retrospectively to test the system. I hope if similar indicators crop up in the future, Leetaru will be willing to make some forecasts, just to see if the project works in real-time conditions. For the moment, though, there's no denying that it's a fascinating set of results.

Every time I look at this Tolstoyan approach to human behavior (for instance here and here and here), I'm struck by its eeriness. It is hard to wrap my mind around the notion that the real causes and effects of our actions are hiding in plain sight all around us, traceable in the ups and downs of the stock market, or the rise and fall of hemlines. It is especially hard to envision what the chain of causes could be that links adjectives chosen by journalists with some individual's decision to set himself on fire. It all has an air of haruspicy, somehow.

Still, if ever humanity can find a way to describe society's motions that we do not feel (which, of course, will have to also include a description of the effects of the description), politics will never be the same.

LinkedIn meets Tinder in this mindful networking app

Swipe right to make the connections that could change your career.

Getty Images
Swipe right. Match. Meet over coffee or set up a call.

No, we aren't talking about Tinder. Introducing Shapr, a free app that helps people with synergistic professional goals and skill sets easily meet and collaborate.

Keep reading Show less

Think you’re bad at math? You may suffer from ‘math trauma’

Even some teachers suffer from anxiety about math.

Image credit: Getty Images
Mind & Brain

I teach people how to teach math, and I've been working in this field for 30 years. Across those decades, I've met many people who suffer from varying degrees of math trauma – a form of debilitating mental shutdown when it comes to doing mathematics.

Keep reading Show less

A world map of Virgin Mary apparitions

She met mere mortals with and without the Vatican's approval.

Strange Maps
  • For centuries, the Virgin Mary has appeared to the faithful, requesting devotion and promising comfort.
  • These maps show the geography of Marian apparitions – the handful approved by the Vatican, and many others.
  • Historically, Europe is where most apparitions have been reported, but the U.S. is pretty fertile ground too.
Keep reading Show less

How KGB founder Iron Felix justified terror and mass executions

The legacy of Felix Dzerzhinsky, who led Soviet secret police in the "Red Terror," still confounds Russia.

Getty Images
Politics & Current Affairs
  • Felix Dzerzhinsky led the Cheka, Soviet Union's first secret police.
  • The Cheka was infamous for executing thousands during the Red Terror of 1918.
  • The Cheka later became the KGB, the spy organization where Russia's President Putin served for years.
Keep reading Show less