Using Google to Tell Real Science from Fads

Most hot ideas and discoveries fade with time. But some scientific papers are genuine breakthroughs, whose importance only increases as the decades pass. This one, published in Science last week, which describes a database of words from millions of books digitized by Google—4 percent of all books ever printed—could be one of the big ones. It's a fabulous source of ideas and evidence for theories about society, and it's fabulously democratic. Google offers a handy analyzer, the Ngram Viewer, which anyone can use to test an idea. A case in point: Yesterday, the social psychologist Rob Kurzban argued that the tool can distinguish between genuine scientific theories and intellectual fads.

Google's viewer graphs the proportion of books over time that contain any word or phrase. A sound scientific hypothesis, Kurzban reasoned, ought to climb steadily over the years, as it is tested, validated, re-validated, and taught. On the other hand, a theory that flunks the reality test should peak and then ebb away as people lose interest in it. Real science should generate rising slopes, while fad notions should be mountains, sloping up and then back down to intellectual oblivion. As does Daniele Fanelli, whose PLoS paper I posted about last spring, Kurzban wants a reliable way to quantify the difference between science-ish theories and real knowledge.

Like many social psychologists, Kurzban isn't impressed with the state of his own field. When he compared canonical concepts from physics and biology, he writes, both had the upward slope of solid science. But big ideas from his training in psychology showed the up-and-down pattern of fashion.

Here is Kurzban's map of terms from physics since 1920, with years advancing on the horizontal axis, and the proportion of books that contain the term on the vertical. The curves have been smoothed (I think to about degree 9 on Google's handy menu of options).


That's quite a contrast to his map of big theories in psychology:


"From this," Kurzban writes, "—and I want to reiterate that this is an extremely crude and very flawed method—you might think that social psychology looks really different from the hard sciences."

Well, maybe. I love Kurzban's notion that the Ngram viewer can be used to map differences in the character of ideas. But my own noodling around with it left me skeptical that it can tell real science from false starts.

Here, for instance, is what I found when I mapped "thermodynamics" since 1850 (curve smoothing of 3, the Ngram viewer's default, as are all of my graphs in this post):


Mentions of the term have been occurring less since 1960, but I don't think it's because we've outgrown this great 19th century discovery. Similarly, references to the Big Bang

big bang

are fewer lately, but that doesn't mean physics has chucked the idea.

I'd guess that after a time, real scientific theories become a part of "what everyone knows," are so are less cited.

Others, though, get co-opted by popular culture and have to be renamed. Then, a word disappears, but the idea lives on. "Sociobiology" might be a good example. Its occurrences since 1960 show the same up-and-down pattern as the social-psych theories on Kurzban's charts.


I don't think this is because scientists have lost interest in applying evolutionary principles to human behavior. I know that Kurzban doesn't think this, as he is one of the sharpest advocates for this position in his field.

Instead, I think the graph indicates how sociobiology ceased to be a scientific word. Rightly or wrongly, it became linked in the 1980s with a political/cultural outlook in which science replaced God as the world's authoritative opponent of social change. Instead of a Deity that forbad socialism or polyamory or female fighter pilots, it would be Evolution. This didn't have much to do with science but it certainly pissed people off, and the resultant culture war looks to have killed off the word "sociobiology." The idea that evolutionary reasoning applies to psychology, though, is still around. Its rebranded moniker, "evolutionary psychology," is sloping up:


ev psych

So I don't think a downward slope in Ngrams is a sure diagnostic of scientific faddishness.

But now I was hooked on the surprisingly addictive pleasure of poking through a vast trove of data (try playing with the Ngram Viewer yourself, you'll see what I mean). So I decided then to plug in some former really popular, really crappy notions about the human mind, and see if any patterns popped up.

Here's the graph of mentions of William Reich's "orgone" energy idea since 1950:


And here is the "Oedipus complex" since 1920:


These graphs aren't mountains. They're cliffs, because each term has a sudden fast ascent, and then a long fall. Which makes me wonder if the place to look for fads might sometimes be at their beginnings, rather than their ends. Perhaps a very fast spread through books is inherently indicative that an idea is, or has become, unscientific—that it's so interesting and exciting to so many different types of writers that it isn't being tested as a hypothesis would be.

Now, my Ngram surfing was just as crude and unrigorous as Kurzban's, but I find it intriguing that ideas could have such sharply different paths, and that similar patterns should come up repeatedly. This makes me suspect that "culturomics," the term that the Science authors propose for the quantitative study of culture, is going to be one of those ideas that slopes steadily upward as the years pass.

Michel, J., Shen, Y., Aiden, A., Veres, A., Gray, M., , ., Pickett, J., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M., & Aiden, E. (2010). Quantitative Analysis of Culture Using Millions of Digitized Books Science DOI: 10.1126/science.1199644

Fanelli, D. (2010). “Positive” Results Increase Down the Hierarchy of the Sciences PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010068

comments powered by Disqus