Study: Cities Speak A Different Language Than Their Small-Town Neighbors

Whatever your native language, you've probably noticed that city people speak it differently than do country folk. But so what? It's also true that Chicagoans speak a bit differently than do Baltimoreans, and the French of Marseilles is not that of Paris. When it comes to differences in accent, grammar and vocabulary, you might expect that region, culture, social class and gender would count for more than the size of your town. So the people of, say, Caracas, should sound more like their fellow Venezuelans than like people in Miami. But according to this paper, you would be wrong. "The Spanish language," its authors write, "is split into two superdialects"—a city dialect in which Caracas and Miami have a lot in common, versus a dialect of rural regions and small towns.


As novel as the finding is the method that Bruno Gonçalves and David Sánchez used to distinguish the dialects: They analyzed every tweet made in Spanish over two years for which geolocation data was also available (they don't say which years). Breaking down these 50 million tweets according to different words used for "computer," "car," and other key concepts revealed the boundaries of the two dialects.

The researchers used Spanish because it is widely spoken and widely spread across several continents. Spanish also has plenty of Twitter users (unlike Chinese) to supply evidence. And written Spanish is logical—the letters you see represent the sounds you'd hear. On the other hand, in English (as noted here) the same letter combo can represent five different sounds ("Though I cough through the day, this rough bough comforts me"). Conversely, different sounds can be rendered by the same letters ("Archer, I bow to your bow, and I will lead you to the mines of lead"). That sort of thing, which has incensed sensible people for centuries, messes up textual analysis.

The researchers divided up the Spanish-tweeting world into cells of approximately 25 square kilometers each, and noted in each cell the majority-endorsed words for 131 key things. That gave them a map distinguishing, for example, places where the word for "computer" was "computadora" from those where the word is "computador" or "ordenador." They then applied their algorithms to identify cells that are closely related to each other. In this way, they discovered "a profound correlation" between one widespread dialect and areas of high population density. In other words, one of their super dialects was spoken mostly in cities—even cities as widely scattered about the globe as Buenos Aires, San Diego and San Juan. The other cluster is spoken outside major urban centers. "This suggests a natural lexical bipartition of Spanish into two superdialects," they write. "Superdialect α is utilized by speakers in main American and Spanish cities and corresponds to an international variety with a strongly urban component while superdialect β is comprised mostly of rural areas and small towns."

Why cities? Because people who move to cities want to communicate with one another (and, I am guessing, want to sound as if they didn't just step off the boat from Nowheresville). For the sake of efficiency and identity, then, city-dwellers are inclined to drop the more idiosyncratic parts of their speech. They come to talk like their fellow city-dwellers, not Mom and Pop back home. "This leveling process," write Gonçalves and Sánchez, is present throughout the Spanish-speaking cities, where it "is reinforced by the rapid increase of worldwide social ties and the powerful influence of mass media precisely located in important metropolitan areas (Madrid, Mexico City, Miami)."

That Twitter can be used to find heretofore unrecognized dialects surprised me (who knew 140-character utterances could be so revealing?) but Gonçalves and Sánchez believe it's likely to be a rich Big-Data source of insights into language. In fact, they think, the abundance of tweets worldwide, combined with GPS data, could soon permit linguists to track language differences in real time, as they arise and evolve among different regions.

I was tempted to call their paper a "Big Data" approach to language analysis. But the term is almost a misnomer. They made a new finding not because their data was abundant but because it was different. Instead of having to go out and interview (often male, often rural) people to ask about their language use, the researchers had an immense river of language use ready and waiting for them. This is the new kind of data all of us are generating every day, in tweets, Facebook likes, YouTube clicks and so on. Where once we had to be asked about a topic, and think about our answers, we now reveal ourselves without thinking. This may not be great for our notions of personal autonomy, but it is going to be a great source of insight into human behavior for a long time to come.

Illustration: Geographical distribution of the dominant word for the concepts Computer (left) and Car (right), from the paper.

Follow me on Twitter: @davidberreby

3D printing might save your life one day. It's transforming medicine and health care.

What can 3D printing do for medicine? The "sky is the limit," says Northwell Health researcher Dr. Todd Goldstein.

Northwell Health
Sponsored by Northwell Health
  • Medical professionals are currently using 3D printers to create prosthetics and patient-specific organ models that doctors can use to prepare for surgery.
  • Eventually, scientists hope to print patient-specific organs that can be transplanted safely into the human body.
  • Northwell Health, New York State's largest health care provider, is pioneering 3D printing in medicine in three key ways.
Keep reading Show less
Big Think Edge
  • "I consider that a man's brain originally is like a little empty attic, and you have to stock it with such furniture as you choose," Sherlock Holmes famously remarked.
  • In this lesson, Maria Konnikova, author of Mastermind: How to think like Sherlock Holmes, teaches you how to optimize memory, Holmes style.
  • The goal is to expand one's limited "brain attic," so that what used to be a small space can suddenly become much larger because we are using the space more efficiently.

Active ingredient in Roundup found in 95% of studied beers and wines

The controversial herbicide is everywhere, apparently.

(MsMaria/Shutterstock)
Surprising Science
  • U.S. PIRG tested 20 beers and wines, including organics, and found Roundup's active ingredient in almost all of them.
  • A jury on August 2018 awarded a non-Hodgkin's lymphoma victim $289 million in Roundup damages.
  • Bayer/Monsanto says Roundup is totally safe. Others disagree.
Keep reading Show less
Big Think Edge
  • Our ability to behave rationally depends not just on our ability to use the facts, but on our ability to give those facts meaning. To be rational, we need both facts and feelings. We need to be subjective.
  • In this lesson, risk communication expert David Ropeik teaches you how human rationality influences our perception of risk.
  • By the end of it, you'll understand the pitfalls of your subjective risk perception system so that you can avoid these traps in the future.