Study: Cities Speak A Different Language Than Their Small-Town Neighbors

Whatever your native language, you've probably noticed that city people speak it differently than do country folk. But so what? It's also true that Chicagoans speak a bit differently than do Baltimoreans, and the French of Marseilles is not that of Paris. When it comes to differences in accent, grammar and vocabulary, you might expect that region, culture, social class and gender would count for more than the size of your town. So the people of, say, Caracas, should sound more like their fellow Venezuelans than like people in Miami. But according to this paper, you would be wrong. "The Spanish language," its authors write, "is split into two superdialects"—a city dialect in which Caracas and Miami have a lot in common, versus a dialect of rural regions and small towns.

As novel as the finding is the method that Bruno Gonçalves and David Sánchez used to distinguish the dialects: They analyzed every tweet made in Spanish over two years for which geolocation data was also available (they don't say which years). Breaking down these 50 million tweets according to different words used for "computer," "car," and other key concepts revealed the boundaries of the two dialects.

The researchers used Spanish because it is widely spoken and widely spread across several continents. Spanish also has plenty of Twitter users (unlike Chinese) to supply evidence. And written Spanish is logical—the letters you see represent the sounds you'd hear. On the other hand, in English (as noted here) the same letter combo can represent five different sounds ("Though I cough through the day, this rough bough comforts me"). Conversely, different sounds can be rendered by the same letters ("Archer, I bow to your bow, and I will lead you to the mines of lead"). That sort of thing, which has incensed sensible people for centuries, messes up textual analysis.

The researchers divided up the Spanish-tweeting world into cells of approximately 25 square kilometers each, and noted in each cell the majority-endorsed words for 131 key things. That gave them a map distinguishing, for example, places where the word for "computer" was "computadora" from those where the word is "computador" or "ordenador." They then applied their algorithms to identify cells that are closely related to each other. In this way, they discovered "a profound correlation" between one widespread dialect and areas of high population density. In other words, one of their super dialects was spoken mostly in cities—even cities as widely scattered about the globe as Buenos Aires, San Diego and San Juan. The other cluster is spoken outside major urban centers. "This suggests a natural lexical bipartition of Spanish into two superdialects," they write. "Superdialect α is utilized by speakers in main American and Spanish cities and corresponds to an international variety with a strongly urban component while superdialect β is comprised mostly of rural areas and small towns."

Why cities? Because people who move to cities want to communicate with one another (and, I am guessing, want to sound as if they didn't just step off the boat from Nowheresville). For the sake of efficiency and identity, then, city-dwellers are inclined to drop the more idiosyncratic parts of their speech. They come to talk like their fellow city-dwellers, not Mom and Pop back home. "This leveling process," write Gonçalves and Sánchez, is present throughout the Spanish-speaking cities, where it "is reinforced by the rapid increase of worldwide social ties and the powerful influence of mass media precisely located in important metropolitan areas (Madrid, Mexico City, Miami)."

That Twitter can be used to find heretofore unrecognized dialects surprised me (who knew 140-character utterances could be so revealing?) but Gonçalves and Sánchez believe it's likely to be a rich Big-Data source of insights into language. In fact, they think, the abundance of tweets worldwide, combined with GPS data, could soon permit linguists to track language differences in real time, as they arise and evolve among different regions.

I was tempted to call their paper a "Big Data" approach to language analysis. But the term is almost a misnomer. They made a new finding not because their data was abundant but because it was different. Instead of having to go out and interview (often male, often rural) people to ask about their language use, the researchers had an immense river of language use ready and waiting for them. This is the new kind of data all of us are generating every day, in tweets, Facebook likes, YouTube clicks and so on. Where once we had to be asked about a topic, and think about our answers, we now reveal ourselves without thinking. This may not be great for our notions of personal autonomy, but it is going to be a great source of insight into human behavior for a long time to come.

Illustration: Geographical distribution of the dominant word for the concepts Computer (left) and Car (right), from the paper.

Follow me on Twitter: @davidberreby

'Upstreamism': Your zip code affects your health as much as genetics

Upstreamism advocate Rishi Manchanda calls us to understand health not as a "personal responsibility" but a "common good."

Sponsored by Northwell Health
  • Upstreamism tasks health care professionals to combat unhealthy social and cultural influences that exist outside — or upstream — of medical facilities.
  • Patients from low-income neighborhoods are most at risk of negative health impacts.
  • Thankfully, health care professionals are not alone. Upstreamism is increasingly part of our cultural consciousness.
Keep reading Show less

In U.S. first, drug company faces criminal charges for distributing opioids

It marks a major shift in the government's battle against the opioid crisis.

George Frey/Bloomberg via Getty Images
Politics & Current Affairs
  • The nation's sixth-largest drug distributor is facing criminal charges related to failing to report suspicious drug orders, among other things.
  • It marks the first time a drug company has faced criminal charges for distributing opioids.
  • Since 1997, nearly 222,000 Americans have died from prescription opioids, partly thanks to unethical doctors who abuse the system.
Keep reading Show less

Scientists create a "lifelike" material that has metabolism and can self-reproduce

An innovation may lead to lifelike evolving machines.

Shogo Hamada/Cornell University
Surprising Science
  • Scientists at Cornell University devise a material with 3 key traits of life.
  • The goal for the researchers is not to create life but lifelike machines.
  • The researchers were able to program metabolism into the material's DNA.
Keep reading Show less

Calling out Cersei Lannister: Elizabeth Warren reviews Game of Thrones

The real Game of Thrones might be who best leverages the hit HBO show to shape political narratives.

Photo credit: Mario Tama / Getty Images
Politics & Current Affairs
  • Sen. Elizabeth Warren argues that Game of Thrones is primarily about women in her review of the wildly popular HBO show.
  • Warren also touches on other parallels between the show and our modern world, such as inequality, political favoritism of the elite, and the dire impact of different leadership styles on the lives of the people.
  • Her review serves as another example of using Game of Thrones as a political analogy and a tool for framing political narratives.
Keep reading Show less