Skip to content
Politics & Current Affairs

Data firm left detailed profiles of 48 million people on a publicly accessible website

In the wake of Facebook’s Cambridge Analytica scandal, another data firm was discovered to have amassed similar user profiles of millions of people.
Joel Kjellgren, Data Center Manager walks in one of the server rooms at the new Facebook Data Center. (Photo: GETTY/JONATHAN NACKSTRAND)
Sign up for the Smarter Faster newsletter
A weekly newsletter featuring the biggest ideas from the smartest people


A report published Wednesday reveals how a data firm built psychographic profiles on 48 million people, using data from Facebook, Twitter, LinkedIn, Zillow, and others—and then left that trove of data unprotected on a cloud storage repository.

The data was compiled by LocalBlox, a firm that “automatically crawls, discovers, extracts, indexes, maps and augments data in a variety of formats from the web and from exchange networks” to build consumer profiles that it sells to companies.  

In February, Chris Vickery, an ethical data breach hunter and director of cyber risk research at the security firm UpGuard, was able to access millions of these profiles on an unlisted and unprotected Amazon Web Services S3 bucket. The bucket contained a 151.3-gigabyte file that, when decompressed, amounted to a 1.2 terabyte that contained the user profiles. It was aptly named “final_people_data_2017_5_26_48m.json.”

“In the wake of the Facebook/Cambridge Analytica debacle, the importance of massive sets of psychographic data is becoming more and more apparent,” UpGuard’s report reads. “The exposed LocalBlox dataset combines standard personal information like name and address, with data about the person’s internet usage, such as their LinkedIn histories and Twitter feeds. This combination begins to build a three-dimensional picture of every individual affected—who they are, what they talk about, what they like, even what they do for a living—in essence, a blueprint from which to create targeted persuasive content, like advertising or political campaigning.”

The consumer profiles amassed by LocalBlox vary in level of detail. Much of the information can be harvested from public sources—the email address listed on your Facebook profile, or the city of residence shown on your Twitter page. Some of the information is believed to have been collected from non-public sources, such as purchased marketing data.

In a ZDNet article published Wednesday, LocalBlox’s chief technology officer Ashfaq Rahman said most of the data discovered by Vickery was fabricated for internal tests, and that Vickery had “hacked in” to the publicly accessible repository. But Vickery had informed LocalBlox that he accessed the repository after discovering the vulnerability in February, and it was reportedly secured soon after.

“Rahman would not say why he restricted the bucket’s permissions hours later,” reads the ZDNet article.

According to Rahman, “no other individual is believed to have accessed this file from the S3 bucket.”

LocalBlox didn’t break any laws in its harvesting of consumer data, though it’s not clear whether it violated the terms of websites like LinkedIn, Facebook, and Zillow, all of which explicitly prohibit data scraping.

In a 2013 article, LocalBlox’s president Sabira Arefin said it’s “up to the individual sites and system to determine the terms and conditions and then enforce any security mechanism in place if they want to prevent scraping.”

Vickery said that companies like LocalBlox should be more responsible in the way they handle and stores people’s data.

“Concentrating millions of people’s details can become by its very nature a weaponized thing, and something that can lead to a lot of harm,” Vickery said.

UpGuard’s report concludes:

“The profitability gained by data must come with the responsibility of protecting its integrity and privacy. Cloud storage itself provides functionality and speed at a reasonable cost, but cloud assets require careful configuration—the thin line between private and public can be erased with the flip of a single switch. The lack of controls around common IT processes are what allow critical errors like this to slip into production, eroding the privacy of millions of people.”

Sign up for the Smarter Faster newsletter
A weekly newsletter featuring the biggest ideas from the smartest people

Related