If its claims are true, Clearview AI has quietly blown right past privacy norms to become the nightmare many have been fearing.
- Recent reporting has revealed the existence of a company that has probably scraped your personal data for its facial recognition database.
- Though social platforms forbid it, the company has nonetheless collected personal data from everywhere it can.
- The company's claims of accuracy and popularity with law enforcement agencies is a bit murky.
Your face is all over the internet in images you and others have posted. Also lots of personal information. For those concerned about all those pictures being matched somehow with all that information, there's some small comfort in public assertions by Google, Facebook, and other platforms that they won't use your data for nefarious purposes. Of course, taking the word of companies whose business model depends of data-mining is a bit of a reach.
Meanwhile, as revealed recently by the New York Times, with further reporting from BuzzFeed News and WIRED, one company called Clearview AI has been quietly scraping up much of this data — the company claims it has a database of 3 billion images collected from everywhere they can. Their sources presumably include all sorts of online sources, as well as all social platforms including Facebook, Twitter, LinkedIn, YouTube, and so on. They even scrape Venmo, a particularly chilling revelation given the rigorous security one would expect a money-exchanging site to employ.
Combining their database with proprietary artificial intelligence, Clearview AI says it can identify a person from a picture nearly instantaneously, and are already selling their service to police departments for identifying criminals. You may think you own your face, but Clearview has probably already acquired it without your even knowing about it, much less granting them permission to do so.
Is this legal? And does it matter?
Image source: Anton Watman/Shutterstock
In terms of Federal law protecting one's personal data, the regulations are way behind today's digital realities. The controlling legislation appears to be the anti-hacking Computer Fraud and Abuse Act (CFAA) enacted in 1984, well before the internet we know today. Prior to a Ninth Circuit Court of Appeals ruling last year, the law had been used to fight automated data-scraping. However, that ruling determined that this type of scraping doesn't violate the CFAA.
Social media sites generally include anti-scraping stipulations in their user agreements, but these are hard — and perhaps impossible given programmers' ingenuity — to enforce. Twitter, whose policies explicitly forbid automated scraping for the purposes of constructing a database, recently ordered Clearview AI to knock it off. Given last year's CFAA ruling, though, sites have little legal recourse when their policies are violated. In any event, tech is a troublingly incestuous industry — for example, a Facebook board member, Peter Thiel, is one of Clearview AI's primary investors, so how motivated would such people really be to block mining of their data?
Is Clearview AI legit?
Image source: Clearview AI, through Atlanta public-records request by New York Times
Clearview has taken pains to remain off the public's radar, at least until the New York Times article appeared. Its co-founders long ago scrubbed their own social identities from the web, though one of them, Hoan Ton-That, has since reemerged online.
In efforts to remain publicly invisible while simultaneously courting law enforcement as customers for Clearview's services, the company has been quietly publishing an array of targeted promotional materials (The Times, BuzzFeed, and WIRED have acquired a number of these materials via Freedom of Information requests and through private individuals). The ads make some extraordinary and questionable claims regarding Clearview's accuracy, successes, and the number of law enforcement agencies with which it has contracts. Not least, of course, among questions about the company's integrity must be their extensive scraping of data from sites whose user agreements forbid it.
According to Clearview, over 600 law enforcement parties have used their product in the last year, though the company won't supply a list of them. There are a handful of confirmed clients, however, including the Indiana State Police. According to the department's then-captain, the police were able to identify the perpetrator in a shooting case in just 20 minutes thanks to Clearview's ability to find a video the man had posted of himself on social media. The department itself has officially declined to comment on the case for The New York Times. Police departments in Gainesville, Florida and Atlanta, Georgia are also among their confirmed customers.
Clearview has tried to impress potential customers with case histories that apparently aren't true. For example, they sent an email to prospective clients with the title, "How a Terrorism Suspect Was Instantly Identified With Clearview," describing how their software cracked a New York subway terrorism case. The NYPD says Clearview had nothing to do with it and that they used their own facial recognition system. Clearview even posted a video on Vimeo telling the story, which has since been removed. Clearview has also claimed several other successes that have been denied by the police departments involved.
There is skepticism regarding Clearview's claims of accuracy, a critical concern given that in this context a false positive can send an innocent person to jail. Clare Garvie, of Georgetown University's Center on Privacy and Technology, tells BuzzFeed, "We have no data to suggest this tool is accurate. The larger the database, the larger the risk of misidentification because of the doppelgänger effect. They're talking about a massive database of random people they've found on the internet."
Clearview has not submitted their results for independent verification, though a FAQ on their site claims that an "independent panel of experts rated Clearview 100% accurate across all demographic groups according to the ACLU's facial recognition accuracy methodology." In addition, the accuracy rating of facial recognition is usually derived from a combination of variables, including its ability to detect a face in an image, its correct-match rate, reject rate, non-match rate, and the false-match rate. As far as the FAQ claim, Garvie notes that "whenever a company just lists one accuracy metric, that is necessarily an incomplete view of the accuracy of their system."
Image source: Andre_Popov/Shutterstock
It may or may not be that Clearview is doing what they claim to be doing, and that their technology is really accurate and seeing increasing use by police departments. Regardless, there can be little doubt that the company and likely others are working toward the goal of making reliable facial recognition available to law enforcement and other government agencies (Clearview also reportedly pitches its product to private detectives).
This has many people concerned, as it represents a major blow to personal privacy. A bipartisan effort in the U.S. Senate has seemingly failed. In November 2019, Democrats introduced their own privacy bill of rights in the Consumer Online Privacy Rights Act (COPRA) while Republicans introduced their United States Consumer Data Privacy Act of 2019 (CDPA). States have also enacted or are in the process of considering new privacy legislation. Preserving personal privacy without unnecessarily constraining acceptable uses of data collection is complicated, and the law is likely to continue lagging behind technological reality.
In any event, the exposure of Clearview AI's system is pretty chilling, setting off alarms for anyone hoping to hold onto what's left of their personal privacy, at least for as long as it's possible to do so.
UPDATE: The ACLU announced on Thursday that it is suing Clearview in the state of Illinois. CNET reports that Illinois is the only state with a biometric privacy law, the Biometric Information Privacy Act, which requires "informed written consent" before companies can use someone's biometrics. "Clearview's practices are exactly the threat to privacy that the legislature intended to address, and demonstrate why states across the country should adopt legal protections like the ones in Illinois," the ACLU said in a statement.
For more on the suit, head over to the ACLU website.
We know that body language reveals a lot. But language is an even bigger tell if you know what to look for.
I can read your face better than you can. The same holds true for you. While the role of mirror neurons is still not well understood (and sometimes disputed), the fact that we can tell what another person is feeling, often more quickly than they can, is a consequence of being a social animal. This transcends facial expressions. We read bodies all of the time. For example, if we meet for the first time and I cross my arms, I’m more likely to trust you if you follow suit and cross yours. If we’re in a group and you’re the only one who doesn’t follow this pantomime, I’m less likely to trust you. Social cues have been tried and tested for a long time, so much so they don’t need to be consciously understood to be effective.
New research published in Proceedings of the National Academy of Sciences has uncovered another telling clue regarding our inner state, namely stress: shifts in language. A team led by University of Arizona’s Matthias Mehl found that certain markers in language detect stress levels better than conscious ratings, which in turn effects gene expression in our immune system. The more stressed we are, the more genetic inflammation activity occurs, while antiviral genes are turned down.
One hundred and forty-three American adults were recruited to wear audio recorders. Over a two-day period, 22,627 clips were collected. After transcribing the tapes, Mehl analyzed the language they used, focusing on “function words,” i.e. pronouns and adjectives. We consciously choose “meaning words,” i.e. nouns and verbs, while function words “are produced more automatically and they betray a bit more about what’s going on with the speaker.”
Function words change, Mehl says, when we face a crisis as well as following terrorist attacks. Volunteers self-reported feeling less stressed, anxious, and depressed than they actually were, according to their white blood cell counts Mehl’s team measured.
Researchers focused on two aspects of language: volume and structure. The more stressed a volunteer was, the less likely they were to talk much at all. When they did speak they used more adverbs, such as “incredibly” and “really.” They also focused their speech less on others and more on themselves.
This research could lead to more effective means of understanding and treating stress. As I recently wrote about, Twitter might become a new avenue for discovering sufferers of depression and PTSD. Just as Israeli airport security guards focus heavily on behavioral detection (such as body language) for detecting threats, doctors and therapists could use natural language patterns to better understand potential psychological disorders. As Mehl and team conclude,
Statistical pattern analysis of natural language use may provide a useful behavioral indicator of nonconsciously evaluated well-being (implicit safety vs. threat) that is distinct from the information provided by conventional self-report measures and more closely tracks the activity of underlying CNS processes which regulate peripheral physiology, gene expression, and health.
So it might be true that we don’t know ourselves as well as others know us. Instead of an invasion of privacy, treating this as a therapeutic means of dealing with inner conflict could help a world experiencing rising anxiety and depression rates. Anthropologists have long known group fitness is the main driver behind our evolutionary triumph in the animal kingdom. Though we might live in an individualistic culture, remembering where our strength lies—in depending on others—could not be more timely.
Derek is the author of Whole Motion: Training Your Brain and Body For Optimal Health. Based in Los Angeles, he is working on a new book about spiritual consumerism. Stay in touch on Facebook and Twitter.
Researchers at Human Longevity have developed technology that can generate images of individuals face using only their genetic information. But not all are convinced.
What if a computer could generate a realistic image of your face using only your genetic information?
That's precisely the technology researchers at Human Longevity, a San-Diego based company with the world's largest genomic database, claim to have developed. The team, led by genome-sequencing pioneer Craig Venter, reported their findings in a controversial paper published in the journal Proceedings of the National Academy of Sciences.
To train the A.I. to generate facial images, the team first sequenced the genomes of 1,061 people of various ages and ethnicity. They also took high-definition 3D photos of each participant. Finally, they fed the photos and genetic information to an algorithm that taught itself how small differences in DNA relate to facial features, like cheekbone height or protrusion of the brow. The algorithm was then given genomes it hadn't seen before, and it used them to generate images of the individual's face that could be reliably matched to real photos.
Well... sort of.
The team successfully matched eight out of ten images to the real photos. However, this rate fell to just five out of ten when researchers analyzed participants of only one race, considering facial features differ slightly by race. Judge for yourself how well the algorithm did:
The potential applications of this technology are especially intriguing for fields like forensic science — what if investigators were able to use genetic information left at a crime scene to “see” the perpetrator?
Interesting as the applications may be, Human Longevity is more concerned with the implications its findings has on privacy in genomics research, namely that technologies like this could be used to match people's thought-to-be anonymous genetic information to their online photos.
“A core belief from the HLI researchers is that there is now no such thing as true deidentification and full privacy in publicly accessible databases,” HLI said in a statement.
Privacy concerns seem to be widely shared in the community. But some scientists say that the paper is misleading. One reason is that the Human Longevity researchers already knew the age, sex and race of the participants — demographic information that could have been used to achieve the same matching rate without using the computer-generated photos at all.
“I don't think this paper raises those risks, because they haven’t demonstrated any ability to individuate this person from DNA,” said Mark Shriver, an anthropologist at Pennsylvania State University in University Park, in an interview with Nature.
Jason Piper, a former employee of Human Longevity, took issue with what he considered a lack of accuracy in the images, writing on Twitter that:
“everyone looks close to the average of their race, everyone looks like their prediction.”
But perhaps the most exhaustive criticism came from computational biologist Yaniv Erlich, who published a paper entitled Major flaws in "Identification of individuals by trait prediction using whole-genome sequencing data, part of which reads:
“The results of the authors are unremarkable. I achieved a similar re-identification accuracy with the Venter cohort in 10 minutes of work without fancy face morphology...”
Just days later, the team behind the original paper issued a rebuttal, titled simply No major flaws in "Identification of individuals by trait prediction using whole-genome sequencing data.
(It may seem mundane to those outside the field, but it's a pretty vicious beef in the scientific community at the moment, as seen by the "shots fired!" and "I'm gonna grab my popcorn..." comments under both papers.)
Access to genomics data
Underlying this whole debate is a question of access. Genomic data is used across various fields of study, but perhaps most importantly in research that seeks to combat diseases. In an interview with Nature, Piper said that Human Longevity has a vested interest in restricting access to DNA databases because it's a for-profit company that's trying to build the largest genome database in the world.
“I think genetic privacy is very important, but the approach being taken is the wrong one,” Piper said. “In order to get more information out of the genome, people have to share.”
Rather than privatizing and restricting access to genomic data, Piper said that a better solution would be to make data public while using techniques that still allow individuals to remain anonymous.