How AI is learning to convert brain signals into speech
The first steps toward developing tools that could help disabled people regain the power to speak.
- The technique involves training neural networks to associate patterns of brain activity with human speech.
- Several research teams have managed to get neural networks to "speak" intelligible words.
- Although similar technology might someday help disabled people regain the power to speak, decoding imagined speech is still far off.
Several research groups have recently made significant progress in using neural networks to convert brain activity into intelligible computer-generated speech, developments that could mark some of the first steps toward radically improving the quality of life for people who've lost the ability to speak.
As a recent article from Science notes, the groups, which have published several separate papers on the preprint server bioRxiv, aren't yet able to convert people's purely imagined words and sentences into computer-generated speech. Still, the teams were successful in getting neural networks to reconstruct words that various participants had either heard, spoken aloud or mouthed silently.
To accomplish that, the teams recorded brain signals and fed them to a neural network, which then matched the signals with associated sounds or mouth movements.
Unfortunately, this kind of work requires opening the skull; researchers need extremely precise data that can only be obtained by surgically implanting electrodes directly onto regions of the brain associated with speech, listening or motor functioning. Making matters more complicated is the fact that each person shows unique neural activity in these regions, so what an AI learns from one person doesn't translate to the next.
"We are trying to work out the pattern of … neurons that turn on and off at different time points, and infer the speech sound," Nima Mesgarani, a computer scientist at Columbia University, told Science. "The mapping from one to the other is not very straightforward."
For the research, the teams relied on participants who were already scheduled to undergo invasive surgery to remove brain tumors or receive pre-surgery treatments for epilepsy.
One team, led by Mesgarani, fed a neural network with data from participants' auditory cortexes that was obtained while they listened to recordings of people telling stories and listing numbers. Using the brain data alone, the neural network was able to "speak" numbers to a group of listeners who were able to identify the digits correctly about 75 percent of the time.
Another team, led by neurosurgeon Edward Chang and his team at the University of California, San Francisco, recorded epilepsy patients' brain activity as they read sentences aloud, and fed the data to a neural network. A separate group of people then listened to the neural network's attempts to reconstruct the sentences, and after selected from a written list which sentences they thought it was trying to reproduce. In some cases, they chose correctly 80 percent of the time.
Chang's team also managed to get a neural network to reproduce words that participants had only mouthed silently, an achievement that marks "one step closer to the speech prosthesis that we all have in mind," as neuroscientist Christian Herff at Maastricht University in the Netherlands told Science.
Deciphering imagined speech
A scene from The Diving Bell and the Butterfly (2007).
The techniques described above work because neural networks were able to find patterns between two relatively defined sets of data: brain activity and external speech functions (such as spoken words or mouth movements). But those external functions aren't present when someone merely imagines speech, and, without that data to use for training, it's unclear whether neural networks would ever be able to translate brain activity into computer-generated speech.
One approach, as Herff told Science's Kelly Servick, involves giving "feedback to the user of the brain-computer interface: If they can hear the computer's speech interpretation in real time, they may be able to adjust their thoughts to get the result they want. With enough training of both users and neural networks, brain and computer might meet in the middle."
It's still speculative, but it's easy to see how technology of the sort could greatly improve the lives of people who've lost the ability to speak, many of whom rely on speech-assist technology that requires people to make tiny movements in order to control a cursor that selects symbols or words. The most famous example of this is the system used by Stephen Hawking, who described it like this:
"My main interface to the computer is through an open source program called ACAT, written by Intel. This provides a software keyboard on the screen. A cursor automatically scans across this keyboard by row or by column. I can select a character by moving my cheek to stop the cursor. My cheek movement is detected by an infrared switch that is mounted on my spectacles. This switch is my only interface with the computer. ACAT includes a word prediction algorithm provided by SwiftKey, trained on my books and lectures, so I usually only have to type the first couple of characters before I can select the whole word. When I have built up a sentence, I can send it to my speech synthesizer. I use a separate hardware synthesizer, made by Speech Plus. It is the best I have heard, although it gives me an accent that has been described variously as Scandinavian, American or Scottish."
- Mind-reading tech is here (and more useful than you think ... ›
- Computer system transcribes words users “speak silently” | MIT News ›
- #MindControl: Translating Brain Signals to a Voice Synthesizer ... ›
As religious diversity increases in the United States, we must learn to channel religious identity into interfaith cooperation.
- Religious diversity is the norm in American life, and that diversity is only increasing, says Eboo Patel.
- Using the most painful moment of his life as a lesson, Eboo Patel explains why it's crucial to be positive and proactive about engaging religious identity towards interfaith cooperation.
Pulitzer Prize-winner Jared Diamond explains why some nations make it through epic crises and why others fail.
- "A country is not going to resolve a national crisis unless it acknowledges that it's in a crisis," says Jared Diamond. "If you don't, you're going to get nowhere. Many Americans still don't recognize today that the United States is descending into a crisis."
- The U.S. tends to focus on "bad countries" like China, Canada and Mexico as the root of its problems, however Diamond points out the missing piece: Americans are generating their own problems.
- The crisis the U.S. is experiencing is not cause for despair. The U.S. has survived many tragedies, such as the War of Independence and the Great Depression – history is proof that the U.S. can get through this current crisis too.
If you don't want to know anything about your death, consider this your spoiler warning.
- For centuries cultures have personified death to give this terrifying mystery a familiar face.
- Modern science has demystified death by divulging its biological processes, yet many questions remain.
- Studying death is not meant to be a morbid reminder of a cruel fate, but a way to improve the lives of the living.
When it comes to sniffing out whether a source is credible or not, even journalists can sometimes take the wrong approach.
- We all think that we're competent consumers of news media, but the research shows that even journalists struggle with identifying fact from fiction.
- When judging whether a piece of media is true or not, most of us focus too much on the source itself. Knowledge has a context, and it's important to look at that context when trying to validate a source.