For the first time, scientists report they have devised a method that uses functional magnetic resonance imaging brain recordings to reconstruct continuous language. The findings are the next step in the quest for better brain-computer interfaces, which are being developed as an assistive technology for those who can’t speak or type.
In a preprint posted September 29 on bioRxiv, a team at the University of Texas at Austin details a “decoder,” or algorithm, that can “read” the words that a person is hearing or thinking during a functional magnetic resonance imaging (fMRI) brain scan. While other teams had previously reported some success in reconstructing language or images based on signals from implants in the brain, the new decoder is the first to use a noninvasive method to accomplish this.
“If you had asked any cognitive neuroscientist in the world twenty years ago if this was doable, they would have laughed you out of the room,” says Alexander Huth, a neuroscientist at the University of Texas at Austin and a coauthor on the study.
Yukiyasu Kamitani, a computational neuroscientist at Kyoto University who was not involved in the research, writes in an email to The Scientist that it’s “exciting” to see intelligible language sequences generated from a noninvasive decoder. “This study . . . sets a solid ground for [brain-computer interface] applications,” he says.
Using fMRI data for this type of research is difficult because it is rather slow compared to the speed of human thoughts. Instead of detecting the firing of neurons, which happens on the scale of milliseconds, MRI machines measure changes in blood flow within the brain as proxies for brain activity; such changes take seconds. The reason the setup in this research works, says Huth, is that the system is not decoding language word-for-word, but rather discerning the higher-level meaning of a sentence or thought.
See “New MRI Technique Tracks Brain Activity at Millisecond Timescales”
Huth and his colleagues trained their algorithm with fMRI brain recordings taken as three study subjects—one woman and two men, all in their 20s or 30s—listened to 16 hours of podcasts and radio stories: The Moth Radio Hour, TED talks, and John Green’s Anthropocene Reviewed were among the media used. To build an accurate and widely applicable decoder, Huth says it was important that the research subjects listened to a broad range of media. He notes that the amount of fMRI data collected matches most other studies that use fMRI recordings, though his had fewer research subjects.
Based on its training on the 16 hours of fMRI recordings of the individual’s own brain, the decoder made a set of predictions of what the fMRI readings would look like. Using these “guesses” was the key to ensuring that the decoder was able to translate thoughts that didn’t relate to one of the known audio recordings used in the training, according to Huth. These “guesses” were then checked against the real-time fMRI recording, and the prediction that most closely matched the real reading determined the words the decoder finally generated.
To determine how successful the decoder was, the researchers scored the similarity of the decoder’s generation to the stimulus presented to the subject. They also scored language generated by the same decoder that had not been checked against an fMRI recording. They then compared those scores and tested the statistical significance of the difference between the two.
The results indicated that the algorithm’s guess-and-check procedure eventually generates a whole story from fMRI recordings, which, says Huth, matches “pretty well” with the actual story being told in the audio recording. However, it does have some shortcomings; for example, it isn’t very good at conserving pronouns and often mixes up first- and third-person. The decoder, says Huth, “knows what’s happening pretty accurately, but not who is doing the things.”
Sam Nastase, a researcher and lecturer at the Princeton Neuroscience Institute who was not involved in the research, says using fMRI recordings for this type of brain decoding is “mind blowing,” since such data are typically so slow and noisy. “What they’re showing with this paper is that if you have a smart enough modeling framework, you can actually pull out a surprising amount of information” from fMRI recordings, he says.
The system is not decoding language word-for-word, but rather discerning the higher-level meaning of a sentence or thought.
Since the decoder uses noninvasive fMRI brain recordings, Huth says it has higher potential for real-world application than do invasive methods, though the expense and inconvenience of using MRI machines is an obvious challenge. Magnetoencephalography, another noninvasive, but more portable, brain imaging technique that is more temporally precise than fMRI, could potentially be used with a similar computational decoder to provide nonverbal people a method of communication, he says.
Huth says the most exciting element of the decoder’s success is the insight it affords into the workings of the brain. For instance, he notes, the results reveal which parts of the brain are responsible for creating meaning. By using the decoder on recordings of specific areas such as the prefrontal cortex or the parietal temporal cortex, the team could determine which part was representing what semantic information. One of their findings was that those two parts of the brain represented the same information to the decoder, and the decoder worked similarly well when using recordings from either brain region.
Most surprising, adds Huth, is that the decoder was able to reconstruct stimuli that didn’t use semantic language, even though it was trained on subjects listening to spoken language. For example, after training, the algorithm successfully reconstructed the meaning of a silent film subjects viewed, as well as a participant’s imagined experience of telling a story. “The fact that those things are so overlapping [in the brain] is something we’re just starting to appreciate,” he says.
For both Kamitani and Nastase, the Huth lab’s results, which have not yet been peer reviewed, bring up questions about how decoders process underlying meaning versus text-like or speech-like language. Since the new decoder detects meaning, or semantics, rather than individual words, its success can be difficult to measure, as numerous combinations of words could count as a “good” output, says Nastase. “It’s an interesting problem that they’re introducing,” he says.
Huth acknowledges that to some, technology that is able to effectively “read minds” can be a bit “creepy.” He says his team has thought deeply about the implications of the research, and, out of concern for mental privacy, examined whether the decoder would work without the participant’s willing cooperation. In some trials, while audio was being played, the researchers asked the subjects to distract themselves by performing other mental tasks, like counting, naming and imagining animals, and imagining telling a different story. Naming and imagining animals was most effective at rendering the decoding inaccurate, they found.
Also notable from a privacy point of view is that a decoder trained on one individual’s brain scans could not reconstruct language from another individual, Huth says, returning “basically no usable information” in the study. So someone would need to participate in extensive training sessions before their thoughts could be accurately decoded.
To Nastase, the fact that the researchers looked for evidence of mental privacy protections was encouraging. “You could very easily have published this paper six months ago without any of those [privacy] experiments,” he says. However, he adds, he’s not convinced the authors definitively showed that privacy won't be a concern down the road, since future research could possibly find ways around the mental privacy stopgaps detailed by the researchers. “It’s a question of if the benefits of technology like this outweigh the possible pitfalls,” Nastase says.