Speech is a window into our brains—and not just when we’re healthy. When neurological issues arise, they often manifest in the things we say and how we say them.
IBM computer scientist Guillermo Cecchi came to appreciate just how important language is in medicine as a psychiatry postdoc at Weill Cornell Medicine in the early 2000s. Despite advances in brain imaging, “it’s still [through] behavior, and fundamentally through language, that we assess mental health,” he says. “And we deal with it through therapy. . . . Language is essential for that.”
In the digital age, hardware and software are available for “natural language processing”—a type of artificial intelligence (AI) pioneered by IBM’s Watson that extracts useful content from written and spoken language. But while companies such as Google and Facebook use language processing to evaluate our social media interactions, emails, and browsing histories in order to personalize the ads we see in our news feeds, the tools have yet to be harnessed for medical applications. In the clinic, “all that technology is completely ignored,” says Cecchi. “We still judge the language production of the patient on a subjective basis.”
With recent advances in AI, that’s starting to change, and Cecchi’s team at IBM is one of several groups now developing machine learning algorithms to analyze patient language. “I would say in the last five years there’s been an explosion of interest,” he says. The approaches are all in the earliest stages of development, with most models suffering from small training and testing datasets. But several studies have yielded promising results across a range of psychiatric and neurological conditions.
One area that Cecchi has explored is the prediction of schizophrenia and psychosis onset. Because in these conditions “it’s your thought process that is disordered,” as Cecchi explains, their connection to language is intimate. A few years ago, Cecchi and his colleagues developed a machine learning algorithm to analyze two features of speech known to be affected in psychosis: syntactic complexity and semantic coherence, a technical term for the flow of meaning.
In two small-scale validation studies, Cecchi’s team trained the algorithm using transcripts of interviews with patients, and showed that the resulting model could predict, with 85–100 percent accuracy, if psychosis onset was imminent (in the next two years) in young, high-risk patients (npj Schizophr, 1:15030, 2015; World Psychiatry, 17:67–75, 2018).
The model is a long way from clinical use, cautions Cecchi, noting that the studies included data from just a few dozen subjects. “We need to reach sample sizes [of] several thousand to say we are absolutely sure this is working.” But he suspects that more work will support the use of such AI-based approaches, not only for helping psychiatrists diagnose psychosis, but also for monitoring patients who suffer from psychotic disorders.
And it’s not just psychosis, he emphasizes. “The major disorders affecting our society—depression, PTSD, addiction, and then neurological disorders, Alzheimer’s disease, Parkinson’s disease—all of them leave a mark in language.” A few years ago, for example, his group developed machine learning models that predict Parkinson’s diagnoses and severity with about 75 percent accuracy based on transcripts of patients describing their typical day (Brain Lang, 162:19–28, 2016).
While the linguistic content of speech can reveal a lot about how a person’s brain is functioning, other aspects of spoken language, such as voice, tone, and intonation, could provide additional clues about a person’s physical and mental health. “If you have a cold, the sound of your voice changes,” notes MIT computer scientist James Glass, who has investigated AI analyses of speech for detection of cognitive impairment and depression.
Monitoring people’s health by listening to the sound of their voice is the focus of researchers at Sonde Health, a startup based in Boston, Massachusetts, that aims to integrate voice-analysis technology into consumer devices such as Google Home and Amazon Echo. Company cofounder Jim Harper says the team has already developed machine learning models to predict more than 15 conditions, including neurological, respiratory, and muscular or cardiac disorders, based on the acoustic properties of short fragments of speech. The early models work “almost as well as existing measurements,” Harper says, noting that the company is already in talks with the US Food and Drug Administration about its model for detecting depression and hopes to begin a clinical study within the year.
While several groups are investigating either linguistic or acoustic elements of spoken language, a combination of the two often yields the best results.
Examining qualities of speech, such as the tone and expressiveness of a person’s voice, can be particularly revealing for identifying movement disorders such as Parkinson’s disease, which can disrupt the functioning of muscles involved in speech. After being diagnosed, some Parkinson’s patients will recognize that one of the first symptoms was a flat speaking tone, for example.
Laureano Moro-Velázquez, a telecommunications engineer at Johns Hopkins University, and his colleagues are using machine learning to analyze phonemes—the discrete sounds that compose speech—as a means of diagnosis. This February, the team published a model, trained with recordings of sentences recited by about 100 people with Parkinson’s and 100 controls, that could determine, with more than 80 percent accuracy, whether or not someone had the disease (Biomed Signal Process Control, 48:205–20, 2019).
While several groups are investigating either linguistic or acoustic elements of spoken language, a combination of the two often yields the best results. Cecchi’s group, for example, recently used an analysis of recordings of Parkinson’s patients’ speech—considering both acoustic and linguistic features—to successfully identify who was taking the drug levodopa (bioRxiv, doi: 10.1101/420422, 2018).
And Tuka Alhanai, a graduate student in Glass’s lab at MIT, has developed a machine learning model to extract clues about whether a person is depressed from text and audio recordings of interviews. The model learned—both from the words used to respond to questions and from other features of speech, such as its speed—to predict, with 77 percent accuracy, depression in 142 patients whose data were held in a public repository, according to results Alhanai presented at the Interspeech conference in September 2018.
For now, the use of speech analysis is still in the proof-of-concept stage, whatever aspects are analyzed. “I think everything suffers from small databases,” says Glass. “Do it on something ten or a hundred times bigger, and I’ll pay more attention.” But if validation of the early-stage work proves as successful as many in the field anticipate and hope, he says, “I think it just opens up new opportunities to complement existing techniques and maybe provide more comprehensive [care].”