His speech-generating device (SGD), made by Intel, used an infrared switch mounted on his spectacles that caught the slightest twitches or movements in his cheek. He used these twitches to input the words he wanted to say to the computer through a tablet computer mounted on one arm of his wheelchair. The tablet’s keyboard had a cursor moving through rows or columns, and Hawking would stop the cursor with a twitch of his cheek when it reached the word he wanted.
This system, as technology advanced as it sounds, may soon become antiquated. Doctors at the University of California in San Francisco have successfully demonstrated a system that can translate brain activity into synthetic speech.
Setting out to create a product that allows paralysed people to communicate more fluidly than using devices that pick up eye movements and muscle twitches to control a virtual keyboard, the team designed a system that uses electrodes that map brain activity, together with machine learning algorithms, to produce more natural speech than existing solutions allow for.
The study was funded by Facebook, and used epilepsy patients who volunteered to help lead researcher Dr Edward Chang trial his idea. The volunteers were in hospital preparing for neurosurgery for their condition, and had electrodes placed directly on the brain for at least a week to map the origins of their seizures. Dr Chang used the electrodes to record brain activity while each patient was asked nine set questions and asked to read a list of 24 potential responses.
The team then built computer models that learned to match particular patterns of brain activity to the questions the patients heard and the answers they spoke. Once trained, the software could identify almost instantly – from the brain signals alone – which question a patient heard and what response they gave, with an accuracy of 76% and 61% respectively.
The volunteers initially responded out loud to the questions, training the machine learning algorithms to detect when they were hearing a new question or beginning to respond, and to identify which of the two dozen standard responses the volunteer was giving. The team found that context is important, and helped to improve the algorithm’s speed and accuracy, so identifying the question as well as the answer raised the accuracy levels.
The system allowed patients to answer questions about the music they liked, how well they were feeling, whether their room was too hot or cold, or too bright or dark, and when they would like to be checked on again. In its current form, the system works only for the stock sentences it has been trained on, but the scientists are hoping it is a stepping stone towards a more powerful system that can decode the words a person intends to say in real time.
The researchers used Amazon’s Mechanical Turk crowdsourcing marketplace to test how intelligible their system was. Native English speakers were asked to transcribe the sentences they heard. The listeners accurately heard the sentences 43% of the time when given a set of 25 possible words to choose from, and 21% of the time when given 50 words.
Chang believes that that the technology, which has only been tested on people with typical speech, might be much harder to make work for those who cannot speak—and particularly for people who have never been able to speak because of a movement disorder such as cerebral palsy. However, the team is already working on making the technology more natural and more intelligible.
While the study is generating excitement in the field, the researchers say the technology is not yet ready for clinical trials. Even if they are years away, the study is providing hope for people who are unable to speak to their loved ones due to illness.