GazeSpeak runs on a smartphone and uses artificial intelligence to convert eye movements into speech, so a conversation partner can understand what is being said in real time.
By Timothy Revell, New Scientist February 17, 2017
The app runs on the listener’s device. They point their smartphone at the speaker as if they are taking a photo. A sticker on the back of the phone, visible to the speaker, shows a grid with letters grouped into four boxes corresponding to looking left, right, up and down. As the speaker gives different eye signals, GazeSpeak registers them as letters.
“For example, to say the word ‘task’ they first look down to select the group containing ‘t’, then up to select the group containing ‘a’, and so on,” says Xiaoyi Zhang, who developed GazeSpeak whilst he was an intern at Microsoft.
GazeSpeak selects the appropriate letter from each group by predicting the word the speaker wants to say based on the most common English words, similar to predictive text messaging. The speaker indicates they have finished a word by winking or looking straight ahead for two seconds. The system also takes into account added lists of words, like names or places that the speaker is likely to use. The top four word predictions are shown onscreen, and the top one is read aloud.
“We’re using computer vision to recognise the eye gestures, and AI to do the word prediction,” says Meredith Morris at Microsoft Research in Redmond, Washington.
The app is designed for people with motor disabilities like ALS, because eye movement can become the only way for people with these conditions to communicate. ALS progressively damages nerve cells, affecting a person’s ability to speak, swallow and eventually breathe. The eye muscles are often some of the last to be affected.
Board of the old
“People can become really frustrated when trying to communicate, so if this app can make things easier that’s a really good thing,” says Matthew Hollis from the Motor Neurone Disease Association.
There are currently limited options for people with ALS to communicate. The most common is to use boards displaying letters in different groups, with a person tracking the speaker’s eye movements as they select letters. But it can take a long time for someone to learn how to interpret these eye movements effectively.
GazeSpeak proved much faster to use in an experiment with 20 people trying both the app and the low-tech boards. Completing a sentence with GazeSpeak took 78 seconds on average, compared with 123 seconds using the boards. The people in the tests did not have ALS, but the team also got feedback on the technology from some people with ALS and their interpreters. One person who tried the device typed a test sentence in just 62 seconds and said he thought it would be even quicker in a real-life situation, as his interpreter can more easily predict what he is likely to say.
“I love the phone technology; I just think that would be so slick,” said one of the interpreters.
Other systems currently use software to track eye movements with infrared cameras. But these are often expensive and bulky, and infrared cameras don’t work very well in sunlight. The GazeSpeak app is portable and comparatively cheap, as it only requires an iOS device, like an iPhone or iPad, with the app installed.
Microsoft will present the app at the Conference on Human Factors in Computing Systems in Colorado in May. The researchers say it will be available on the Apple App Store before the conference, and the source code will be made freely available so that other people can help to improve it.
Source New Scientist
|Current eye-tracking input systems for people with ALS or other motor impairments are expensive, not robust under sunlight, and require frequent re-calibration and substantial, relatively immobile setups. Eye-gaze transfer (e-tran) boards, a low-tech alternative, are challenging to master and offer slow communication rates.
To mitigate the drawbacks of these two status quo approaches, we created GazeSpeak, an eye gesture communication system that runs on a smartphone, and is designed to be low-cost, robust, portable, and easy-to-learn, with a higher communication bandwidth than an e-tran board. GazeSpeak can interpret eye gestures in real time, decode these gestures into predicted utterances, and facilitate communication, with different user interfaces for speakers and interpreters.
Our evaluations demonstrate that GazeSpeak is robust, has good user satisfaction, and provides a speed improvement with respect to an e-tran board; we also identify avenues for further improvement to low-cost, low-effort gaze-based communication technologies. Published on YouTube Feb 18, 2017
Microsoft Ability Summit aims to bring ‘next wave’ of technology to empower people with disabilities in Microsoft Research
New tool allows paralyzed to speak — using their eyes in The Washington Post
The eyes have it: Eyefluence may have the answer to navigating AR/VR in Venture Beat
Paralyzed patients can control computers just by moving their eyes, thanks to this free software in Tech Insider
Team GLEASON – How NFL hero Steve Gleason is battling ALS with cutting-edge technology and the power of human spirit in Microsoft Story Labs
Breaking the silence at 16 years old with the words ‘Hello Mum’ in BBC News
Create notes and text with your eyes in GazeSpeaker
Seeing the future with eye-tracking control in Electronics Weekly
Eye tracking as a way for people with disabilities or in pain to play chess on tablets, laptops and desktop in ABLEity
Eye Tracking and Head Movement Detection: A State-of-Art Survey, Al-Rahayfeh A, Faezipour M. IEEE J Transl Eng Health Med. 2013 Nov 6;1:2100212. doi: 10.1109/JTEHM.2013.2289879