Electronic communication devices are evolving from robotic speech to personalized voices.
Max Plansky is 16 years old, and like any teenager, he is trying to find his own voice. But for Max, the effort is complicated by the fact that he needs to use a communication device to talk.
At nine months, Max was diagnosed with cerebral palsy, a neurological disorder. His vocal cords vibrate, and Max can make sounds. But the area of his brain affected, which controls his muscles, also controls his ability to speak. When Max wants to talk, he chooses words and sentences on an electronic device, which then speaks in a computerized voice often known as “Perfect Paul.”
Perfect Paul is a common option on devices like the one Max uses. But it doesn’t sound anything like a 16-year-old boy. It sounds more like an adult male robot.
But only recently have advances in computers and other technologies made it possible to analyze the characteristics and more ethereal aspects of human voices and mix them together to the point where an artificial voice doesn’t sound so, well, artificial.
Most of these efforts are commercial and geared toward improving the synthetic voices that give you directions or speak to you on the phone when you try to make a flight reservation. But researchers also are starting to apply the technology to help people, like Max, who due to medical conditions are forced to rely on electronic communication devices. They want a voice that not only sounds more natural, but that manages to capture who they are.
“Speech is not just a means of communication—it is a window into the soul,” says Matthew Aylett, chief scientific officer of an Edinburgh-based company, CereProc, that sells technology for personalizing synthetic voices.
To capture inflection and tone in a synthetic voice is an enormous challenge, Dr. Aylett says. People speak faster, louder or at a higher pitch when they are upset, or slower, softer and deeper when they are sad. The stronger the emotion, the harder it is to simulate, he says. “Telling a joke is also tough.”
Another thing that is hard to capture, says Dr. Aylett, is the myriad ways a voice can be used to convey different meanings of the same word. “When the waiter asks if you have decided what you want,” he says, “it means something different if you say ‘Yes!’ or ‘Yessss.’ ”
Building Max a personalized voice is not only a technical issue; it also involves questions about his identity. What sounds might capture the different aspects of Max’s personality? Moreover, if Max is to have a synthesized voice, should it sound like a 16-year-old speaking, or should it sound more like who Max might become later?
These are issues that Rupal Patel, a speech-technology professor on leave from Northeastern University, has been thinking about for years. Dr. Patel is chief executive of Vocal ID, a Belmont, Mass., company that custom-builds voices for the speech-impaired. She became interested in personalizing synthetic voices, she says, after attending an assistive-technology conference where she heard a young girl and an older man conversing by means of the same synthetic voice.
“How come this young girl has the same voice as this adult male?” Dr. Patel recalls wondering.