Silent Speech Recognition

Thad Starner
Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo

In this study, we address the problem of performing continuous speech recognition where audio is not available (e.g., due to a medical condition) or is highly noisy (e.g. during fighting or combat). Our Tongue Magnet Interface (TMI) uses 3-axis magnetometers to measure the movement of a small magnet glued to the user's tongue. Tongue movement corresponding to speech is isolated from the continuous data by comparing the variance of a sliding window of data to the variance of a signal corresponding to silence. Recognition relied on hidden Markov model (HMM) based techniques. Using a custom headset with four magnetometers placed close to the cheeks of the participant, a maximum user dependent recognition rate of 99.8% is achieved for a phrase set of 12 sentences spoken by able-bodied participants. The average accuracy across four users is 95.9%. Using the single magnetometer aboard Google Glass, a commercial wearable computing device worn at eye level, one of 12 phrases could be selected with 93.8% average accuracy. To improve the latter recognition result we introduced a new interface, known as the Outer Ear Interface (OEI), which captures the lower jaw movements by measuring the deformation it causes in the ear canal. This measurement is done using a pair of infrared proximity sensors, one in each ear. We hypothesize that combining features from both interfaces will improve accuracy results significantly.

Thad Starner

The Contextual Computing Group (CCG) creates wearable and ubiquitous computing technologies using techniques from artificial intelligence (AI) and human-computer interaction (HCI). We focus on giving users superpowers through augmenting their senses, improving learning, and providing intelligent assistants in everyday life. Members' long-term projects have included creating wearable computers (Google Glass), teaching manual skills without attention (Passive Haptic Learning), improving hand sensation after traumatic injury (Passive Haptic Rehabilitation), educational technology for the Deaf community, and communicating with dogs and dolphins through computer interfaces (Animal Computer Interaction).