Ramifications of a linguist's life: speech recognition

Thursday, July 9, 2009

iPhone for hearing loss?

Apparently, now you can cure your hearing loss with a $9.99 gadget that works to turn your iPhone into a hearing aid according to this CNET article. Huh? Why is everyone with an acquired hearing loss then spending thousands of dollars to get state-of-the-art hearing aids? The article is ludicrous. Obviously what we have all missed was that a combo of a simple volume amplifier and an equalizer is enough to restore one's hearing to optimal levels. Ginger Labs claim that "For those with a hearing loss, soundAMP reawakens your sense of hearing; sounds comes to life and you hear better again." For some reason this excerpt reminds me of these bogus $5 "hearing aids" ads you see in the back pages of pop U.S. magazines.

SoundAMP offers a feature that repeats the last seconds of the recorded sound. For a person with acquired

hearing loss, this is actually a useless feature! Let me first note that repeating "recorded sounds" is not the same

as repeating speech/conversation (which is what the article says this feature does). "Pure tone" sounds are usually

no problem for people who use hearing aids; all you need to "perceive" these sounds is an amplifier and any

hearing aid can fit that bill. Speech is much more complex and presents a problem of a taller order. Deciphering

speech involves -beyond capacity for perceiving plain sensory input- recognition, processing and understanding,

all very different cognitive processes in the brain.

The article states: "According to the developers, SoundAMP improves your hearing quality in a variety of

environments, including lecture halls and noisy restaurants. Thus, it has the potential to help students as well as the

hearing-impaired." This quote is just as vague as they come!! How exactly are students and "the hearing impaired"

similar audiences? It just shows the author's blatant ignorance of the mechanics and complexities of hearing loss

and related hearing aid technology. For sure, if hearing loss could improve in as simple a manner as turning up the volume

in an iPhone, none of us would be spending so much money in buying *actual* hearing aids, nor would the government

bother with research into hearing loss, and related brain, speech and signal processing tech. Let's set the record

straight: simply amplifying the volume may help about 5-15% of people with acquired hearing loss. The same

people can actually perform a simple surgery which restores their (conductive) hearing loss into "normal" levels.

The majority of people with acquired hearing loss need much more than a $5 amplifier with equalizing and recording

options.

They need a state-of-the-art speech processor. Of course such a feature could potentially improve its usefulness

if it could repeat sounds after a (slight) frequency mapping procedure, where a shift is made around areas where

people have reduced sensitivity for certain frequencies. However, that's exactly what hearing aids are (supposed

to be) doing! At least in theory, they are programmed to tweak your auditory input to match your audiogram's levels.

However, both the audiogram and the 'tweaking' process have severe limitations. For instance, audiograms are not

exact science! And hearing aids, depending on their underlying digital or analog platform, may not allow you any type

or level of 'tweaking'. And of course even if you had the perfect hearing aid to match your individualized (perfectly

measured) audiogram, there is always the X-factor of how your brain works with all this. Some people are unable to

benefit by state-of-the-art hearing aids despite their best efforts.

Maybe an improvement could be seen in replaying the recorded sound after applying a noise filter for specific problem

frequencies, e.g. a white noise filter, or every freq > 5000 Hz? I would never trust a machine to provide any noise filters

for me, beyond when applied in the extremes of the sound band. Speech is so much more complex than that.

Current state-of-the-art hearing aids offer a number of programmable channels. Users can try and test any one of

those channels that functions as an automatic noise filter; such a channel adjusts environmental sounds and speech input

to the user's audiogram and decides which frequencies to filter out based on indications in your audiogram. For many

of the reasons I mentioned above, this is utterly useless! Real speech and real brains are multifaceted dynamic systems;

no tech has come to address that level of complexity yet.

In an older post of mine posted in this blog last summer, I complained how the new iPhone was not hearing aid

compatible like most popular cell phones are. iPhone doesn't come with a hearing aid compatibility standard, so

you have no way to tell how well it fares with the microphone and telecoil of a modern hearing aid.

Incidentally, if you're curious about the hearing-aid compatibility standards for various popular hearing aids, go

to Phonescoop Phone Finder and choose a weighted search showing all options, where you can search for phones with high hearing aid compatibility. Anything about M3/T3 is what most people with acquired hearing loss need (M4/T4 is the top score and most phone may not accommodate the same rates for both M(icrophone) and T(elecoil)). The telecoil switch is what offers elimination of background noise in modern hearing aids and it's very useful for speaking on the phone while in the middle of the street or some other public area. When you have your telecoil switch on, though, you lose the volume of the microphone so you usually need pretty good phone compatibility with both your hearing aid's microphone and telecoil. Without a hearing aid compatibility (HAC) standard, you cannot tell if your new phone will work with your hearing aid without interference. In terms of usability, inference is critical since it renders hearing aids useless and the phones by extension become anything but a nuisance to people with acquired hearing loss. For the hearing impaired, speech perception is near impossible with the aid picking up buzzing, humming of whining noise when the user is on the phone.

Although it's nice to see anything like an improvement of sound clarity on iPhone, this is still nowhere near what customers with acquired hearing loss need to see in order to use an iPhone for listening purposes (not just music but also for speech).

Monday, December 22, 2008

SpinVox (again)

SpinVox's voicemail/speech-to-text service becomes increasingly important for my mobile communications. Today I noticed that although they had screwed up the names (a person name and a company name) on the message, they actually had just spelled them phonetically. The person who was calling was a stranger so I had no way to know the correct name. However, I took a few educated guesses about the possibly correct spellings based on the assumption that the resulting spelling was a phonetic representation of the input string. And I was right! Within seconds of googling, I found both the person's name and this person's affiliated company name.
A suggestion for SpinVox: Maybe run Soundex on your interface and pick the most frequently correct spelling for the names the engine recognizes. I think that would considerably improve performance.

Wednesday, December 17, 2008

SpinVox

I signed up to try SpinVox's voicemail-to-text service on my mobile phone. They quickly set that up and I was impressed that it's a complimentary demo. Performance seemed to be lacking in proper name entity recognition whereas not so much in catching exotic accents. I had a Danish friend leave a voicemail for me with the details of our following day's meeting. SpinVox caught everything my Danish friend said but for her (Danish) name, my (Greek) name, and the name of the place (unfortunately for SpinVox we were meeting at a local Starbucks, so no excuses for not catching that! /wahaha.......... /hmm).
To SpinVox's credit, the call was placed in the middle of the street, a lot of noise in the background and the caller had an accent.
However, in actual conditions (if I really depended on that converted to text voicemail for my meeting) their performance was poor and the text I got useless as unfortunately the proper names SpinVox missed were critical information. For instance, I wouldn't know who was calling since they screwed up their name, and I wouldn't know where she wanted to meet because SpinVox didn't catch the place name. So in that respect, although an admirable effort, it leaves a lot to be desired.
Maybe users could build their own local dictionaries of names based on their -say- address books. They could upload to the SpinVox's server a dictionary of names pronounced with the particular user's accent to help augment SpinVox's central server's dictionary of names and accents. Still, a lot of real-time speech comes with a high unpredictability factor as various callers are expected to call the particular user. Only an adaptive speech recognition system could actually learn from ad hoc input in order to improve itself. Imagine for instance if every time I had a new caller, SpinVox could learn to memorize and subsequently recognize their accent and linguistic model; so, if your boss has an American accent and usually talks about project XYZ and meeting you at Room 234B in Building ABC, SpinVox could learn to expect this type of "talk" (and accent) when he next calls you. That would of course improve speech recognition accuracy and it would involve the successful marriage of a memory (lexicon/vocabulary + accents) with an adaptive learning algorithm.

Ramifications of a linguist's life