SpinVox's voicemail/speech-to-text service becomes increasingly important for my mobile communications. Today I noticed that although they had screwed up the names (a person name and a company name) on the message, they actually had just spelled them phonetically. The person who was calling was a stranger so I had no way to know the correct name. However, I took a few educated guesses about the possibly correct spellings based on the assumption that the resulting spelling was a phonetic representation of the input string. And I was right! Within seconds of googling, I found both the person's name and this person's affiliated company name.
A suggestion for SpinVox: Maybe run Soundex on your interface and pick the most frequently correct spelling for the names the engine recognizes. I think that would considerably improve performance.
Everything that comes to mind about language, linguistic software, and life in between. Just an alternative linguist's blog, I guess.
Showing posts with label Spin Vox. Show all posts
Showing posts with label Spin Vox. Show all posts
Monday, December 22, 2008
SpinVox (again)
Wednesday, December 17, 2008
SpinVox
I signed up to try SpinVox's voicemail-to-text service on my mobile phone. They quickly set that up and I was impressed that it's a complimentary demo. Performance seemed to be lacking in proper name entity recognition whereas not so much in catching exotic accents. I had a Danish friend leave a voicemail for me with the details of our following day's meeting. SpinVox caught everything my Danish friend said but for her (Danish) name, my (Greek) name, and the name of the place (unfortunately for SpinVox we were meeting at a local Starbucks, so no excuses for not catching that! /wahaha.......... /hmm).
To SpinVox's credit, the call was placed in the middle of the street, a lot of noise in the background and the caller had an accent.
However, in actual conditions (if I really depended on that converted to text voicemail for my meeting) their performance was poor and the text I got useless as unfortunately the proper names SpinVox missed were critical information. For instance, I wouldn't know who was calling since they screwed up their name, and I wouldn't know where she wanted to meet because SpinVox didn't catch the place name. So in that respect, although an admirable effort, it leaves a lot to be desired.
Maybe users could build their own local dictionaries of names based on their -say- address books. They could upload to the SpinVox's server a dictionary of names pronounced with the particular user's accent to help augment SpinVox's central server's dictionary of names and accents. Still, a lot of real-time speech comes with a high unpredictability factor as various callers are expected to call the particular user. Only an adaptive speech recognition system could actually learn from ad hoc input in order to improve itself. Imagine for instance if every time I had a new caller, SpinVox could learn to memorize and subsequently recognize their accent and linguistic model; so, if your boss has an American accent and usually talks about project XYZ and meeting you at Room 234B in Building ABC, SpinVox could learn to expect this type of "talk" (and accent) when he next calls you. That would of course improve speech recognition accuracy and it would involve the successful marriage of a memory (lexicon/vocabulary + accents) with an adaptive learning algorithm.
To SpinVox's credit, the call was placed in the middle of the street, a lot of noise in the background and the caller had an accent.
However, in actual conditions (if I really depended on that converted to text voicemail for my meeting) their performance was poor and the text I got useless as unfortunately the proper names SpinVox missed were critical information. For instance, I wouldn't know who was calling since they screwed up their name, and I wouldn't know where she wanted to meet because SpinVox didn't catch the place name. So in that respect, although an admirable effort, it leaves a lot to be desired.
Maybe users could build their own local dictionaries of names based on their -say- address books. They could upload to the SpinVox's server a dictionary of names pronounced with the particular user's accent to help augment SpinVox's central server's dictionary of names and accents. Still, a lot of real-time speech comes with a high unpredictability factor as various callers are expected to call the particular user. Only an adaptive speech recognition system could actually learn from ad hoc input in order to improve itself. Imagine for instance if every time I had a new caller, SpinVox could learn to memorize and subsequently recognize their accent and linguistic model; so, if your boss has an American accent and usually talks about project XYZ and meeting you at Room 234B in Building ABC, SpinVox could learn to expect this type of "talk" (and accent) when he next calls you. That would of course improve speech recognition accuracy and it would involve the successful marriage of a memory (lexicon/vocabulary + accents) with an adaptive learning algorithm.
Labels:
accent,
adaptive,
algorithm,
Danish,
entity,
learning,
lexicon,
memory,
mobile,
place,
proper name,
recognition,
speech recognition,
Spin Vox,
SpinVox,
Starbucks,
text,
vocabulary,
voicemail
Subscribe to:
Posts (Atom)