And so it came to pass, on 08-06-96 23:50,
that Bonnie Goodwin spake unto Steve Holmes:
BG> Voice Recognition is a little beyond the general scope of this
BG> echo, but much of what is required to accompish it is based on
BG> acoustical principles.
Anything to tie it in and keep the moderator happy :)
BG> I first saw it demonstrated at the IBM exhibit at the Seattle
BG> World's Fair when I was very young, where I also saw Bell Telephone
BG> demonstrate video phones, It was on a washing machine sized
BG> computer that was about the equivalent of a four function
BG> calculator today. The guy demoing it (unsucessfully) had a Texas
BG> accent, and I wondered how it was going to universally detect what
BG> he or anyone else said, which is the biggest difficulty in voice
BG> recognition, that is making utterances said into a microphone into
BG> something useful for the computer. Getting a library of the sounds
BG> that a person says that relates to commands the computer
BG> recognizes. Translating the sounds from the microphone into that
BG> database requires differenciating between all of the various sounds
BG> that speech uses which would have to be taught.
Bonnie, that's the difference between VOICE recognition, and the SPEECH
recognition now used by IBM's system. In a voice recognition system, the
user must train the system with specific commands, and those commands must
subsequently be spoken exactly the same as they were recorded in the first
place. It also limits the recognition to the words that have been
pre-recorded, and to the user(s) that trained it.
BG> One way this can be done is to analyze the seperate components of
BG> speech spectrally over time and use that database of sounds to
BG> compare with the commands input.
This is essentially how ICSS and VTD work - speech is broken down into
"phenoms" - its individual sounds. This lets it operate without special
training for a wide range of voices and speech styles.
The VTD dictation system also uses a complex context-sensing system - it may
first put down what word it THINKS the user said, but will continue to
monitor the context and will go back and change the word if it doesn't fit
later context.
BG> I've used this kind of analysis for many years in aligning sound
BG> systems. It can tend to be very intensive in amounts of data
BG> required to represent all of the sounds required for speech.
BG> Certainly there is ways to reduce this data sizably from real time
BG> audio. The IBM Aptiva is advertised as having voice recognition
BG> capabilities but I haven't had a chance to try it out yet.
The Aptiva also uses a variation on the ICSS *speech* recognition technology.
The ones that come with Windows preloaded typically only have Navigation
enabled, while the ones that come with Warp installed have Dictation
activated as well (apparently Windows just doesn't have the nards for VTD to
operate satisfactorily :).
BG> I suspect that there is still a lot of work to be done before it
BG> will be commonplace in day to day usage.
Watch for Merlin! It's an integrated part of the OS now.
IKEA ... Swedish for "particle board."
--- Sqed/32 1.10/unreg
---------------
* Origin: la Point Strangiato... (1:153/7040.106)
|