TIP: Click on subject to list as thread! ANSI
echo: audio
to: BONNIE GOODWIN
from: MATT ION
date: 1996-06-10 12:14:00
subject: Re: Voice Recognition Softwar

And so it came to pass, on 08-06-96 23:50,
   that Bonnie Goodwin spake unto Steve Holmes:
 BG>  Voice Recognition is a little beyond the general scope of this
 BG> echo, but much of what is required to accompish it is based on
 BG> acoustical principles. 
Anything to tie it in and keep the moderator happy :)
 BG> I first saw it demonstrated at the IBM exhibit at the Seattle
 BG> World's Fair when I was very young, where I also saw Bell Telephone
 BG> demonstrate video phones, It was on a washing machine sized
 BG> computer that was about the equivalent of a four function
 BG> calculator today. The guy demoing it (unsucessfully) had a Texas
 BG> accent, and I wondered how it was going to universally detect what
 BG> he or anyone else said, which is the biggest difficulty in voice
 BG> recognition, that is making utterances said into a microphone into
 BG> something useful for the computer. Getting a library of the sounds
 BG> that a person says that relates to commands the computer
 BG> recognizes. Translating the sounds from the microphone into that
 BG> database requires differenciating between all of the various sounds
 BG> that speech uses which would have to be taught. 
Bonnie, that's the difference between VOICE recognition, and the SPEECH 
recognition now used by IBM's system.  In a voice recognition system, the 
user must train the system with specific commands, and those commands must 
subsequently be spoken exactly the same as they were recorded in the first 
place.  It also limits the recognition to the words that have been 
pre-recorded, and to the user(s) that trained it.
 BG> One way this can be done is to analyze the seperate components of
 BG> speech spectrally over time and use that database of sounds to
 BG> compare with the commands input. 
This is essentially how ICSS and VTD work - speech is broken down into 
"phenoms" - its individual sounds.  This lets it operate without special 
training for a wide range of voices and speech styles.
The VTD dictation system also uses a complex context-sensing system - it may 
first put down what word it THINKS the user said, but will continue to 
monitor the context and will go back and change the word if it doesn't fit 
later context.
 BG> I've used this kind of analysis for many years in aligning sound
 BG> systems. It can tend to be very intensive in amounts of data
 BG> required to represent all of the sounds required for speech.
 BG> Certainly there is ways to reduce this data sizably from real time
 BG> audio. The IBM Aptiva is advertised as having voice recognition
 BG> capabilities but I haven't had a chance to try it out yet. 
The Aptiva also uses a variation on the ICSS *speech* recognition technology. 
 The ones that come with Windows preloaded typically only have Navigation 
enabled, while the ones that come with Warp installed have Dictation 
activated as well (apparently Windows just doesn't have the nards for VTD to 
operate satisfactorily :).
 BG> I suspect that there is still a lot of work to be done before it
 BG> will be commonplace in day to day usage.
Watch for Merlin!  It's an integrated part of the OS now.
IKEA ... Swedish for "particle board."
--- Sqed/32 1.10/unreg
---------------
* Origin: la Point Strangiato... (1:153/7040.106)

SOURCE: echomail via exec-pc

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.