Comparisons of Language Network Representations for a Constrained Vocabulary Speech Recognition Task Using Commercially Available Software
Location
CSU 253
Start Date
25-4-2006 10:30 AM
End Date
25-4-2006 11:30 AM
Student's Major
Computer Information Science
Student's College
Science, Engineering and Technology
Mentor's Name
Colin Wightman
Mentor's Department
Computer Information Science
Mentor's College
Science, Engineering and Technology
Second Mentor's Name
Rebecca Bates
Second Mentor's Department
Computer Information Science
Second Mentor's College
Science, Engineering and Technology
Description
Speech recognition is the process of converting acoustic waveforms into text. This requires models that map acoustics to words and a language model that estimates the probabilities of hypothesized word sequences. Given a set of acoustic models, different language models can produce profoundly different recognition results. This work uses the Microsoft speech recognition engine that is available with Windows XP to recognize a set of test utterances from a military communications task. The engine comes with its own language model that is intended for use in general dictation applications. Even though this model was constructed with a large amount of training data, it is unlikely to perform well on the military task, which uses many word sequences that would be regarded as unlikely in typical office dictation. There is minimal training data available for developing a new language model for the military communications task so direct construction of a statistical language model is not feasible. For this particular application, the structure of the messages is well-known and highly-constrained. A finite-state network, which requires a relatively small amount of data to generate and which explicitly represents the constraints on the task, will replace the default language model in the recognition engine. Comparisons of the networks will be presented to show the effects of using a well-tuned model on the task.
Comparisons of Language Network Representations for a Constrained Vocabulary Speech Recognition Task Using Commercially Available Software
CSU 253
Speech recognition is the process of converting acoustic waveforms into text. This requires models that map acoustics to words and a language model that estimates the probabilities of hypothesized word sequences. Given a set of acoustic models, different language models can produce profoundly different recognition results. This work uses the Microsoft speech recognition engine that is available with Windows XP to recognize a set of test utterances from a military communications task. The engine comes with its own language model that is intended for use in general dictation applications. Even though this model was constructed with a large amount of training data, it is unlikely to perform well on the military task, which uses many word sequences that would be regarded as unlikely in typical office dictation. There is minimal training data available for developing a new language model for the military communications task so direct construction of a statistical language model is not feasible. For this particular application, the structure of the messages is well-known and highly-constrained. A finite-state network, which requires a relatively small amount of data to generate and which explicitly represents the constraints on the task, will replace the default language model in the recognition engine. Comparisons of the networks will be presented to show the effects of using a well-tuned model on the task.
Recommended Citation
Thom, Alex and Ivan Marte. "Comparisons of Language Network Representations for a Constrained Vocabulary Speech Recognition Task Using Commercially Available Software." Undergraduate Research Symposium, Mankato, MN, April 25, 2006.
https://cornerstone.lib.mnsu.edu/urs/2006/oral-session-N/4