Comparisons of Language Network Representations for a Constrained Vocabulary Speech Recognition Task Using Commercially Available Software

Location

CSU 253

Start Date

25-4-2006 10:30 AM

End Date

25-4-2006 11:30 AM

Student's Major

Computer Information Science

Student's College

Science, Engineering and Technology

Mentor's Name

Colin Wightman

Mentor's Department

Computer Information Science

Mentor's College

Science, Engineering and Technology

Second Mentor's Name

Rebecca Bates

Second Mentor's Department

Computer Information Science

Second Mentor's College

Science, Engineering and Technology

Description

Speech recognition is the process of converting acoustic waveforms into text. This requires models that map acoustics to words and a language model that estimates the probabilities of hypothesized word sequences. Given a set of acoustic models, different language models can produce profoundly different recognition results. This work uses the Microsoft speech recognition engine that is available with Windows XP to recognize a set of test utterances from a military communications task. The engine comes with its own language model that is intended for use in general dictation applications. Even though this model was constructed with a large amount of training data, it is unlikely to perform well on the military task, which uses many word sequences that would be regarded as unlikely in typical office dictation. There is minimal training data available for developing a new language model for the military communications task so direct construction of a statistical language model is not feasible. For this particular application, the structure of the messages is well-known and highly-constrained. A finite-state network, which requires a relatively small amount of data to generate and which explicitly represents the constraints on the task, will replace the default language model in the recognition engine. Comparisons of the networks will be presented to show the effects of using a well-tuned model on the task.

This document is currently not available here.

Share

COinS
 
Apr 25th, 10:30 AM Apr 25th, 11:30 AM

Comparisons of Language Network Representations for a Constrained Vocabulary Speech Recognition Task Using Commercially Available Software

CSU 253

Speech recognition is the process of converting acoustic waveforms into text. This requires models that map acoustics to words and a language model that estimates the probabilities of hypothesized word sequences. Given a set of acoustic models, different language models can produce profoundly different recognition results. This work uses the Microsoft speech recognition engine that is available with Windows XP to recognize a set of test utterances from a military communications task. The engine comes with its own language model that is intended for use in general dictation applications. Even though this model was constructed with a large amount of training data, it is unlikely to perform well on the military task, which uses many word sequences that would be regarded as unlikely in typical office dictation. There is minimal training data available for developing a new language model for the military communications task so direct construction of a statistical language model is not feasible. For this particular application, the structure of the messages is well-known and highly-constrained. A finite-state network, which requires a relatively small amount of data to generate and which explicitly represents the constraints on the task, will replace the default language model in the recognition engine. Comparisons of the networks will be presented to show the effects of using a well-tuned model on the task.

Recommended Citation

Thom, Alex and Ivan Marte. "Comparisons of Language Network Representations for a Constrained Vocabulary Speech Recognition Task Using Commercially Available Software." Undergraduate Research Symposium, Mankato, MN, April 25, 2006.
https://cornerstone.lib.mnsu.edu/urs/2006/oral-session-N/4