Analysis of Pronunciation Variation and Linguistic Structure Using Decision Trees
Location
CSU Ballroom
Start Date
27-4-2009 1:00 PM
End Date
27-4-2009 3:00 PM
Student's Major
Electrical and Computer Engineering and Technology
Student's College
Science, Engineering and Technology
Mentor's Name
Rebecca Bates
Mentor's Department
Computer Information Science
Mentor's College
Science, Engineering and Technology
Description
As automatic speech recognition becomes more heavily used in applications such as computer enhanced dialog systems and automatic dictation, an improved understanding of linguistic structure and the physiology of speech becomes more important. There is great variability in how people speak depending on gender, health, age, geographic origin, and education level. All of this makes it difficult for computers to recognize speech. Typical recognition results for read speech are over 90% accurate, but for spontaneous conversational speech, which has greater pronunciation variation, results reduce to about 70%. This work examined pronunciation variation and different structures of articulatory-feature-based linguistic models to assess their usefulness for speech recognition applications. Articulatory features describe characteristics that distinguish specific speech sounds, or phonemes, and are related to the human vocal tract. Groups of phonemes can share the same features; however, each phoneme has a unique combination of them. The set of features defines a sound and makes it distinguishable from all other sounds. Using the difference between the dictionary pronunciation for words and hand-labeled pronunciations of spoken words, decision trees were built to predict feature changes. Decision trees were used because they give descriptive means for calculating conditional probabilities and help to visualize patterns between different features. Decision tree models were built to represent two different linguistic models and tested using data held out from the training process. While a long-term goal is to improve automatic speech recognition, this work contributes a more detailed understanding of ways to quantify linguistic theory and improve pronunciation modeling.
Analysis of Pronunciation Variation and Linguistic Structure Using Decision Trees
CSU Ballroom
As automatic speech recognition becomes more heavily used in applications such as computer enhanced dialog systems and automatic dictation, an improved understanding of linguistic structure and the physiology of speech becomes more important. There is great variability in how people speak depending on gender, health, age, geographic origin, and education level. All of this makes it difficult for computers to recognize speech. Typical recognition results for read speech are over 90% accurate, but for spontaneous conversational speech, which has greater pronunciation variation, results reduce to about 70%. This work examined pronunciation variation and different structures of articulatory-feature-based linguistic models to assess their usefulness for speech recognition applications. Articulatory features describe characteristics that distinguish specific speech sounds, or phonemes, and are related to the human vocal tract. Groups of phonemes can share the same features; however, each phoneme has a unique combination of them. The set of features defines a sound and makes it distinguishable from all other sounds. Using the difference between the dictionary pronunciation for words and hand-labeled pronunciations of spoken words, decision trees were built to predict feature changes. Decision trees were used because they give descriptive means for calculating conditional probabilities and help to visualize patterns between different features. Decision tree models were built to represent two different linguistic models and tested using data held out from the training process. While a long-term goal is to improve automatic speech recognition, this work contributes a more detailed understanding of ways to quantify linguistic theory and improve pronunciation modeling.
Recommended Citation
Mamchuk, Tatyana V.. "Analysis of Pronunciation Variation and Linguistic Structure Using Decision Trees." Undergraduate Research Symposium, Mankato, MN, April 27, 2009.
https://cornerstone.lib.mnsu.edu/urs/2009/poster-session-B/1