Analysis of Pronunciation Variation and Linguistic Structure Using Decision Trees

Location

CSU Ballroom

Start Date

27-4-2009 1:00 PM

End Date

27-4-2009 3:00 PM

Student's Major

Electrical and Computer Engineering and Technology

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Department

Computer Information Science

Mentor's College

Science, Engineering and Technology

Description

As automatic speech recognition becomes more heavily used in applications such as computer enhanced dialog systems and automatic dictation, an improved understanding of linguistic structure and the physiology of speech becomes more important. There is great variability in how people speak depending on gender, health, age, geographic origin, and education level. All of this makes it difficult for computers to recognize speech. Typical recognition results for read speech are over 90% accurate, but for spontaneous conversational speech, which has greater pronunciation variation, results reduce to about 70%. This work examined pronunciation variation and different structures of articulatory-feature-based linguistic models to assess their usefulness for speech recognition applications. Articulatory features describe characteristics that distinguish specific speech sounds, or phonemes, and are related to the human vocal tract. Groups of phonemes can share the same features; however, each phoneme has a unique combination of them. The set of features defines a sound and makes it distinguishable from all other sounds. Using the difference between the dictionary pronunciation for words and hand-labeled pronunciations of spoken words, decision trees were built to predict feature changes. Decision trees were used because they give descriptive means for calculating conditional probabilities and help to visualize patterns between different features. Decision tree models were built to represent two different linguistic models and tested using data held out from the training process. While a long-term goal is to improve automatic speech recognition, this work contributes a more detailed understanding of ways to quantify linguistic theory and improve pronunciation modeling.

This document is currently not available here.

Share

COinS
 
Apr 27th, 1:00 PM Apr 27th, 3:00 PM

Analysis of Pronunciation Variation and Linguistic Structure Using Decision Trees

CSU Ballroom

As automatic speech recognition becomes more heavily used in applications such as computer enhanced dialog systems and automatic dictation, an improved understanding of linguistic structure and the physiology of speech becomes more important. There is great variability in how people speak depending on gender, health, age, geographic origin, and education level. All of this makes it difficult for computers to recognize speech. Typical recognition results for read speech are over 90% accurate, but for spontaneous conversational speech, which has greater pronunciation variation, results reduce to about 70%. This work examined pronunciation variation and different structures of articulatory-feature-based linguistic models to assess their usefulness for speech recognition applications. Articulatory features describe characteristics that distinguish specific speech sounds, or phonemes, and are related to the human vocal tract. Groups of phonemes can share the same features; however, each phoneme has a unique combination of them. The set of features defines a sound and makes it distinguishable from all other sounds. Using the difference between the dictionary pronunciation for words and hand-labeled pronunciations of spoken words, decision trees were built to predict feature changes. Decision trees were used because they give descriptive means for calculating conditional probabilities and help to visualize patterns between different features. Decision tree models were built to represent two different linguistic models and tested using data held out from the training process. While a long-term goal is to improve automatic speech recognition, this work contributes a more detailed understanding of ways to quantify linguistic theory and improve pronunciation modeling.

Recommended Citation

Mamchuk, Tatyana V.. "Analysis of Pronunciation Variation and Linguistic Structure Using Decision Trees." Undergraduate Research Symposium, Mankato, MN, April 27, 2009.
https://cornerstone.lib.mnsu.edu/urs/2009/poster-session-B/1