Symbolic Phonetic Features for Modeling of Pronunciation Variation

Document Type

Article

Publication Date

2-2007

Abstract

A significant source of variation in spontaneous speech is due to intra-speaker pronunciation changes, often realized as small feature changes, e.g., nasalized vowels or affricated stops, rather than full phone transformations. Previous computational modeling of pronunciation variation has typically involved transformations from one phone to another, in part because most speech processing systems use phone-based units. Here, a phonetic-feature-based prediction model is presented where phones are represented by a vector of symbolic features that can be on, off, unspecified or unused. Feature interaction is examined using different groupings of possibly dependent features, and a hierarchical grouping with conditional dependencies led to the best results. Feature-based models are shown to be more efficient than phone-based models, in the sense of requiring fewer parameters to predict variation while giving smaller distance and perplexity values when comparing predictions to the hand-labeled reference. A parsimonious model is better suited to incorporating new conditioning factors, and this work investigates high-level information sources, including both text (syntax, discourse) and prosody cues. Experiments show that feature-based models benefit from prosody cues, but not text, and that phone-based models do not benefit from any of the high-level cues explored here.

Department

Integrated Engineering

Publication Title

Speech Communication

DOI

10.1016/j.specom.2006.10.007

Share

COinS