Oral Session 10

The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)

Zhejian Wang, Minnesota State University, MankatoFollow

Location

CSU 202

Start Date

11-4-2017 1:05 PM

End Date

11-4-2017 2:05 PM

Student's Major

Integrated Engineering

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Department

Integrated Engineering

Mentor's College

Science, Engineering and Technology

Description

Automatic speech recognition (ASR) incorporates knowledge and research in linguistics, computer science and electrical engineering to develop methodologies and algorithms to translate human speech into text. In ASR, speaker adaptation refers to the technologies that adapt acoustic features to better model the variation for individual speakers. Its goal is to reduce the mismatch between individual speakers and the acoustic model in order to reduce the word error rate (WER). Adaptation strategies include long short-term memory recurrent neural networks (LSTM-RNN), maximum likelihood linear regression (MLLR) for hidden Markov models (HMM), and I-vectors. Recently, deep neural networks (DNN) have become an alternative modeling approach. Combined with older adaptation techniques, DNNs have improved ASR performance significantly. This research presents a review of adaptation techniques used with DNNs, examines existing experimental results, and investigate speaker difference in recognition using a virtual machine (VM) from the Speech Recognition Virtual Kitchen (SRVK). The SRVK toolkit is comprised of Linux-based VMs which allow users at teaching-focused institutions to participate in ASR research. The TI-digits will be used as training datasets, as they have sufficient individual speaker data to separate for adaptation experiments. WER is the main indicator for performance evaluation. The work presented includes discussion and comparison results of each strategy used with DNN, an overview of the SRVK toolkit, results of recognition performance, and potential methods to improve adaptation within the toolkit.

This document is currently not available here.

COinS

Apr 11th, 1:05 PM Apr 11th, 2:05 PM

The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)

CSU 202

Recommended Citation

Wang, Zhejian. "The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)." Undergraduate Research Symposium, Mankato, MN, April 11, 2017.
https://cornerstone.lib.mnsu.edu/urs/2017/oral-session-10/2

Oral Session 10

The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)

Location

Start Date

End Date

Student's Major

Student's College

Mentor's Name

Mentor's Department

Mentor's College

Description

Recommended Citation

Search

Author Corner

Links

University Resources

Oral Session 10

The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)

Presenter Information

Location

Start Date

End Date

Student's Major

Student's College

Mentor's Name

Mentor's Department

Mentor's College

Description

Share

Recommended Citation

Search

Author Corner

Links

University Resources