Event Title

The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)

Location

CSU 202

Start Date

11-4-2017 1:05 PM

End Date

11-4-2017 2:05 PM

Student's Major

Integrated Engineering

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Department

Integrated Engineering

Mentor's College

Science, Engineering and Technology

Description

Automatic speech recognition (ASR) incorporates knowledge and research in linguistics, computer science and electrical engineering to develop methodologies and algorithms to translate human speech into text. In ASR, speaker adaptation refers to the technologies that adapt acoustic features to better model the variation for individual speakers. Its goal is to reduce the mismatch between individual speakers and the acoustic model in order to reduce the word error rate (WER). Adaptation strategies include long short-term memory recurrent neural networks (LSTM-RNN), maximum likelihood linear regression (MLLR) for hidden Markov models (HMM), and I-vectors. Recently, deep neural networks (DNN) have become an alternative modeling approach. Combined with older adaptation techniques, DNNs have improved ASR performance significantly. This research presents a review of adaptation techniques used with DNNs, examines existing experimental results, and investigate speaker difference in recognition using a virtual machine (VM) from the Speech Recognition Virtual Kitchen (SRVK). The SRVK toolkit is comprised of Linux-based VMs which allow users at teaching-focused institutions to participate in ASR research. The TI-digits will be used as training datasets, as they have sufficient individual speaker data to separate for adaptation experiments. WER is the main indicator for performance evaluation. The work presented includes discussion and comparison results of each strategy used with DNN, an overview of the SRVK toolkit, results of recognition performance, and potential methods to improve adaptation within the toolkit.

This document is currently not available here.

Share

COinS
 
Apr 11th, 1:05 PM Apr 11th, 2:05 PM

The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)

CSU 202

Automatic speech recognition (ASR) incorporates knowledge and research in linguistics, computer science and electrical engineering to develop methodologies and algorithms to translate human speech into text. In ASR, speaker adaptation refers to the technologies that adapt acoustic features to better model the variation for individual speakers. Its goal is to reduce the mismatch between individual speakers and the acoustic model in order to reduce the word error rate (WER). Adaptation strategies include long short-term memory recurrent neural networks (LSTM-RNN), maximum likelihood linear regression (MLLR) for hidden Markov models (HMM), and I-vectors. Recently, deep neural networks (DNN) have become an alternative modeling approach. Combined with older adaptation techniques, DNNs have improved ASR performance significantly. This research presents a review of adaptation techniques used with DNNs, examines existing experimental results, and investigate speaker difference in recognition using a virtual machine (VM) from the Speech Recognition Virtual Kitchen (SRVK). The SRVK toolkit is comprised of Linux-based VMs which allow users at teaching-focused institutions to participate in ASR research. The TI-digits will be used as training datasets, as they have sufficient individual speaker data to separate for adaptation experiments. WER is the main indicator for performance evaluation. The work presented includes discussion and comparison results of each strategy used with DNN, an overview of the SRVK toolkit, results of recognition performance, and potential methods to improve adaptation within the toolkit.

Recommended Citation

Wang, Zhejian. "The State of the Art in Speaker Adaptation for Automatic Speech Recognition (ASR)." Undergraduate Research Symposium, Mankato, MN, April 11, 2017.
https://cornerstone.lib.mnsu.edu/urs/2017/oral-session-10/2