Enhancing an Offline Transcriber for the Speech Recognition Virtual Kitchen

Location

CSU 255

Start Date

18-4-2016 2:10 PM

End Date

18-4-2016 3:10 PM

Student's Major

Computer Information Science

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Department

Integrated Engineering

Mentor's College

Science, Engineering and Technology

Description

The Speech Recognition Virtual Kitchen (SRVK) is a web resource (http://speechkitchen.org) created to improve community research and education infrastructure for automatic speech recognition (ASR). SRVK has been developed by researchers at Carnegie Mellon University, the Ohio State University and Minnesota State University, Mankato. The resource is comprised of Linux based virtual machines (VMs) and open-source software which can be run on multiple platforms, allowing a wide range of users to participate in ASR research. This project evaluates the Eesen offline transcriber, a Kaldi-based offline transcriber that transcribes audio speech files into text files, that should be easily used by researchers not familiar with ASR software but who would benefit from transcribed data. Kaldi (http://kaldi.sourceforge.net) is an open source ASR toolkit developed at John Hopkins University, typically used for research. The speech data used in this project are interviews with SRVK users about their experiences and provide evidence for toolkit improvements. Here, we investigated changing parameters within the decoding script, improving existing acoustic models and examining ways to improve transcription of non-native speakers and the performance of speaker diarization, or segmentation of the speech signal for individual speakers. We use Sclite to calculate the word error rate (WER) and use it to evaluate our VM performance. Our goal is to reduce WER, thus enhancing the performance of the existing offline transcriber. The work presented includes an overview of the toolkit and its uses, results of transcription performance, and avenues for transcription improvement for non-native speakers of American English.

This document is currently not available here.

Share

COinS
 
Apr 18th, 2:10 PM Apr 18th, 3:10 PM

Enhancing an Offline Transcriber for the Speech Recognition Virtual Kitchen

CSU 255

The Speech Recognition Virtual Kitchen (SRVK) is a web resource (http://speechkitchen.org) created to improve community research and education infrastructure for automatic speech recognition (ASR). SRVK has been developed by researchers at Carnegie Mellon University, the Ohio State University and Minnesota State University, Mankato. The resource is comprised of Linux based virtual machines (VMs) and open-source software which can be run on multiple platforms, allowing a wide range of users to participate in ASR research. This project evaluates the Eesen offline transcriber, a Kaldi-based offline transcriber that transcribes audio speech files into text files, that should be easily used by researchers not familiar with ASR software but who would benefit from transcribed data. Kaldi (http://kaldi.sourceforge.net) is an open source ASR toolkit developed at John Hopkins University, typically used for research. The speech data used in this project are interviews with SRVK users about their experiences and provide evidence for toolkit improvements. Here, we investigated changing parameters within the decoding script, improving existing acoustic models and examining ways to improve transcription of non-native speakers and the performance of speaker diarization, or segmentation of the speech signal for individual speakers. We use Sclite to calculate the word error rate (WER) and use it to evaluate our VM performance. Our goal is to reduce WER, thus enhancing the performance of the existing offline transcriber. The work presented includes an overview of the toolkit and its uses, results of transcription performance, and avenues for transcription improvement for non-native speakers of American English.

Recommended Citation

Wang, Zhejian and Sungwoo Choi. "Enhancing an Offline Transcriber for the Speech Recognition Virtual Kitchen." Undergraduate Research Symposium, Mankato, MN, April 18, 2016.
https://cornerstone.lib.mnsu.edu/urs/2016/oral-session-13/3