A Virtual Machine Toolkit to Impelemtn Automatic Speech Recognition (ASR)

Location

CSU Ballroom

Start Date

20-4-2015 2:00 PM

End Date

20-4-2015 3:30 PM

Student's Major

Integrated Engineering

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Email Address

rebecca.bates@mnsu.edu

Mentor's Department

Integrated Engineering

Mentor's College

Science, Engineering and Technology

Description

Speech Recognition Virtual Kitchen (SRVK) is a toolkit created to facilitate automatic speech recognition research. It is being developed and evaluated by a team of researchers at Carnegie Mellon University, the Ohio State University and Minnesota State University, Mankato. The toolkit is comprised of state of the art Linux-based virtual machines (VMs) with pre-compiled software tools to run various ASR experiments. Our contribution to this project was to evaluate the usability of VMs for undergraduate students and users new to the field. We focused on two VMs: 1) a system using the open-source Kaldi recognizer that transcribes speech into text, and 2) the Interaction in Virtual Worlds (IVW) VM where users interact with a virtual agent. In the Kaldi experiments, we investigated changing parameters within the decoding script to generate variations in the word error rate, a measure of words incorrectly recognized by the system. In the IVW system, user interaction with the avatar was expanded by adding new words to the language model. This created a larger vocabulary shared by the user and the agent, and resulted in more types of interaction. Related activities to support the project included reviewing the repository website at speechkitchen.org to improve the user experience, running other ASR experiments with open source data sets, and providing feedback to the multi-institutional research team developing the SRVK toolkit. The work presented includes a description of the available experimental tools, results of experimental changes for both VMs, and a discussion of future enhancements expected for the SRVK.

This document is currently not available here.

Share

COinS
 
Apr 20th, 2:00 PM Apr 20th, 3:30 PM

A Virtual Machine Toolkit to Impelemtn Automatic Speech Recognition (ASR)

CSU Ballroom

Speech Recognition Virtual Kitchen (SRVK) is a toolkit created to facilitate automatic speech recognition research. It is being developed and evaluated by a team of researchers at Carnegie Mellon University, the Ohio State University and Minnesota State University, Mankato. The toolkit is comprised of state of the art Linux-based virtual machines (VMs) with pre-compiled software tools to run various ASR experiments. Our contribution to this project was to evaluate the usability of VMs for undergraduate students and users new to the field. We focused on two VMs: 1) a system using the open-source Kaldi recognizer that transcribes speech into text, and 2) the Interaction in Virtual Worlds (IVW) VM where users interact with a virtual agent. In the Kaldi experiments, we investigated changing parameters within the decoding script to generate variations in the word error rate, a measure of words incorrectly recognized by the system. In the IVW system, user interaction with the avatar was expanded by adding new words to the language model. This created a larger vocabulary shared by the user and the agent, and resulted in more types of interaction. Related activities to support the project included reviewing the repository website at speechkitchen.org to improve the user experience, running other ASR experiments with open source data sets, and providing feedback to the multi-institutional research team developing the SRVK toolkit. The work presented includes a description of the available experimental tools, results of experimental changes for both VMs, and a discussion of future enhancements expected for the SRVK.

Recommended Citation

Kodippily, Rajeev. "A Virtual Machine Toolkit to Impelemtn Automatic Speech Recognition (ASR)." Undergraduate Research Symposium, Mankato, MN, April 20, 2015.
https://cornerstone.lib.mnsu.edu/urs/2015/poster_session_B/31