Event Title

Improving Speech Recognition for Interviews with Both Clean and Telephone Speech

Location

CSU 202

Start Date

11-4-2017 1:05 PM

End Date

11-4-2017 2:05 PM

Student's Major

Computer Information Science

Student's College

Science, Engineering and Technology

Mentor's Name

Rebecca Bates

Mentor's Department

Integrated Engineering

Mentor's College

Science, Engineering and Technology

Description

High quality automatic speech recognition (ASR) depends on the context of the speech. For example, cleanly recorded speech has better results than speech recorded over telephone lines. In telephone speech, the signal is band-pass filtered which limits frequencies available for computation. Consequently, the transmitted speech signal may be distorted by noise, causing higher word error rates (WER). The main goal of this research is to examine approaches to improve recognition of telephone speech while maintaining or improving results for clean speech in mixed telephone-clean speech recordings. The test data includes recorded interviews where the interviewer was near the hand-held, single-channel recorder and the interviewee was on a speaker phone with the speaker near the recorder. Available resources include the Eesen offline transcriber and two acoustic models based on clean training data or telephone training data. The Eesen offline transcriber is on a virtual machine available through the Speech Recognition Virtual Kitchen and uses an approach based on a deep recurrent neural network acoustic model and a weighted finite state transducer decoder to transcribe audio into text. This project addresses the problem of high WER that comes when telephone speech is tested on cleanly- trained models by 1) replacing the clean model with a telephone model and 2) analyzing and addressing errors through data cleaning, correcting audio segmentation, and adding words to the dictionary. These approaches reduced the overall WER. The presentation includes an overview of the transcriber and acoustic models, the methods used to improve speech recognition, and transcription results.

This document is currently not available here.

Share

COinS
 
Apr 11th, 1:05 PM Apr 11th, 2:05 PM

Improving Speech Recognition for Interviews with Both Clean and Telephone Speech

CSU 202

High quality automatic speech recognition (ASR) depends on the context of the speech. For example, cleanly recorded speech has better results than speech recorded over telephone lines. In telephone speech, the signal is band-pass filtered which limits frequencies available for computation. Consequently, the transmitted speech signal may be distorted by noise, causing higher word error rates (WER). The main goal of this research is to examine approaches to improve recognition of telephone speech while maintaining or improving results for clean speech in mixed telephone-clean speech recordings. The test data includes recorded interviews where the interviewer was near the hand-held, single-channel recorder and the interviewee was on a speaker phone with the speaker near the recorder. Available resources include the Eesen offline transcriber and two acoustic models based on clean training data or telephone training data. The Eesen offline transcriber is on a virtual machine available through the Speech Recognition Virtual Kitchen and uses an approach based on a deep recurrent neural network acoustic model and a weighted finite state transducer decoder to transcribe audio into text. This project addresses the problem of high WER that comes when telephone speech is tested on cleanly- trained models by 1) replacing the clean model with a telephone model and 2) analyzing and addressing errors through data cleaning, correcting audio segmentation, and adding words to the dictionary. These approaches reduced the overall WER. The presentation includes an overview of the transcriber and acoustic models, the methods used to improve speech recognition, and transcription results.

Recommended Citation

Choi, Sung. "Improving Speech Recognition for Interviews with Both Clean and Telephone Speech." Undergraduate Research Symposium, Mankato, MN, April 11, 2017.
http://cornerstone.lib.mnsu.edu/urs/2017/oral-session-10/4