Abstract

This thesis documents the feasibility of creating an automated system to identify mistakes and errors in audiobook performances, with the intent of helping to speed up and streamline the otherwise-lengthy process of audiobook production and editing. The primary method being explored is to compare the intended script with the audio recording of the performance, as evaluated and transcribed by an automatic speech recognition (ASR) system, and to search for discrepancies between the two. Other topics examined include the background and history of audiobooks and ASR systems, the difficulties of creating a bespoke machine-learning system for a specific performer, the strengths and limitations of freely-available tools such as Kaldi and Aeneas, as well as the possibility of using statistical and mathematical techniques to try to identify the patterns of certain unwanted artifacts within the performance. The results of the experiments conducted were promising, but fell short of full success in several ways. Most importantly, the Kaldi-based system did not achieve error recognition with a sufficiently high degree of confidence or consistency when using a speaker-independent model. Additionally, the designed purpose of many ASR systems are at odds with this project's intent, in that for most applications, they are built to accommodate errors and ignore background noise. This work demonstrates that there is potential to improve the performance and accuracy of existing ASR tools for error detection.

Advisor

Rebecca Bates

Committee Member

Rushit Dave

Committee Member

Richard Liebendorfer

Date of Degree

2023

Language

english

Document Type

Thesis

Program

Cognitive Science

Degree

Bachelor of Science (BS)

Department

Computer Information Science

College

Science, Engineering and Technology

Recommended Citation

Tietz, M. (2023). Use of speech recognition tools to predict mistakes in audio book recordings [Bachelor of Science thesis, Minnesota State University, Mankato]. Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato. https://cornerstone.lib.mnsu.edu/undergrad-theses-capstones-all/7/

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Request Accessible Version

COinS

Rights Statement

All Undergraduate Theses and Capstone Projects

Use of Speech Recognition Tools to Predict Mistakes in Audio Book Recordings

Abstract

Advisor

Committee Member

Committee Member

Date of Degree

Language

Document Type

Program

Degree

Department

College

Recommended Citation

Creative Commons License

Rights Statement

Search

Author Corner

University Resources

All Undergraduate Theses and Capstone Projects

Use of Speech Recognition Tools to Predict Mistakes in Audio Book Recordings

Author

Abstract

Advisor

Committee Member

Committee Member

Date of Degree

Language

Document Type

Program

Degree

Department

College

Recommended Citation

Creative Commons License

Share

Rights Statement

Search

Author Corner

University Resources