Abstract

This thesis documents the feasibility of creating an automated system to identify mistakes and errors in audiobook performances, with the intent of helping to speed up and streamline the otherwise-lengthy process of audiobook production and editing. The primary method being explored is to compare the intended script with the audio recording of the performance, as evaluated and transcribed by an automatic speech recognition (ASR) system, and to search for discrepancies between the two. Other topics examined include the background and history of audiobooks and ASR systems, the difficulties of creating a bespoke machine-learning system for a specific performer, the strengths and limitations of freely-available tools such as Kaldi and Aeneas, as well as the possibility of using statistical and mathematical techniques to try to identify the patterns of certain unwanted artifacts within the performance. The results of the experiments conducted were promising, but fell short of full success in several ways. Most importantly, the Kaldi-based system did not achieve error recognition with a sufficiently high degree of confidence or consistency when using a speaker-independent model. Additionally, the designed purpose of many ASR systems are at odds with this project's intent, in that for most applications, they are built to accommodate errors and ignore background noise. This work demonstrates that there is potential to improve the performance and accuracy of existing ASR tools for error detection.

Advisor

Rebecca Bates

Committee Member

Rushit Dave

Committee Member

Richard Liebendorfer

Date of Degree

2023

Language

english

Document Type

Thesis

Program

Cognitive Science

Degree

Bachelor of Science (BS)

Department

Computer Information Science

College

Science, Engineering and Technology

Share

COinS