This thesis documents the feasibility of creating an automated system to identify mistakes and errors in audiobook performances, with the intent of helping to speed up and streamline the otherwise-lengthy process of audiobook production and editing. The primary method being explored is to compare the intended script with the audio recording of the performance, as evaluated and transcribed by an automatic speech recognition (ASR) system, and to search for discrepancies between the two. Other topics examined include the background and history of audiobooks and ASR systems, the difficulties of creating a bespoke machine-learning system for a specific performer, the strengths and limitations of freely-available tools such as Kaldi and Aeneas, as well as the possibility of using statistical and mathematical techniques to try to identify the patterns of certain unwanted artifacts within the performance. The results of the experiments conducted were promising, but fell short of full success in several ways. Most importantly, the Kaldi-based system did not achieve error recognition with a sufficiently high degree of confidence or consistency when using a speaker-independent model. Additionally, the designed purpose of many ASR systems are at odds with this project's intent, in that for most applications, they are built to accommodate errors and ignore background noise. This work demonstrates that there is potential to improve the performance and accuracy of existing ASR tools for error detection.


Rebecca Bates

Committee Member

Rushit Dave

Committee Member

Richard Liebendorfer

Date of Degree




Document Type



Cognitive Science


Bachelor of Science (BS)


Computer Information Science


Science, Engineering and Technology