Abstract

Machine learning (ML) techniques developed in computer science have revolutionized nearly every sector of industry. Despite the prevalence and usefulness of ML, students outside of computer science rarely receive training in ML. Students frequently receive training in statistical analysis, often using the software package R, which is free, open source, and has additional downloadable modules. A popular module is the ML package caret, which contains 238 different ML algorithms, each with 0-9 hyperparameters. caret is powerful, flexible, and provides consistent syntax across algorithms. In the hands of an experienced practitioner, this tunability is welcomed and can increase accuracy. However, when used by a beginning student, the large number of options can become overwhelming and hinder their learning. babyCaret is an ML package for R developed in this work to reduce this complexity and support student learning while matching caret's syntax. The goal is to teach users about the application of ML directly inside their familiar R environment. baby- Caret contains integrated tutorials activated by a function call. These tutorials teach users about the application, interpretation, and technical aspects of four algorithms: k-nearest neighbors, apriori, k-prototypes, and decision tree. The k-nearest neighbors implementation was designed by the author. Decision trees are computed via the rpart R package for its visualization capabilities, k-prototypes uses a modified implementation by Gero Szepannek, and apriori uses the arules R package. A limited number of hyperparameters are available for tuning. The rest have either been automated or fixed to their simplest configuration to reduce complexity, which may affect accuracy. babyCaret is an open-source teaching tool, a simple and functional beginner ML package, and a stepping-stone to the more complex caret. Evaluation includes runtime comparison between k-nearest neighbors computed using caret and babyCaret, and runtime comparison between Szepannek's implementation and our modified version of k-prototypes. babyCaret's KNN implementation had lower runtime than caret's and the modified version of Szepannek's k-prototypes implementation had lower runtime than the original. Evaluation of the tutorials involved distributing them to an intelligent systems class as supplemental course content. Tutorials were successfully used in the course setting.

Advisor

Rebecca Bates

Committee Member

Dean Kelley

Committee Member

Adam Steiner

Date of Degree

2020

Language

english

Document Type

Thesis

Program

Cognitive Science

Degree Program/Certificate

Cognitive Science with an emphasis in Computer Science

Degree

Bachelor of Science (BS)

Department

Integrated Engineering

College

Science, Engineering and Technology

Share

COinS