Abstract

Machine learning enables a computer to learn a relationship between two assumingly related types of information. One type of information could thus be used to predict any lack of informaion in the other using the learned relationship. During the last decades, it has become cheaper to collect biological information, which has resulted in increasingly large amounts of data. Biological information such as DNA is currently analyzed by a variety of tools. Although machine learning has already been used in various projects, a flexible tool for analyzing generic biological challenges has not yet been made. The recent advancements in the DNA sequencing technologies (nextgeneration sequencing) decreased the time of sequencing a human genome from weeks to hours and the cost of sequencing a human genome from million dollars to a thousand dollars. Due to this drop in costs, a large amount of genomic data are produced. This thesis implemented the supervised and unsupervised machine learning algorithms for the genomic data. Distances are an integral part of all machine learning algorithms and hence play a central role in the analysis of most genomic data. The distance that is used for any particular task can have a profound effect on the output of the machine learning method and thus, it is essential that users ensure that the same distance method is used when comparing machine learning algorithms.

Advisor

Mezbahur Rahman

Committee Member

In-Jae Kim

Committee Member

Ruijun Zhao

Date of Degree

2019

Language

english

Document Type

Thesis

Degree

Master of Science (MS)

Department

Mathematics and Statistics

College

Science, Engineering and Technology

Recommended Citation

Jung, J. (2019). A statistical analysis and machine learning of genomic data [Master’s thesis, Minnesota State University, Mankato]. Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato. https://cornerstone.lib.mnsu.edu/etds/899/

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Statistics and Probability Commons

COinS

Rights Statement

All Graduate Theses, Dissertations, and Other Capstone Projects

A Statistical Analysis and Machine Learning of Genomic Data

Abstract

Advisor

Committee Member

Committee Member

Date of Degree

Language

Document Type

Degree

Department

College

Recommended Citation

Creative Commons License

Included in

Rights Statement

Search

Author Corner

University Resources

All Graduate Theses, Dissertations, and Other Capstone Projects

A Statistical Analysis and Machine Learning of Genomic Data

Author

Abstract

Advisor

Committee Member

Committee Member

Date of Degree

Language

Document Type

Degree

Department

College

Recommended Citation

Creative Commons License

Included in

Share

Rights Statement

Search

Author Corner

University Resources