Abstract

Glioblastoma multiforme (GBM) remains one of the most lethal brain tumors, necessitating improved survival prediction models that integrate clinical and molecular data. This study develops a comprehensive machine learning pipeline leveraging TCGA-derived multi-omics datasets to predict binary survival outcomes. The framework integrates four classifiers Logistic Regression, Random Forest, XGBoost, and Support Vector Machine (SVM) and includes rigorous preprocessing with MCAR testing, KNN imputation, feature scaling, and hyperparameter optimization via GridSearchCV. SMOTE was applied to mitigate class imbalance and enhance model robustness for minority survival classes. Comparative performance analyses revealed Random Forest and XGBoost as top performers, achieving the highest recall and ROC–AUC (~0.80–0.81), while SVM demonstrated stability under high dimensionality, and Logistic Regression provided a strong interpretable baseline. Both ensemble models effectively captured nonlinear gene–methylation interactions and exhibited balanced precision–recall performance, emphasizing a trade-off between interpretability and predictive strength. Model interpretability through SHAP analysis identified key clinical and genomic predictors including age, IDH mutation, MGMT promoter methylation, and G-CIMP-related methylation clusters—that aligned with known biological mechanisms. These findings suggest that ensemble-based, interpretable ML frameworks can complement existing prognostic markers, offering clinically transparent decision-support tools for GBM patient risk stratification and personalized treatment planning.

Advisor

Rushit Dave

Committee Member

Mansi Bhavsar

Committee Member

Katner Samantha

Committee Member

Rajeev Bukralia

Date of Degree

2025

Language

english

Document Type

APP

Degree

Master of Science (MS)

Program of Study

Data Science

Department

Computer Information Science

College

Science, Engineering and Technology

Included in

Data Science Commons

Share

COinS
 

Rights Statement

In Copyright