Machine learning for complex disease prediction: A case study for asthma dataset
Abstract
Machine learning is an alternative and powerful approach in analyzing high dimensional biological data to understand underlying complex phenomena. In this study, machine learning is used to analyze the single nucleotide polymorphism (SNP) profile of an individual in order to predict asthma occurrence at its onset stage. Machine learning algorithms such as support vector machine (SVM), k-nearest neighbors (kNN), random forest, and naïve Bayes were used on asthma case-control dataset. Results showed that SVM achieved the highest classification performance with accuracy, precision, sensitivity, and receiver operating characteristic (ROC), of 55.47%, 51.03%, 52.63%, and 0.52, respectively, which is comparable to other machine learning models. This study demonstrates the potential of machine learning to extensively analyze biological data and understand disease etiology for complex disease prediction.