Explainable machine learning for multi-survey classification of astronomical objects
Abstract
Classification of astronomical objects remains a fundamental yet challenging problem in astronomy, particularly in the current era of large-scale sky surveys, where the large volume of data makes manual classification impractical. To address this challenge, machine learning has recently emerged as a pivotal tool across various physics and interdisciplinary fields. This work presents a machine learning pipeline that classifies astronomical objects into three types: stars, galaxies, and quasi-stellar objects (QSOs), using two robust decision tree classifiers, XGBoost (XGB) and Random Forest (RF). The model uses cross-matched optical, infrared, and spectroscopic data from three astronomical surveys: the Sloan Digital Sky Survey (SDSS), Wide-field Infrared Survey Explorer (WISE), and Two-Micron All Sky Survey (2MASS). SHapely Additive exPlanations (SHAP) was used in decoding feature importance and its validity under physical laws. Overall the model achieved high mean accuracy, 98-99%, and high metrics across different classes. SHAP revealed that classification was primarily driven by redshift, morphological concentration, and photometric features, verifying that results aligned with SDSS classification criteria.



