Topology-informed image classification of imbalanced datasets
Abstract
This study explores the application of topological data analysis (TDA) to enhance image classification from an imbalanced dataset of micro-vertebrate bone fragments. We present a hybrid pipeline that integrates deep learning feature extraction, TDA-based topological representation, and gradient boosting classifiers. Our approach was evaluated on an archaeological dataset of bone fragment images from Callao Cave, Philippines. Results demonstrate that the TDA-enhanced pipeline consistently outperforms traditional machine learning methods, achieving 89-91% accuracy across LightGBM, XGBoost, and SVM classifiers. Notably, the TDA-based approach maintains robust performance (>82% accuracy) even when trained on just 10% of the available data, showing particular strength with imbalanced distributions. The findings highlight TDA as a valuable augmentation for image classification tasks in archaeological contexts, where limited and imbalanced datasets are common. This work contributes to both the methodological advancement of archaeological classification and the broader application of topological methods in machine learning.