Topology-informed image classification of imbalanced datasets
Abstract
This study explores the application of topological data analysis (TDA) to enhance image classification from an imbalanced dataset of micro-vertebrate bone fragments. We present a hybrid pipeline that integrates deep learning feature extraction, TDA-based topological representation, and gradient boosting classifiers. Our approach was evaluated on an archaeological dataset of bone fragment images from Callao Cave, Philippines. Results demonstrate that the TDA-enhanced pipeline consistently outperforms traditional machine learning methods, achieving 89-91% accuracy across LightGBM, XGBoost, and SVM classifiers. Notably, the TDA-based approach maintains robust performance (>82% accuracy) even when trained on just 10% of the available data, showing particular strength with imbalanced distributions. The findings highlight TDA as a valuable augmentation for image classification tasks in archaeological contexts, where limited and imbalanced datasets are common. This work contributes to both the methodological advancement of archaeological classification and the broader application of topological methods in machine learning.
Downloads
Issue
Entangled!
25-28 June 2025, National Institute of Physics, University of the Philippines Diliman
Please visit the SPP2025 activity webpage for more information on this year's Physics Congress.