Topology-informed image classification of imbalanced datasets

Authors

  • Chara Deanna F. Punzal ⋅ PH Data Science Program, University of the Philippines Diliman
  • Khristian G. Kikuchi ⋅ PH Data Science Program, University of the Philippines Diliman and College of Computer and Information Science, Mapúa Malayan Colleges
  • Patricia S. Cabrera ⋅ PH School of Archaeology, University of the Philippines Diliman
  • Gabrielle Anne B. Gascon ⋅ PH School of Archaeology, University of the Philippines Diliman
  • Ranzivelle Marianne Roxas-Villanueva ⋅ PH Institute of Physics, University of the Philippines Los Baños
  • Juan C. Rofes ⋅ PH School of Archaeology, University of the Philippines Diliman and Archéozoologie, Archéobotanique, Sociétés Pratiques et Environnements, CNRS/MNHN, France and National Museum of the Philippines
  • Giovanni A. Tapang ⋅ PH National Institute of Physics, University of the Philippines Diliman

Abstract

This study explores the application of topological data analysis (TDA) to enhance image classification from an imbalanced dataset of micro-vertebrate bone fragments. We present a hybrid pipeline that integrates deep learning feature extraction, TDA-based topological representation, and gradient boosting classifiers. Our approach was evaluated on an archaeological dataset of bone fragment images from Callao Cave, Philippines. Results demonstrate that the TDA-enhanced pipeline consistently outperforms traditional machine learning methods, achieving 89-91% accuracy across LightGBM, XGBoost, and SVM classifiers. Notably, the TDA-based approach maintains robust performance (>82% accuracy) even when trained on just 10% of the available data, showing particular strength with imbalanced distributions. The findings highlight TDA as a valuable augmentation for image classification tasks in archaeological contexts, where limited and imbalanced datasets are common. This work contributes to both the methodological advancement of archaeological classification and the broader application of topological methods in machine learning.

Issue

Article ID

SPP-2025-3C-04

Section

Complex Systems and Data Analytics

Published

2025-06-18

How to Cite

[1]
CDF Punzal, KG Kikuchi, PS Cabrera, GAB Gascon, RM Roxas-Villanueva, JC Rofes, and GA Tapang, Topology-informed image classification of imbalanced datasets, Proceedings of the Samahang Pisika ng Pilipinas 43, SPP-2025-3C-04 (2025). URL: https://proceedings.spp-online.org/article/view/SPP-2025-3C-04.