Topology-informed image classification of imbalanced datasets

Authors

  • Chara Deanna F. Punzal ⋅ PH Data Science Program, University of the Philippines Diliman
  • Khristian G. Kikuchi ⋅ PH Data Science Program, University of the Philippines Diliman and College of Computer and Information Science, Mapúa Malayan Colleges Laguna
  • Patricia S. Cabrera ⋅ PH School of Archaeology, University of the Philippines Diliman
  • Gabrielle Anne B. Gascon ⋅ PH School of Archaeology, University of the Philippines Diliman
  • Ranzivelle Marianne Roxas-Villanueva ⋅ PH Institute of Physics, University of the Philippines Los Baños
  • Juan C. Rofes ⋅ PH School of Archaeology, University of the Philippines Diliman and Archéozoologie, Archéobotanique – Sociétés, Pratiques et Environnements, CNRS/MNHN, France and National Museum of the Philippines
  • Giovanni A. Tapang ⋅ PH National Institute of Physics, University of the Philippines Diliman

Abstract

This study explores the application of topological data analysis (TDA) to enhance image classification from an imbalanced dataset of micro-vertebrate bone fragments. We present a hybrid pipeline that integrates deep learning feature extraction, TDA-based topological representation, and gradient boosting classifiers. Our approach was evaluated on an archaeological dataset of bone fragment images from Callao Cave, Philippines. Results demonstrate that the TDA-enhanced pipeline consistently outperforms traditional machine learning methods, achieving 89-91% accuracy across LightGBM, XGBoost, and SVM classifiers. Notably, the TDA-based approach maintains robust performance (>82% accuracy) even when trained on just 10% of the available data, showing particular strength with imbalanced distributions. The findings highlight TDA as a valuable augmentation for image classification tasks in archaeological contexts, where limited and imbalanced datasets are common. This work contributes to both the methodological advancement of archaeological classification and the broader application of topological methods in machine learning.

Downloads

Issue

Entangled!
25-28 June 2025, National Institute of Physics, University of the Philippines Diliman

Please visit the SPP2025 activity webpage for more information on this year's Physics Congress.

Article ID

SPP-2025-3C-04

Section

Complex Systems and Data Analytics

Published

2025-06-18

How to Cite

[1]
CDF Punzal, KG Kikuchi, PS Cabrera, GAB Gascon, RM Roxas-Villanueva, JC Rofes, and GA Tapang, Topology-informed image classification of imbalanced datasets, Proceedings of the Samahang Pisika ng Pilipinas 43, SPP-2025-3C-04 (2025). URL: https://proceedings.spp-online.org/article/view/SPP-2025-3C-04.