Automatic identification of small skeletal remains from Ardales Cave, Málaga, Spain using a Vision Transformer model
Abstract
A Vision Transformer (ViT) model was used to classify small zooarchaeological bone assemblages from Ardales Cave, Málaga, Spain, across three taxonomic orders (Rodentia, Lagomorpha, and others), achieving 76% validation accuracy after approximately 100 epochs. The model was fine-tuned from an ImageNet-21k-pretrained backbone on a dataset of 417 images with severe class imbalance. This work applies the ARCHAEOVISION pipeline from Philippine tropical fauna to European Paleolithic assemblages, demonstrating the generalizability of its architecture to other archaeological sites.



