Filipino text classification by Universal Language Model Fine-tuning (ULMFiT)

Authors

Mary June A. Ricaña ⋅ PH National Institute of Physics, University of the Philippines Diliman
Francis N. C. Paraan ⋅ PH National Institute of Physics, University of the Philippines Diliman

Abstract

One of the major obstacles in natural language processing is the scarcity of labeled data for some languages. To tackle this issue, transfer learning techniques like Universal Language Model Fine-tuning (ULMFiT) have emerged as effective solutions. This research paper explores the utilization of ULMFiT for addressing text classification challenges in the Filipino language. We follow the ULMFiT approach, involving pretraining a language model, fine-tuning it, and developing a text classifier. We independently reproduce previous results for a binary text classification task on a dataset of text in Filipino. Additionally, we demonstrate the promising performance of the ULMFiT model on a multi-label classification task, achieving hamming losses as low as ~0.10, which are comparable to previous benchmark results obtained with transformer models.

Downloads

Issue

2023: Proceedings of the 41st Samahang Pisika ng Pilipinas Physics Conference

Physics: Connecting islands of knowledge
19-21 July 2023, Del Carmen, Siargao Island

Please visit the SPP2023 activity webpage for more information on this year's Physics Congress.

Article ID

SPP-2023-PB-06

Section

Poster Session B (Complex Systems, Simulations, and Theoretical Physics)

Published

2023-07-09

How to Cite

[1]

MJA Ricaña and FNC Paraan, Filipino text classification by Universal Language Model Fine-tuning (ULMFiT), Proceedings of the Samahang Pisika ng Pilipinas 41, SPP-2023-PB-06 (2023). URL: https://proceedings.spp-online.org/article/view/SPP-2023-PB-06.