Fake news detection in Philippine news corpus using LDA and sentiment analysis with machine learning
Abstract
The persistent proliferation of fake news on Philippine social media platforms poses serious threats to public discourse and safety. To address this growing concern, it is critical to continuously develop automated models that effectively classify online published news as either real or fake. This study presents an alternative approach to fake news classification by integrating VADER-extracted sentiment ratio and reduced feature vectors through Linear Discriminant Analysis (LDA) on a suite of supervised machine-learning models. We trained and evaluate these models on a publicly-available corpus of real and fake news from the Philippines. Remarkably, our best-performing model achieved an accuracy of 94% using only a single feature derived from LDA applied to a combination of TF-IDF features and sentiment ratio, comparable to benchmark models in the literature. Moreover, the addition of the sentiment ratio consistently improved performance across models. Overall, this study provides valuable insights for improving fake news classifiers for Philippine-based news corpus.