Identifying factors influencing the science proficiency of Filipino students in the PISA 2018 using machine learning
Abstract
Filipino students performed poorly in the science literacy domain of the PISA 2018 assessment as they attained an underwhelming mean score of 357 against the 489 OECD average. With this arises the need for the exploration of key features that can be relevant to improving science proficiency among Filipino students. In this study, we developed binary classification models to classify low (Level 1b or lower) vs. high (Level 1a or higher) science proficiency among Filipino students using the extensive set of features from the PISA dataset, encompassing information about various student-level and school-level contexts. Our results showed that our best-performing model, using eXtreme Gradient Boosting (XGBoost) achieved 81.62% accuracy, 83.29% precision, 84.31% recall, 83.80% F1 score, and 81.22% AUC. We applied the SHapley Additive exPlanations (SHAP) tool to rank the feature importances in classifying science proficiency with the aim of identifying factors that may be used to identify the poorest-performing students. Among the top features, we found variables related to a student's reading skills and attitudes, disposition toward global issues, and growth mindset. These findings can point to strategies for targeting interventions and guiding policies toward improving science literacy and achievement among Filipino learners.