Predicting cubic and orthorhombic crystal band gaps via two-stage support-vector machine framework
Abstract
Electronic band gap is a fundamental property determining the suitability of materials for modern electronic applications. While density functional theory (DFT) is the standard for calculating these values, low-cost functionals underestimate experimental band gaps by approximately 40%. This study aims to replicate these DFT-calculated results using machine learning with high fidelity. We developed a two-stage support-vector machine framework to predict the band gaps of cubic and orthorhombic crystal systems using a dataset from the Materials Project. Structural and elemental descriptors were employed as features after rigorous pre-processing to eliminate data leakage. Our methodology implements a radial basis function kernel, first utilizing a support-vector classifier to distinguish metallic from non-metallic phases, followed by a support-vector regressor to predict specific gap values. Results demonstrate exceptional predictive performance, with the classification stage achieving near-perfect accuracy and the regression stage yielding R2 values exceeding 0.99 for both crystal systems. Specifically, the model achieved a testing mean absolute error of 0.0343 eV for cubic and 0.0420 eV for orthorhombic structures. These errors approach the room-temperature thermal energy scale (kBT ≈ 0.0259 eV), confirming the model's ability to serve as a high-precision alternative. Overall, this pipeline provides a robust alternative to traditional DFT simulations, enabling efficient high-throughput material screening within the established generalized gradient approximation functionals baseline.



