Interpreting experimental Raman spectra of amino acid mixtures via a variational autoencoder-based machine learning approach

Authors

Ming-Kang TSAI ⋅ TW Department of Chemistry, National Taiwan Normal University

Abstract

Amino acid detection holds significant practical value particularly in the fields of biochemical and medical application. Tryptophan and tyrosine are known as the precursors of serotonin and dopamine, respectively, and their metabolic abnormalities are closely associated with neurodegenerative diseases like Alzheimer's disease. This study aims to develop a rapid computational method for identifying the main components for given amino acid mixture based on Raman spectroscopy and machine learning models. We constructed the theoretical Raman spectra of 20 amino acids using density functional theory (DFT), and subsequently generated the arbitrary theoretical mixture spectrum. Machine learning classifiers, namely Random Forest (RF) and XGBoost, were found to predict the dominate amino acid among the random theoretical-mixtures with the accuracy higher than 94.7%. For the task of predicting the principal components of the experimental mixtures, namely mixing phenylalanine (Phe) and glutamic acid (Glu) in arbitrary ratios, the experimental spectrum of Glu-Phe mixture were transformed sequentially by asymmetrically reweighted-penalized least squares fitting (arPLS) and variational autoencoder (VAE) to the style of DFT spectrum. Consequently, RF and XGBoost models were found to be able to predict the leading amino acids among these transformed mixtures with 100% accuracy. We demonstrated that this workflow effectively reduces the discrepancy between theoretical and experimental Raman spectra and substantially improves the practical applicability in biomedical applications.

About the Speaker

Ming-Kang TSAI, Department of Chemistry, National Taiwan Normal University

Ming-Kang TSAI received his PhD from the University of Pittsburgh in 2005. He conducted postdoctoral research at Pacific Northwest National Laboratory and Brookhaven National Laboratory in the USA from 2005 to 2010. He joined the Department of Chemistry at National Taiwan Normal University (NTNU) in 2010 and currently serves as Director of the Intelligent Computing for Sustainable Development Research Center at NTNU. His research interests focus on the development of multiscale and cheminformatics approaches for designing novel molecules and materials.

Downloads

Published

2026-06-24

Issue

2026: Proceedings of the 44th Samahang Pisika ng Pilipinas Physics Conference

Section

Invited Presentations

Copyright Information

How to Cite

[1]

M-K Tsai, Interpreting experimental Raman spectra of amino acid mixtures via a variational autoencoder-based machine learning approach, in Proceedings of the 44th Samahang Pisika ng Pilipinas Physics Conference (Philippines, 2026), SPP-2026-INV-1E-03. URL: https://proceedings.spp-online.org/article/view/SPP-2026-INV-1E-03

BibTeX (.bib)