RT Journal Article SR Electronic T1 Supervised and Unsupervised Classification of Cocoa Bean Origin and Processing using Liquid Chromatography-Mass Spectrometry JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.02.09.940577 DO 10.1101/2020.02.09.940577 A1 Santhust Kumar A1 Roy N. D’Souza A1 Britta Behrends A1 Marcello Corno A1 Matthias S. Ullrich A1 Nikolai Kuhnert A1 Marc-Thorsten Hütt YR 2020 UL http://biorxiv.org/content/early/2020/02/10/2020.02.09.940577.abstract AB Liquid Chromatography-Mass Spectrometry (LC-MS) provides an unprecedented wealth of metabolomics information for food products, including insights into compositional changes during food processing. Here, we employed the largest available LC-MS dataset of around 300 cocoa bean samples to assess the capability of two popular multivariate classification methods, principal component analysis (PCA) and linear decomposition analysis (LDA), for studying bean geographic origin and responsible characteristic compounds.The unsupervised method, PCA, only provides a limited separation in bean origin. Expectedly, the supervised method, LDA, provides a better origin clustering. However, it suffers from a strong, nonlinear dependence on the set of compounds used in the analysis. We show that for LDA a compound filtering criterion based on Gaussian intensity distributions dramatically enhances origin clustering of samples, thus increasing its predictive efficiency. In this form, the supervised method of LDA holds the possibility to identify potential markers of a specific origin.