Abstract
We present LC-MS2Struct, a machine learning framework for structural annotation of small molecule data arising from liquid chromatography-tandem mass spectrometry (LC-MS2) measurements. LC-MS2Struct predicts the annotations for a set of mass spectrometry features in a sample, using the ions’ observed retention orders and the output of state-of-the-art MS2 scorers. LC-MS2Struct is based on a novel structured prediction model trained to benefit from dependencies between retention times and the mass spectral features for an improved annotation accuracy.
We demonstrate the benefit of LC-MS2Struct on a comprehensive dataset containing reference MS2 spectra and retention times of 4327 molecules from MassBank, measured using a variety of LC conditions. We show that LC-MS2Struct obtains significantly higher annotation accuracy than methods based on retention time prediction. Furthermore, LC-MS2Struct improves the annotation accuracy of state-of-the-art MS2 scorers by up to 66.1 percent and even up to 95.9 percent when predicting stereochemical variants of small molecules.
Competing Interest Statement
The authors have declared no competing interest.