Abstract
Mass spectrometry is a valued method to evaluate the metabolomics content of a biological sample. The recent advent of rapid ionization technologies such as Laser Diode Thermal Desorption (LDTD) and Direct Analysis in Real Time (DART) has rendered high-throughput mass spectrometry possible. It can now be used for large-scale comparative analysis of populations of samples. In practice, many factors resulting from the environment, the protocol, and even the instrument itself, can lead to minor discrepancies between spectra, rendering automated comparative analysis difficult. In this work, a sequence/pipeline of algorithms to correct variations between spectra is proposed. The algorithms correct multiple spectra by identifying peaks that are common to all and, from those, computes a spectrum-specific correction. We show that these algorithms increase comparability within large datasets of spectra, facilitating comparative analysis, such as machine learning.
Author summary Mass spectrometry is a widespread technology used to measure the chemical content of samples. This measurement technique is often used with biological samples for diverse applications, such as protein sequencing, metabolomic profiling or quantitative measurements. However, with the increasing throughput of mass spectrometry technologies and methodologies, the resulting datasets are becoming larger. This reveals slight shifts in mass measured by the instruments, in the case of Time-of-Flight (ToF) mass spectrometers. These shifts render spectra harder to compare and analyze in large datasets. In this article, we propose algorithms that counter mass shifts and variations in datasets of ToF mass spectra. These algorithms use no external reference points, instead calculating spectrum-specific corrections by finding peaks present in all spectra of a dataset. Applying these algorithm yields a representation of the mass spectra that can then easily be used for statistical or machine learning analyses.