PT - JOURNAL ARTICLE AU - Allen Hubbard AU - Louis Connelly AU - Shrikaar Kambhampati AU - Brad Evans AU - Ivan Baxter TI - “A novel paradigm for optimal mass feature peak picking in large scale LC-MS datasets using the ‘isopair’: isoLock, autoCredential and anovAlign” AID - 10.1101/2021.12.05.471237 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.12.05.471237 4099 - http://biorxiv.org/content/early/2021/12/08/2021.12.05.471237.short 4100 - http://biorxiv.org/content/early/2021/12/08/2021.12.05.471237.full AB - Untargeted metabolomics enables direct quantification of metabolites without apriori knowledge of their identity. Liquid chromatography mass spectrometry (LC-MS), a popular method to implement untargeted metabolomics, identifies metabolites via combined mass/charge (m/z) and retention time as mass features. Improvements in the sensitivity of mass spectrometers has increased the complexity of data produced, leading to computational obstacles. One outstanding challenge is calling metabolite mass feature peaks rapidly and accurately in large LC-MS datasets (dozens to thousands of samples) in the presence of measurement and other noise. While existing algorithms are useful, they have limitations that become pronounced at scale and lead to false positive metabolite predictions as well as signal dropouts. To overcome some of these shortcomings, biochemists have developed hybrid computational and carbon labeling techniques, such as credentialing. Credentialing can validate metabolite signals, but is laborious and its applicability is limited. We have developed a suite of three computational tools to overcome the challenges of unreliable algorithms and inefficient validation protocols: isolock, autoCredential and anovAlign. Isolock uses isopairs, or metabolite-istopologue pairs, to calculate and correct for mass drift noise across LC-MS runs. autoCredential leverages statistical features of LC-MS data to amplify naturally present 13C isotopologues and validate metabolites through isopairs. This obviates the need to artificially introduce carbon labeling. anovAlign, an anova-derived algorithm, is used to align retention time windows across samples to accurately delineate retention time windows for mass features. Using a large published clinical dataset as well as a plant dataset with biological replicates across time, genotype and treatment, we demonstrate that this suite of tools is more sensitive and reproducible than both an open source metabolomics pipelines, XCMS, and the commercial software progenesis QI. This software suite opens a new era for enhanced accuracy and increased throughput for untargeted metabolomics.Competing Interest StatementAH, SK and BH have filed an invention disclosure related to this work