RT Journal Article SR Electronic T1 HDP-Align: Hierarchical Dirichlet Process Clustering for Multiple Peak Alignment of Liquid Chromatography Mass Spectrometry Data JF bioRxiv FD Cold Spring Harbor Laboratory SP 074831 DO 10.1101/074831 A1 Joe Wandy A1 Rónán Daly A1 Simon Rogers YR 2016 UL http://biorxiv.org/content/early/2016/09/12/074831.abstract AB Matching peak features across multiple LC-MS runs (alignment) is an integral part of all LC-MS data processing pipelines. Alignment is challenging due to variations in the retention time of peak features across runs and the large number of peak features produced by a single compound in the analyte. In this paper, we propose a Bayesian non-parametric model that aligns peaks via a hierarchical cluster model using both peak mass and retention time. Crucially, this method provides confidence values in the form of posterior probabilities allowing the user to distinguish between aligned peaksets of high and low confidence. The results from our experiments on a diverse set of proteomic, glycomic and metabolomic data show that the proposed model is able to produce alignment results competitive to other widely-used benchmark methods, while at the same time, provide a probabilistic measure of confidence in the alignment results, thus allowing the possibility to trade precision and recall.Availability Our method has been implemented as a stand-alone application in Java, available for download at http://github.com/joewandy/HDP-Align.