RT Journal Article SR Electronic T1 A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.05.29.122770 DO 10.1101/2020.05.29.122770 A1 Q. Giai Gianetto A1 S. Wieczorek A1 Y. Couté A1 T. Burger YR 2020 UL http://biorxiv.org/content/early/2020/05/30/2020.05.29.122770.abstract AB Motivation Quantitative mass spectrometry-based proteomics data are characterized by high rates of missing values, which may be of two kinds: missing completely-at-random (MCAR) and missing not-at-random (MNAR). Despite numerous imputation methods available in the literature, none account for this duality, for it would require to diagnose the missingness mechanism behind each missing value.Results A multiple imputation strategy is proposed by combining MCAR-devoted and MNAR-devoted imputation algorithms. First, we propose an estimator for the proportion of MCAR values and show it is asymptotically unbiased under assumptions adapted to label-free proteomics data. This allows us to estimate the number of MCAR values in each sample and to take into account the nature of missing values through an original multiple imputation method. We evaluate this approach on simulated data and shows it outperforms traditionally used imputation algorithms.Availability The proposed methods are implemented in the R package imp4p (available on the CRAN Giai Gianetto (2020)), which is itself accessible through Prostar software.Contact quentin.giaigianetto{at}pasteur.fr; thomas.burger{at}cea.frCompeting Interest StatementThe authors have declared no competing interest.