Abstract
Recent developments in mass spectrometry (MS) instruments and data acquisition modes have aided multiplexed, fast, reproducible and quantitative analysis of proteome profiles, yet missing values remain a formidable challenge for proteomics data analysis. The stochastic nature of sampling in Data Dependent Acquisition (DDA), suboptimal preprocessing of Data Independent Acquisition (DIA) runs and dynamic range limitation of MS instruments impedes the reproducibility and accuracy of peptide quantification and can introduce systematic patterns of missingness that impact downstream analyses. Thus, imputation of missing values becomes an important element of data analysis. We introduce msImpute, an imputation method based on low-rank approximation, and compare it to six alternative imputation methods using public DDA and DIA datasets. We evaluate the performance of methods by determining the error of imputed values and accuracy of detection of differential expression. We also measure the post-imputation preservation of structures in the data at different levels of granularity. We develop a visual diagnostic to determine the nature of missingness in datasets based on peptides with high biological dropout rate and introduce a method to identify such peptides. Our findings demonstrate that msImpute performs well when data are missing at random and highlights the importance of prior knowledge about nature of missing values in a dataset when selecting an imputation technique.
Competing Interest Statement
The authors have declared no competing interest.
Abbreviations
- DE
- Differential Expression
- MAR
- Missing At Random
- MCAR
- Missing Completely At Random
- MNAR
- Missing Not At Random
- SVD
- Singular Value Decomposition
- FDR
- False Discovery Rate
- MS
- Mass Spectrometry
- LC-MS
- Liquid Chromatography - Mass Spectrometry
- DDA
- Data-dependent Acquisition
- DDA
- Data-independent Acquisition
- ALS
- Alternating Least Squares
- RMSE
- Root Mean Squared Error
- HBD
- High Biological Dropout
- LOQ
- limits of quantification
- PC
- Principal Components