Abstract
Introduction Mass spectrometry-based proteomics is actively embracing quantitative, single-cell level analyses. Indeed, recent advances in sample preparation and mass spectrometry (MS) have enabled the emergence of quantitative MS-based single-cell proteomics (SCP). While exciting and promising, SCP still has many rough edges. The current analysis workflows are custom and built from scratch. The field is therefore craving for standardized software that promotes principled and reproducible SCP data analyses.
Areas covered This special report is the first step toward the formalization and standardization of SCP data analysis. scp, the software that accompanies this work, successfully replicates one of the landmark SCP studies and is applicable to other experiments and designs. We created a repository containing the replicated workflow with comprehensive documentation in order to favor further dissemination and improvements of SCP data analyses.
Expert opinion Replicating SCP data analyses uncovers important challenges in SCP data analysis. We describe two such challenges in detail: batch correction and data missingness. We provide the current state-of-the-art and illustrate the associated limitations. We also highlight the intimate dependence that exists between batch effects and data missingness and offer avenues for dealing with these exciting challenges.
Article highlights
Single-cell proteomics (SCP) is emerging thanks to several recent technological advances, but further progress is still lagging due to the lack of principled and systematic data analysis.
This work offers a standardized solution for the processing of SCP data demonstrated by the replication of a landmark SCP work.
Two important challenges remain: batch effects and data missingness. Furthermore, these challenges are not independent and therefore need to be modeled simultaneously.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Email: laurent.gatto{at}uclouvain.be
- The title has been modified to remove the emphasis on the software and highlight the replication and computational challenges of SCP. - Typos and rephrasing. - Various clarifications. - Motivation for SCoPE2 has been moved to the introduction and improved. - New figure 3C describing the relationship between the proportion of missing data and the average abundances. - New figure 4B showing how imputation increases differences in computed quantitation values. - Mention and cite proDA. - Mention references of particular interest.
https://uclouvain-cbio.github.io/SCP.replication/articles/SCoPE2.html