Abstract
Background The majority of high-throughput single-cell molecular profiling methods quantify RNA expression; however, recent multimodal profiling methods add simultaneous measurement of genomic, proteomic, epigenetic, and/or spatial information on the same cells. The development of new statistical and computational methods in Bioconductor for such data will be facilitated by easy availability of landmark datasets using standard data classes.
Results We collected, processed, and packaged publicly available landmark datasets from important single-cell multimodal protocols, including CITE-Seq, ECCITE-Seq, SCoPE2, scNMT, 10X Multiome, seqFISH, and G&T. We integrate data modalities via the MultiAssayExperiment Bioconductor class, document and re-distribute datasets as the SingleCellMultiModal package in Bioconductor’s Cloud-based ExperimentHub. The result is single-command actualization of landmark datasets from seven single-cell multimodal data generation technologies, without need for further data processing or wrangling in order to analyze and develop methods within Bioconductor’s ecosystem of hundreds of packages for single-cell and multimodal data.
Conclusions We provide two examples of integrative analyses that are greatly simplified by SingleCellMultiModal. The package will facilitate development of bioinformatic and statistical methods in Bioconductor to meet the challenges of integrating molecular layers and analyzing phenotypic outputs including cell differentiation, activity, and disease.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
(kelly.eckenrode{at}sph.cuny.edu), (marcel.ramos{at}sph.cuny.edu), (ludwig.geistlinger{at}sph.cuny.edu), (dario.righelli{at}unipd.it), (martin.morgan{at}roswellpark.org), (ricard{at}ebi.ac.uk), (christophe.vanderaa{at}uclouvain.be), (laurent.gatto{at}uclouvain.be), (Ludwig_Geistlinger{at}hms.harvard.edu), (aedin{at}ds.dfci.harvard.edu), (stvjc{at}channing.harvard.edu)
↵* shared first authorship
↵↟ shared second authorship
1 10,149 out of the 32,738 features are zero
Abbreviations
- Abbreviation
- Definition
- 10X
- Multiome 10x Chromium Single Cell Multiome ATAC + Gene Expression
- ADT
- Antibody derived tag
- CITE-Seq
- Cellular Indexing of Transcriptomes and Epitopes by sequencing
- ECCITE-Seq
- Expanded CRISPR CITE-Seq
- G&T-seq
- Genome and Transcriptome sequencing
- HDF5
- Hierarchical data format V5
- HTO
- Hashtag oligonucleotide
- LC
- liquid chromatography
- m/z
- mass over charge
- MOFA+
- Multi-Omics Factor Analysis V2
- MS
- mass spectrometry
- MS/MS
- tandem MS
- PSM
- peptide to spectrum match
- scNMT
- single-cell Nucleosome, Methylation, and Transcriptome sequencing
- SCoPE2
- Single Cell ProtEomics by Mass Spectrometry V2
- Single-cell RNA-seq
- Single-cell RNA sequencing
- seqFISH
- sequential Fluorescence In Situ Hybridization
- TMT
- tandem mass tag
- UMI
- unique molecular identifier sequence