Abstract
Background Single-cell multi-omics technologies allow the profiling of different data modalities from the same cell. However, while isolated modalities only capture one view of the total information of a biological cell, an integrative analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies have been developed for multi-omics single-cell analysis. Nevertheless, it is unclear if current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter finetuning.
Results We designed LIBRA, a Neural Network based framework, to learn a translation between paired multi-omics profiles such that a shared latent space is constructed. LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the latent space. When assessing the predictive power across data modalities, LIBRA outperforms existing tools. Finally, considering the importance of hyperparameters, we implemented an adaptative-tuning strategy, labelled aLIBRA, in the LIBRA package. As expected, adaptive parameter optimization significantly boosts the performance of learning predictive models from paired datasets. Additionally, aLIBRA provides parameter combinations balancing the integrative and predictive tasks.
Conclusions LIBRA is a versatile tool, uniquely targeting both integration and prediction tasks of Single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme. Furthermore, LIBRA is freely available as R and Python libraries (https://github.com/TranslationalBioinformaticsUnit/LIBRA).
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
- Analysis considering additional data-sets. - Includes analysis of the computational time requirements. - Including automatic fine-tuning.
Abbreviations
- NN
- Neural networks
- GEO
- Gene Expression Omnibus
- SLS
- Shared latent space
- PJI
- Pairwise Jaccard Index
- DS
- Data set
- predRNA
- Predicted RNA
- predATAC
- Predicted ATAC
- MSE
- Mean squared error
- SNARE-seq
- Droplet based technology to profile chromatin accessibility and gene expression from the same cells.
- CITE-seq
- Qualitative information over gene expression and surface proteins with available antibodies on a single cell level.
- Paired-seq
- Combinatorial indexing strategy to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription.
- SHARE-seq
- Strategy that uses three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multi-omic profiling from the same single cells.
- 10X
- 10X Genomics Single-Cell Multiomics Solutions
- CITE-seq
- Method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level.
- scNMT-seq
- Method to look at methylation (CpG) and chromatin accessibility (GpC).