Adaptative Machine Translation between paired Single-Cell Multi-Omics Data

Xabier Martinez-de-Morentin; Sumeer A. Khan; Robert Lehmann; Sisi Qu; Alberto Maillo; Narsis A. Kiani; Felipe Prosper; Jesper Tegner; David Gomez-Cabrero

doi:10.1101/2021.01.27.428400

Abstract

Background Single-cell multi-omics technologies allow the profiling of different data modalities from the same cell. However, while isolated modalities only capture one view of the total information of a biological cell, an integrative analysis capturing the different modalities is challenging. In response, bioinformatics and machine learning methodologies have been developed for multi-omics single-cell analysis. Nevertheless, it is unclear if current tools can address the dual aspect of modality integration and prediction across modalities without requiring extensive parameter finetuning.

Results We designed LIBRA, a Neural Network based framework, to learn a translation between paired multi-omics profiles such that a shared latent space is constructed. LIBRA is a state-of-the-art tool when evaluating the ability to increase cell-type (clustering) resolution in the latent space. When assessing the predictive power across data modalities, LIBRA outperforms existing tools. Finally, considering the importance of hyperparameters, we implemented an adaptative-tuning strategy, labelled aLIBRA, in the LIBRA package. As expected, adaptive parameter optimization significantly boosts the performance of learning predictive models from paired datasets. Additionally, aLIBRA provides parameter combinations balancing the integrative and predictive tasks.

Conclusions LIBRA is a versatile tool, uniquely targeting both integration and prediction tasks of Single-cell multi-omics data. LIBRA is a data-driven robust platform that includes an adaptive learning scheme. Furthermore, LIBRA is freely available as R and Python libraries (https://github.com/TranslationalBioinformaticsUnit/LIBRA).

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

- Analysis considering additional data-sets. - Includes analysis of the computational time requirements. - Including automatic fine-tuning.

Abbreviations

NN: Neural networks
GEO: Gene Expression Omnibus
SLS: Shared latent space
PJI: Pairwise Jaccard Index
DS: Data set
predRNA: Predicted RNA
predATAC: Predicted ATAC
MSE: Mean squared error
SNARE-seq: Droplet based technology to profile chromatin accessibility and gene expression from the same cells.
CITE-seq: Qualitative information over gene expression and surface proteins with available antibodies on a single cell level.
Paired-seq: Combinatorial indexing strategy to simultaneously tag both the open chromatin fragments generated by the Tn5 transposases and the cDNA molecules generated from reverse transcription.
SHARE-seq: Strategy that uses three rounds of barcodes by ligating barcoded adaptors to both RNA (gene expression) and tagmented DNA (chromatin accessibility) to achieve the multi-omic profiling from the same single cells.
10X: 10X Genomics Single-Cell Multiomics Solutions
CITE-seq: Method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level.
scNMT-seq: Method to look at methylation (CpG) and chromatin accessibility (GpC).

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.