DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification

Clémentine Decamps; Alexis Arnaud; Florent Petitprez; Mira Ayadi; Aurélia Baurès; Lucile Armenoult; HADACA consortium; Rémy Nicolle; Richard Tomasini; Aurélien de Reyniès; Jérôme Cros; Yuna Blum; Magali Richard

doi:10.1101/2020.06.06.131482

Abstract

Motivation Quantification of tumor heterogeneity is essential to better understand cancer progressionand to adapt therapeutic treatments to patient specificities.

Results We present DECONbench, a web-based application to benchmark computational methods dedicated to quantify of cell-type heterogeneity in cancer. DECONbench includes benchmark datasets, computational methods and performance evaluation. It allows submission of new methods.

Availability and implementation DECONbench is hosted on the open source codalab competition platform. It is freely available at: https://competitions.codalab.org/competitions/23660.

1 Introduction

Since the recent development of high-throughput sequencing technologies, cancer research has focused on characterizing the genetic and epigenetic changes that contribute to the disease. However, these studies often neglect the fact that tumors are constituted of cells with different identities and origins. Quantification of tumor heterogeneity is of utmost interest to the bioinformatics and biomedical research community as multiple components of a tumor are key factors in tumor progression, clinical outcome and response to therapy. Advanced microdissection techniques to isolate a population of interest from heterogeneous clinical tissue samples are not feasible in daily practice. Single-cell technologies, while promising, have intensive protocols and require expensive and specialized resources, currently hindering their establishment in a clinical setting (Avila Cobos et al., 2018). An alternative is to rely on deconvolution methods that infer cell-type composition in silico. Bioinformatics tools to assess the different cell populations from bulk transcriptome (Becht et al., 2016; Nazarov et al., 2019; Blum et al., 2019) and methylome (Houseman et al., 2014; Lutsik et al., 2017; Decamps et al., 2020) samples have been recently developed, including reference-based and reference-free methods. This offers several advantages, notably the possibility to re-analyse a large number of publicly available datasets. However, their efficacy assessment has been impaired by the lack of dedicated benchmarking studies, which is often the case for methodological developments (Ellrott et al., 2019). Here we present DECONbench, an innovative public digital benchmarking platform, open source and freely available, aiming to compare deconvolution methods for tumor heterogeneity quantification. It includes benchmarking datasets, state-of-the-art computational methods and it enables the submission of new methods.

2 The benchmarking platform infrastructure

DECONbench takes advantage of the Codalab web-based platform (https://competitions.codalab.org/) to provide a common software environment for evaluating deconvolution methods. Users submit a full R program that is applied to the provided benchmark datasets and compared to the ground truth. DECONbench outputs a performance score displayed on the leaderboard (Fig. 1).

Fig. 1. Overview of the DECONbench platform.

The platform proposes a set of 8 reference deconvolution methods and benchmark datasets consisting of paired methylome and transcriptome of in silico mixtures from pancreatic tumors. The platform outputs the performance of each method on a leaderboard and provides plots for deeper evaluation. New methods are automatically compared to the existing ones.

3 Current benchmark datasets and methods

We have generated transcriptome and methylome benchmarking datasets from primary cells from pancreatic tumors and sorted cells from public datasets (supplementary Fig. 1). Heterogeneous samples were simulated using mixture of individual cell populations. Sample compositions are not accessible to the users. Methods are evaluated on their accuracy to estimate the cell-type proportion per sample from transcriptome and/or methylome heterogeneous profiles. The discriminating metric is the mean absolute error between the estimate and the ground truth. We recently used this unreleased dataset in a data challenge (https://tinyurl.com/hadaca2019). The best methods collectively discovered during the challenge are provided on DECONbench as reference methods (supplementary Table 1). The methods consist of various statistical approaches and novel strategies integrating transcriptome and methylome.

4 Usage

DECONbench is designed to execute methods developed in R statistical programming language, using a docker image provided on our website. A list of R packages installed on the docker image is as well provided. Users need to: i) register to DECONbench on the participate tab, download the starting kit and the public datasets ii) develop an algorithm according to DECONbench guidelines and iii) submit their code (a zip file) in the participate tab. Submitted algorithms are evaluated on DECONbench datasets and benchmarked with the other methods. Resulting scores appear on the leaderboard and a fact sheet is edited summarizing the performances (Supplementary Fig. 2). Importantly, users can choose whether they want their algorithm to be public or private.

5 Perspectives

This platform is a unique opportunity to compare the performance of deconvolution methods on different omics data. It can be used to assess the performance of newly developed methods by applying them on high quality benchmark datasets in a user-friendly fashion. The structure of DECONbench is open to evolution. Work is ongoing to generate new benchmark datasets that will be added to the platform. In the near future, we plan to expand the usability of DECONbench by offering the possibility for owners of biological data to upload them. This extended functionality will allow health professionals and biologists to benefit from all developed methods to gain insights regarding the composition of their samples.

Funding

The research leading to these results was supported by Univ. Grenoble-Alpes via the Grenoble Alpes Data Institute [MR, AA] (ANR-15-IDEX-02), EIT Health Campus HADACA and COMETH programs [MR, YB],activities 19359 and 20377 and the Ligue Nationale Contre le Cancer.

Other fundings: Concerted Research Actions from Ghent University (BOF.DOC.2017.0026.01 [FAC]), South-Eastern Norway Regional Health Authority (project number 2019030 [MJ]), European IMI IMMUCAN project [NS], European Union's Horizon 2020 program (grant 826121, iPC project, [JM]).

HADACA (Health Data Challenge) Consortium

Nicolas Alcala⁶, Alexis Arnaud², Francisco Avila Cobos⁷, Luciana Batista⁸, Anne-Françoise Batto⁹, Yuna Blum³, Florent Chuffart¹⁰, Jérôme Cros⁵, Clémentine Decamps¹, Lara Dirian¹¹, Daria Doncevic¹², Ghislain Durif¹³, Silvia Yahel Bahena Hernandez¹⁴, Milan Jakobi¹⁰, Rémy Jardillier¹⁵, Marine Jeanmougin¹⁶, Paulina Jedynak¹⁰, Basile Jumentier¹, Aliaksandra Kakoichankava¹⁷, Maria Kondili¹⁸, Jing Liu¹⁹, Tiago Maie²⁰, Jules Marécaille¹¹, Jane Merlevede²¹, Maxime Meylan³²², Petr Nazarov²³, Kapil Newar¹, Karl Nyrén¹⁴, Florent Petitprez³, Claudio Novella Rausell¹⁴, Magali Richard¹, Michael Scherer²⁴, Nicolas Sompairac²¹, Katharina Waury¹⁴, Ting Xie²⁵, Markella-Achilleia Zacharouli¹⁴

Affiliations:

¹Laboratory TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, Grenoble, France

²Data Institute, Univ. Grenoble Alpes, Grenoble,France

³Programme Cartes d’Identité des Tumeurs (CIT), Ligue Nationale Contre le Cancer, Paris, France

⁴INSERM U1068 CRCM, Marseille, France

⁶Section of Genetics, International Agency for Research on Cancer (IARC-WHO), Lyon, France

⁷Center for Medical Genetics Ghent, Department of Biomolecular Medicine, Ghent University, Ghent, Belgium

⁸Innate Pharma, Marseille, France

⁹Equipe Cancer et Immunité- INSERM Centre de Recherche des Cordeliers, Paris, France

¹⁰Institute for Advanced Biosciences, CNRS UMR 5309, Inserm, U1209, Univ. Grenoble Alpes, F-38700 Grenoble, France

¹¹Verteego, Paris, France

¹²Health Data Science Unit, BioQuant Center and Medical Faculty Heidelberg, Germany

¹³Université de Montpellier, CNRS, IMAG UMR 5149, Montpellier, France

¹⁴Uppsala University, SE-751 05, Uppsala, Sweden

¹⁵University Grenoble Alpes, CEA, INSERM, IRIG, Biology of Cancer Infection UMR_S 1036, 38000 Grenoble, France & University Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Institute of Engineering University Grenoble Alpes, 38000 Grenoble, France

¹⁶Department of Molecular Oncology, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital - Oslo, Norway

¹⁷Vitebsk State Medical University & NatiVita, Vitebsk, Belarus

¹⁸Centre de Recherche de St. Antoine, Paris, AP-HP

¹⁹Institut Curie, PSL Research University, Sorbonne Universités, UPMC Université Paris 06, CNRS, UMR144, Equipe Labellisée Ligue contre le Cancer, 75005 Paris, France

²⁰Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany

²¹Institut Curie, PSL Research University, Mines Paris Tech, Inserm, U900, F−75005, Paris, France

²²INSERM U1138 Centre de Recherche des Cordeliers, France

²³Quantitative Biology Unit, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg

¹⁴Uppsala University, SE-751 05, Uppsala, Sweden

²⁴Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany

²⁵Centre de Recherche en Cancérologie de Toulouse, Inserm UMR 1037, F-31037, Toulouse, France

Acknowledgements

We thank all members of the HADACA consortium for helpful discussion and contributions during the HADACA data challenge 2ndedition (November 2019, Aussois, France).

We also thank Daniel Jost and the members of the BCM team for inspiring discussions during regular joint group meetings. We are grateful to the Codalab data challenge open source platform. The authors gratefully acknowledge the EpiMed core facility for their support and assistance in this work. This work is part of the national program Cartes d'Identité des Tumeurs supported by the Ligue Nationale Contre le Cancer. Where authors are identified as personnel of the International Agency for Research on Cancer / World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer / World Health Organization.

Footnotes

↵‡ HADACA consortium authors: N.Alcala, A. Arnaud, F. Avila Cobos, Luciana Batista, A-F. Batto, Y. Blum, F. Chuffart, J. Cros, C. Decamps, L. Dirian, D. Doncevic, G. Durif,SY Bahena Hernandez, M. Jakobi, R. Jardillier, M. Jeanmougin, P. Jedynak, B. Jumentier, A. Kakoichankava, Maria Kondili, J. Liu, T.Maie, J. Marécaille, J. Merlevede, M. Meylan, P. Nazarov, K. Newar, K. Nyrén, F. Petitprez, C. Novella Rausell, M. Richard, M. Scherer,N. Sompairac, K. Waury, T. Xie & M-A. Zacharouli. A full list of Consortium members and their affiliations is available at the end of the text
↵* The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
↵# The authors wish it to be known that, in their opinion, the last two authors should be regarded as joint Last Authors
https://cancer-heterogeneity.github.io/deconbench.html

References

↵
Avila Cobos, F.et al. (2018). Computational deconvolution of transcriptomics data from mixed cell populations.Bioinformatics,34(11), 1969–1979.
OpenUrl
↵
Becht, E.et al. (2016). Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome biology,17(1), 218.
OpenUrl CrossRef PubMed
↵
Blum, Y.et al. (2019).Dissecting heterogeneity in malignant pleural mesothelioma through histo-molecular gradients for clinical applications. Nature communications,10(1), 1–12.
OpenUrl
↵
Decamps, C.et al. (2020).Guidelines for cell-type heterogeneity quantification based on a comparative analysis of reference-free dna methylation deconvolution software. BMC bioinformatics,21(1), 16.
OpenUrl
↵
Ellrott, K.et al. (2019). Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges. Genome biology,20(1), 1–9.
OpenUrl CrossRef
↵
Houseman, E. A.et al. (2014). Reference-free cell mixture adjustments in analysis of dna methylation data. Bioinformatics,30(10), 1431–1439.
OpenUrl CrossRef PubMed
↵
Lutsik, P.et al. (2017). Medecom: discovery and quantification of latent components of heterogeneous methylomes. Genome biology,18(1), 55.
OpenUrl
↵
Nazarov, P. V.et al. (2019). Deconvolution of transcriptomes and mirnomes by independent component analysis provides insights into biological processes and clinical outcomes of melanoma patients. BMCmedical genomics,12(1), 132
OpenUrl