RT Journal Article SR Electronic T1 Investigating the performance of deep learning methods for Hi-C resolution improvement JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.01.27.477975 DO 10.1101/2022.01.27.477975 A1 Ghulam Murtaza A1 Atishay Jain A1 Madeline Hughes A1 Thulasi Varatharajan A1 Ritambhara Singh YR 2022 UL http://biorxiv.org/content/early/2022/08/05/2022.01.27.477975.abstract AB Motivation Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of coarse quality, consequently limiting the quality of the downstream analyses. Recently, multiple deep learning-based methods have been proposed to improve the quality of these data sets by increasing their resolution through upscaling. However, the existing works do not thoroughly evaluate these methods using Hi-C reproducibility metrics across different Hi-C experiments to establish their generalizability to real-world scenarios. This study extensively compares deep learning-based Hi-C upscaling methods on real-world, Hi-C datasets with lower read counts. We evaluate these models using Hi-C reproducibility metrics and downstream evaluation on data from three cell lines – GM12878 (lymphoblastoid), K562 (human erythroleukemic), and IMR90 (lung fibroblast) – obtained from different Hi-C experiments.Results We show that the existing deep-learning-based Hi-C upscaling techniques are inadequate for effectively upscaling Hi-C data with low read counts. We observe that retraining these models with real-world datasets substantially improves the performance of these methods, both Hi-C similarity metrics, and downstream tasks. However, we find that Hi-C upscaling methods struggle to capture the underlying biological signals on sparse datasets even when retrained with real-world datasets. Therefore, our study highlights the need for rigorous evaluation and identifies specific improvement areas concerning current deep learning-based Hi-C upscaling methods. We facilitate our thorough analysis by developing a computational pipeline that provides a unified platform for pre-processing, upscaling, and evaluating the generated Hi-C data quality for for users interested in benchmarking or using these methods.Availability https://github.com/rsinghlab/Investigation-of-HiC-Resolution-Improvement-MethodsAuthor Summary This paper comprehensively evaluates Hi-C deep-learning-based upscaling methods using seven datasets across four similarity metrics and two downstream evaluation methods. To facilitate this study and upcoming methods, we developed a pipeline that provides a unified interface that abstracts underlying Hi-C data representations to accelerate the development and evaluation processes. Our work finds that the existing methods trained with synthetically generated datasets tend to perform significantly worse on the datasets we have curated from real-world experiments. Moreover, in our work, we find that retraining the existing methods improves the performance of these methods on real-world datasets. Finally, even though we see significant improvements in specific metrics, we also observe comparable performance in downstream methods implying that using a single class of metrics is inadequate for estimating the biological usefulness of enhanced Hi-C datasets.Competing Interest StatementThe authors have declared no competing interest.