Abstract
Hi-C is a popular technique to map three-dimensional chromosome conformation by capturing the frequency of physical contacts between pairs of genomic regions in cell populations. Although the resolution of Hi-C data is in principle only limited by the size of restriction fragments (300 bp - 4 kb), stochastic noise caused by the limited sequencing coverage forces researchers to artificially reduce the resolution of Hi-C matrices by binning the genome into 5-100 kb regions, resulting in a loss of information and biological interpretability. Here, we present the Hi-C Interaction Frequency Inference (HIFI) algorithms, a family of computational approaches that takes advantage of dependencies between neighboring restriction fragments to estimate restriction-fragment resolution interaction frequency matrices from Hi-C data. HIFI is shown to be superior to existing fixed-binning and state-of-the-art approaches via cross-validation experiments on Hi-C data and comparisons to 5C data. It also greatly improves the delineation of enhancer-promoter contacts. Finally, the high resolution afforded by HIFI reveals a new role for active regulatory regions in structuring topologically associating domains (TADs) and subTADs. By operating upstream of many Hi-C data analysis tools (e.g., normalization tools, as well as loop, TAD, and compartment predictors), HIFI will be easily inserted into a number of Hi-C data analysis pipelines, enabling a variety of high-resolution genomic organization analyses.
Availability github.com/BlanchetteLab/HIFI