Abstract
Motivation The three-dimensional (3D) organization of an organism’s genome and chromosomes plays a significant role in many biological processes. Currently, methods exist for modeling chromosomal 3D structure using contact matrices generated via chromosome conformation capture (3C) techniques such as Hi-C. However, the effectiveness of these methods is inherently bottlenecked by the quality of the Hi-C data, which may be corrupted by experimental noise. Consequently, it is valuable to develop methods for eliminating the impact of noise on the quality of reconstructed structures.
Results We develop unsupervised and semi-supervised deep learning algorithms (i.e. deep convolutional autoencoders) to denoise Hi-C contact matrix data and improve the quality of chromosome structure predictions. When applied to noisy synthetic contact matrices of the yeast genome, our network demonstrates consistent improvement across metrics for contact matrix similarity including: Pearson Correlation, Spearman Correlation and Signal-to-Noise Ratio. Positive improvement across these metrics is seen consistently across a wide space of parameters to both gaussian and poisson noise functions.
Contact mrh8x5{at}mail.missouri.edu and chengji{at}missouri.edu