Abstract
Three-dimensional structures of the genome play an important role in regulating the expression of genes. Non-coding variants have been shown to alter 3D genome structures to activate oncogenes in cancer. However, there is currently no method to predict the effect of DNA variants on 3D structures. We propose a deep learning method, DeepMILO, to learn DNA sequence features of CTCF/cohesin-mediated loops and to predict the effect of variants on these loops. DeepMILO consists of a convolutional and a recurrent neural network, and it can learn features beyond the presence of CTCF motifs and their orientations. Application of DeepMILO on a cohort of 241 malignant lymphoma patients with whole-genome sequences revealed CTCF/cohesin-mediated loops disrupted in multiple patients. These disrupted loops contain known cancer driver genes and novel genes. Our results show mutations at loop boundaries are associated with upregulation of the cancer driver gene BCL2 and may point to a possible new mechanism for its dysregulation via alteration of 3D loop structures.
Footnotes
Fix format errors.