Abstract
Inherited factors are thought to be responsible for a substantial fraction of many different forms of cancer. However, individual cancer risk cannot currently be well quantified by analyzing germ line DNA. Most analyses of germline DNA focus on the additive effects of single nucleotide polymorphisms (SNPs) found. Here we show that chromosomal-scale length variation of germline DNA can be used to predict whether a person will develop cancer. In two independent datasets, the Cancer Genome Atlas (TCGA) project and the UK Biobank, we could classify whether or not a patient had a certain cancer based solely on chromosomal scale length variation. In the TCGA data, we found that all 32 different types of cancer could be predicted better than chance using chromosomal scale length variation data. We found a model that could predict ovarian cancer in women with an area under the receiver operator curve, AUC=0.89. In the UK Biobank data, we could predict breast cancer in women with an AUC=0.83. This method could be used to develop genetic risk scores for other conditions known to have a substantial genetic component and complements genetic risk scores derived from SNPs.
Footnotes
We added an analysis from a second dataset, the UK Biobank. This analysis was based on similar ideas (chromosome-scale length variation) but implemented in a different way, because UK Biobank presents their data in a different way.