Abstract
DNA methylation data-based precision tumor early diagnostics is emerging as state of the art technology, which could capture the signals of cancer occurrence 3∼5 years in advance and clinically more homogenous groups. At present, the sensitivity of early detection for many tumors is about 30%, which needs to be significantly improved. Nevertheless, based on the genome wide DNA methylation information, one could comprehensively characterize the entire molecular genetic landscape of the tumors and subtle differences among various tumors. With the accumulation of DNA methylation data, we need to develop high-performance methods that can model and consider more unbiased information. According to the above analysis, we have designed a self-attention graph convolutional network to automatically learn key methylation sites in a data-driven way for precision multi-tumor early diagnostics. Based on the selected methylation sites, we further trained a multi-class classification support vector machine. Large amount experiments have been conducted to investigate the performance of the computational pipeline. Experimental results demonstrated the effectiveness of the selected key methylation sites which are highly relevant for blood diagnosis.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
1. For cfDNA methylation data, since sequencing is adopted, the data may not contain the signals from loci that the authors are interested in. In other words, missing value might be a problem here. How was this issue handled in section 3.6? Response to Reviewer: Thank you very much for the thoughtful questions. In this study, we first extracted the 350 DEMs with SAGCN. Then multi-class classification was conducted with SVM. The extracted DEMs and related annotations can be seen in Supplementary Table 2. From Supplementary Table 2, we can know that there are 85.2% methylation sites among all annotated sites located in protein coding region, indicating that can be easily detected in cell free DNA methylation data. In this revision, we have carefully checked the expression in this manuscript and made more detailed description to help better understand. Please see #line 22-26 page 10. 2. The authors should consider better approaches for presentation using more figures and concise tables. Response to Reviewer: Thank you very much for the thoughtful suggestion. To the best of our knowledge, this is the graph neural network used in precision tumour diagnostic based on DNA methylation data for the first time. Large amounts of experiments have been performed to verify the effectiveness of the computational framework. Classic machine learning models, such as random forest classifier (RF), extremely randomized trees (ERT), decision tree (DT), Gaussian naive Bayes (GNB), and SVM, and the state-of-the-art methods, such as CGATCPred and DeepCDR were used to prove the superiority of SAGCN + SVM. In this revision, we have followed your suggestion and carefully modified our manuscript. We further carefully checked the figures and tables to make sure that they were corrected displayed. Besides, we have tried our every effort to improve the quality of our research.