Abstract
With the rapid advancement of sequencing technologies in the past decade, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from errors or artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically benchmarking sequencing platforms, library protocols, bioinformatics pipelines and for measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples, a human triple-negative breast cancer cell line and a matched normal cell line derived from B lymphocytes. We generated whole-genome (WGS) and whole-exome sequencing (WES) data using 16 NGS library preparation protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the paired reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Figure 2 revised, Author afflictions updated. Supplemental files updated