Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA

Nathan Wan, David Weinberg, Tzu-Yu Liu, Katherine Niehaus, Daniel Delubac, Ajay Kannan, Brandon White, Eric A. Ariazi, Mitch Bailey, Marvin Bertin, Nathan Boley, Derek Bowen, James Cregg, Adam M. Drake, Riley Ennis, Signe Fransen, Erik Gafni, Loren Hansen, Yaping Liu, Gabriel L Otte, Jennifer Pecson, Brandon Rice, Gabriel E. Sanderson, Aarushi Sharma, John St. John, Catherina Tang, Abraham Tzou, Leilani Young, Girish Putcha, Imran S. Haque
doi: https://doi.org/10.1101/478065
Nathan Wan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David Weinberg
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tzu-Yu Liu
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katherine Niehaus
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Delubac
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ajay Kannan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brandon White
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric A. Ariazi
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mitch Bailey
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marvin Bertin
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nathan Boley
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Derek Bowen
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James Cregg
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Adam M. Drake
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Riley Ennis
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Signe Fransen
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erik Gafni
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Loren Hansen
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yaping Liu
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gabriel L Otte
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer Pecson
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brandon Rice
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gabriel E. Sanderson
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aarushi Sharma
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John St. John
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Catherina Tang
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Abraham Tzou
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Leilani Young
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Girish Putcha
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Imran S. Haque
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Blood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer.

Methods Whole-genome sequencing was performed on cfDNA extracted from plasma samples (N=546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validation to assess generalization performance.

Results In a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91-0.93) with a mean sensitivity of 85% (95% CI 83-86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance.

Conclusions A machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.

  • Abbreviations

    AUC
    area under the curve
    cfDNA
    cell-free DNA
    CHIP
    Clonal hematopoiesis of indeterminate potential
    CNV
    copy number variant
    CRC
    colorectal cancer
    ctDNA
    circulating tumor DNA
    CV
    cross-validation
    DNA
    deoxyribonucleic acid
    IU
    intended use
    ML
    machine learning
    RNA
    ribonucleic acid
    SVM
    support vector machine
    TF
    tumor fraction
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted November 24, 2018.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
    Nathan Wan, David Weinberg, Tzu-Yu Liu, Katherine Niehaus, Daniel Delubac, Ajay Kannan, Brandon White, Eric A. Ariazi, Mitch Bailey, Marvin Bertin, Nathan Boley, Derek Bowen, James Cregg, Adam M. Drake, Riley Ennis, Signe Fransen, Erik Gafni, Loren Hansen, Yaping Liu, Gabriel L Otte, Jennifer Pecson, Brandon Rice, Gabriel E. Sanderson, Aarushi Sharma, John St. John, Catherina Tang, Abraham Tzou, Leilani Young, Girish Putcha, Imran S. Haque
    bioRxiv 478065; doi: https://doi.org/10.1101/478065
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    Machine learning enables detection of early-stage colorectal cancer by whole-genome sequencing of plasma cell-free DNA
    Nathan Wan, David Weinberg, Tzu-Yu Liu, Katherine Niehaus, Daniel Delubac, Ajay Kannan, Brandon White, Eric A. Ariazi, Mitch Bailey, Marvin Bertin, Nathan Boley, Derek Bowen, James Cregg, Adam M. Drake, Riley Ennis, Signe Fransen, Erik Gafni, Loren Hansen, Yaping Liu, Gabriel L Otte, Jennifer Pecson, Brandon Rice, Gabriel E. Sanderson, Aarushi Sharma, John St. John, Catherina Tang, Abraham Tzou, Leilani Young, Girish Putcha, Imran S. Haque
    bioRxiv 478065; doi: https://doi.org/10.1101/478065

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Cancer Biology
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4654)
    • Biochemistry (10298)
    • Bioengineering (7614)
    • Bioinformatics (26189)
    • Biophysics (13445)
    • Cancer Biology (10620)
    • Cell Biology (15333)
    • Clinical Trials (138)
    • Developmental Biology (8452)
    • Ecology (12754)
    • Epidemiology (2067)
    • Evolutionary Biology (16762)
    • Genetics (11356)
    • Genomics (15400)
    • Immunology (10548)
    • Microbiology (25040)
    • Molecular Biology (10151)
    • Neuroscience (54093)
    • Paleontology (398)
    • Pathology (1655)
    • Pharmacology and Toxicology (2877)
    • Physiology (4314)
    • Plant Biology (9196)
    • Scientific Communication and Education (1579)
    • Synthetic Biology (2541)
    • Systems Biology (6752)
    • Zoology (1452)