Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls

View ORCID ProfileSamantha L. Wilson, View ORCID ProfileShu Yi Shen, View ORCID ProfileLauren Harmon, View ORCID ProfileJustin M. Burgener, View ORCID ProfileTim Triche Jr., View ORCID ProfileScott V. Bratman, View ORCID ProfileDaniel D. De Carvalho, View ORCID ProfileMichael M. Hoffman
doi: https://doi.org/10.1101/2021.02.12.430289
Samantha L. Wilson
1Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Samantha L. Wilson
Shu Yi Shen
1Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shu Yi Shen
Lauren Harmon
2Van Andel Institute, Grand Rapids, MI, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lauren Harmon
Justin M. Burgener
1Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
3Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Justin M. Burgener
Tim Triche Jr.
2Van Andel Institute, Grand Rapids, MI, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tim Triche Jr.
Scott V. Bratman
1Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
3Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Scott V. Bratman
Daniel D. De Carvalho
1Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
3Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Daniel D. De Carvalho
Michael M. Hoffman
1Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
3Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
4Department of Computer Science, University of Toronto, Toronto, ON, Canada
5Vector Institute for Artificial Intelligence, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael M. Hoffman
  • For correspondence: michael.hoffman@utoronto.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Cell-free methylated DNA immunoprecipitation-sequencing (cfMeDIP-seq) identifies genomic regions with DNA methylation, using a protocol adapted to work with low-input DNA samples and with cell-free DNA (cfDNA). This method allows for DNA methylation profiling of circulating tumour DNA in cancer patients’ blood samples. Such epigenetic profiling of circulating tumour DNA provides information about in which tissues tumour DNA originates, a key requirement of any test for early cancer detection. In addition, DNA methylation signatures provide prognostic information and can detect relapse. For robust quantitative comparisons between samples, immunoprecipitation enrichment methods like cfMeDIP-seq require normalization against common reference controls.

Methods To provide a simple and inexpensive reference for quantitative normalization, we developed a set of synthetic spike-in DNA controls for cfMeDIP-seq. These controls account for technical variation in enrichment efficiency due to biophysical properties of DNA fragments. Specifically, we designed 54 DNA fragments with combinations of methylation status (methylated and unmethylated), fragment length (80 bp, 160 bp, 320 bp), G+C content (35%, 50%, 65%), and fraction of CpG dinucleotides within the fragment (1/80 bp, 1/40 bp, 1/20 bp). We ensured that the spike-in synthetic DNA sequences do not align to the human genome. We integrated unique molecular indices (UMIs) into cfMeDIP-seq to control for differential amplification after enrichment. To assess enrichment bias according to distinct biophysical properties, we conducted cfMeDIP-seq solely on spike-in DNA fragments. To optimize the amount of spike-in DNA required, we added varying quantities of spike-in control DNA to sheared HCT116 colon cancer genomic DNA prior to cfMeDIP-seq. To assess batch effects, three separate labs conducted cfMeDIP-seq on peripheral blood plasma samples from acute myeloid leukemia (AML) patients.

Results We show that cfMeDIP-seq enriches for highly methylated regions, capturing ≥99.99% of methylated spike-in control fragments with ≤0.01% non-specific binding and preference for both high G+C content fragments and fragments with more CpGs. The use of 0.01 ng of spike-in control DNA total provided sufficient sequencing reads to adjust for variance due to fragment length, G+C content, and CpG fraction. Using the known amount of each spiked-in fragment, we created a generalized linear model that absolutely quantifies molar amount from read counts across the genome, while adjusting for fragment length, G+C content, and CpG fraction. Employing our spike-in controls greatly mitigates batch effects, reducing batch-associated variance to ≤ 1% of the total variance within the data.

Discussion Incorporation of spike-in controls enables absolute quantification of methylated cfDNA generated from methylated DNA immunoprecipitation-sequencing (MeDIP-seq) experiments. It mitigates batch effects and corrects for biases in enrichment due to known biophysical properties of DNA fragments and other technical biases. We created an R package, spiky, to convert read counts to picomoles of DNA fragments, while adjusting for fragment properties that affect enrichment. The spiky package is available on GitHub (https://github.com/trichelab/spiky) and will soon be available on Bioconductor.

Contact michael.hoffman{at}utoronto.ca

Competing Interest Statement

S.L.W., S.Y.S., T.T., D.D.De C., and M.M.H. are inventors on a patent application related to the synthetic spike-in controls. S.Y.S., S.V.B., and D.D.De C. are inventors on other patent applications related to this work. S.V.B. and D.D.De C. are co-founders of and provide consulting for DNAMx, Inc. S.V.B. and D.D.De C. have received research funding from Nektar Therapeutics.

Footnotes

  • In the first submission, we erroneously reported some 320 bp spike-in control fragment sequences as having 65% G+C content. These fragment sequences actually represented alternate fragment sets for 35% G+C content or 35% G+C content. We have corrected the G+C content in Supplementary Table 1 for these three fragments: (1) 1/80 bp CpG fraction corrected to 35% G+C content, alternate 2; (2) 1/40 bp CpG fraction corrected to 50% G+C content, alternate 2; (3) 1/20 bp CpG fraction corrected to 50% G+C content, alternate 2. We have also re-estimated the Gaussian generalized linear models to include these corrections (Supplementary Table 2). We have updated Figures 2-5 to reflect the updated model. We added an additional analysis to examine the difference within and between alternate fragment sets with the same fragment properties (new Table 1). We followed up different outlying genomic windows following (Table 2, previously Table 1). We removed the HOMER analysis, as we felt it did not add any value to the manuscript.

  • https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE166259

  • https://github.com/trichelab/spiky

  • https://github.com/hoffmangroup/2020spikein

  • https://doi.org/10.5281/zenodo.4533340

  • https://ega-archive.org/studies/EGAS00001005069/

  • https://doi.org/10.5281/zenodo.4568265

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted April 16, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls
Samantha L. Wilson, Shu Yi Shen, Lauren Harmon, Justin M. Burgener, Tim Triche Jr., Scott V. Bratman, Daniel D. De Carvalho, Michael M. Hoffman
bioRxiv 2021.02.12.430289; doi: https://doi.org/10.1101/2021.02.12.430289
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Sensitive and reproducible cell-free methylome quantification with synthetic spike-in controls
Samantha L. Wilson, Shu Yi Shen, Lauren Harmon, Justin M. Burgener, Tim Triche Jr., Scott V. Bratman, Daniel D. De Carvalho, Michael M. Hoffman
bioRxiv 2021.02.12.430289; doi: https://doi.org/10.1101/2021.02.12.430289

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genomics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3607)
  • Biochemistry (7581)
  • Bioengineering (5529)
  • Bioinformatics (20809)
  • Biophysics (10338)
  • Cancer Biology (7988)
  • Cell Biology (11647)
  • Clinical Trials (138)
  • Developmental Biology (6611)
  • Ecology (10217)
  • Epidemiology (2065)
  • Evolutionary Biology (13630)
  • Genetics (9550)
  • Genomics (12854)
  • Immunology (7925)
  • Microbiology (19555)
  • Molecular Biology (7668)
  • Neuroscience (42147)
  • Paleontology (308)
  • Pathology (1258)
  • Pharmacology and Toxicology (2203)
  • Physiology (3269)
  • Plant Biology (7051)
  • Scientific Communication and Education (1294)
  • Synthetic Biology (1952)
  • Systems Biology (5429)
  • Zoology (1119)