Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Reliable analysis of clinical tumor-only whole exome sequencing data

Sehyun Oh, View ORCID ProfileLudwig Geistlinger, View ORCID ProfileMarcel Ramos, Martin Morgan, View ORCID ProfileLevi Waldron, View ORCID ProfileMarkus Riester
doi: https://doi.org/10.1101/552711
Sehyun Oh
1Graduate School of Public Health and Health Policy, City University of New York, 55 W 125th St, New York, NY 10027, USA.
2Institute for Implementation Science and Population Health, City University of New York, 55 W 125th St, New York, NY 10027, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ludwig Geistlinger
1Graduate School of Public Health and Health Policy, City University of New York, 55 W 125th St, New York, NY 10027, USA.
2Institute for Implementation Science and Population Health, City University of New York, 55 W 125th St, New York, NY 10027, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ludwig Geistlinger
Marcel Ramos
1Graduate School of Public Health and Health Policy, City University of New York, 55 W 125th St, New York, NY 10027, USA.
2Institute for Implementation Science and Population Health, City University of New York, 55 W 125th St, New York, NY 10027, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Marcel Ramos
Martin Morgan
3Roswell Park Cancer Institute, 665 Elm St, Buffalo, NY 14203, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Levi Waldron
1Graduate School of Public Health and Health Policy, City University of New York, 55 W 125th St, New York, NY 10027, USA.
2Institute for Implementation Science and Population Health, City University of New York, 55 W 125th St, New York, NY 10027, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Levi Waldron
Markus Riester
4Novartis Institutes for BioMedical Research, 250 Massachusetts Ave, Cambridge, MA 02139, USA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Markus Riester
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background Allele-specific copy number alteration (CNA) analysis is essential to study the functional impact of single nucleotide variants (SNV) and the process of tumorigenesis. Most commonly used tools in the field rely on high quality genome-wide data with matched normal profiles, limiting their applicability in clinical settings.

Methods We propose a workflow, based on the open-source PureCN R/Bioconductor package in conjunction with widely used variant-calling and copy number segmentation algorithms, for allele-specific CNA analysis from whole exome sequencing (WES) without matched normals. We use The Cancer Genome Atlas (TCGA) ovarian carcinoma (OV) and lung adenocarcinoma (LUAD) datasets to benchmark its performance against gold standard SNP6 microarray and WES datasets with matched normal samples. Our workflow further classifies SNVs by somatic status and then uses this information to infer somatic mutational signatures and tumor mutational burden (TMB).

Results Application of our workflow to tumor-only WES data produces tumor purity and ploidy estimates that are highly concordant with estimates from SNP6 microarray data and matched-normal WES data. The presence of cancer type-specific somatic mutational signatures was inferred with high accuracy. We also demonstrate high concordance of TMB between our tumor-only workflow and matched normal pipelines.

Conclusion The proposed workflow provides, to our knowledge, the only open-source option for comprehensive allele-specific CNA analysis and SNV classification of tumor-only WES with demonstrated high accuracy.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted March 10, 2019.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Reliable analysis of clinical tumor-only whole exome sequencing data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Reliable analysis of clinical tumor-only whole exome sequencing data
Sehyun Oh, Ludwig Geistlinger, Marcel Ramos, Martin Morgan, Levi Waldron, Markus Riester
bioRxiv 552711; doi: https://doi.org/10.1101/552711
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Reliable analysis of clinical tumor-only whole exome sequencing data
Sehyun Oh, Ludwig Geistlinger, Marcel Ramos, Martin Morgan, Levi Waldron, Markus Riester
bioRxiv 552711; doi: https://doi.org/10.1101/552711

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4687)
  • Biochemistry (10370)
  • Bioengineering (7689)
  • Bioinformatics (26358)
  • Biophysics (13544)
  • Cancer Biology (10708)
  • Cell Biology (15449)
  • Clinical Trials (138)
  • Developmental Biology (8507)
  • Ecology (12831)
  • Epidemiology (2067)
  • Evolutionary Biology (16875)
  • Genetics (11406)
  • Genomics (15488)
  • Immunology (10631)
  • Microbiology (25242)
  • Molecular Biology (10233)
  • Neuroscience (54556)
  • Paleontology (402)
  • Pathology (1670)
  • Pharmacology and Toxicology (2898)
  • Physiology (4349)
  • Plant Biology (9262)
  • Scientific Communication and Education (1587)
  • Synthetic Biology (2558)
  • Systems Biology (6785)
  • Zoology (1470)