Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer

Eliseos J. Mucaki, Natasha G. Caminsky, Ami M. Perri, Ruipeng Lu, Alain Laederach, Matthew Halvorsen, Joan HM. Knoll, Peter K. Rogan
doi: https://doi.org/10.1101/031419
Eliseos J. Mucaki
1Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Canada, N6A 2C1
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Natasha G. Caminsky
1Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Canada, N6A 2C1
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ami M. Perri
1Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Canada, N6A 2C1
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ruipeng Lu
2Department of Computer Science, Faculty of Science, Western University, London, Canada, N6A 2C1
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alain Laederach
3Department of Biology, Department of Biology, University of North Carolina, Chapel Hill, NC, 27599-3290
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Halvorsen
4Institute for Genomic Medicine, Columbia University Medical Center, New York, NY, 10032
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joan HM. Knoll
5Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Canada, N6A 2C1
6Cytognomix Inc. London, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter K. Rogan
1Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Canada, N6A 2C1
2Department of Computer Science, Faculty of Science, Western University, London, Canada, N6A 2C1
6Cytognomix Inc. London, Canada
7Department of Oncology, Schulich School of Medicine and Dentistry, Western University, London, Canada, N6A 2C1
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: progan@uwo.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

ABSTRACT

Background Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT).

Methods We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk patients without identified mutations in BRCA1/2. Aside from protein coding changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure.

Results 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels).

Conclusions We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.

Footnotes

  • ↵* EJM and NGC should be considered to be joint first authors.

  • LIST OF ABBREVIATIONS

    ASSEDA
    Automated Splice Site and Exon Definition Analysis
    BIC
    Breast Cancer Information Core Database
    CASAVA
    Consensus Assessment of Sequencing and Variation
    CIS-BP-RNA
    Catalog of Inferred Sequence Binding Preferences of RNA binding proteins
    CRAC
    Complex Reads Analysis and Classification
    DM2
    Domain Mapping of Disease Mutations
    ENIGMA
    Evidence-based Network for the Interpretation of Germline Mutant Alleles
    ExPASy
    Expert Protein Analysis System
    GATK
    Genome Analysis Toolkit
    HBOC
    Hereditary Breast and Ovarian Cancer
    HGMD
    Human Gene Mutation Database
    IARC
    International Agency for Research on Cancer
    IGV
    Integrative Genomics Viewer
    Indel
    Insertion/deletion
    IT
    Information theory
    LOVD
    Leiden Open Variant Database
    MGL
    Molecular Genetics Laboratory
    MLPA
    Multiplex Ligation Probe Amplification
    NGS
    Next-Generation Sequencing
    PTB
    Polypyrimidine tract binding protein
    PTT
    Protein Truncation Test
    PWM
    Position Weight Matrix
    RBBS
    RNA-Binding protein Binding Site
    RBP
    RNA-Binding Protein
    RBPDB
    RNA-Binding Protein DataBase
    Ri
    Individual information
    Rsequence
    Mean information content
    SHAPE
    Selective 2’-Hydroxyl Acylation analyzed by Primer Extension
    SNV
    Single Nucleotide Variant
    SRF
    Splicing Regulatory Factor
    SRFBS
    Splicing Regulatory Factor Binding Site
    SS
    Splice Site
    TF
    Transcription Factor
    TFBS
    Transcription Factor Binding Site
    UTR
    Untranslated Region
    VCF
    Variant Call File
    VUS
    Variants of Uncertain Significance
    ΔRi
    Change in individual information.
    Patient Sample IDs are assigned in following manner
    number-number+letter (i.e. 1–1A). If a sample was repeated, the IDs are separated by a “.” (i.e. 1–1A.2–1A)
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
    Back to top
    PreviousNext
    Posted November 11, 2015.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer
    Eliseos J. Mucaki, Natasha G. Caminsky, Ami M. Perri, Ruipeng Lu, Alain Laederach, Matthew Halvorsen, Joan HM. Knoll, Peter K. Rogan
    bioRxiv 031419; doi: https://doi.org/10.1101/031419
    Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
    Citation Tools
    A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer
    Eliseos J. Mucaki, Natasha G. Caminsky, Ami M. Perri, Ruipeng Lu, Alain Laederach, Matthew Halvorsen, Joan HM. Knoll, Peter K. Rogan
    bioRxiv 031419; doi: https://doi.org/10.1101/031419

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Genetics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (4224)
    • Biochemistry (9101)
    • Bioengineering (6749)
    • Bioinformatics (23935)
    • Biophysics (12086)
    • Cancer Biology (9491)
    • Cell Biology (13728)
    • Clinical Trials (138)
    • Developmental Biology (7614)
    • Ecology (11656)
    • Epidemiology (2066)
    • Evolutionary Biology (15476)
    • Genetics (10615)
    • Genomics (14292)
    • Immunology (9456)
    • Microbiology (22773)
    • Molecular Biology (9069)
    • Neuroscience (48840)
    • Paleontology (354)
    • Pathology (1479)
    • Pharmacology and Toxicology (2562)
    • Physiology (3822)
    • Plant Biology (8307)
    • Scientific Communication and Education (1467)
    • Synthetic Biology (2289)
    • Systems Biology (6170)
    • Zoology (1297)