GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly

  1. Anthony T. Papenfuss1,2,8,9,10
  1. 1Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, 3052, Australia;
  2. 2Department of Medical Biology, University of Melbourne, Parkville, Victoria, 3010, Australia;
  3. 3Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria, 3010, Australia;
  4. 4Translational Genomics and Epigenomics Laboratory, Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria, 3084, Australia;
  5. 5Department of Pathology, University of Melbourne, Parkville, Victoria, 3010, Australia;
  6. 6School of Cancer Medicine, La Trobe University, Bundoora, Victoria, 3084, Australia;
  7. 7Department of Medicine, University of Melbourne, Austin Health, Heidelberg, Victoria, 3084, Australia;
  8. 8Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria, 3010, Australia;
  9. 9Peter MacCallum Cancer Centre, Victorian Comprehensive Cancer Centre, Melbourne, 3000, Australia;
  10. 10Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, 3010, Australia
  • Corresponding author: papenfuss{at}wehi.edu.au
  • Abstract

    The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.222109.117.

    • Freely available online through the Genome Research Open Access option.

    • Received February 24, 2017.
    • Accepted September 14, 2017.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server