INTEGRATE: gene fusion discovery using whole genome and transcriptome data

  1. Christopher A. Maher1,2,4,5
  1. 1McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
  2. 2Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
  3. 3Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
  4. 4Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
  5. 5Department of Biomedical Engineering, Washington University School of Medicine, St. Louis, Missouri 63110, USA
  1. Corresponding author: cmaher{at}dom.wustl.edu

Abstract

While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.

Footnotes

  • Received October 20, 2014.
  • Accepted November 9, 2015.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents

Preprint Server