Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

Elsa Bernard; Laurent Jacob; Julien Mairal; Jean-Philippe Vert

doi:10.1093/bioinformatics/btu317

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows

Bioinformatics. 2014 Sep 1;30(17):2447-55. doi: 10.1093/bioinformatics/btu317. Epub 2014 May 9.

Authors

Elsa Bernard¹, Laurent Jacob², Julien Mairal², Jean-Philippe Vert¹

Affiliations

¹ Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France.
² Mines ParisTech, Centre for Computational Biology, 77300 Fontainebleau, Institut Curie, 26 rue d'Ulm, 75248 Paris Cedex 05, INSERM U900, Paris F-75248, France, Laboratoire Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, INRA, UMR5558, Villeurbanne, France and LEAR Project-Team, INRIA Grenoble Rhône Alpes, 38330 Montbonnot, France.

Abstract

Motivation: Several state-of-the-art methods for isoform identification and quantification are based on [Formula: see text]-regularized regression, such as the Lasso. However, explicitly listing the-possibly exponentially-large set of candidate transcripts is intractable for genes with many exons. For this reason, existing approaches using the [Formula: see text]-penalty are either restricted to genes with few exons or only run the regression algorithm on a small set of preselected isoforms.

Results: We introduce a new technique called FlipFlop, which can efficiently tackle the sparse estimation problem on the full set of candidate isoforms by using network flow optimization. Our technique removes the need of a preselection step, leading to better isoform identification while keeping a low computational cost. Experiments with synthetic and real RNA-Seq data confirm that our approach is more accurate than alternative methods and one of the fastest available.

Availability and implementation: Source code is freely available as an R package from the Bioconductor Web site (http://www.bioconductor.org/), and more information is available at http://cbio.ensmp.fr/flipflop.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Exons
Gene Expression Profiling / methods*
Humans
Models, Statistical
RNA Isoforms / chemistry*
RNA Isoforms / metabolism
Sequence Analysis, RNA / methods*
Software

Substances

RNA Isoforms