PT - JOURNAL ARTICLE AU - Michael I. Love AU - John B. Hogenesch AU - Rafael A. Irizarry TI - Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation AID - 10.1101/025767 DP - 2015 Jan 01 TA - bioRxiv PG - 025767 4099 - http://biorxiv.org/content/early/2015/08/28/025767.short 4100 - http://biorxiv.org/content/early/2015/08/28/025767.full AB - RNA-seq technology is widely used in biomedical and basic science research. These studies rely on complex computational methods that quantify expression levels for observed transcripts. We find that current computational methods can lead to hundreds of false positive results related to alternative isoform usage. This flaw in the current methodology stems from a lack of modeling sample-specific bias that leads to drops in coverage and is related to sequence features like fragment GC content and GC stretches. By incorporating features that explain this bias into transcript expression models, we greatly increase the specificity of transcript expression estimates, with more than a four-fold reduction in the number of false positives for reported changes in expression. We introduce alpine, a method for estimation of bias-corrected transcript abundance. The method is available as a Bioconductor package that includes data visualization tools useful for bias discovery.