A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data

  1. Joshua M. Akey1,3
  1. 1Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA;
  2. 2Department of Biostatistics and Department of Statistics, University of Washington, Seattle, Washington 98195, USA

    Abstract

    Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.

    Footnotes

    • 3 Corresponding authors.

      E-mail akeyj{at}u.washington.edu.

      E-mail daskelly{at}u.washington.edu.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.119784.110.

    • Received December 22, 2010.
    • Accepted July 12, 2011.
    | Table of Contents

    Preprint Server