TY - JOUR
T1 - Inferring the genetic architecture of expression variation from replicated high throughput allele-specific expression experiments
JF - bioRxiv
DO - 10.1101/699074
SP - 699074
AU - Xinwen Zhang
AU - J.J. Emerson
Y1 - 2019/01/01
UR - http://biorxiv.org/content/early/2019/07/13/699074.abstract
N2 - Gene expression variation between alleles in a diploid cell is mediated by variation in cis regulatory sequences, which usually refers to the differences in DNA sequence between two alleles near the gene of interest. Expression differences caused by cis variation has been estimated by the ratio of the expression level of the two alleles under a binomial model. However, the binomial model underestimates the variance among replicated experiments resulting in the exaggerated statistical significance of estimated cis effects and thus many false discoveries of cis-affected genes. Here we describe a beta-binomial model that estimates the cis-effect for each gene while permitting overdispersion of variance among replicates. We demonstrated with simulated null data (data without true cis-effect) that the new model fits the true distribution better, resulting in approximately 5% false positive rate under 5% significance level in all null datasets, considerably better than the 6%-40% false positive rate of the binomial model. Additional replicates increase the performance of the beta-binomial model but not of the binomial model. We also collected new allele-specific expression data from an experiment comprised of 20 replicates of a yeast hybrid (YPS128/RM11-1a). We eliminated the mapping bias problem with de novo assemblies of the two parental genomes. By applying the beta-binomial model to this dataset, we found that cis effects are ubiquitous, affecting around 70% of genes. However, most of these changes are small in magnitude. The high number of replicates enabled us a better approximation of cis landscape within species and also provides a resource for future exploration for better models.
ER -