%0 Journal Article %A Patrick Deelen %A Daria V. Zhernakova %A Mark de Haan %A Marijke van der Sijde %A Marc Jan Bonder %A Juha Karjalainen %A K. Joeri van der Velde %A Kristin M. Abbott %A Jingyuan Fu %A Cisca Wijmenga %A Richard J. Sinke %A Morris A. Swertz %A Lude Franke %T Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels %D 2014 %R 10.1101/007633 %J bioRxiv %P 007633 %X Given increasing numbers of RNA-seq samples in the public domain, we studied to what extent expression quantitative trait loci (eQTLs) and allele-specific expression (ASE) can be identified in public RNA-seq data while also deriving the genotypes from the RNA-seq reads. 4,978 human RNA-seq runs, representing many different tissues and cell-types, passed quality control. Even though this data originated from many different laboratories, samples reflecting the same cell-type clustered together, suggesting that technical biases due to different sequencing protocols were limited. We derived genotypes from the RNA-seq reads and imputed non-coding variants. In a joint analysis on 1,262 samples combined, we identified cis-eQTLs effects for 8,034 unique genes. Additionally, we observed strong ASE effects for 34 rare pathogenic variants, corroborating previously observed effects on the corresponding protein levels. Given the exponential growth of the number of publicly available RNA-seq samples, we expect this approach will become relevant for studying tissue-specific effects of rare pathogenic genetic variants.eQTLExpression quantitative trait locusASEAllele-specific expressionENAEuropean nucleotide archiveMAFMinor allele frequencyRNA-seqRNA-sequencingPCAPrincipal component analysisQCQuality controlLCLLymphoblastoid cell-lineFDRFalse discovery rateGoNLGenome of the NetherlandsGQPhred-scaled genotype qualityDR2Estimated dosage r2 after imputation %U https://www.biorxiv.org/content/biorxiv/early/2014/08/01/007633.full.pdf