Abstract
Expression quantitative trait loci (eQTL) analysis links sequence variants with gene expression change and serves as a successful approach to fine-map variants causal for complex traits and understand their pathogenesis. In this work, we present an ensemble-based computational framework, EnsembleExpr, for eQTL prioritization. When trained on data from massively parallel reporter assays (MPRA), EnsembleExpr accurately predicts reporter expression levels from DNA sequence and identifies sequence variants that exhibit significant allele-specific reporter expression. This framework achieved the best performance in the “eQTL-causal SNPs” open challenge in the Fourth Critical Assessment of Genome Interpretation (CAGI 4). We envision EnsembleExpr to be a powerful resource for interpreting non-coding regulatory variants and prioritizing disease-associated mutations for downstream validation.