Abstract
3’ untranslated region (3’UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the Massively Parallel Reporter Assay for 3’UTRs (MPRAu) to sensitively assay 12,173 3’UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3’UTR function, suggesting that low-complexity sequences predominately explain 3’UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base-pair resolution, including an AU-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3’UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14, and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Introduction
Over the past two decades, thousands of variant-trait associations have been identified by genome-wide association studies (GWAS) (Buniello et al., 2019). However, GWAS have been hindered in elucidating the mechanisms of complex disease by two limitations: (1) linkage disequilibrium (LD) causes neighboring neutral polymorphisms to display similarly strong associations with causal alleles, greatly increasing the experimental burden for functional validation; (2) over 90% of associations reside in non-coding regions of the genome (Gusev et al., 2014; Maurano et al., 2012), where functional interpretation is much more difficult than in coding regions.
3’ untranslated regions (3’UTRs) contain a particularly important class of noncoding variants that can impact post-transcriptional and translational processes. Causal peripheral blood cis-expression Quantitative Trait Loci (eQTL) variants are 4-fold enriched to be in 3’UTRs, a level matching that of promoter elements (Wang et al., 2020b). Across all tissues in the Genotype-Tissue Expression project (GTEx), eQTLs in 3’UTRs are found to be 2-fold enriched, the largest enrichment amongst all non-coding regions (The GTEx Consortium, 2020). Untranslated regions harbor the largest enrichment of GWAS heritability (5-fold) of all non-coding categories except for transcription start sites, emphasizing a substantial role for post-transcriptional activities in human regulatory variation (Finucane et al., 2015).
Although 3’UTR variants are crucial to understanding human phenotypic variation, only a handful of causal 3’UTR variants have been described. They include BAFF-var in TNFSF13B associated with lupus and multiple sclerosis (Steri et al., 2017), rs13702 in LPL associated with HDL-cholesterol levels (Richardson et al., 2013), and rs12190287 in TCF21 associated with coronary artery disease (Miller et al., 2014). Each case required meta-analysis across multiple trait or population datasets and annotation with well-known regulatory factors, before being pursued with low-throughput luciferase confirmations. These factors demonstrate how current 3’UTR causal variant discovery is burdensome and highlight the need for high-throughput tools to characterize the functional impact of 3’UTR variants on gene expression.
The development of Massively Parallel Reporter Assays (MPRAs) has enabled simultaneous testing of thousands of variants for cis-regulatory activity to nominate causal variants in non-coding regions (van Arensbergen et al., 2019; Choi et al., 2020; Kircher et al., 2019; Klein et al., 2019; Liu et al., 2017; Sample et al., 2019; Tewhey et al., 2016; Ulirsch et al., 2016), Historically MPRA has been primarily applied to understand transcriptional regulation. Several studies have adapted MPRA to test 3’UTR sequences but to date, none have comprehensively characterized the impact of genetic variation in 3’UTRs (Bogard et al., 2019; Litterman et al., 2019; Oikonomou et al., 2014; Siegel et al., 2020; Vainberg Slutskin et al., 2018, 2019; Zhao et al., 2014).
Here, we developed the Massively Parallel Reporter Assay for 3’UTRs (MPRAu) to quantify allelic expression differences for thousands of 3’UTR variants simultaneously in a high-throughput, accurate, and reproducible manner. MPRAu detects distinct aspects of 3’UTR regulation, allowing us to understand general sequence features governing transcript abundance via computational modeling, pinpoint exact sequence architectures underlying variant functionality including RNA structure and RNA-binding protein (RBP) occupancy, and nominate causal variants. We utilize MPRAu to comprehensively test disease-associated, as well as evolutionarily adaptive, 3’UTR genetic variation in six human cell lines. From our functionally nominated causal variants, we also more deeply characterize two variants using CRISPR-induced allelic replacement.
Results
MPRAu reproducibly characterizes the functions of thousands of 3’UTR elements
We applied MPRAu to systematically evaluate the functional effects of genetic variation from 3’UTRs. To do so, we designed and synthesized 100 base pair (bp) oligonucleotides derived from human 3’UTRs, centered on, and differing only with respect to the variant’s ‘reference’ (ref) or ‘alternate’ (alt) alleles (Fig. 1a), for testing using MPRAu. We cloned the oligo pool into the 3’UTR of a plasmid reporter gene controlled by a moderately strong promoter. By transfecting our pool into cell lines of interest, and sequencing both the plasmid pool and mRNA from cells, we could compare steady-state RNA expression effects of each 3’UTR oligonucleotide (from either differential mRNA decay or transcription). We refer to 3’UTR oligo backgrounds that increase mRNA levels as having ‘augmenting’ effects and those that decrease transcript levels as having ‘attenuating’ effects. We also quantify differences between sequences bearing the ref versus alt allele and refer to alleles with a statistically significant ‘allelic skew’ as transcript abundance-modulating variants (tamVars). In addition, MPRAu employs several quality controls to minimize bias, including using random barcodes to ensure adequate library complexity (Methods).
We applied MPRAu to identify functional 3’UTR variants associated with human disease and evolutionary selection, testing 12,173 3’UTR variants. As the causal variant (s) underlying human traits and diseases can be amongst many variants associated with GWAS tag SNPs, we tested 3’UTR SNPs and insertion/deletions (indels) (minor allele frequency (MAF)≧5%) in LD with tag SNPs (minimum r2=0.8) from the NHGRI-EBI GWAS catalog (Welter et al., 2014), totaling 2,153 putative disease-associated variants from 1,556 independent association loci (Supplementary Table 1). We also incorporated a set of 9,325 3’UTR SNPs and indels overlapping regions identified as being under positive selection in humans (Grossman et al., 2013) (Supplementary Table 1). We also included a set of 46 rare 3’UTR variants (minor allele frequency (MAF) ≤0.01 in Europeans) that are in genes with outlier expression signatures across tissues in the Genotype-Tissue Expression (GTEx) project, which are known to have potential deleterious consequences (Li et al., 2017) (Supplementary Table 1). Lastly, across all tested variants, 3,055 were also tested under alternative allelic backgrounds to account for the potential effect of surrounding sequence variants. As genetic variants impacting traits can have tissue-specific effects (GTEx Consortium, 2017; Marbach et al., 2016; Parker et al., 2013), we characterized these variants across six diverse human cellular lines: HEK293 (embryonic kidney), HepG2 (hepatocellular carcinoma), GM12878 (lymphoblastoid), SK-N-SH (neuroblastoma), K562 (leukemia), and a primary cell line (HMEC, mammary epithelial).
We first sought to ensure that our assay was reproducibly capturing expected 3’UTR biological effects. Consistent with the dominant regulatory function of 3’UTRs to attenuate transcript expression, the predominant effect across all MPRAu-tested 3’UTRs was to decrease mRNA abundance (Fig. 1b). This effect is reproducibly observed in the strong correlation of normalized RNA read counts between experimental replicates across all cell types (average Pearson correlation (corr)=0.99) (Fig. 1c).
Confident in our assay’s ability to assess oligos with regulatory activity, we then identified tamVars altering 3’UTR functionality by comparing expression changes between alleles of the same 3’UTR (using as a threshold a Benjamini-Hochberg adjusted p-value (BH p-adj)<0.1) (Fig. 1d,e). We found 2,368 tamVars in total across all cell types, with HMEC having the largest number of tamVars (1,430), and K562 having the least (471), potentially driven by power differences in tamVar detection across different cell types (Supplementary Table 1). 104 tamVars were shared across every tested cell line (Fig. 1f). Out of the 3,055 variants tested with alternative allelic backgrounds, only 10 tamVars displayed function dependent on its allelic background. To confirm that 3’UTR tamVar effects were reflected in protein levels, we utilized orthogonal methods to measure allelic effects. We performed polysome profiling on HEK293 cells transfected with a subset of our tested library and found polysomal RNA expression highly correlated with steady-state RNA expression (Pearson corr=0.94, p=2.68×10−11,546) (Supplementary Fig. 1a). In addition, variant effects between polysomal and steady-state RNA were concordant (Pearson corr=0.97, p=1.4×10−106) (Supplementary Fig. 1b), recapitulating previous findings that steady-state RNA levels are a good proxy for protein levels (Oikonomou et al., 2014; Zhao et al., 2014). To demonstrate that variant effects are directly translatable to the protein level, we also tested a subset of tamVars and non-tamVar controls via luciferase assays (Supplementary Table 2). We observed a strong correlation between luciferase luminescence assays and the MPRAu-measure tamVar skew (Pearson corr=0.81, p=4.6×10−4) (Fig. 1g), which also highlights MPRAu’s ability to triage 3’UTR variant effects over low-throughput luciferase approaches.
MPRAu sensitively detects 3’UTR regulators and functional sequence variants
To confirm that MPRAu effects are consistent with molecular mechanisms underlying 3’UTR biology, we analyzed features across the entire oligo sequence. We found that GC content and secondary structure, as measured by the predicted minimal free energy, positively correlated with the level of attenuation (percent GC: Pearson corr=-0.24, p=6.59×10−384, mfe: Pearson corr=0.27, p=8.87×10−485) (Fig. 2a). This finding may be explained by the role of high GC content and structuredness in RBP occupancy and therefore functionality (Dominguez et al., 2018; Litterman et al., 2019).
At the finer sequence level, MPRAu captured the expected attenuation and augmentation effects from the presence of empirical RBP motifs (Bakheet et al., 2001; Holcik and Liebhaber, 1997; White et al., 2001) and predicted miRNA motifs (Friedländer et al., 2012; Friedman et al., 2009). Regulatory signatures such as AU-rich elements, the canonical Pumilio motif, and miRNAs motifs exhibited expected attenuating effects on expression (avg log2 fold change (log2FC) range −0.55 to −0.12, two-sided t-test p<1×10−30 for all attenuating factors) (Fig. 2b). Conversely, CU-rich elements demonstrated their expected augmenting effects on expression (avg log2FC=0.18, two-sided t-test p=1.58×10−14) (Fig. 2b). Variants perturbing these predicted elements abrogate the functional effect (two-sided t-test p<0.05 for all factors except for CU Rich Element and AU-Rich Class II, but skew directionality is consistent for the latter two) (Fig. 2b). Highlighting MPRAu’s ability to capture endogenous perturbation effects, variants with high allelic skews (| log2FC Skew |≥1) also are predominantly in regions with elevated in vivo RBP occupancy signatures (Van Nostrand et al., 2016) (two-sided Wilcoxon rank-sum test p=0.02, 0.011, for HepG2 and K562, respectively) (Fig. 2c). Furthermore, attenuation of transcript levels in a cell line is generally well explained by its top 10 expressed miRNAs, demonstrating the capacity of MPRAu to capture cell-type-specific effects (Fig. 2d). Lastly, an orthogonal analysis of the barcodes in our MPRAu design also validated the effects of known 3’UTR RBP motifs and provides a valuable resource of potential functional hexamers (Methods, Supplementary Fig. 2, 3, Supplementary Table 3). Together these data demonstrate the ability of MPRAu to detect the effects of known 3’UTR regulators and cell-type-specific regulatory mechanisms.
Computational modeling uncovers known and novel features of 3’UTR regulation
Having identified key 3’UTR features underlying MPRAu transcript levels, we trained predictive models for our tested sequences and compared the sensitivity and specificity of several classification models to predict 3’UTR elements with attenuated activity (< −0.5 3’UTR log2FC) (Methods). We used 34 total simple, sequence-specific annotations in our initial models, including features derived from nucleotide and dinucleotide composition, homopolymer content, and secondary structure (Methods). Our best model performed well in all cell types, with an average precision of 0.23-0.48 and an area under the receiver operating characteristic curve of 0.67-0.79 (Fig. 3a,b, Supplementary Fig. 4a-h). We also used the same features to predict augmenting expression (>0.5 log2FC) and found comparable performance (Supplementary Fig. 5a,b).
Because our initial model included only cell-type agnostic simple sequence features, we expanded our model to incorporate additional features related to cell-type-specific miRNA and RBP occupancy sites. However, these features did not improve model accuracy (Supplementary Fig. 5c,d). This finding suggests that simple, cell type-agnostic sequence features explain the predominant proportion of regulatory activity in 3’UTRs, consistent with recent studies showing that RBPs tend to bind low-complexity sequence motifs (Dominguez et al., 2018; Litterman et al., 2019). Our observation that 84.3% of the tested 3’UTR elements with significant activity in MPRAu are shared across multiple cell types is concordant with this result.
From our models that utilized only sequence-specific annotations, several low-complexity features were found to be important for prediction using SHapley Additive exPlanations (SHAP) (Lundberg and Lee, 2017). These features included homopolymer length, sequence diversity, and various features related to uracil content such as percent U/UC and UA/UU dinucleotide count (Fig. 3c). Investigating the individual effects of some of these important features, we found minimal free energy (mfe) monotonically negatively correlated with attenuation prediction. We also found percent GC to be anticorrelated with mfe and therefore positively correlated with attenuation (Fig. 3d). Surprisingly, we discovered the proportion of uracil content to have a nonlinear effect on attenuation with both low and high uracil content displaying attenuating effects. Specifically, longer uracil homopolymers demonstrate the most attenuating activity (Fig. 3d), concordant with their function as binding motifs for many RBPs (Dominguez et al., 2018; Mukherjee et al., 2019). The identification of attenuation features may be useful for constructing synthetic 3’UTRs with precise expression levels.
MPRAu allelic effects are reflected in gene expression and human phenotype changes
After demonstrating MPRAu’s ability to detect the activity of 3’UTR elements, we asked whether our tamVar allelic effects are supported by causal alleles altering gene expression or phenotypic traits. We compared tamVars in GM12878 to cell-type matched allele-specific expression (ASE) data from heterozygous individuals in the Geuvadis RNA-seq dataset (Lappalainen et al., 2013) and used this comparison to estimate the positive predictive value (PPV) for tamVars in the assay. We observed moderately strong concordance between our tamVars and endogenously observed ASE (66.1% directionality agreement, binomial p=0.011) (Fig. 4a), which corresponds to a PPV of 32%. With higher stringency ASE calls (two-sided t-test p<0.001, Methods), agreement in directionality increased to 77.5% (binomial p=6.8×10−4, PPV of 55%). We obtained a weaker concordance when overlapping with Geuvadis expression quantitative trait loci (eQTL) data (60.5% agreement in directionality, binomial p=0.22, PPV of 20.9%) (Supplementary Fig. 6a), potentially due to varying regulatory factors (i.e. RBP/miRNA concentrations) muting true allelic effects when aggregating across individuals.
Next, we expanded our analysis to compare tamVars with tissue eQTLs from GTEx (GTEx Consortium, 2013), where we obtained putative causal alleles from genetic fine-mapping (Benner et al., 2016; Ulirsch; Wang et al., 2020a) (Supplementary Table 4). Aggregating allelic effects across cell types and tissues, we observed variants with a high inferred probability of causality (posterior inclusion probability (PIP)>0.2) displayed significant agreement in directionality between aggregated MPRAu and GTEx median effect sizes (concordance=67%, binomial p=0.012) (Fig. 4b). Subsetting on known 3’UTR annotations improved the concordance (82%, binomial p=9.1×10−4). Even at a relaxed significance threshold, we still observed significant concordance (61%, binomial p=0.04), suggesting our findings are robust (Supplementary Fig. 7a). We demonstrate that MPRAu is highly specific to causal variants, as the agreement in directionality is lost at lower PIPs (Supplementary Fig. 7b). PPV estimates based on GTEx tissue-aggregated median effect sizes (34%-64%) are similar to those based on Geuvadis ASE (32%-55%).
We next looked for enrichment of MPRAu tamVars in genetically fine-mapped causal variants associated with 94 traits in the UK Biobank (Benner et al., 2016; Ulirsch; Wang et al., 2020a). We observed greater enrichment in MPRAu functionality with increasing causality (PIP) thresholds (two-sided t-test p=3.64×10−4) (Fig. 4c, Supplementary Table 4). This suggests that in addition to causing in vivo gene expression changes, the tamVars identified in our study have phenotypic consequences and that MPRAu is a powerful approach for dissecting association studies.
As an orthogonal approach to confirm the tamVars identified by MPRAu were correlated with in vivo expression changes, we also assayed a set of rare variants associated with large transcriptional effects (Li et al., 2017). When we compared MPRAu allelic skews with a rare variant functionality metric (RIVER score), we observed a significant positive correlation (Pearson corr=0.42, p=7.2×10−3) (Fig. 4d). This finding suggests MPRAu can identify functionality in common as well as rare 3’UTR variants, which modern association studies have a lower power to detect.
MPRAu SNV and deletion tiling dissects functional sequence motifs
To characterize the molecular mechanisms driving the regulatory effects of several highly significant tamVars with base-pair resolution, we created a MPRAu array that tiled single-nucleotide and deletion variants surrounding the variant (3’UTR tiling). Specifically, we extended MPRAu in two ways: we assayed 1) 5 bp non-overlapping deletions over the entire tested 3’UTR sequence, and 2) all single-nucleotide changes within +/- 10 bp of the tamVar. We carried out this deep analysis on 80 tamVars (Supplementary Table 1), of which three are described.
We found 3’UTR tiling useful to identify RBP sequence motifs. At rs16975240 (FAM92B), which in our screen displayed a strong allelic skew (log2FC=2.06, BH p-adj=1.33×10−60), the ref allele demonstrated strong expression attenuation (log2FC=-2.29). However, the alt allele exhibited a muted effect (log2FC=-0.24), which suggests the perturbation of an attenuating element. Deletions between the variant position and 10 bps upstream on the ref background identified such an element that when removed, alleviated this attenuation (+3.55-4.46 FC increase). As expected in the alt background, deletions of the same sequence yielded minimal changes (+1.13-1.16 FC increase). Saturation mutagenesis identified a 10 bp motif in the ref region perfectly matching the U1 snRNP binding sequence consensus (Fig. 5a). This core component of the spliceosome has also been shown to suppress gene expression by limiting poly-A tail addition (Furth et al., 1994; Guan et al., 2007). The alt allele disrupts this motif, likely explaining the underlying allelic skew.
3’UTR tiling also can be used to detect three-dimensional structures that influence RNA stability. rs3751756 (KIAA0513), displayed a large allelic skew (log2FC=-1.38, BH p-adj=2.65×10−40), with the alt allele demonstrating a significant attenuating effect that is unobserved with the ref allele. On the alt background, deletions within a 45 bp window around the tamVar restored expression to near ref levels, which suggests an overlying functional element. RNAfold (Gruber et al., 2008) predicts a stem-loop structure in the region, potentially mediating this attenuation (Fig. 5b). While the non-functional ref allele (U) creates a bulge in the stem, the alt allele (G) is predicted to create a stable stem, potentially amenable to RBP occupancy and subsequent functional attenuation. Moreover, nearly every base alteration disrupting pairing of the stem in our saturation mutagenesis alleviated repression (average Z-test p=6.13×10−4), in contrast to tolerated changes in the loop that yielded attenuation. Highlighting the sensitivity of MPRAu to small effect sizes, we recapitulated the expected effects of wobble base pairing, observing A to G mutations along the stem have the smallest effect sizes.
We similarly investigated the indel tamVar rs34448361 (LEPR) due to its large allelic skew (log2FC=-0.55, BH p-adj=2.19×10−11), and its previous identification as being within a genomic region with evidence of recent natural selection (Grossman et al., 2010). Both 3’ UTR alleles displayed attenuation, but the alt allele exhibited a significantly stronger effect (ref log2FC=-0.56, alt log2FC=-1.12), which was validated in a luciferase assay (two-sided t-test p=4.4×10−4) (Supplementary Fig. 8a). Deletions mapped a 25 bp AU-rich element consisting of AUUUA pentamer repeats which are known to impose attenuating effects (Siegel et al., 2020) (Fig. 5c). The ref 3’UTR contains four pentamers while the alt allele is a 4 bp insert that creates a fifth pentamer which is in agreement with the alt allele showing an increased effect. SNV tiling recapitulated this finding, with only disruption of AUUUA pentamers abrogating attenuation in both allelic contexts (Fig. 5c). The alt allele is highly represented in East Asian populations (allele frequency=0.79) versus other populations (African allele frequency=0.15) (1000 Genomes Project Consortium et al., 2015). rs34448361 is in the 3’UTR of a LEPR isoform (LepRb) (Supplementary Fig. 8b,c), and matches the direction of effect for an eQTL for LepRb across diverse tissues contexts including adipose (eQTL β=-0.26, p=2.35×10−7) and IFN stimulated macrophage (eQTL β=-0.50, p=6.61×10−5) (Kerimov et al., 2020). This isoform has been implicated in traits highly essential for survival: satiety and obesity (Münzberg and Morrison, 2015), as well as activation of various immune cells (Abella et al., 2017). However, the region surrounding LEPR has also been implicated in cold adaptation in modern and archaic humans (Sazzini et al., 2014). While the exact phenotype under selection is still unknown, MPRAu provides strong functional evidence for rs34448361 as a causal variant driving the evolutionary signature in the genomic region and uncovers the variant’s regulatory mechanism.
MPRAu identifies causal 3’UTR variants related to human evolution and disease
A key attribute of MPRAu is the ability to parse genetic associations based on functional evidence to nominate causal variants for a variety of traits and diseases. Amongst all 2,153 GWAS-associated 3’UTR variants tested, we found 677 GWAS loci with a significant tamVar in at least one cell type. From this tamVar set, several previously nominated causal variants were identified, including rs13702, which disrupts a miRNA binding site in LPL and has been suggested to underlie associations for triglyceride levels and type 2 diabetes (Ban et al., 2010; Richardson et al., 2013; Tang et al., 2010). Our set also provides novel functional evidence for 10 variants convincingly nominated for causality from genetics alone, such as rs5891007 (LSM1, Schizophrenia risk) (Shi et al., 2011) and rs1140711 (LIN7C, bone mineral density) (Kemp et al., 2014), among others (Table 1). While MPRAu nominates hundreds of phenotypically-relevant 3’UTR variants, a potential limitation is the episomal-based assay not capturing endogenous gene expression effects. To more directly confirm the effects of tamVars, we performed endogenous allelic replacements via CRISPR-mediated homology-directed repair (HDR) on two tamVars linked to specific human disease processes: rs705866 and rs1059273.
rs705866 (PILRB) is associated with age-related macular degeneration (odds ratio (OR)=1.13, p=5×10−9), residing amongst 151 SNPs in strong LD to a tag-SNP (r2=0.808 with rs7803454) (Fritsche et al., 2016) (Fig. 6a). MPRAu identified a suggestive allelic effect (log2FC Skew=0.3, BH p-adj=0.08) with the ref allele showing an attenuating effect on expression and the alt allele disrupting a predicted miRNA binding site (hsa-miR-374a-5p) (Fig. 6b). Other variants in this credible set (the smallest set of variants that contain the true causal variant with 95% probability) include multiple missense SNPs, although their impacts are predicted to be benign (Supplementary Fig. 9a,b). Recent work identified rs705866 as an eQTL for PILRB specifically within the focal disease tissue, the retinal macular region (eQTL β=1.25, p=5.72×10−29) (Orozco et al., 2020).
To confirm the MPRAu allelic effect in the endogenous genomic context, we used CRISPR HDR to perform allelic replacement of rs705866 (T (ref) to C (alt)) in neuronal SK-N-SH cells. Allelic ratios of RNA and DNA from cells with the desired edit were compared with unperturbed cells, and an allelic skew was found (log2FC=0.21, Fisher p=3.39×10−8) to be concordant with the effects measured by MPRAu (Fig. 6c). In addition, due to the imperfect editing from CRISPR HDR, a large proportion of cells also contained unspecific SNVs and indels edits over the miRNA binding site. Aggregating the effects of these imprecise edits led to an even larger functional disruption (log2FC=0.49, Fisher p=1.90×10−48) (Fig. 6c), providing further support for rs705866’s function via hsa-miR-374a-5p.
rs1059273 (TRIM14) lies within a genomic region that experienced positive selection in East Asian populations (Grossman et al., 2010). Han Chinese genomes display an extended haplotype heterozygosity score surrounding rs1059273, but identifying a causal allele has remained difficult (Fig. 6d). MPRAu identified attenuating effects on the ref background, but not the alt (log2FC Skew=0.73, BH p-adj=0.07) (Fig. 6e). rs1059273 disrupts a miRNA binding site (hsa-miR-142-3p), potentially explaining higher expression from the alt allele (Fig. 6e). TRIM14 has many functions, including antiviral and antimicrobial activity as part of the Type I Interferon pathway (Chen et al., 2016; Tan et al., 2017; Zhou et al., 2014). Knockouts of TRIM14 in macrophages more effectively control Mycobacterium tuberculosis replication (Hoffpauir et al., 2020), and TRIM14 has also been found to suppress Influenza A replication (Wu et al., 2019). rs1059273 is a significant eQTL for TRIM14 in T cells and NK cells (T cell, CD4, Th1/17, eQTL β=0.91, p=7.2×10−9; NK cell, CD56dim CD16+, eQTL β=0.90, p=9.7×10−9) (Schmiedel et al., 2018). Fine-mapping also assigns a high likelihood of the variant causally affecting TRIM14 expression (PIP=0.53, 0.55 in Th1, Th17 cells), and finds no other potentially causal (PIP≥0.1) variants with functional annotations within the credible set containing rs1059273 (Kerimov et al., 2020). These associations further link this variant’s potential evolutionary impact to an immunological role.
We performed allelic replacement of rs1059273 (G (ref) to T (alt)) in lymphoblastoid cells and confirmed the variant’s effects on TRIM14 (allele replacement: log2FC=0.58, Fisher p=6.79×10−699). Similar to rs708566, we aggregated the effects from cells with unspecific edits over the miRNA binding site mutations and observed an effect size larger than by rs1059273 alone (log2FC=0.69, Fisher p=3.32×10−953) (Fig. 6c).
Discussion
We developed MPRAu, a high-throughput tool to functionally characterize 3’UTR variants and used it to identify 2,368 3’UTR variants that modulate transcript abundance across six cell lines. We built powerful predictive models of 3’UTR function and identified novel modes of 3’UTR regulation. This resource characterizing GWAS, selection signals, and common variation in 3’UTRs will be useful to ongoing future studies of human adaptation and disease. We expect MPRAu will be a common experimental paradigm to test variants of unknown significance and rare variants going forward. In the future, MPRAu may be further modified to specifically detect variants impacting a particular regulatory mechanism of interest, such as transcription termination (Shalem et al., 2015) or mRNA localization (Andreassi and Riccio, 2009; Berkovits and Mayr, 2015; Tushev et al., 2018).
While MPRAu bridges a tremendous gap linking 3’UTR genomic variation, functional effects, and ultimately phenotypes, the assay has important limitations. The assay’s episomal-based nature, and the limited (100 bp) variant sequence context tested, may prevent the full recapitulation of the endogenous expression effects. Furthermore, RNA steady-state levels may not be perfectly linked with protein levels (Battle et al., 2015; Chick et al., 2016); however, flow-cytometry assays have shown overwhelmingly strong agreement of RNA expression from episomal reporter assays with protein abundance (Oikonomou et al., 2014; Zhao et al., 2014).
In addition to our reporter assay, we provide additional evidence using 3’UTR tiling and endogenous allelic replacement for three variants (rs1059273, rs705866, and rs34448361) with potentially important consequences to understanding human disease and evolution. Further experiments are needed to assess each candidate variant’s contribution in the relevant cellular context. For example, while TRIM14 expression is suppressed in multiple viral infections, including the SARS-CoV-2 virus responsible for the COVID-19 pandemic (Blanco-Melo et al., 2020), the exact extent of rs1059273 in modulating viral infectivity is unknown. While we tested our assay across six different cellular contexts, additional phenotypic-relevant variants may be found when applying our assay in disease-specific tissues.
Currently, potentially causal variants in 3’UTR elements underlying complex human diseases are largely overlooked because of the lack of tools to characterize them. The vast addition of 3’UTR measurements, especially in the context of phenotype-relevant genetic variation, may additionally inform future models of genome function. In total, MPRAu provides a framework for prioritizing regulatory variation in 3’UTRs based on functionality. Our study helps further a more comprehensive understanding of the regulatory processes important for non-coding variant function.
Author Contributions
D.G., R.T., and P.C.S. conceived and began the study. J.R.X. and S.K.R. performed the main analyses and completed the study. D.G., J.R.X., S.K.R., and K.K. performed experiments. J.U. and M.K. provided the fine-mapping datasets. J.D. and S.B.M. provided the rare variant dataset. C.D.N. provided help with polysome-profiling experiments. J.U., M.K., and D.K.Y. provided additional computational insights. S.K.R., R.T., and P.C.S. supervised the study.
Declaration of Interests
P.C.S. is a co-founder of and consultant to Sherlock Biosciences and Board Member of Danaher Corporation.
Acknowledgments
We thank S. Schaffner, S. Gosai, S. Weingarten-Gabbay, A.E. Lin, D. Kotliar, C. Myhrvold, and C. Tomkins-Tinch for thoughtful conversations and help with editing the manuscript. We thank D.P. Bartel as well as his lab for thoughtful discussions. We thank I. Shlyakhter for providing the CMS regions. This work, D. Griesemer, J.R. Xue, S.K. Reilly and R. Tewhey were supported as an ENCODE Functional Characterization Center (UM1HG009435), a Broad SPARC grant, NSF DEB-1401237, and the Howard Hughes Medical Institute. D. Griesemer was partially supported by F30GM114940. S.K. Reilly was partially supported by K99HG010669 and F32HG00922. R. Tewhey is supported by R00HG008179.