Multiple causal DNA variants in a single gene affect gene expression in trans

Identifying the specific causal DNA differences in a genome that contribute to variation in phenotypic traits is a key goal of genetic research. Trans-acting DNA variants that alter gene expression are important sources of genetic variation. Several genes are known to carry single causal variants that affect the expression of numerous genes in trans. Whether these single variants are representative of the architecture of trans-acting variation is unknown. Here, we studied the gene IRA2, which carries variants with broad, trans-acting effects on gene expression in two strains of the yeast Saccharomyces cerevisiae. We found that IRA2 contains at least seven causal nonsynonymous variants. The causal variants were located throughout the gene body and included a pair of neighboring variants with opposing effects that largely canceled each other out. The causal variants showed evidence for non-additive epistatic interactions, in particular among variants at the 5’ end of the gene. These results show that the molecular basis of trans-acting variation can involve considerable complexity even within a single gene.

To test if IRA2 is indeed the causal gene at this locus, we exchanged alleles of the IRA2 ORF in 141 strains carrying Gph1-GFP using our CRISPR-Swap strategy for rapid genome engineering 142 . We then measured Gph1-GFP fluorescence and optical density (OD 600 ) of the 143 engineered strains during 24 hours of growth in a plate reader. The Gph1-GFP expression level 144 was extracted at the inflection point of the growth curve and normalized using the OD 600 value at 145 this time (see Materials and Methods). The presence of the RM allele compared to the BY allele 146 resulted in significantly higher Gph1-GFP expression, confirming that coding variants in IRA2 147 contribute to the effects of this locus ( Figure 1C ). The effect was present in both the BY and RM  Figure 1C and D; and Table S3 ), suggesting the presence of epistatic interactions between 151 variation in the IRA2 coding region and other variants elsewhere in the genome.
152 Increased expression of Gph1-GFP in the presence of the IRA2 -RM allele is consistent with the 153 direction of known effects of this locus on GPH1 mRNA and protein (Albert et al., 2018(Albert et al., , 2014. 154 Based on the function of Ira2 ( Figure 1B ), increased GPH1 expression is expected in the 155 presence of a more active IRA2 allele. Thus, in accordance with previous results (Smith and 156 Kruglyak, 2008), our findings suggest that the RM allele of IRA2 has higher activity than the BY 157 allele. 159 To narrow in on the causal variant in the large IRA2 gene, we divided the IRA2 ORF into four 160 blocks balancing size and number of nonsynonymous variants ( Figure 2A ). We performed these 161 experiments in BY due to the higher efficiency of genome engineering in this strain background 162 (Lutz et al., 2019). We created 16 chimeric alleles representing all combinations of the four 163 blocks to also allow testing for non-additive ("epistatic") interactions between the blocks. Each 164 block combination was represented by five or six independent transformants that were each 165 assayed five times for their effect on Gph1-GFP expression.

IRA2 harbors multiple causal variants that show epistatic interactions
166 At each of the first three blocks, the IRA2 RM sequence significantly increased Gph1-GFP 167 expression, with the largest effect resulting from block 1 ( Figure 2B; Table 2; Figure S1 and 168 Table S4 ). Thus, there must be at least three causal variants in the IRA2 RM allele, with at least 169 one variant in each of blocks 1, 2 and 3. 170 We noticed that one combination of blocks (with the RM at the first 3 blocks and the BY allele at 171 the last block, "RRRB") resulted in significantly higher Gph1-GFP expression than the allele 172 carrying RM alleles at all four blocks (One-way ANOVA: p = 0.035, Figure 2B ). Thus, higher 173 Gph1-GFP levels than those driven by the full RM allele can be achieved by combinations of BY 174 and RM variants. Considering that block 4 had no effect when swapped in isolation ( Figure 2B; 175 Table 2 and Table S4 ), this difference between the RRRB construct and the full RM allele also 176 suggests that there are epistatic interactions between variants in block 4 and variants in the other 177 blocks.
178 To comprehensively test for epistatic interactions among blocks, we performed ANOVA on our 179 fully crossed set of block chimeras. Three combinations of blocks showed nominally significant 180 non-additive interactions ( Table 2 ), which exceeds the expectation of 0.6 out of 12 interaction 181 tests under a null model without epistasis. The strongest interaction was seen between block 1 182 and block 2 and is visualized in Figure 2C . In all three significant interactions, the observed 183 Gph1-GFP expression driven by the combined blocks was less than that expected from the sum 184 of the individual block effects (note negative effect signs in Table 2 ), indicating negative 185 epistasis between these blocks of variants. In summary, IRA2 harbors multiple variants that affect 186 Gph1-GFP expression in trans , some of which interact in a non-additive manner.   ( Figure 3A ). To test whether their joint effect can account for the entire IRA2 effect in an 236 additive manner, we used a bootstrap strategy that accounted for measurement error. We created 237 10,000 bootstrap datasets by randomly sampling with replacement from our individual 238 measurements. In each dataset, we computed the observed difference between Gph1-GFP 239 abundance for the IRA2 -BY and the full IRA2 -RM allele, as well as the sum of the effects driven 240 by each of the 26 single variants. The central 95% of the resulting bootstrap distributions 241 overlapped, such that we cannot formally rule out that the sum of the single variant effects 242 accounts for the effect of the full RM allele in an additive manner ( Figure 3B ). However, the 243 overlap was mostly due to the very long tail of the summed single variant distribution, such that 244 only 3.6% of this distribution exceeded the 2.5% quantile of the full RM allele effect. As also 245 suggested by the significant epistatic interactions among the four blocks above, it remains 246 possible that epistatic interactions among variants across the length of IRA2 are required to 247 generate the effect of the full IRA2 -RM allele.
248 Multiple, epistatic causal variants are near the 5'-end of IRA2 249 The results above suggest considerable complexity of genetic variation within IRA2 , with 250 multiple causal variants and non-additive interactions between different regions of the ORF. To 251 dissect this architecture further, we focused on block 1. Although block 1 had the largest effect 252 on Gph1-GFP expression, none of the eight single variants in this block had a significant effect 253 on their own. The variants in block 1 could have real but subtle effects, such that their combined 254 effect could account for the block 1 effect even if no individual variant reached statistical 255 significance. However, bootstrap analysis showed that the additive effects of these variants 256 cannot account for the effect of block 1 ( Figure 4A ), suggesting a non-additive interaction 257 between at least two of the eight variants within block 1. 258 We further divided block 1 into sub-block 1, containing the first five nonsynonymous variants 259 and assayed its effect on Gph1-GFP expression ( Figure 4B; Figure S3 and Table S6 ). The 260 effect of sub-block 1 was significant (p = 4 x 10 -7 ), but was smaller than the entire effect of block 261 1 (p = 0.05). This result suggests that the effect of block 1 is caused by multiple variants: at least 262 one among the first five variants and at least one among the following three variants. 263 Furthermore, bootstrap analysis showed that the summed effects of the first five variants are 264 unlikely to account for the effect of sub-block 1 ( Figure 4C) . Thus, further division of block 1 265 caused its effects to splinter and uncovered the existence of non-additive interactions among the 266 five variants closest to the 5'-end of the gene. 299 Discussion 300 Our fine-mapping experiments in the IRA2 gene revealed multiple causal variants spread 301 throughout the IRA2 ORF that likely act together to create the trans -eQTL hotspot at this gene. 302 Altogether, there must be at least seven causal variants in this gene. In our single-variant 303 experiments, we directly identified one variant each in block 2 and block 3, and two variants in 304 block 4. In addition, there must be at least three causal variants in block 1, with at least two in 305 sub-block 1, that epistatically interact to create the strong increase in Gph1-GFP expression 306 caused by this block. In addition to this localized epistasis within block 1, which covers a poorly 307 conserved region of the ORF with unknown function, we also found evidence for epistasis 308 between variants spread further across the gene body. Thus, this single gene harbors a complex 309 molecular genetic architecture. 310 We were able to detect the small effects of the causal variants in IRA2 because in conjunction 311 they create a large effect that manifests as a highly pleiotropic trans -eQTL hotspot that drew our  363 For solid media, 20 g/L agar was added prior to autoclaving. Yeast were grown at 30°C. 433 GPH1-GFP IRA2(BY) strains. The IRA2 single-variant strains were engineered in four batches, 434 corresponding to the four IRA2 blocks. In each batch, a new IRA2(BY) strain was also 435 engineered, and in batches 1, 2 and 4 a new IRA2(RM) strain was engineered. The IRA2(BY) 436 strains from these four batches were similar in their effect on Gph1-GFP expression and 437 therefore were grouped into one IRA2(BY) genotype that served as the wildtype baseline for 438 determining the effects of the other alleles. The IRA2(RM) strains were similarly grouped. The 439 IRA2 block1 RM strain and the IRA2 sub-block 1 RM strain were created in the same batch 440 (block1) as variants 1 -8.  551 We tested the significance of each term using a Type I ANOVA through sequential addition of 552 terms to the model above.
553 Testing for epistasis between single nucleotide variant effects 554 The single nucleotide variants were not engineered in all combinations, precluding the use of 555 interaction terms in a linear model. Instead, we asked whether the effects of single variants could 556 sum to the observed effect of their given multi-variant block or the full IRA2 RM allele. To 557 properly take into account measurement error across our experimental design, in which 558 individual transformants were run multiple times in several different runs of a plate reader, we 559 used bootstrapping. For each set of variants, we performed stratified sampling with replacement 560 across measurement plates, ensuring that each genotype was represented at the same sample size 561 as in the full data. For each bootstrapped experiment, we fit the same models as above and 562 extracted the effect estimate. Single-variant effect estimates were summed to calculate the effect 563 they would be expected to have together under an additive model. We performed 10,000 of these 564 bootstraps to generate two distributions. The first distribution represented the difference between 565 BY and the given multi-variant RM allele, and the second distribution represented the sum of the 566 respective single-variant effects. Significance testing was performed by calculating the overlap 567 between the two distributions. If the central 95% quantile range of the two distributions did not 568 overlap, we considered it significantly unlikely that the additive effects of the single variants are 569 able to account for the observed effect of the multi-variant allele. Thus, the less the two 570 bootstrapped distributions overlap, the stronger the evidence for epistatic interactions among the 571 given set of single variants.  Figure S1: Gph1-GFP expression in chimeric IRA2 block alleles 581 Figure S2: Gph1-GFP expression in single-variant alleles 582 Figure S3: Gph1-GFP expression in block 1 alleles 583 Tables 584 Table S1: Strains 585 Table S2: Primers/Oligonucleotides 586 Table S3: Allele effects and linear models of full RM in BY or RM background 587