Heteroscedastic regression modeling elucidates gene-by-environment interaction

Genotype-by-environment interaction (GxE) occurs when the size of a genetic effect varies systematically across levels of the environment and when the size of an environmental effect varies systematically across levels of the genotype. However, total variance in the phenotype may shift as a function of the moderator irrespective of its etiology such that the proportional effect of the predictor is constant. We expand the traditional GxE regression model for polygenic scores and single locus allele counts to directly account for heteroscedasticity associated with both the genotype and the measured environment, and we derive a test statistic, ξ, for inferring whether GxE can be plausibly attributed to a more general effect of the genetic or environmental moderator on the dispersion of the phenotype. We apply this method to test whether genotype-by-birth year interactions for Body Mass Index (BMI) are distinguishable from general secular increases in the variance of BMI or associations of the genetic predictors (both PGS and individual loci) with BMI variance. We provide software for estimating heteroscedastic GxE regression models and calculating ξ.


Introduction
Genotype-by-Environment interaction (GxE) studies test whether the genotype-phenotype association varies in magnitude across the range of a measured environmental variable and whether the environment-phenotype association varies in magnitude across the range of the measured genotype. Investigations of GxE have been of particular interest in the study of complex traits, leading to a variety of methods for estimating polygenic GxE [1,2]. Development of these analytic approaches coincides with a large increase in the availability of genomic data, particularly in the context of population-based studies that also contain a broad range of phenotypes and environmental measures. Here, we consider circumstances where either the genetic or the environmental measure selected for GxE is associated with the magnitude of variance in the target phenotype. Such heteroscedasticity introduces a key interpretational problem to results from standard GxE approaches: any observed GxE may arise as an artifact of the fact that the total variance in the phenotype shifts across the range of the genetic or environmental moderator, irrespective of etiology. We resolve this problem by expanding the standard regression-based model for testing GxE in both single locus and polygenic score settings by introducing a simple yet flexible approach to directly model heteroscedastic residuals, and we develop a test statistic for inferring whether GxE can be plausibly attributed to a more general effect of the moderator on the dispersion of the phenotype.
All else equal, when the dispersion of a phenotype is directly controlled by the level of the moderator, the standard regression-based model will identify a significant interaction between that moderator and any correlate of the phenotype, whether genetic or non-genetic. For example, when the effectiveness, intensity, or dosage of a phenotype-altering intervention varies proportionally to baseline levels of the phenotype (e.g. when hours of individualized education are assigned based on previous year's test scores; or when intensity of a weight-loss intervention is determined from baseline body mass index), then the variance of the phenotype is expected to increase in response to the intervention, and the unstandardized effect sizes for all correlates of baseline levels of the phenotype are expected to increase. Similarly, when a genetic variant confers greater plasticity in a phenotype, then the variance of the phenotype is expected to differ across alleles (such that the variant is identified as a variance quantitative trait locus, vQTL). We would then expect the variant to moderate the effect sizes for all correlates of that phenotype proportionally to one another. Only when a moderator acts preferentially on specific mechanisms of variation in the phenotype are interactions with various predictors expected to depart from expectations that follow from differences in total variance across the range of the moderator. Figure 1 illustrates how heteroscedasticity across the range of a measured environmental moderator (E; left panel) or genetic moderator (G; right panel) can produce the impression of GxE. In the left panel, we illustrate a scenario in which a PGS is associated with a constant proportion of variance in the phenotype across the range of the measured environment. That the proportion of variance explained by the PGS at any given location on the x-axis is constant indicates that the expected percentile location within the distribution of the phenotype will also be constant across the range of E (assuming that the shape, but not dispersion, of the distribution remains constant across the range of E). Nevertheless, a clear fan-spreading effect is observed, incorrectly implying that the PGS is increasingly penetrant at higher levels of E. The right panel represents an analogous scenario with respect to a single genetic variant that moderates the variance of the phenotype (i.e a vQTL [3, 4, 5, 6, 7]). In this scenario, the environment is associated with a constant proportion of variance in the phenotype across the range of the genotype, but appears to become increasingly predictive of the phenotype as the effect allele count increases as a result of heteroscedasticity. By directly modeling this heteroscedasticity, we are able to distinguish between scenarios such as these-in which GxE is an epiphenomenon of variance modulation-from those representing more meaningful patterns of GxE.
We develop and validate a flexible heteroscedastic regression model for testing GxE. We apply this model elucidate to the well-studied interaction between birth cohort and genetics linked to body mass index (BMI). We test whether both PGS-by-Birth Year and SNP-by-Birth Year interactions on BMI can be attributed to more general increases in the total dispersion of BMI over historical time. That is, we ask whether birth year is associated specifically with amplification of genetic risk for BMI or more generally with amplification of all variation in BMI.

Overview of Method
In settings wherein focus is on a measured genotype, as indexed by a single-locus allele count or a polygenic score (PGS), the standard regression model for GxE is where Y i is the phenotype for person i, G i is the measured genotype, E i is the measured environment, and e i is an error term. Under this model, the effect of G i on Y i is allowed to vary as a linear function of E i , as indexed by the regression coefficient β 3 . However, a crucial assumption is that the residuals, e i , are homoscedastic across all levels of E i (e.g., e i ∼ N(0, σ 2 )). The standard regression model given by Eqn 1 can be expanded so as to relax the assumption of homoscedasticity of residuals across E i by specifying the following environmental heteroscedasticity Figure 1: Hypothetical examples of an Environmental moderator (left) or a genetic moderator (right) of total variance in a phenotype producing the impression of GxE. Left: Residuals of the regression of the Phenotype on the Measured Environment are heteroscedastic. high, average, and low polygenic scores (PGSs) are associated with constant percentiles of the phenotype at any given location along the x axis. However, because variance of the phenotype expands across the range of the Measured Environment, the PGS accounts for increasing unstandardized variance across this range. Right: A variance quantitative trait locus (vQTL) in which the effect allele is associated with greater variance in the phenotype. Unstandardized scores on the phenotype that are associated with high, average, and low levels of the measured environment (E) become more distinct with increasing number of effect alleles. However, because the total variance in the phenotype expands across the x axis, the percentile locations of these scores within each genotype (0, 1, or 2) is constant across all levels of the genotype.
Here, τ 1 is the main effect of the measured environment, π 0 is the main effect of G i , π 1 is the GxE analogue to β 3 in Eqn 1, λ 0 is the main effect of the error i (which we assume, without loss of generality given λ 0 , to have unit variance), and λ 1 is the coefficient used to index heteroscedasticity. Note that the heteroscedasticity term E i · i takes a form parallel to the GxE term, G i · E i . The heteroscedasticity model is a flexible model allowing for both conventional GxE and heteroscedasticity. In order to test whether a more general pattern of heteroscedasticity as a function of E i is sufficient to account for the observed GxE, we propose a restricted form version of this model that does not directly model GxE but instead allows for the total variance of the phenotype to vary as a function of E i . This scaling model represents a scenario wherein GxE-in the sense of a significant estimate of β 3 in Eqn 1-arises as a function of differences in the total variance of Y i as a function of E i . We choose this nomenclature given that the model emphasizes the importance of the scale of the outcome. Under this model, the proportional contribution of G i is constant over the range of E i , but the scale of Y i systematically varies across the range of E i ; i.e., E i acts as a "dimmer" [8].
The scaling model for moderation of the variance of Y i as a function of E i takes the form Here, Y i is an unobserved factor representing unexplained variation in Y i incremental of E i . The b 0 + b 1 E i coefficient on the Y term produces heteroscedasticity in Y i as a function of E i . We let where G is a measured genotype standardized to have unit variance and i is an unobserved error term (we assume it is also scaled to have unit variance). Finally, we identify the units of Y by specifying e = √ 1 − h 2 . The penetrance of the measured genotype is thus controlled via the relative magnitudes of h and e. Note, in particular, that in terms of Y i , the penetrance of the measured genotype is constant. The test that we introduce relies on this property; i.e., when properly scaled, the penetrance of the measured genotype to Y i is constant. Suppose that b 1 = 0. In that case, Y i is affected solely by environmental and genetic main effects. If b 1 = 0, then the role of Y i with respect to Y i varies as a function of E, and as we elucidate next, so do the raw (but not relative) contributions of G i and i to Y .
In Section A of the supplemental information (SI), we show that the environmental heteroscedasticity model reduces to the environmental scaling model when We use this observation to derive the test statistic ξ E and associated hypothesis test of whether an empirical estimate of GxE is distinguishable from the scaling model as We provide parallel heteroscedasticity and scaling models for moderation of the variance of Y i as a function of G i (leading to an analogous test statistic, ξ G ) in Section A.3 of SI.
We conduct a variety of simulation studies (see Section B of SI). In summary, results of the simulation study suggest that ξ E is a reliable indicator of whether the scaling model is the basis for GxE. When a test of the statistic fails to reject H 0 : ξ E = 0 and there is significant GxE (π 1 ) observed, we cannot rule out that the scaling model is driving observed GxE. When the test suggests rejection of H 0 : ξ E = 0, alternative forms of GxE are implicated. Software to estimate the models considered here is described in Section C of SI.

Heteroscedasticity in context of BMI-linked genetics and birth year
We apply our heteroscedastic GxE model to the well-studied interaction between birth year and genetics linked to body mass index (BMI) [9, 10, 11, 12]. 1 We test whether previous reports of increasing penetrance of PGS for BMI for later-born birth cohorts can be plausibly attributed to more general increases in the total dispersion of BMI, including variation unique of its genetic etiology, across birth cohorts [14,15]. We thus ask whether birth year is associated specifically with amplification of genetic risk for BMI or more more generally with amplification of all variation in BMI. We then go on to apply our heteroscedastic GxE model to examine individual locus-by-year interactions for BMI. To illustrate how our approach can be used to conduct SNP-level analyses, we apply our technique using the top hits from a recent BMI GWAS [16] to investigate whether they are associated with heteroscedastic GxE.

Polygenic Score Analysis
We consider GxE in the context of a long-running cohort study-the Health and Retirement Study (HRS) [17]-of US adults over 50 and their spouses (N=11,586). Respondents are interviewed every two years and data are collected on their early-life, anthropometrics, socioeconomic status, and mental and physical well-being. Previous examinations of increases in the penetrance of the BMI polygenic score across birth year have used this data [9, 10, 11]. We use a polygenic score for BMI [16] as constructed by the HRS [18]; further description of the data are in D of SI.
Results are presented in Table 1. We first analyze all respondents together and then conduct separate analyses by sex. In all cases, we observe considerable evidence for stronger prediction of BMI by the PGS for more recently born individuals (i.e., π 1 > 0); this is consistent with earlier findings indicating secular increases in the penetrance of a BMI PGS [9, 10, 11]. However, we also observe considerable evidence for non-PGS variance in BMI to increase with birth year (i.e., λ 1 > 0), raising the possibility that the obtained GxE may be attributable to a more general pattern of increasing variance in BMI with birth year. In sex-pooled analyses, probabilities associated with tests of H 0 are ≈ 0.02. However, findings vary slightly by gender; especially for males, we are unable to reject the null of the scaling model. Table 1: Estimates from parameters of the full heteroscedasticity model (Eqn 42) in analysis of GxE for BMI as a function of birth year in the HRS. The ξ E and ξ G estimates reported are obtained from the environmental and genetic heteroscedasticity models, respectively. We show probabilities for parameters when the maximal probability in a column is larger than 1e − 6. To better illustrate the differences in the sex-stratified analyses, we consider a graphical analysis inspired by the notion of posterior predictive checks [19]. Using estimates from the empirical data, we construct simulated data sets using Eqn 14 and Eqn 1 and the HRS data. We then consider the predicted percentage of phenotypic variance explained by the genetic predictor across birthyears. Based on Eqn 9, we compute this percentage as Results of this graphical analysis are in Figure 2. For males (top row), the empirical data (in red) produce a pattern that falls well within the distribution of results from data simulated from the scaling model, but departs more appreciably from results from data simulated under the conventional homoscedastic GxE model. The proportional contribution of the PGS to BMI does not significantly increase with birth year for males. For females, there is more change in penetrance than would be anticipated under the scaling model but less change than would be anticipated under the conventional homoscedastic GxE model. As compared to males,the proportional contribution of the PGS to BMI significantly increases with birth year for females. Thus, for females, the full heteroscedasticity model, in which the contributions of PGS and non-PGS factors independently shift with birth year, may be the most appropriate analytic tool. We also illustrate the potential for genetic heteroscedasticity using the genetic heteroscedasticity model (Eqn 37). We observe evidence for moderation of residual variance in BMI by the BMI PGS, as indicated by the λ 2 parameter. Interestingly, this result holds even for the Box-Cox transformed version of BMI, suggesting that the heteroscedasticity is not entirely an artifact of mean-variance associations induced by the skewed distribution of BMI. One interpretation of these results is that the polygenic score for BMI also captures some degree of "plasticity" (in the sense of [20]). Results of the ξ G tests (see final column of Table 1) are similar to those of the ξ E in that we cannot reject the null of the scaling model for the male respondents.

SNP Analyses
We conducted heteroscedastic regression analyses for 96 marker SNPs for the genome-wide significant loci identified in [16] using independent data from N=380,605 participants from UK Biobank (UKB; not used in the [16] GWAS). Replication results for the main effects can be found in Table  F.2. Note that the UKB is highly non-representative [21] potentially affecting the generalizability of our results.
The environmental heteroscedasticity parameter (λ 1 ) was positive and significant in all models, suggesting greater variance in BMI among those born later in the 20th century. With the full heteroscedasticity model, we identified four SNPs that exhibited significant gene-by-birth year effects, Figure 2: Gray lines represent genetic penetrance as a function of birth year simulated based on either the scaling model (Eqn 14, on left) or the standard homoscedastic GxE model (Eqn 1, on right). Red lines represent genetic penetrance as a function of birth year estimated from the real data using the heteroscedastic regression model (Eqn 9). Analyses based on standardized BMI data.
as indicated by the π 1 parameter at the Bonferroni-adjusted alpha level. Parameter estimates from the full heteroscedasticity model for these SNPs are reported in Table 2, along with ξ E and ξ G estimates obtained from the environmental and genetic heteroscedasticity models, respectively. Note that rs1558902 is a variant in the FTO gene locus. This finding replicates earlier results [22] which reported increasing penetrance of a variant with FTO over historical time using a different data set. However, ξ G was highly significant for each of these four SNPs, allowing us to reject the null that observed GxE is driven entirely by scaling.
With the full heteroscedasticity model, we identified six SNPs that exhibited significant effects on the dispersion of BMI, as indicated by the λ 2 parameter at the Bonferroni adjusted alpha level. Parameter estimates for these SNPs are reported in the supplement (Table F.1), along with ξ E and ξ G estimates obtained from the environmental and genetic heteroscedasticity models, respectively. Of the 4 Bonferonni significant variants for SNP-by-Year interaction described above, only the FTO variant (rs1558902) appears among the 6 Bonferonni significant variants for vQTL. This finding replicates earlier results [3] reporting that the FTO gene locus is associated with greater variability in BMI. The p value of ξ G for this variant is 5.1e-07, indicating the the abovereported SNP-by-birth-year interaction on BMI for FTO is not simply attributable to a general modulation of variance by rs1558902. Note that although the ξ G statistic is not significant for rs7903146, rs9925964, and rs2287019, these SNPs do not exhibit GxE at Bonferonni-corrected levels, rendering the ξ G moot.
To examine the total evidence of signal for GxE and genetic heteroscedasticity in the 96 SNPs, we provide a QQ plot of -1og10(p) for parameters indexing SNP-controlled heteroscedasticity of BMI vQTL and gene-by-birth year GxE interaction for Box-Cox transformed BMI in the left panel of Figure 3. We observe considerable departure of the observed -1og10(p) values relative to the expectation under the null, indicating strong enrichment of signal for both vQTL and GxE. In the right panel of Figure 3, we provide the scatter plots of parameter estimates for SNP main effects (π0), GxE (π 1 ), for vQTL (λ 2 ) for Box-Cox transformed BMI. We observe moderate-to-strong associations between the main effect of a SNP and both its GxE and vQTL effects, indicating that, among these variants selected on the basis of marking genome-wide significant loci for BMI in an external meta-analysis, those with larger main effects on BMI exhibit larger interactions with birth year and more pronounced moderation of BMI variance. Results of [6] suggest that similar patterns may apply genome-wide.
Our results indicate that SNP moderation of birth year effects on BMI and SNP moderation of BMI dispersion were in the same direction as the SNP effects on BMI means. We observe considerable sign congruence between SNP main effects (π 0 ) and SNP-by-Birth-Year interaction (π 0 ) for 3 out of 4 SNPs exhibiting Bonferonni significant π 1 estimates and for 71 out of 96 of all SNPs tested. We also observe considerable sign congruence between SNP main effects (π 0 ) and vQTL effects (λ 2 ) for 6 out of 6 SNPs exhibiting Bonferonni significant λ 2 estimates, and for 75 out of 96 of all SNPs tested.
Finally, we note that the main effect of birth year (τ 1 ) was approximately −0.058 in all models (all associated probabilities less than 1e-250), indicating that later-born UK Biobank participants tend to have lower BMI. Given substantial epidemiological evidence from representative samples indicating increasing BMI with birth year, we speculate that this negative association may be driven by selection bias in the UKB sample [21]. However, note that, despite decreasing mean BMI with birth year in this sample, we observe increasing variance in BMI with birth year (as indicated by the positive λ 1 parameter), and increasing penetrance of SNPs with birth year (as indicated by strong sign congruence between π 0 and π 1 parameters). This further suggests that variance moderation and SNP-by-Birth Year interactions are unlikely to have resulted from positive associations between BMI mean and variance. Table 2: Estimates from parameters of the full heteroscedasticity model (Eqn 42) in analysis of GxE for Box-Cox transformed BMI as a function of birthyear in the UK Biobank for those SNPs with significant π 1 estimates following Bonferonni adjustment. The ξ E and ξ G estimates reported are obtained from the environmental and genetic heteroscedasticity models, respectively. We show probabilities for parameters when the maximal probability in a column is larger than 1e − 6.

Discussion
When GxE is detected using the conventional regression-based model, it is common to interpret the interaction in terms of the specific genetic and environmental measures included in the interaction model. However, GxE may arise more generally when the genetic or environmental predictor moderates the total amount of variation in the phenotype. When the total amount of variation in a phenotype varies across the range of an environmental measure, traditional homoscedastic GxE models will detect interactions between that measure and all other correlates of the phenotype, both genetic and non-genetic. Here, we have delineated an expanded heteroscedastic GxE regression model that explicitly allows for heteroscedasticity in the phenotype as a function of the genetic and environmental measures. We use this model to derive a test statistic, ξ, which compares this We provide an application of the heteroscedastic regression model to empirical data, replicating previous observations that that genetic predictors of BMI have become increasingly penetrant in recent years. However, we additionally observed that the total variance in BMI, including variation not associated with the genetic predictors, increased with participant birthyear. We found that, amongst males within the sample, the interaction between PGS and birth year was indistinguishable from a scaling model in which the total variance in BMI increases over birth year. For males, although the unstandardized variance in BMI accounted for by the PGS increased with birth year, the variance in BMI unaccounted for by the PGS also increased, such that the proportion of variance accounted for in BMI did not significantly vary with birth year. Other work has considered the potential moderation of genetic risk for BMI as a function of diet [23, 24], other lifestyle characteristics [25], and educational attainment [26]. The issues discussed here may be a relevant feature of such empirical work; i.e., environmentally-linked heteroscedasticity may play a role in driving previously observed GxE findings.
We have considered situations in which the dispersion of the phenotype systematically varies as a function of the genetic and environmental measures included in the model. However, other issues surrounding the scaling of the phenotype also have the potential to have substantive effects on GxE results. For instance, skew in the distribution of the phenotype may produce both GxE and heteroscedasticity that would vanish under a transformation. This issue has been vexed geneticists since the days of Fischer [27], and is yet to be fully resolved. In our empirical analysis of BMI, we report results of GxE analyses for both the original scale and a normalizing (Box-Cox) transform of BMI and obtain similar results. In situations in which the phenotype is measured by self-reported items, as is often the case in large-scale epidemiological research, psychometric issues may magnify this problem.
The methods introduced here can be viewed as complementary to those based in other genetic research designs, such as twin/family [1] and genome-wide molecular methods [2] that focus on GxE in the form of differences in the magnitude of of genetic variance or (SNP) heritability across the environmental range. The techniques developed here could be incorporated along with other recently developed GxE techniques [28], as well as techniques to further relax assumptions regarding the parametric form of the likelihood [29,30], to further enhance the robustness of future GxE work.
GxE research faces a number of conceptual and statistical challenges [8]. Thoughtfully and rigorously engaging with these challenges is particularly salient given the substantial recent increases in the availability of both genetic resources [31] and computational tools [32] for such research. Our proposed test is designed to help clarify the nature of potential GxE findings and to shed additional light on the processes contributing to variation in the phenotype; in particular, we can assess the extent to which the genetic etiology is relatively stable across the range of the environmental measure. Using the model and test statistic presented here, researchers can test for whether variance in the phenotype that is not explained by the genetic predictor shifts across the range of the environment and whether a scaling model may account for the obtained pattern of GxE. When the scaling model holds, we would suggest that the environment largely acts as a "dimming" mechanism on phenotypic variation [8] without altering the proportional contribution of genetic variation to the phenotype.
Genotype-covariate correlation and interaction disentangled by a whole-genome multivariate reaction norm model. Nature communications, 10(

Supplemental Information (SI)
A Methods

A.1 Description of the Heteroscedastic Regression Model and Derivation of ξ
The environmental heteroscedasticity model is specified as Here, τ 1 is the main effect of the measured environment, π 0 is the main effect of G i , π 1 is the GxE directly analogous to β 3 in Eqn 1, λ 0 is the main effect of the error i (which we assume, without loss of generality given λ 0 , to have unit variance), and λ 1 is the coefficient used to index heteroscedasticity. This model will produce very similar unstandardized estimates of GxE as the standard GxE model given by Eqn 1. In other words, π 1 in Eqn 9 is directly comparable to β 3 in Eqn 1. However, the heteroscedasticity model further models whether variance in the phenotype that is not accounted for by the genetic predictor (G i ) also varies as a function of the environment (E i ) in the form of an E i i interaction, as captured by the λ 1 parameter. Importantly, when G i only index portions of total genetic etiology of a trait [33, 34, 35], the E i i interaction may result from GxE with respect to the genetic factors not indexed by Gi.
In order to test whether a more general pattern of heteroscedasticity as a function of E i is sufficient to account for the observed GxE, we propose a restricted form version of this model that does not directly model GxE but instead allows for the total variance of the phenotype to vary as a function of E i . This scaling model represents a scenario wherein GxE, in the sense of a significant estimate of β 3 in Eqn 1, arises as a function of differences in the total variance of Y as a function of E i . The scaling model for moderation of the variance of Y i as a function of E i takes the form Here, E i is a measured environment and Y i is an unobserved factor representing unexplained variation in Y i incremental of E i . We introduce a scaling transformation that produces heteroscedasticity in Y i as a function of E i via the b 0 + b 1 E i coefficient on the Y term. We let where G is a measured genotype standardized to have unit variance and i is an unobserved error term (we assume it is also scaled to have unit variance). Finally, we set the scale of Y by specifying e = √ 1 − h 2 . The penetrance of the measured gentoype is thus controlled via the relative magnitudes of h and e. Note, in particular, that in terms of Y i , the penetrance of the measured genotype is constant. The test that we introduce relies on this property; i.e. that when properly scaled, the penetrance of the measured genotype to Y i is constant in this model. Suppose that b 1 = 0. In that case, Y i is affected solely by environmental and genetic main effects. If b 1 = 0, then the role of Y i with respect to Y i varies as a function of E, and as we elucidate next, so do the raw (but not relative) contributions of G i and i to Y .
Substituting Eqn 11 into Eqn 10, and setting e = √ 1 − h 2 , yields It can be seen that the environmental scaling model (Eqn 14) constitutes a constrained form of the environmental heteroscedasticity model (Eqn 9). Both equations take the same form in terms of E and G. However, whereas the coefficients in Eqn 9 are unconstrained, these coefficients are linked to one another in Eqn 14 via the the ratio h/e which, as noted above, is fixed under the scaling model. Thus, Eqn 9 is equivalent to Eqn 14 when We use Eqn 15 as the basis for a hypothesis test. In particular, we consider the test statistic When ξ E = 0, the environmental heteroscedasticity model is indistinguishable from the simpler scaling model, whereas when ξ E is differs from 0, the environmental scaling model is rejected.

A.2 Estimation
We estimate Eqn 9 using maximum likelihood by assuming i ∼ Normal(0, 1).We assume that Y is mean-centered and that both G i and E i are non-random.
since we assume V( ) = 1. If we write note that both E(Y i ) and V(Y i ) are functions of θ. Estimation of Eqn 9 is based on the likelihood, where φ is the normal density and we use the above expressions for E(Y i ) and V(Y i ). We can thus solve for θ = {τ 1 , π 0 , π 1 , λ 0 , λ 1 } using ML based on In particular, we obtain estimators via To obtain ML estimates, it is convenient to also have the gradient. Here, we have Using these estimates, we can compute a test statistic. Replacing the values in Eqn 16 with estimates from Eqn 9, we propose to test Given that we are testing a single restriction, W has a χ 2 distribution with one degree of freedom in large samples. To estimate V(ξ E ), we use the Hessian of Eqn 24 (see E) and the delta method. Computationally, we do this via [37]. Note also that conventional tests of π 1 and λ 1 can be used to offer insight into the existence of GxE and potential heteroscedasticity; we make use of such tests below.

A.3 Extensions
A version of the heteroscedasticity model presented in Eqn 9 above can be constructed so as to relax the assumption of homoscedasticity of residuals across G i as follows Here λ 2 is the coefficient used to index heteroscedasticity as a function of G i . We assume homoscedasticity of residuals as a function of E i by omitting λ 1 . We refer to Eqn 37 as a genetic heteroscedasticity model. When a single variant is entered for G, this model consitutes a vQTL model that includes a term for GxE. Note that this model is equivalent to the model presented in Eqn 9, but with G i entered for E i and E i entered for G i . The scaling model for moderation of the variance of Y i as a function of G i takes the form and As in the scaling model for moderation the variance of Y i as a function of E i , i is assumed to have unit variance. In terms of Y i , the effect of E i constant across G i . However, in terms of Y i , the effect of E i changes across G i as result of heteroscedasticity across G i . It follows that the hypothesis test comparing the GxE model allowing for heterocedasticity as a function of G i (given by Eqn 37) to its reduced scaling model (given by Eqns 38 and 39, is Finally, we can fit a full heteroscedasticity model that allows for moderation of residual variance in Y i as a function of both G i and E i as follows This model may be particularly useful when there is nonzero gene-environment correlation between the two moderators, i.e. | r(G i , E i ) |> 0, such that G i and E i can be mutually controlled when modelling heteroscedasticity. However, to avoid added complexity, we recommend that comparisons with the scaling models rely on the ξ E and ξ G tests from the separate environmental heteroscedasticity and genetic heteroscedasticity models, respectively.

B Simulation Study
We now consider both the null distribution of the test statistic under different configurations of the relevant parameters from the scaling model (Eqn 14) and the performance of our test statistic in the case of generating models that do not conform to the scaling model (i.e. when Eqn 1 is used to generate a model with GxE and homoscedastic residuals, or Eqn 9 is used to general a model with heteroscedastic residuals but no GxE). We primarily describe the simulation models in the context of the heteroscadasticity and scaling models in which the measured environment, E i moderates the residual variance of the phenotype, Y i , as given by Eqns 9 and 14. However, as described in A.3, these models are equivalent to those in which the measured phenotype, G i moderates the residual variance of Y i (after G i and E i are substituted for one another). Thus simulation results apply to both scenarios.

B.1 Distribution of the test statistic under H 0
We first consider the behavior of the test statistic when the null hypothesis is true. We generate data according to the scaling model (Eqn 14), estimate the heteroscedasticity model (Eqn 9), and then calculate the ξ E test statistic and associated Wald-based probability. We start by illustrating that the appropriate transformation of ξ E has a χ 2 distribution. To do this, we generate 10,000 datasets from the scaling model (Eqn 14) with b 0 = 1, b 1 = 0.15, h = 0.5, N = 5000. We compute ξ E and then transform it to W via Eqn 36. Results are in Figure B.1. The null distribution of W neatly aligns with the density of χ 2 (1). Below, we report on ξ E given that it is a relatively intuitive function of the estimated parameters but also exploit the fact that ξ E can be mapped to W and inference can be done via reference to χ 2 (1). We next considered the behavior of ξ E under the null, for configurations of the relevant parameters in the scaling model (Eqn 14) meant to capture the relevant features of an analysis in which G is a PGS. We considered variation in four parameters. We used three different sample sizes (1000, 5000, 10000). We used two values respectively for b 0 (0.8, 1) and b 1 (0.05, 0.1). Finally, we used two values for h (0.1 and 0.5). The values of h suggest weakly (0.1) to strongly (0.5) penetrant genetic predictors; they were chosen to represent the range of potential PGS predictors given existing technology [33]. Recall that the value of e is fixed as a function of h. Note that we considered two different distributions for E; we sampled E from both the standard normal and the uniform distributions (where the uniform distribution was scaled to have the same standard deviation as the standard normal). As polygenic scores for complex traits are expected to be normally distributed [38], G was sampled from the standard normal distribution. To help develop intuition, suppose b 0 = 0.8, b 1 = 0.05, and h = 0.5. In that case, V(y|E = 2) = 0.736 compared to V(y|E = −2) = 0.422, a ratio of 1.74 (i.e., there is heteroscedasticity). Finally, it is important to note that the scale of ξ E is dependent on several features (e.g. scale of G and E); patterns shown here are informative about the behavior of ξ E but the values themselves do not necessarily generalize to other cases (e.g., if E or G does not have unit variance). In contrast, W and associated tests are invariant to scaling of G and E, but sensitive to sample size.
Results are shown in Table B.1. Note that the estimates of ξ E are unbiased (i.e., the means are nearly zero). Further, as we would anticipate, the false positive rate (FPR) of the test is insensitive to the parameters; note also that the mean probability associated with the test of ξ E via Eqn 36 is near 0.05, as we would expect when the null is true. 2 In contrast, the variance of the test statistic (and, thus, statistical power) depends on the sample size and simulation parameters. In particular, note the decline in the SD of the test statistics as a function of the sample size. Statistical power will also depend upon h; this test will have less power with less penetrant genetic predictors.  We next considered the behavior of of ξ E under the null for configurations of parameters meant to capture the relevant features of an analysis in which G is an allele count for a single locus (e.g. a SNP). Results are in Table B.2. The table is similar to Table B.1, but where we now consider allele counts based on variants simulated under a binomial distribution with different minor allele frequencies (denoted p). Based on Extended Data Figure 2 in [39], we chose effect sizes of h 2 ∈ {.05%, .1%}. Note that after generating G under a binomial distribution, it was standardized, such that h can be interpreted as a standardized effect size. We examine behavior using a sample size of N = 10000, which might be used for confirmatory tests of a select number of variants previously identified in a larger independent GWAS, and a sample size of N = 1000000, which might be used for genome-wide discovery. Simulation results indicate that the heteroscedasticity model and the ξ E statistic behave as expected across the range of conditions examined. Table B.2: Distribution of ξ E under the null (mean and SD) and associated test (via Eqn 36) for different values of parameters in Eqn 14 based on a standardized single-locus allele count. For each set of parameters, 100 datasets are simulated. The environmental variable is sampled from the standard normal distribution.

B.2 Performance under alternatives
We now turn to study the behavior of the heteroscedasticity model and associated ξ E test statistic when the null hypothesis is false (i.e., when Eqn 14 is not the data generating model). We consider two different data generating models: (i) a model with GxE and homoscedastic residuals, generated according to Eqn 1 (ii) a model with no GxE but with heteroscedastic residuals, generated according to Eqn 9. In these analyses, we aim to probe whether the test statistic allows us to reject H 0 under different configurations of the relevant data-generating parameters. We first study a circumstance in which the generating model is GxE with homoscedastic residuals. We use Eqn 1 for our data-generating model and suppose that e i is white-noise (i.e., e i ∼ N(0, σ 2 )) as opposed to heteroscedastic as a function of E i . In particular, we set σ 2 = 1. We focus on two tests: a test of whether π 1 = 0 and a test of whether ξ E = 0. Data are analyzed with the environmental heteroscadasticity model (Eqn 9, which subsumes Eqn 1).
Results are shown in Table B.3. When both N and b 3 are relatively small, we are poorly powered to detect GxE and to reject the scaling hypothesis (i.e., both Pr π1 and Pr ξ E are relatively large). Importantly, when GxE is not detected, the ξ E test is moot. In contrast, power to detect GxE and to reject scaling are both excellent when N or b 3 is relatively large. Throughout we reject λ 1 = 0 for α = .05 at the appropriate FPR. We next consider the circumstance in which there is no GxE, but there are G and E main effects, and there is heteroscedasticity as a function of E so as to parallel Eqn 21. In particular, we generate data via with i ∼ N(0, (λ 0 + λ 1 E i ) 2 ). We use λ 0 = 2 and vary λ 1 between 0.15 and 0.3 (if λ 1 = 0, there is no heteroscedasticity). We focus on tests of π 1 = 0, λ 1 = 0, ξ E = 0. Results are shown in Table B.4. First, note that the appropriate FPR of .05 is obtained for π 1 = 0 at alpha=.05, irrespective of λ 0 and λ 1 . This indicates that heteroscedasticity has not induced the impression of GxE under these conditions. We can also uniformly detect that λ 1 = 0 suggesting that errors are in fact heteroscedastic. However, tests of ξ E are relatively weak here (e.g., power is 0.2 at best). Importantly, as the tests of π 1 appropriate indicate no GxE, the question of whether there is scaling GxE is moot.

C Software
Code to estimate the heteroscestacity model and to estimate ξ E is available in an R package. 3 In particular, the documentation illustrates functionality via simple simulations similar to those found in Section B.

D.1 HRS Data
The first wave of HRS data collection was in 1992, the most recent in 2018 (wave 13). We focus on N=11,586 respondents of European ancestry. Descriptive information for the environment and outcome are shown in Figure D.1. We use data from the RAND HRS files on BMI, see [40] for additional documentation. In particular, we consider the mean BMI over all waves. The mean BMI was skewed, so we also considered a transformed version (see Figure D.1). HRS respondents were born across a wide span of birthyears centered around 1940. To ensure that results aren't driven by small numbers of observations with extreme birth years, we analyzed those born between 1920-1965. All data (polygenic score, birthyear, and BMI) are standardized to have mean zero and unit variance for analysis. . We standardize respondent's average BMI taken over all waves; this mean BMI is skewed (skew equals 1.14). This is a common finding with respect to BMI (see, for example, the Figure in [14]). Thus, we also consider a version of this variable in which it has been transformed, via the Box-Cox transformation, to closely approximate a normal distribution. All variables (birthyear, polygenic score, and outcome) are standardized (to have zero mean and unit variance) prior to analysis. Bottom: Illustration of increasing variance in BMI distribution as a function of birth year for analytic sample.

D.2 UKB data
We used data from UK Biobank to conduct a series of SNP-level gene-by-environment association analyses. Genotyping and imputation procedures were performed by the original UK Biobank investigators, as described in a previous publication [41]. 4 In the present study, we applied additional thresholds to minimize potential confounding, removing participants with (i) low call rate, (ii) extreme heterozygosity, (iii) chromosomal aneuploidy, (iv) discordant self-reported and chromosomal sex, or (v) missing data for the relevant outcome and covariates. We further focused our analyses on unrelated participants of non-Hispanic European ancestry to minimize the threat of population stratification. Rather than perform a genome-wide scan of gene-by-environment interactions, we focused on our analyses on loci that have been previously associated with BMI. Specifically, we selected the 97 loci identified in a large genome-wide association study of body mass index that did not include UK Biobank [16]. We successfully identified and extracted the lead SNP for 96/97 (99%) of these loci. For imputed SNPs, dosages were converted to hard calls. Any converted hard calls with a certainty less than .90 were coded as missing.
As some UK Biobank participants had multiple observations for BMI, we computed the mean BMI for each individual. We applied a Box-Cox transformation to reduce skew. Box-Cox transformed BMI was then adjusted for sex, batch, and the first 40 principal components of ancestry, which we estimated ourselves [42]. All variables (SNP, birth year, and body mass index) were standardized to have mean zero and unit variance prior to analysis. The final analytic samples size was N=380,605.