## Abstract

Genotype-by-environment interaction (GxE) occurs when the size of a genetic effect varies systematically across levels of the environment and when the size of an environmental effect varies systematically across levels of the genotype. However, total variance in the phenotype may shift as a function of the moderator irrespective of its etiology such that the *proportional* effect of the predictor is constant. We expand the traditional GxE regression model to directly account for heteroscedasticity associated with both the genotype and the measured environment. We then derive a test statistic, *ξ*, for inferring whether GxE can be attributed to an effect of the moderator on the dispersion of the phenotype. We apply this method to identify genotype-by-birth year interactions for Body Mass Index (BMI) that are distinguishable from general secular increases in the variance of BMI or associations of the genetic predictors (both PGS and individual loci) with BMI variance. We provide software for analyzing such models.

## 1 Introduction

Genotype-by-Environment interaction (GxE) studies test whether the genotype-phenotype association varies in magnitude across the range of a measured environmental variable and whether the environment-phenotype association varies in magnitude across the range of the measured genotype. Investigations of GxE have been of particular interest in the study of complex traits, leading to a variety of methods for estimating polygenic GxE [1, 2]. Development of these analytic approaches coincides with a large increase in the availability of genomic data, particularly in the context of population-based studies that also contain a broad range of phenotypes and environmental measures. Here, we consider circumstances where either the genetic or the environmental measure selected for GxE is associated with the magnitude of variance in the target phenotype. Such heteroscedasticity introduces a key interpretational problem to results from standard GxE approaches: any observed GxE may arise as an artifact of the fact that the total variance in the phenotype shifts across the range of the genetic or environmental moderator, irrespective of etiology. We resolve this problem by expanding the standard regression-based model for testing GxE in both single locus and polygenic score settings by introducing a simple yet flexible approach to directly model heteroscedastic residuals, and we develop a test statistic for inferring whether GxE can be plausibly attributed to a more general effect of the moderator on the dispersion of the phenotype.

All else equal, when the dispersion of a phenotype is directly controlled by the level of the moderator, the standard regression-based model will identify a significant interaction between that moderator and *any* correlate of the phenotype, whether genetic or non-genetic. For example, when the effectiveness, intensity, or dosage of a phenotype-altering intervention varies proportionally to baseline levels of the phenotype (e.g. when hours of individualized education are assigned based on previous year’s test scores; or when intensity of a weight-loss intervention is determined from baseline body mass index), then the variance of the phenotype is expected to increase in response to the intervention, and the unstandardized effect sizes for all correlates of baseline levels of the phenotype are expected to increase. Similarly, when a genetic variant confers greater plasticity in a phenotype, then the variance of the phenotype is expected to differ across alleles (such that the variant is identified as a variance quantitative trait locus, vQTL). We would then expect the variant to moderate the effect sizes for all correlates of that phenotype proportionally to one another. Only when a moderator acts preferentially on specific mechanisms of variation in the phenotype are interactions with various predictors expected to depart from expectations that follow from differences in total variance across the range of the moderator.

Figure 1 illustrates how heteroscedasticity across the range of a measured environmental moderator (E; left panel) or genetic moderator (G; right panel) can produce the impression of GxE. In the left panel, we illustrate a scenario in which a PGS is associated with a constant proportion of variance in the phenotype across the range of the measured environment. That the proportion of variance explained by the PGS at any given location on the x-axis is constant indicates that the expected percentile location within the distribution of the phenotype will also be constant across the range of *E* (assuming that the shape, but not dispersion, of the distribution remains constant across the range of *E*). Nevertheless, a clear fan-spreading effect is observed, incorrectly implying that the PGS is increasingly penetrant at higher levels of E. The right panel represents an analogous scenario with respect to a single genetic variant that moderates the variance of the phenotype (i.e a vQTL [3–7]). In this scenario, the environment is associated with a constant proportion of variance in the phenotype across the range of the genotype, but appears to become increasingly predictive of the phenotype as the effect allele count increases as a result of heteroscedasticity. By directly modeling this heteroscedasticity, we are able to distinguish between scenarios such as these—in which GxE is an epiphenomenon of variance modulation—from those representing more meaningful patterns of GxE.

We develop and validate a flexible heteroscedastic regression model for testing GxE. We use this model to elucidate the well-studied interaction between birth cohort and genetics linked to body mass index (BMI). We test whether both PGS-by-Birth Year and SNP-by-Birth Year interactions on BMI can be attributed to more general increases in the total dispersion of BMI over historical time. That is, we ask whether birth year is associated specifically with amplification of genetic risk for BMI or more generally with amplification of all variation in BMI.

## 2 Methods

In settings wherein focus is on a measured genotype, as indexed by a single-locus allele count or a polygenic score (PGS), the standard regression model for GxE is
where *Y _{i}* is the phenotype for person

*i*,

*G*is the measured genotype,

_{i}*E*is the measured environment, and

_{i}*e*is an error term. Under this model, the effect of

_{i}*G*on

_{i}*Y*is allowed to vary as a linear function of

_{i}*E*, as indexed by the regression coefficient

_{i}*β*

_{3}. However, a crucial assumption is that the residuals,

*e*, are homoscedastic across all levels of

_{i}*E*(e.g., ).

_{i}The standard regression model given by Eqn 1 can be expanded so as to relax the assumption of homoscedasticity of residuals across *E _{i}* by specifying the following

*environmental heteroscedasticity model*,

Here, *τ*_{1} is the main effect of the measured environment, *π*_{0} is the main effect of *G _{i}*,

*π*

_{1}is the GxE analogue to

*β*

_{3}in Eqn 1, λ

_{0}is the main effect of the error

*ϵ*(which we assume, without loss of generality given λ

_{i}_{0}, to have unit variance), and λ

_{1}is the coefficient used to index heteroscedasticity. Rather than simply adjusting standard errors and p values of the standard regression model (Eqn 1) using a robust estimation method, this model explicitly estimates heteroscedasticity as a model parameter. As we articulate next, directly modelling heteroscedasticity allows us to overcome specification bias that would result from estimating a standard homoscedastic GxE model, even with a robust estimator. We are purposeful in choosing a function form for the heteroscedasticity term,

*E*·

_{i}*ϵ*, that directly parallels that specified for the GxE term,

_{i}*G*.

_{i}· E_{i}The heteroscedasticity model is a flexible model allowing for both conventional GxE and heteroscedasticity. In order to test whether a more general pattern of heteroscedasticity as a function of *E _{i}* is sufficient to account for the observed GxE, we propose a restricted form version of this model that does not directly model GxE but instead allows for the total variance of the phenotype to vary as a function of

*E*. This

_{i}*scaling model*represents a scenario wherein GxE—in the sense of a significant estimate of

*β*

_{3}in Eqn 1—arises as a function of differences in the total variance of

*Y*as a function of

_{i}*E*. We choose this nomenclature given that the model emphasizes the importance of the

_{i}*scale*of the outcome. Under this model, the proportional contribution of

*G*is constant over the range of

_{i}*E*, but the scale of

_{i}*Y*systematically varies across the range of

_{i}*E*; i.e.,

_{i}*E*acts as a “dimmer” [8].

_{i}The scaling model for moderation of the variance of *Y _{i}* as a function of

*E*takes the form

_{i}Here, is an unobserved factor representing unexplained variation in *Y _{i}* incremental of

*E*. The

_{i}*b*

_{0}+

*b*

_{1}

*E*coefficient on the

_{i}*Y*

^{⋆}term produces heteroscedasticity in

*Y*as a function of

_{i}*E*. We let where

_{i}*G*is a measured genotype standardized to have unit variance and

*ϵ*is an unobserved error term (we assume it is also scaled to have unit variance). Finally, we identify the units of

_{i}*Y*

^{⋆}by specifying . The penetrance of the measured genotype is thus controlled via the relative magnitudes of

*h*and

*e*. Note, in particular, that in terms of , the penetrance of the measured genotype is constant. The test that we introduce relies on this property; i.e., when properly scaled, the penetrance of the measured genotype to

*Y*is constant. Suppose that

_{i}*b*

_{1}= 0. In that case,

*Y*is affected solely by environmental and genetic main effects. If

_{i}*b*

_{1}≠ 0, then the role of with respect to

*Y*varies as a function of

_{i}*E*, and as we elucidate next, so do the raw (but not relative) contributions of

*G*and

_{i}*ϵ*to

_{i}*Y*.

In Section A of the supplemental information (SI), we show that the environmental heteroscedasticity model reduces to the environmental scaling model when

We use this observation to derive the test statistic *ξ _{E}* and associated hypothesis test of whether an empirical estimate of GxE is distinguishable from the scaling model as

We provide parallel heteroscedasticity and scaling models for moderation of the variance of *Y _{i}* as a function of

*G*(leading to an analogous test statistic,

_{i}*ξ*) in Section A.3 of SI.

_{G}We conduct a variety of simulation studies (see Section C of SI). In summary, results of these simulation studies suggest that *ξ _{E}* is a reliable indicator of whether the scaling model is the basis for GxE. When a test of the statistic fails to reject

*H*

_{0}:

*ξ*=0 and there is significant GxE (

_{E}*π*

_{1}) observed, we cannot rule out that the scaling model is driving observed GxE. When the test suggests rejection of

*H*

_{0}:

*ξ*=0, alternative forms of GxE are implicated. Software to estimate the models considered here is described in Section D of SI.

_{E}## 3 Heteroscedasticity in context of BMI-linked genetics and birth year

We apply our heteroscedastic GxE model to the well-studied interaction between birth year and genetics linked to body mass index (BMI) [9–12], a literal textbook example (see Section 11.3 of [13]). We test whether previous reports of increasing penetrance of PGS for BMI for later-born birth cohorts can be plausibly attributed to more general increases in the total dispersion of BMI, including variation unique of its genetic etiology, across birth cohorts [14, 15]. We thus ask whether birth year is associated specifically with amplification of genetic risk for BMI or more more generally with amplification of all variation in BMI. We then go on to apply our heteroscedastic GxE model to examine individual locus-by-year interactions for BMI. To illustrate how our approach can be used to conduct SNP-level analyses, we apply our technique using the top hits from a recent BMI GWAS [16] to investigate whether they are associated with heteroscedastic GxE.

### 3.1 Polygenic Score Analysis

We consider GxE in the context of a long-running cohort study—the Health and Retirement Study (HRS) [17]—of US adults over 50 and their spouses (N=11,586). Respondents are interviewed every two years and data are collected on their early-life, anthropometrics, socioeconomic status, and mental and physical well-being. Previous examinations of increases in the penetrance of the BMI polygenic score across birth year have used this data [9–11]. We use a polygenic score for BMI [16] as constructed by the HRS [18]; further description of the data are in E of SI.

Results are presented in Table 2. We observe considerable evidence for stronger prediction of BMI by the PGS for more recently born individuals (i.e., *π*_{1} > 0); this is consistent with earlier findings indicating secular increases in the penetrance of a BMI PGS [9–11]. However, we also observe a birth year-linked increase in non-PGS variance in BMI (i.e., λ_{1} > 0), raising the possibility that the obtained GxE may be attributable to a more general pattern of increasing variance in BMI with birth year. In sex-pooled analyses, probabilities associated with tests of *H*_{0} : *ξ* = 0 are approximately 0.02. However, findings vary slightly by gender; especially for males, we are unable to reject the null of the scaling model. To illustrate the differences in implications in our sex-stratified analyses, we also conduct a posterior predictive check analysis [19], see Figure 2; there is less change in the penetrance of BMI PGS across birth years for males as compared to females. Using Eqn 43, we can decompose variation in *π*_{1} estimates. For females and males, 41% (95% CI: 10-57%) and 25% (95% CI: 0-54%) of the GxE (*π*_{1}) effect cannot be attributed to a simple scaling term. In an analysis of all respondents, this proportion is 37% (95% CI: 9-52%). Collectively these results indicate substantial evidence for heteroscedasticity as a function of birth year; the simple homoscedastic GxE model does not represent the data adequately. Further discussion of this analysis and findings related to genetic heteroscedasticity are discussed in the SI.

### 3.2 SNP Analyses

We conducted heteroscedastic regression analyses for 96 marker SNPs for the genome-wide significant loci identified in [16] using independent data from N=380,605 participants from UK Biobank (UKB; not used in the BMI GWAS [16]). Replication results for the main effects can be found in Table F.2. Note that the UKB is highly non-representative [20] potentially affecting the generalizability of our results.

The environmental heteroscedasticity parameter (λ_{1}) was positive and significant in all models, suggesting greater variance in BMI among those born later in the 20th century. With the full heteroscedasticity model, we identified four SNPs that exhibited significant gene-by-birth year effects, as indicated by the *π*_{1} parameter at the Bonferroni-adjusted alpha level. Parameter estimates from the full heteroscedasticity model for these SNPs are reported in Table 1, along with *ξ _{E}* and

*ξ*estimates obtained from the environmental and genetic heteroscedasticity models, respectively. Note that rs1558902 is a variant in the FTO gene locus. This finding replicates earlier results [21] which reported increasing penetrance of a variant with FTO over historical time using a different data set. However,

_{G}*ξ*was highly significant for each of these four SNPs, allowing us to reject the null that observed GxE is driven entirely by scaling.

_{G}With the full heteroscedasticity model, we identified six SNPs that exhibited significant effects on the dispersion of BMI, as indicated by the λ_{2} parameter at the Bonferroni adjusted alpha level. Parameter estimates for these SNPs are reported in the supplement (Table F.1), along with *ξ _{E}* and

*ξ*estimates obtained from the environmental and genetic heteroscedasticity models, respectively. Of the 4 Bonferonni significant variants for SNP-by-Year interaction described above, only the FTO variant (rs1558902) appears among the 6 Bonferonni significant variants for vQTL. This finding replicates earlier results [3] reporting that the FTO gene locus is associated with greater variability in BMI. The p value of

_{G}*ξ*for this variant is 5.1e-07, indicating the the above-reported SNP-by-birth-year interaction on BMI for FTO is not simply attributable to a general modulation of variance by rs1558902. Note that although the

_{G}*ξ*statistic is not significant for rs7903146, rs9925964, and rs2287019, these SNPs do not exhibit GxE at Bonferonni-corrected levels, rendering the

_{G}*ξ*moot.

_{G}To examine the total evidence of signal for GxE and genetic heteroscedasticity in the 96 SNPs, we provide a QQ plot of −1og10(p) for parameters indexing SNP-controlled heteroscedasticity of BMI vQTL and gene-by-birth year GxE interaction for Box-Cox transformed BMI in the left panel of Figure 3. We observe considerable departure of the observed −1og10(p) values relative to the expectation under the null, indicating strong enrichment of signal for both vQTL and GxE. In the right panel of Figure 3, we provide the scatter plots of parameter estimates for SNP main effects (*π*_{0}), GxE (*π*_{1}), for vQTL (λ_{2}) for Box-Cox transformed BMI. We observe moderate-to-strong associations between the main effect of a SNP and both its GxE and vQTL effects, indicating that, among these variants selected on the basis of marking genome-wide significant loci for BMI in an external meta-analysis, those with larger main effects on BMI exhibit larger interactions with birth year and more pronounced moderation of BMI variance. Results of [6] suggest that similar patterns may apply genome-wide.

Our results indicate that SNP moderation of birth year effects on BMI and SNP moderation of BMI dispersion were in the same direction as the SNP effects on BMI means. We observe considerable sign congruence between SNP main effects (*π*_{0}) and SNP-by-Birth-Year interaction (*π*_{0}) for 3 out of 4 SNPs exhibiting Bonferonni significant *π*_{1} estimates and for 71 out of 96 of all SNPs tested. We also observe considerable sign congruence between SNP main effects (*π*_{0}) and vQTL effects (λ_{2}) for 6 out of 6 SNPs exhibiting Bonferonni significant λ_{2} estimates, and for 75 out of 96 of all SNPs tested.

Finally, we note that the main effect of birth year (*τ*_{1}) was approximately −0.058 in all models (all associated probabilities less than 1e-250), indicating that later-born UK Biobank participants tend to have lower BMI. Given substantial epidemiological evidence from representative samples indicating increasing BMI with birth year, we speculate that this negative association may be driven by selection bias in the UKB sample [20]. However, note that, despite decreasing mean BMI with birth year in this sample, we observe increasing variance in BMI with birth year (as indicated by the positive λ_{1} parameter), and increasing penetrance of SNPs with birth year (as indicated by strong sign congruence between *π*_{0} and *π*_{1} parameters). This further suggests that variance moderation and SNP-by-Birth Year interactions are unlikely to have resulted from positive associations between BMI mean and variance.

## 4 Discussion

When GxE is detected using the conventional regression-based model, it is common to interpret the interaction in terms of the specific genetic and environmental measures included in the interaction model. However, GxE may arise more generally when the genetic or environmental predictor moderates the total amount of variation in the phenotype. When the total amount of variation in a phenotype varies across the range of an environmental measure, traditional homoscedastic GxE models will detect interactions between that measure and all other correlates of the phenotype, both genetic and non-genetic. Here, we have delineated an expanded *heteroscedastic* GxE regression model that explicitly allows for heteroscedasticity in the phenotype as a function of the genetic and environmental measures. We use this model to derive a test statistic, *ξ*, which compares this heteroscedasticity model to a simpler *scaling model* in which GxE arises from more general differences in phenotypic variance, irrespective of etiology. When the scaling model holds, differences in phenotypic variance across the range of the genetic or environmental moderator induce a form of GxE that is only apparent in unstandardized units; the proportional contribution of the remaining predictors to phenotypic variance remains constant across the range of the moderator.

We provide an application of the heteroscedastic regression model to empirical data, replicating previous observations that that genetic predictors of BMI have become increasingly penetrant in recent years. However, we additionally observed that the total variance in BMI, including variation not associated with the genetic predictors, increased with participant birthyear. We found that, amongst males within the sample, the interaction between PGS and birth year was indistinguishable from a scaling model in which the total variance in BMI increases over birth year. For males, although the unstandardized variance in BMI accounted for by the PGS increased with birth year, the variance in BMI unaccounted for by the PGS also increased, such that the proportion of variance accounted for in BMI did not significantly vary with birth year. While we reject the null that the scaling model produced data for females, we still estimate that a large proportion of the observed GxE is due to scaling. Other work has considered the potential moderation of genetic risk for BMI as a function of diet [22, 23], other lifestyle characteristics [24], and educational attainment [25]. The issues discussed here may be a relevant feature of such empirical work; i.e., environmentally-linked heteroscedasticity may play a role in driving some previously observed GxE findings.

In instances in which the units and scaling of the phenotype are intrinsically-meaningful, biologically-meaningful, or policy-relevant, GxE that is observed on the unstandardized scale may be of high substantive and practical relevance. However, even when GxE arising from heteroscedasticity is deemed meaningful, it may still be important to discern whether interactions are specific to the variables selected for GxE or representative of a more general pattern that would be obtained for many correlates of the phenotype. It will also stil be important to understand whether the moderator is associated with differences in the percentile location of expected values along the distribution of the phenotype, or simply with differences in raw values (e.g. has a policy specifically increased the income of a particular subgroup, or increased the income of all subgroups, such that relative income of the subgroup of interest remains constant?). The heterosedastic regression models introduced here can be used to answer these questions.

Issues surrounding the scaling of the phenotype not fully explored here have the potential to have substantive effects on GxE results. For instance, skew in the distribution of the phenotype may produce both GxE and heteroscedasticity by inducing a dependency between the mean and the variance of the distribution of the phenotype. Such forms of GxE and may altogether dissappear under the appropriate tranformation, an issue that geneticists have be aware of since the days of Fischer [26]. Importantly, heteroscedasticity is not always simply an epiphenomenon of non-normality. For instance, in our empirical analysis of BMI, we report results of GxE analyses after a normalizing (Box-Cox) transform of BMI, and continue to detect both GxE and heteroscedasticity.

GxE research faces a number of conceptual and statistical challenges [8]. Thoughtfully and rigorously engaging with these challenges is particularly salient given the substantial recent increases in the availability of both genetic resources [27] and computational tools [28] for such research. Our proposed test is designed to help clarify the nature of potential GxE findings and to shed additional light on the processes contributing to variation in the phenotype; in particular, we can assess the extent to which the genetic etiology is relatively stable across the range of the environmental measure. Using the model and test statistic presented here, researchers can test for whether variance in the phenotype that is not explained by the genetic predictor shifts across the range of the environment and whether a scaling model may account for the obtained pattern of GxE. When the scaling model holds, we would suggest that the environment largely acts as a “dimming” mechanism on phenotypic variation [8] without altering the proportional contribution of genetic variation to the phenotype.

## Supplemental Information (SI)

### A Methods

#### A.1 Description of the Heteroscedastic Regression Model and Derivation of *ξ*

The *environmental heteroscedasticity model* is specified as

Here, *τ*_{1} is the main effect of the measured environment, *π*_{0} is the main effect of *G _{i}*,

*π*

_{0}is the GxE directly analogous to

*β*

_{3}in Eqn 1, λ

_{0}is the main effect of the error

*ϵ*(which we assume, without loss of generality given λ

_{i}_{0}, to have unit variance), and λ

_{1}is the coefficient used to index heteroscedasticity.

This model will produce very similar unstandardized estimates of GxE as the standard GxE model given by Eqn 1. In other words, *π*_{1} in Eqn 8 is directly comparable to *β*_{3} in Eqn 1. However, the heteroscedasticity model further models whether variance in the phenotype that is not accounted for by the genetic predictor (*G _{i}*) also varies as a function of the environment (

*E*) in the form of an

_{i}*E*interaction, as captured by the λ

_{i}ϵ_{i}_{1}parameter. Importantly, when

*G*only index portions of total genetic etiology of a trait [1–3], the

_{i}*E*interaction may result from GxE with respect to the genetic factors not indexed by

_{i}ϵ_{i}*G*.

_{i}In order to test whether a more general pattern of heteroscedasticity as a function of *E _{i}* is sufficient to account for the observed GxE, we propose a restricted form version of this model that does not directly model GxE but instead allows for the total variance of the phenotype to vary as a function of

*E*. This

_{i}*scaling model*represents a scenario wherein GxE, in the sense of a significant estimate of

*β*

_{3}in Eqn 1, arises as a function of differences in the total variance of

*Y*as a function of

*E*. The scaling model for moderation of the variance of

_{i}*Y*as a function of

_{i}*E*takes the form

_{i}Here, *E _{i}* is a measured environment and is an unobserved factor representing unexplained variation in

*Y*incremental of

_{i}*E*. We introduce a scaling transformation that produces heteroscedasticity in

_{i}*Y*as a function of

_{i}*E*via the

_{i}*b*

_{0}+

*b*

_{1}

*E*

_{i}coefficient on the

*Y*

^{⋆}term. We let where

*G*is a measured genotype standardized to have unit variance and

*E*is an unobserved error term (we assume it is also scaled to have unit variance). Finally, we set the scale of

_{i}*Y*

^{⋆}by specifying . The penetrance of the measured gentoype is thus controlled via the relative magnitudes of

*h*and

*e*. Note, in particular, that in terms of , the penetrance of the measured genotype is constant. The test that we introduce relies on this property; i.e. that when properly scaled, the penetrance of the measured genotype to

*Y*is constant in this model. Suppose that

_{i}*b*

_{1}= 0. In that case,

*Y*is affected solely by environmental and genetic main effects. If

_{i}*b*

_{1}≠ 0, then the role of with respect to

*Y*varies as a function of

_{i}*E*, and as we elucidate next, so do the raw (but not relative) contributions of

*G*and

_{i}*ϵ*to

_{i}*Y*.

Substituting Eqn 10 into Eqn 9, and setting , yields

It can be seen that the environmental scaling model (Eqn 13) constitutes a constrained form of the environmental heteroscedasticity model (Eqn 8). Both equations take the same form in terms of *E* and *G*. However, whereas the coefficients in Eqn 8 are unconstrained, these coefficients are linked to one another in Eqn 13 via the the ratio *h/e* which, as noted above, is fixed under the scaling model. Thus, Eqn 8 is equivalent to Eqn 13 when
because, under Eqn 13
i.e.

We use Eqn 14 as the basis for a hypothesis test. In particular, we consider the test statistic

When *ξ _{E}* = 0, the environmental heteroscedasticity model is indistinguishable from the simpler; scaling model, whereas when

*ξ*is differs from 0, the environmental scaling model is rejected.

_{E}#### A.2 Estimation

We estimate Eqn 8 using maximum likelihood by assuming *ϵ _{i}* ~ Normal(0, 1).We assume that

*Y*; is mean-centered and that both

*G*and

_{i}*E*are non-random. Consider and . We first have since . We then have since we assume . If we write note that both and are functions of

_{i}*θ*. Estimation of Eqn 8 is based on the likelihood, where

*ϕ*is the normal density and we use the above expressions for and . We can thus solve for θ = {τ

_{1},

*π*

_{0},

*π*

_{1}, λ

_{0}, λ

_{1}} using ML based on

In particular, we obtain estimators via

To obtain ML estimates, it is convenient to also have the gradient. Here, we have where , , and

Using these estimates, we can compute a test statistic. Replacing the values in Eqn 15 with estimates from Eqn 8, we propose to test

If *H*_{0} holds, the underlying form of GxE cannot be distinguished from the scaling phenomenon in which the total variance in *Y _{i}* varies as a function of E. Alternatively put, failure to reject

*H*

_{0}constitutes a failure to reject the hypothesis that the

*proportional*contributions of

*G*and

_{i}*ϵ*to

_{i}*Y*do not vary with E.

_{i}*H*_{0} can be tested via a Wald test for nonlinear restrictions. To test this nonlinear restriction, we consider (see Ref [4], Eqn 6–29)

Given that we are testing a single restriction, W has a χ^{2} distribution with one degree of freedom in large samples. To estimate , we use the Hessian of Eqn 23 (see B) and the delta method. Computationally, we do this via [5]. Note also that conventional tests of *π*_{1} and λ_{1} can be used to offer insight into the existence of GxE and potential heteroscedasticity; we make use of such tests below.

#### A.3 Extensions

##### A.3.1 Decomposition of *π*_{1}

We can decompose the *π*_{1} estimate in Eqn 8 in a way that helps elucidate the contribution of heteroscedasticity to observed GxE. Consider an expanded version of the environmental scaling model (Eqn 9) wherein we add an explicit GxE term:

This expanded scaling model (in conjunction with Eqn 10) is equivalent (same number of free parameters and same model fit) to the environmental heteroscedasticy model, but decomposes the GxE effect into a component that is accounted for by scaling and a component unaccounted for by scaling (note that all environmental heteroscedasticity is assumed to operate through scaling). We can therefore equate Eqn 36 and Eqn 8 and recover the following system of equations:

Using estimates from Eqn 8 for the quantitities on the right, we can solve for the parameters on the left. In particular, we can decompose *π*_{1} into *a*_{2} and *b*_{1}*h*. The term *a*2 represents the degree to which observed data depart from the scaling model. Moreover, we can calculate the ratio
to describe the the magnitude of this departure as a proportion of the total GxE effect.

##### A.3.2 Genetically-linked heteroscedasticity

A version of the heteroscedasticity model presented in Eqn 8 above can be constructed so as to relax the assumption of homoscedasticity of residuals across *G _{i}* as follows

Here λ_{2} is the coefficient used to index heteroscedasticity as a function of *G _{i}*. We assume homoscedasticity of residuals as a function of

*E*by omitting λ

_{i}_{1}. We refer to Eqn 44 as a

*genetic heteroscedasticity model*. When a single variant is entered for

*G*, this model consitutes a vQTL model that includes a term for GxE. Note that this model is equivalent to the model presented in Eqn 8, but with

*G*entered for

_{i}*E*and

_{i}*E*entered for

_{i}*G*.

_{i}The scaling model for moderation of the variance of *Y _{i}* as a function of

*G*takes the form and

_{i}As in the scaling model for moderation the variance of *Y _{i}* as a function of

*E*,

_{i}*ϵ*is assumed to have unit variance. In terms of , the effect of

_{i}*E*constant across

_{i}*G*. However, in terms of

_{i}*Y*, the effect of

_{i}*E*changes across

_{i}*G*as result of heteroscedasticity across

_{i}*G*.

_{i}It follows that the hypothesis test comparing the GxE model allowing for heterocedasticity as a function of *G _{i}* (given by Eqn 44) to its reduced scaling model (given by Eqns 45 and 46, is

Finally, we can fit a *full heteroscedasticity model* that allows for moderation of residual variance in *Y _{i}* as a function of both

*G*and

_{i}*E*as follows

_{i}This model may be particularly useful when there is nonzero gene-environment correlation between the two moderators, i.e. | *r*(*G _{i},E_{i}*) |> 0, such that

*G*and

_{i}*E*can be mutually controlled when modelling heteroscedasticity. However, to avoid added complexity, we recommend that comparisons with the scaling models rely on the

_{i}*ξ*and

_{E}*ξ*tests from the separate environmental heteroscedasticity and genetic heteroscedasticity models, respectively.

_{G}### B The Hessian

We use the Hessian for construction of the covariance matrix of the estimation errors for the model parameters under the null model. The constituent parts of the Hessian are as follows.

### C Simulation Study

We now consider both the null distribution of the test statistic under different configurations of the relevant parameters from the scaling model (Eqn 13) and the performance of our test statistic in the case of generating models that do not conform to the scaling model (i.e. when Eqn 1 is used to generate a model with GxE and homoscedastic residuals, or Eqn 8 is used to general a model with heteroscedastic residuals but no GxE). We primarily describe the simulation models in the context of the heteroscadasticity and scaling models in which the measured environment, *E _{i}* moderates the residual variance of the phenotype,

*Y*, as given by Eqns 8 and 13. However, as described in A.3, these models are equivalent to those in which the measured phenotype,

_{i}*G*moderates the residual variance of

_{i}*Y*(after

_{i}*G*and

_{i}*E*are substituted for one another). Thus simulation results apply to both scenarios.

_{i}#### C.1 Distribution of the test statistic under *H*_{0}

We first consider the behavior of the test statistic when the null hypothesis is true. We generate data according to the *scaling model* (Eqn 13), estimate the *heteroscedasticity model* (Eqn 8), and then calculate the *ξ _{E}* test statistic and associated Wald-based probability. We start by illustrating that the appropriate transformation of

*ξ*has a χ

_{E}^{2}distribution. To do this, we generate 10,000 datasets from the scaling model (Eqn 13) with

*b*

_{0}= 1,

*b*

_{1}= 0.15,

*h*= 0.5,

*N*= 5000. We compute

*ξ*and then transform it to

_{E}*W*via Eqn 35. Results are in Figure C.1. The null distribution of

*W*neatly aligns with the density of χ

^{2}(1). Below, we report on

*ξ*given that it is a relatively intuitive function of the estimated parameters but also exploit the fact that

_{E}*ξ*can be mapped to W and inference can be done via reference to χ

_{E}^{2}(1).

We next considered the behavior of *ξ _{E}* under the null, for configurations of the relevant parameters in the scaling model (Eqn 13) meant to capture the relevant features of an analysis in which

*G*is a PGS. We considered variation in four parameters. We used three different sample sizes (1000, 5000, 10000). We used two values respectively for

*b*

_{0}(0.8, 1) and

*b*

_{1}(0.05, 0.1). Finally, we used two values for

*h*(0.1 and 0.5). The values of h suggest weakly (0.1) to strongly (0.5) penetrant genetic predictors; they were chosen to represent the range of potential PGS predictors given existing technology [1]. Recall that the value of

*E*is fixed as a function of h. Note that we considered two different distributions for

*E*; we sampled

*E*from both the standard normal and the uniform distributions (where the uniform distribution was scaled to have the same standard deviation as the standard normal). As polygenic scores for complex traits are expected to be normally distributed [6],

*G*was sampled from the standard normal distribution. To help develop intuition, suppose

*b*

_{0}= 0.8,

*b*

_{1}= 0.05, and

*h*= 0.5. In that case, compared to , a ratio of 1.74 (i.e., there is heteroscedasticity). Finally, it is important to note that the scale of

*ξ*is dependent on several features (e.g. scale of

_{E}*G*and

*E*); patterns shown here are informative about the behavior of

*ξ*but the values themselves do not necessarily generalize to other cases (e.g., if

_{E}*E*or

*G*does not have unit variance). In contrast, W and associated tests are invariant to scaling of

*G*and

*E*, but sensitive to sample size.

Results are shown in Table C.1. Note that the estimates of *ξ _{E}* are unbiased (i.e., the means are nearly zero). Further, as we would anticipate, the false positive rate (FPR) of the test is insensitive to the parameters; note also that the mean probability associated with the test of

*ξ*via Eqn 35 is near 0.05, as we would expect when the null is true.

_{E}^{1}

In contrast, the variance of the test statistic (and, thus, statistical power) depends on the sample size and simulation parameters. In particular, note the decline in the SD of the test statistics as a function of the sample size. Statistical power will also depend upon h; this test will have less power with less penetrant genetic predictors.

We next considered the behavior of of *ξ _{E}* under the null for configurations of parameters meant to capture the relevant features of an analysis in which

*G*is an allele count for a single locus (e.g. a SNP). Results are in Table C.2. The table is similar to Table C.1, but where we now consider allele counts based on variants simulated under a binomial distribution with different minor allele frequencies (denoted p). Based on Extended Data Figure 2 in [7], we chose effect sizes of

*h*

^{2}∈ {.05%, .1%}. Note that after generating

*G*under a binomial distribution, it was standardized, such that h can be interpreted as a standardized effect size. We examine behavior using a sample size of

*N*= 10000, which might be used for confirmatory tests of a select number of variants previously identified in a larger independent GWAS, and a sample size of

*N*= 1000000, which might be used for genome-wide discovery. Simulation results indicate that the heteroscedasticity model and the

*ξ*statistic behave as expected across the range of conditions examined.

_{E}#### C.2 Performance under alternatives

We now turn to study the behavior of the heteroscedasticity model and associated *ξ _{E}* test statistic when the null hypothesis is false (i.e., when Eqn 13 is not the data generating model). We consider two different data generating models: (i) a model with GxE and homoscedastic residuals, generated according to Eqn 1 (ii) a model with no GxE but with heteroscedastic residuals, generated according to Eqn 8. In these analyses, we aim to probe whether the test statistic allows us to reject Ho under different configurations of the relevant data-generating parameters.

We first study a circumstance in which the generating model is GxE with homoscedastic residuals. We use Eqn 1 for our data-generating model and suppose that *e _{i}* is white-noise (i.e.,

*e*~ N(0,

_{i}*σ*

^{2})) as opposed to heteroscedastic as a function of

*E*. In particular, we set

_{i}*σ*

^{2}= 1. We focus on two tests: a test of whether

*π*

_{1}=0 and a test of whether

*ξ*= 0. Data are analyzed with the environmental heteroscadasticity model (Eqn 8, which subsumes Eqn 1).

_{E}Results are shown in Table C.3. When both *N* and *b*_{3} are relatively small, we are poorly powered to detect GxE and to reject the scaling hypothesis (i.e., both Pr_{π1} and Pr_{ξE} are relatively large). Importantly, when GxE is not detected, the *ξ _{E}* test is moot. In contrast, power to detect GxE and to reject scaling are both excellent when

*N*or

*b*

_{3}is relatively large. Throughout we reject λ

_{1}= 0 for

*α*= .05 at the appropriate FPR.

We next consider the circumstance in which there is no GxE, but there are *G* and *E* main effects, and there is heteroscedasticity as a function of *E* so as to parallel Eqn 20. In particular, we generate data via
with *ϵ _{i}* ~ N(0, (λ

_{0}+ λ

_{1}

*E*)

_{i}^{2}). We use λ

_{0}= 2 and vary λ

_{1}between 0.15 and 0.3 (if λ

_{1}= 0, there is no heteroscedasticity). We fit the environmental heteroscedasticity model (eqn 8) and focus on tests of

*π*

_{1}=0 (which is true under the generating model), λ

_{1}=0 (which is false under the generating model), and

*ξ*=0 (which is false under the generating model).

_{E}Results are shown in Table C.4. First, note that the appropriate FPR of .05 is obtained for *π*_{1} = 0 at alpha=.05, irrespective of λ_{0} and λ_{1}. This indicates that heteroscedasticity has not induced the impression of GxE under these conditions. We can also uniformly detect that λ_{1} =0 suggesting that errors are in fact heteroscedastic. However, tests of *ξ _{E}* are relatively weak here (e.g., power is 0.2 at best). Importantly, as the tests of

*π*

_{1}appropriate indicate no GxE, the question of whether there is scaling GxE is moot.

### D Software

Code to estimate the heteroscestacity model and to estimate *ξ _{E}* is available in an R package.

^{2}In particular, the documentation illustrates functionality via simple simulations similar to those found in Section C.

### E Data

#### E.1 HRS Data

The first wave of HRS data collection was in 1992, the most recent in 2018 (wave 13). We focus on N=11,586 respondents of European ancestry. Descriptive information for the environment and outcome are shown in Figure E.1. We use data from the RAND HRS files on BMI, see [8] for additional documentation. In particular, we consider the mean BMI over all waves. The mean BMI was skewed, so we also considered a transformed version (see Figure E.1). HRS respondents were born across a wide span of birthyears centered around 1940. To ensure that results aren’t driven by small numbers of observations with extreme birth years, we analyzed those born between 1920–1965. All data (polygenic score, birthyear, and BMI) are standardized to have mean zero and unit variance for analysis.

#### E.2 UKB data

We used data from UK Biobank to conduct a series of SNP-level gene-by-environment association analyses. Genotyping and imputation procedures were performed by the original UK Biobank investigators, as described in a previous publication [10].^{3} In the present study, we applied additional thresholds to minimize potential confounding, removing participants with (i) low call rate, (ii) extreme heterozygosity, (iii) chromosomal aneuploidy, (iv) discordant self-reported and chromosomal sex, or (v) missing data for the relevant outcome and covariates. We further focused our analyses on unrelated participants of non-Hispanic European ancestry to minimize the threat of population stratification.

Rather than perform a genome-wide scan of gene-by-environment interactions, we focused on our analyses on loci that have been previously associated with BMI. Specifically, we selected the 97 loci identified in a large genome-wide association study of body mass index that did not include UK Biobank [11]. We successfully identified and extracted the lead SNP for 96/97 (99%) of these loci. For imputed SNPs, dosages were converted to hard calls. Any converted hard calls with a certainty less than .90 were coded as missing.

As some UK Biobank participants had multiple observations for BMI, we computed the mean BMI for each individual. We applied a Box-Cox transformation to reduce skew. Box-Cox transformed BMI was then adjusted for sex, batch, and the first 40 principal components of ancestry, which we estimated ourselves [12]. All variables (SNP, birth year, and body mass index) were standardized to have mean zero and unit variance prior to analysis. The final analytic samples size was N=380,605.

### F Supplemental Resultse

#### F.1 Polygenic score analysis

To better illustrate the differences in the sex-stratified analyses, we consider a graphical analysis inspired by the notion of posterior predictive checks [13]. Using estimates from the empirical data, we construct simulated data sets using Eqn 13 and Eqn 1 and the HRS data. We then consider the predicted percentage of phenotypic variance explained by the genetic predictor across birthyears. Based on Eqn 8, we compute this percentage as

Results of this graphical analysis are in Figure 2. For males (top row), the empirical data (in red) produce a pattern that falls well within the distribution of results from data simulated from the scaling model, but departs more appreciably from results from data simulated under the conventional homoscedastic GxE model. The proportional contribution of the PGS to BMI does not significantly increase with birth year for males. For females, there is more change in penetrance than would be anticipated under the scaling model but less change than would be anticipated under the conventional homoscedastic GxE model. As compared to males,the proportional contribution of the PGS to BMI significantly increases with birth year for females. Thus, for females, the full heteroscedasticity model, in which the contributions of PGS and non-PGS factors independently shift with birth year, may be the most appropriate analytic tool.

We also examined the potential for genetic heteroscedasticity using the genetic heteroscedasticity model (Eqn 44). We observe evidence for moderation of residual variance in BMI by the BMI PGS, as indicated by the λ_{2} parameter. Interestingly, this result holds even for the Box-Cox transformed version of BMI, suggesting that the heteroscedasticity is not entirely an artifact of mean-variance associations induced by the skewed distribution of BMI. One interpretation of these results is that the polygenic score for BMI also captures some degree of “plasticity” (in the sense of [14]). Results of the *ξ _{G}* tests (see final column of Table 2) are similar to those of the

*ξ*in that we cannot reject the null of the scaling model for the male respondents.

_{E}#### F.2 SNP analysis

Tables F.1 and F.2 contain results discussed in main text.

## Acknowledements

The authors would like to thank Dan Benjamin, Dalton Conley, Michel Nivard, Paul Rathouz, Mijke Rhemtulla, and Patrick Turley for helpful comments on an early version of this manuscript. This research was conducted using the UK Biobank Resource (Application No. 36046). This work was supported in part by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1656518 (ST), by the Institute of Education Sciences under Grant No. R305B140009 (ST), by NIH grants R01MH120219, R01AG054628, and R01HD083613 (EMTD), and by the Jacobs Foundation (EMTD). Any opinions expressed are those of the authors alone and should not be construed as representing the opinions of funding agencies. The authors declare no competing interests.

## Footnotes

↵1 We use Pr

_{β}to refer to probability associated with a test of the null hypothesis that some parameter*β*equals zero.↵3 See also https://www.ukbiobank.ac.uk/scientists-3/genetic-data/.