Abstract
Outcomes of interest to demographers—fertility; health; education—are the product of both an individual’s genetic makeup and his or her social environment. Yet Gene × Environment research (GxE) currently deploys a limited toolkit on the genetic side to study gene-environment interplay: polygenic scores (PGS, or what we call mPGS) that reflect the influence of genetics on levels of an outcome. The purpose of the present paper is to develop a genetic summary measure better suited for GxE research. We develop what we call variance polygenic scores (vPGS), or polygenic scores that reflect genetic contributions to plasticity in outcomes. The first part of the analysis uses the UK Biobank (N ∼ 326,000 in the training set) and the Health and Retirement Study (HRS) (N = 10,524) to compare four approaches for constructing polygenic scores for plasticity. The results show that two widely-used methods for discovering which genetic variants affect outcome variability fail to serve as distinctive new tools for GxE. Then, using the polygenic scores that do capture distinctive genetic contributions to plasticity, we analyze heterogeneous effects of a UK education reform on health and educational attainment. The results show the properties of a new tool useful for population scientists studying the interplay of nature and nurture and for population-based studies that are releasing polygenic scores to applied researchers.
1 Introduction
1.1 The growth of using genome-wide measures to study genetic moderation of environments
A wide range of research has shown how outcomes of interest to demographers—e..g, fertility; educational attainment; diseases with marked disparities such as obesity—are influenced by both an individual’s genetic makeup and his or her social environment. In turn, this research program, also called gene × environment (G×E) research, has undergone a large shift in how researchers summarize genetic variation.
Earlier research focused on how single or small sets of genetic variants moderated social environments to affect outcomes. These include studies of how polymorphisms in specific genes like MAOA, or the promoter region of 5-HTTP, moderate social conditions like stressful childhood experiences or parental abuse (e.g., Guo et al., 2008) (for a review, see Seabrook and Avison (2010)).
Two developments led researchers to abandon studying how small sets of genetic variants moderate environments. First was the failure of many single gene G×E studies to replicate (Duncan and Keller, 2011; Keller, 2014; Border et al., 2019). Second was growing evidence that most outcomes of interest to social and behavioral scientists—educational attainment; body mass index (BMI); depression—are “polygenic,” that is, the result of small contributions of many variants across the genome, rather than “monogenic”(Boyle et al., 2017). As a result, researchers have moved away from studying how single genes or small sets of genes moderate environments to using polygenic scores (PGS) that summarize genome-wide contributions. As Section 1.3 shows, PGS have become the workhorse tool that social scientists use when studying genetic moderation of environments. As a result, large social science cohort studies—the Health and Retirement Study (Ware et al., 2018); the National Longitudinal Study of Adolescent to Adult Health (Braudt and Harris, 2020); the Wisconsin Longitudinal Study; the Fragile Families and Child Wellbeing Study—have either already released or are considering releasing polygenic scores alongside their standard survey measures.
The proliferation of polygenic scores as the workhorse tool for studying genetic moderation of social environments raises the question: what genome-wide summary should researchers use? Until now, researchers have developed scores that are meant to predict the conditional mean of an outcome. The goal of the present paper is to expand social scientists’ methodological toolkit by presenting a new summary measure: scores summarizing genetic contributions to plasticity.
The remainder of the introduction proceeds as follows. First, we outline two distinct forms of genetic moderation of social environments (Section 1.2). The first is when the environment’s impact on some outcome depends on that individual’s genetic propensity to attain that same outcome–for instance, pre-K having a larger effect on academic outcomes among children with an already-high genetic propensity towards high educational achievement. We call this form of genetic moderation outcome moderation.
Second is when the environment’s impact on some outcome depends on that individual’s propensity towards variability in an outcome. We can call this form of moderation plasticity. We argue that the majority of GxE research implicitly uses the first model of genetic moderation (outcome moderation). By making these implicit choices explicit, we highlight that little existing research uses tools suited for measuring the second form of genetic moderation.
Next, we outline how variance polygenic scores (vPGS) may capture this second form of genetic moderation (Section 1.4). We show that researchers’ focus on using methods for detecting genetic contributions to plasticity have thus far used the methods to find “top hits,” or a limited set of single nucleotide polymorphisms (SNPs) significantly associated with variability. We show our paper fills a gap by using these methods to construct genome-wide summary measures useful for GxE research, complementing other recent calls for better methods to detect genetic moderation of social environments (Domingue et al., 2020).
1.2 Implicit models of genetic moderation: outcome moderation versus variability
Past typologies of different types of gene-environment interactions have focused on differences in the shape of the interaction (e.g., Boardman et al., 2014; Derringer et al., 2019). For instance, Boardman et al. (2014) and Derringer et al. (2019) each summarize three shapes of interactions: (1) diathesis-stress, where those with both a risky genotype and a highly-stressful environment have adverse outcomes; (2) vantage-sensitivity, where those with a less risky genotype and a low-stress environment have particularly good outcomes; and (3) cross-over or differential suspectibility, where those with a risky genotype have adverse outcomes in high-stress environments but also have some of the best outcomes in supportive environments (Boyce and Ellis, 2005; Ellis et al., 2011). Researchers investigating genetic moderation of environments distinguish between these shapes through both theory and the form the interaction effect takes—for instance, diathesis-stress having a crossover shape.
Yet shape is only one dimension of how genotypes can moderate environments. The second dimension, which occurs regardless of shape, is what changes we predict when the environment changes. Here, we review two types.
1.2.1 Outcome moderation
The first type of interaction is called outcome moderation. Outcome moderation occurs when a social environment either impedes or removes an impediment to the expression of a genetically-influenced outcome. For instance, people may vary in their genetic propensity to complete formal schooling (Lee et al., 2018). However, in certain societies, there may be limited access to schooling for the population or some subgroup within the population (e.g., access to higher education was limited for women for much of the twentieth century in the U.S. and elsewhere). If that constraint is removed, individuals’ genetic propensities towards higher education that had enjoyed no avenue for expression can then become manifest. In this case, we would expect a significant interaction in a model where a person’s years of schooling is regressed on (1) an indicator for the cohorts impacted by education reform and (2) a summary measure of a person’s genetic propensity to complete formal schooling. The coefficient between the genetic summary measure and reform would be null or smaller in the pre-reform years; it would become significant and positive during the post-reform years.1
As another example, economic changes in the U.S. have removed caloric constraints for much of the population. Genetic predispositions towards higher BMI now interact with an altered food environment (Guo et al., 2015; Conley et al., 2016). Those with a genetic predisposition towards higher BMI, which stems from a genetic architecture in part related to regulation of appetite and impulse control and in part related to metabolism (Locke et al., 2015), are more highly impacted by the new food environment.
1.2.2 Variability
The case of BMI, however, also highlights a different form of genetic variation that can interact with changes to the environment. Individuals may vary not only in their propensity towards higher or lower BMI, but also vary in their propensity towards changes in BMI in the face of environmental changes. Some individuals have genotypes that are less buffering of environmental changes. When the environment changes (in either direction), their BMI is likely to exhibit large changes. Other individuals have genotypes that are more buffering of environmental changes. When they enter a more calorie-rich or more calorie-restricted environment, their BMI is less likely to change because they adapt to that environment in ways that minimize changes, regardless of where they were on the BMI distribution at baseline. A genetic predisposition towards higher or lower levels of BMI might be very different than a genetic predisposition towards changes in BMI in the face of shifting environmental conditions.
This form of genetic moderation is sometimes called plasticity. Plasticity can take two forms. First is variation in within-individual plasticity, which is relevant for dynamic outcomes like BMI and depression that change as individuals progress through the life course. As we discuss in the Conclusion, estimating genetic contributions to within-person variability is complicated by the lack of large sample size datasets with both large-scale genotyping and repeated measures across individuals. Second, and more immediately tractable, is population-level variation in plasticity. Consider two subgroups within a population. In one subgroup, individuals have high polygenic scores for plasticity in BMI; in the other subgroup, members have low polygenic scores for plasticity. For the high plasticity subgroup, a counterfactual intervention—for instance, moving that individual from neighborhood A with safer spaces for physical activity to neighborhood B—would result in more variability in BMI within the subgroup (more dispersion in outcomes). For the low plasticity subgroup, counterfactual changes would produce little dispersion around the pre-counterfactual mean. Population-level variation in plasticity thus produces population-level variation in how shocks to the environment translate into changes in outcomes. Moreover, a single intervention— say a change in the education system or in norms surrounding the timing of fertility——may cause little change in the low plasticity group’s levels of or dispersion in education. In contrast, among highly plastic individuals, the levels of education or children ever born might remain unchanged but the dispersion around the mean would increase. That is, the same environmental intervention may generate changes in both directions that would go undetected in traditional GxE analysis.
1.3 Gene × environment research using genome-wide polygenic scores has largely focused on outcome moderation
The previous section showed that one form of genetic moderation of environments is outcome moderation: those with different propensities towards an outcome are differentially impacted by some environmental change. Yet in a particular context—changes to neighborhoods interacting with genotype to impact BMI; changes to education policy interacting with genotype to affect schooling—genetic predispositions towards greater variability may also play a role.
Yet researchers’ workhorse tool for studying genetic moderation of environments—polygenic scores for levels of an outcome—has inadvertently narrowed their focus to outcome moderation. Researchers use a three-step process when they develop and use these scores:
Step one: estimate separate linear regressions of some outcome (Y) in a large training sample, to develop weights that reflect each variant’s contribution to levels of that outcome
Step two: use the weights from step one to construct a polygenic score (PGS) in a separate sample
Step three: interact that polygenic score with some measure of “E” to study genetic moderation of environments
Researchers in step one have focused on genetic contributions to levels of an outcome, rather than genetic contributions to variability. Table 1, focusing on recent gene by environment studies that use polygenic scores, shows that the majority focus on how the impact of environments on some outcome vary among people with different propensities for that same outcome—e.g., the impact of neighborhood features on Type II diabetes having a larger impact on those with higher genetic propensities.
The table presents a (non-exhaustive) list of recent gene by environment studies that use polygenic scores as the measure of genotype. With the exception of the study in gray, all study outcome moderation, or how the impact of an environmental trigger or buffer on an outcome varies by an individual’s genetic propensity towards that same outcome.
With the exception of Domingue et al. (2017), who examine how a genetic risk score for well-being buffers the impact of the loss of a spouse on depression, nearly all studies examine outcome moderation. Furthermore, this focus on one form of genetic moderation is often implicit, with the researchers stating that they are studying gene by environment interactions, rather than made explicit, with the researchers stating that they are studying a particular type of gene by environment interaction. The failure to make the specific type of moderation explicit has led to missed opportunities to examine other forms of moderation.
1.4 Variance polygenic scores as a tool for examining new forms of genetic moderation
The implicit focus on one form of genetic moderation of social environments stems from the reliance on one tool for G×E research: polygenic scores trained to predict levels of an outcome. We follow others’ recent calls to expand social scientists’ toolbox for studying genetic moderation of environments. Recently, Domingue et al. (2020) discuss “dimmer-type” gene-environment interactions, which corresponds with outcome moderation, and “lens-type” gene-environment interactions, which take a different form.2 They argue that while social scientists often frame GxE research as wanting to study lens-type interactions, social scientists’ reliance on polygenic scores for levels of an outcome might impede their progress. As they put it: “The selection of PGS effects for examining lens-type GxE may be particularly challenging in that we construct PGSs from GWASs that only include main effects of SNPs. If the environmental context of the participants in the GWAS sample used to construct the PGS is similar to that in the test sample used to estimate GxE then it is unlikely to include SNPs that demonstrate lens-type patterns as the main effects of these SNPs will be close to zero”(p. 10). This call suggests that better tools for either variability moderation or “lens-type” moderation are genome-wide summary measures (PGS) constructed from weights that more closely mirror theory behind GxE.
Here, we present one approach: constructing genome-wide summary measures from models that measure genetic contributions to variability in outcomes.3 In the language of statistical genetics, these models are called “vQTL analyses,” or models for detecting variance-affecting loci. In turn, researchers have developed a variety of approaches for detecting genetic contributions to variability. But thus far, the researchers have only used these approaches to find the “top SNPs”—a few SNPs that have the lowest p-values in regressions performed separately for each SNP. They have not yet used the weights from these models to construct genome-wide scores for plasticity.
One of the earliest attempts at vQTL analysis was Yang (2012), who operationalize variability as a person’s Squared Z-score of a trait—the person deviates from the mean of an outcome in either direction. Wang et al. (2019) and others use the classic Levene’s test, which examines whether the error variance significantly differs across subgroups—in the genetics case, across the three subgroups (AA, AB, and BB) at a given variant. Yet these attempts can lead to false positives when trying to distinguish between variants that affect the mean of an outcome and variants that affect the variance.
Two methods aim to control for this mean-variance conflation. Conley et al. (2018) use sibling pairs to examine how variation in the sibling pair’s combined count of minor alleles at a locus contributes to that sibling pair’s standard deviation in the trait, controlling for the sibling pair’s mean levels of a trait. Young et al. (2018) decompose trait variance into two components–an “additive effect” and a “dispersion effect”—and argue that the latter provides a measure of “when a SNP has a variance effect beyond that which can be explained by a general mean-variance relationship”(p. 1613).4
There is a significant gap in the use of these methods to study gene-environment interplay. Researchers have used each method to find “top hit” loci that contribute to variability in traits like BMI (Yang, 2012; Conley et al., 2018; Young et al., 2018).5 Some such as Wang et al. (2019), Young et al. (2018), and Marderstein et al. (2020) have also interacted these highly significant single SNPs one by one with measures of social environments. No studies of which we are aware have explored whether the “variance weights” that these methods generate can be aggregated to produce what we call variance polygenic scores, or genome-wide summary measures of a person’s plasticity. vPGS can expand demographers’ toolbox for studying gene-environment interplay. Our study is the first to build and characterize the properties of this new tool.
1.5 Research goals/questions
What are best practices for building variance polygenic scores?
When we build these scores, do they reflect distinctive genetic contributions to variability in a trait, or are they too correlated with scores for levels of an outcome to serve as a new tool for gene-environment research?
Applying the scores to a real-world example (education reform in the UK), what forms of moderation do we see?
2 Methods
2.1 Estimating vPGS weights in the UK Biobank
To build the two types of scores for comparison—a typical PGS (hereafter: mPGS) for levels of an outcome; a vPGS for variability in an outcome—we use the UK Biobank, a dataset containing about 500,000 individuals from across the United Kingdom. The sample was limited to respondents who passed quality control and were of British ancestry, using information provided by the UK Biobank, leaving us with 408,219 in our final analytic sample.
This size of the UK Biobank allows us to divide the sample into training and test sets while still maintaining sufficient statistical power for fitting GWAS and vQTL. Training and test sets were produced by randomly sampling respondents. 80% of the British subsample of the UKB was included in the training set; the remaining 20% made up the test set.
We analyze four outcomes: height, body mass index (BMI), educational attainment, and number of children ever born, a measure of fertility. The inverse normal transformations of the outcomes were calculated. Traits were also z-scored to create a second set of dependent variables, used in the Squared Z-score analyses. Unless specified as z-scored, a trait/outcome should be assumed to be inverse normal transformed.
For each outcome, first a regression was run predicting the inverse normal of that outcome, such that the weights reflect the contribution of each genetic locus to the mean level of the outcome. These regressions were performed using the software PLINK (version 1.9), controlling for age, sex, their interaction, and the first 40 PCs.6 We refer to these regression weights as weights for Levels PGS, and they correspond to the traditional tools used in G×E research.
A set of second identical regressions predict the Squared Z-score, rather than inverse normal, of the outcome, corresponding with the method for vQTL analysis discussed in (Yang, 2012). Since the z-score is squared, values which are the same number of standard deviations above or below the mean will receive the same value. Thus, the regression predicts distance from the mean, rather than the mean-level itself, though, as we argue above, this will still be correlated with the mean. We refer to the weights and polygenic scores produced by these regressions as Squared Z.
Third, regressions were run for each outcome on the sibling subsample of the UK Biobank, which includes 22,657 white British sibling pairs. For each sibling pair, the intra-sibling mean and SD were calculated. We then residualized the SD with the mean and used this new residualized standard deviation as our outcome variable. Since each sibling pair was represented twice in the data, we used only one member of each sibling pair in the final regressions. We refer to the weights and polygenic scores produced by this method as Sibling SD.
Fourth, a Mean-Variance vQTL analysis using Levene’s test for variance heterogeneity was run using OSCA (https://www.cnsgenomics.com/software/osca) (Wang et al., 2019).The Levene’s test does not estimate the effect size and standard error, but rather assesses the equality of variances between sample groups (in this case, those that do and do not have a given allele). Thus, following (Zhang et al., 2019), OSCA re-scales the test statistics (p value) to effect size and standard error using Z-statistics. We refer to the weights and polygenic scores produced by this method as as Levene’s. Fifth, a Mean-Variance vQTL analysis using heteroskedastic linear mixed models was similarly used (Young et al., 2018). Their method produces additive (mean) and variance effects. It also allows us to derive what they term dispersion effects, which are variance effects that are independent of the mean effect. We use the weights produced by these dispersion effects in subsequent analyses, referring to them as HLMM.
Finally, to ensure that results comparing the different vPGS were due to true differences between the scores, and not due to differences that arise from the smaller sample size and lower precision in the sibling-based method, for every vQTL or GWAS analysis run on the full sample an analogous analysis was run on a randomly-chosen subsample, where the number of respondents was set to be equal to the number of sibling pairs in the UK Biobank.
2.2 Constructing vPGS in the Health and Retirement Study (HRS)
Using the weights from the previous step, we constructed vPGS in the Health and Retirement study. The HRS sample is restricted to (1) self-identified European Americans, who (2) pass the HRS preprocessing procedure and are within 2 standard deviations of the mean of the first two principal components of their racial/ethnic group. This leaves N = 10, 554 respondents. The replication code contains details on the outcome variable construction; most notably, since the HRS is time-varying with several waves, we took the most recently observed value of the outcome for each respondent.
2.3 Analytic approach
2.3.1 Relationship between mPGS/vPGS and levels of an outcome
We use three tools to explore whether vPGS can capture genetic contributions to variability in an outcome, distinct from genetic contributions to levels of an outcome.
First, we estimate the following linear regression, where i indexes a respondent, PGS indicates the levels PGS (mPGS) or a variance PGS (vPGS), and Y is levels of the outcome trait (converted to the standard normal scale). Xi includes the first 5 principal components (PCs). Our coefficient of interest is, β1—we expect the levels PGS to significantly predict levels of a trait.
In turn, SNPs are a mix of four types: (1) SNPs that affect neither levels of an outcome nor variance in an outcome, (2) SNPs that affect levels of an outcome but not its variance, (3) SNPs that affect variance in an outcome but not levels of an outcome, and (4) SNPs that affect both variance in and levels of an outcome. To serve as a useful new tool for G×E research, an ideal plasticity score will mostly contain SNPs of type three—those that affect variance without necessarily affecting mean. Therefore,, β1 when we regress levels of an outcome on a vPGS should ideally be close to zero and not statistically significant. For these regressions, our samples are the HRS and the held-out test set of the UKB.
Second, we examine whether these patterns of correlation after constructing the vPGS in each sample are also present in the underlying weights that summarize each SNP’s contribution. We use linkage disequilibrium score (LD) score regression (Bulik-Sullivan et al., 2015b) for two purposes. First, we use the technique to compare the heritability of levels of an outcome to the heritability of plasticity in that outcome measured using the four techniques discussed above (Squared Z; Levene’s test; HLMM; sibling SD). Then, we compare the underlying genetic correlations between (1) levels and variability for each outcome, (2) across outcomes in levels and variability (Bulik-Sullivan et al., 2015b) (for a social science application of genetic correlations, see: (Wedow et al., 2018))
The first analyses show that two of the polygenic scores for polasticity—one constructed using the Squared Z-score of an outcome; the other constructed using Levene’s test for variance heterogeneity—fail to summarize genetic contributors unique to variability in an outcome. However, two of the polygenic scores for plasticity—one summarizing “dispersion” effects; the other constructed from sibling variation—capture more distinctive genetic contributions. We focus on these two tools for the application we discuss in the next section.
2.3.2 Comparing mPGS versus vPGS as moderators of a UK education reform
We use these preferred plasticity scores to study heterogeneous effects of a large-scale education reform initiated in 1972 in England, Scotland, and Wales that extended how long students were legally required to stay in school from 15 to 16 years old (Barcellos et al., 2018). Using Barcellos et al. (2018)‘s regression discontinuity design, we evaluate the extent to which the two effective vPGS are able to detect different forms of genetic moderation of this educational shock than the standard levels polygenic scores. We examine two different cases using the same reform as the exogenous environmental context: one where, following Barcellos et al. (2018), we examine the moderating role that the genetic risk of obesity plays in the relationship between education on body size and another where we evaluate the influence of genetic plasticity on downstream educational outcomes. Both outcomes are affected by gene and environment interactions but differ in the kinds of GxE effects they exhibit: while the effect for body size is the result of outcome moderation, the latter results from plasticity.
More specifically, following Barcellos et al. (2018), we use 2SLS to first instrument whether someone stayed in school until 16 years of age (Educ16) with whether they were younger than 16 when the reform went into place (post reform), and were therefore legally required to stay the extra year. This residualized version of Educ16 is then interacted in the second step with BMI PGS to evaluate whether there is an the interaction between educational attainment and genes as they affect health outcomes. We also report a reduced form regression, where PGS is directly interacted with the post reform variable.
The main health outcome is Body Size - a weighted combination of BMI, waist-to-hip ratio, and body fat percentage. In a separate NBER preprint, Barcellos et al. (2019) identified the point in the Body Size distribution at which they should have the most power to detect an effect. We use this same distributional threshold to create an Above Threshold version of Body Size, the results for which can be compared to the continuous version of the outcome.
In a second set of analyses, we examine the role of genes in the impact of an additional year of education on downstream educational outcomes. Here, rather than instrumenting educational attainment, we look specifically at whether the effect of one’s genotype on education outcomes differs depending on whether one was born before or after the reform. This is akin to the reduced form regressions reported for BMI. We follow the previous literature (Barcellos et al., 2018), which found no effect of the reform on the likelihood of attending college, and focus on the outcomes of those who left school at the age of 18 or younger. We examine the effect of the reform on four educational outcomes, previously used in the literature. First, we consider whether the respondent left school at age 16 or later, since despite the reform some students still opted out of attending college through age 16 (Left School 16 or later). Second, we consider whether respondents achieved any certifications as a result of their education (Certification). For the last two outcomes, we explore whether they achieved specific certifications: O-levels or CSE (which were equivalent and later replaced by the GCSE in 1988) and A-levels.
Controls, for both sets of models, include a quadratic term for the number of days that passed from when the respondent was born until the time of reform (to factor out any time trends), dummy variables for the month born, sex, age at time of assessment in days, age squared, dummy variables for country of birth, the first 15 PCs, mPGS, the interaction between those PCs and Educ 16 (or, in the reduced form, Post Reform), and the interaction between mPGS with Educ16. Triangular kernel weights were used to assign more weight to observations closer to the reform and time trends were allowed to vary before and after the reform (Barcellos et al., 2018). Because we will show that the Squared Z-score and Levene’s test plasticity scores fail to capture variability distinct from mean effects, we do not present them in the main text but instead present them in the Supplementary Materials (Section S).
3 Results
3.1 How do the plasticity scores relate to levels of a trait?
The first question that arises when using plasticity scores for GxE research is: is the plasticity score simply capturing genome-wide contributions to levels of an outcome, rather than capturing genome-wide contributions to variability in an outcome? If the plasticity score looks very similar to social scientists’ standard tool for GxE research, it is less useful as a new tool for capturing distinctive forms of genetic moderation. Figure 1 summarizes the results of Model 1, or whether the plasticity score significantly predicts levels of an outcome. The left hand side shows the results in the smaller sample size HRS; the right panel the results in the larger sample size UK Biobank test set. Each bar represents one of the four outcomes of interest to demographers: height; BMI; education; number of children ever born (NEB).
The figure shows results from the regression specified in Equation 1 in the HRS (left panel) and UKB test set (right panel). The top panel shows that, as expected, the levels PGS predict levels of an outcome (though the relationship with fertility in HRS is weaker than for height, BMI, and education). Moving downwards, across both samples, the Squared Z plasticity score performs the least well in that the plasticity score significantly predicts levels of an outcome for all outcomes except for number of children ever born. The Sibling SD score and HLMM perform best in the HRS test set.
We see that, as expected, the levels PGS predict levels of an outcome. But in three out of the four traits, the Squared Z vPGS significantly predicts levels of a trait. In two of the four traits, the Levene’s test vPGS significantly predicts levels of a trait. In contrast, the sibling SD and HLMM vPGS were only significant for one out of the four traits in the HRS sample, though were significant for more traits in the UKB test set.
Overall, the results show that researchers hoping to use plasticity scores for gene-environment research should be careful to choose one of the tools that captures distinctive genetic contributions to plasticity apart from genetic contributions to an outcome’s mean. Online Supplement Section S.2 presents additional results, which include comparing the scores’ significance when we match the sample size of the non-sibling scores to the sample size in the sibling-based analyses.7
3.2 What is the genetic correlation between mPGS and vPGS?
The previous results show that when we aggregate weights from the different vQTL methods to produce a polygenic score for plasticity, some of the scores—most notably, the Squared Z vPGS and one based on results from Levene’s test—perform similarly to an mPGS in predicting levels of a trait. As a result, they are less useful as tools for examining certain forms of gene-environment interplay since they fail to capture distinctive genetic contributions to variability.
Here, we use LD score regression to examine two features of the genetic architecture of plasticity. First, we examine the univariate heritability of each of the four outcomes—first, levels of an outcome (replicating previous work) and second, plasticity in that outcome (extending that work). Table 2 shows the results. The first column, in line with past findings, shows the highest heritability for height followed by BMI, education, and finally number of children ever born. We see that plasticity as defined using the Squared Z score and Levene’s test have higher heritability, which the previous section shows could be related to capturing levels-relevant genetic contributions; both the sibling standard deviation method and dispersion effects from HLMM capture plasticity in a way that leads to estimates of zero heritability. Interestingly, focusing on the Squared Z estimates, we see that the rank order of heritability changes. That is, levels of height are the most heritable, followed by BMI, then education, and fertility is the least heritable. With the Squared Z method, we find plasticity in BMI to be more heritable than plasticity in height.
The cells with zero are heritabilities where, due to sampling variability, the heritability was low enough to be close to zero or negative.
Second, focusing on the Squared Z score that had non-zero heritability across outcomes, we examine the genetic correlation between (1) levels of each outcome (replicating past work by Bulik-Sullivan et al. (2015a)), (2) plasticity in each outcome (extending that work), and (3) levels and plasticity. Notably, these genetic correlations are prior to estimating the scores in a sample, so reflect a shared genetic architecture between contributors to levels of an outcome and contributors to variability in an outcome.
Figure 2 shows the genetic correlation between the levels PGS and the Squared Z vPGS for each of the outcomes. It shows that the weights for the levels PGS for that trait are significantly correlated with the weights for the Squared Z vPGS.
The x axis shows the weights for levels of an outcome (mPGS weights). The y axis shows the weights from the Squared Z score method. The correlation close to 1 on the diagonal shows that there is a high degree of overlap in the weights estimated using each method.
Figure 3 shows between-trait patterns of genetic correlation for (1) the levels PGS8 and (2) the Squared Z vPGS. The analysis investigates whether there are similar patterns of cross-trait genetic correlation in variability in addition to levels. The results show that the patterns are similar except for the relationship between BMI and height.
The left panel shows the bivariate genetic correlation between levels of each outcome variable. The right panel shows the bivariate genetic correlation between variability in that outcome, measured using the Squared Z-score vPGS. The two show nearly identical patterns, except for the relationship between BMI and height. While there is a negative genetic correlation between levels of BMI and height, there is a positive genetic correlation between variability in BMI and height.
The one exception is that levels of height and BMI are negative genetically correlated (in other words, those with genetic propensities to be taller also have genetic propensities towards lower BMI, replicating the relationship in Bulik-Sullivan et al. (2015a)) but plasticity in height and BMI is positively correlated. As we discuss in the Conclusion, this relationship deserves more attention in future research and could reflect differential sensitivity to environmental inputs to growth.
3.3 Summing up thus far: which vPGS can serve as new tools for GxE?
Taken together, the results show that the Squared Z vPGS and Levene’s test-based vPGS are less useful for social scientists looking for a new tool to examine gene-environment interplay. Both significantly predict levels of an outcome across four diverse traits (height; BMI; education; number of children ever born). The Squared Z-score also exhibits patterns of underlying genetic correlation similar to those between levels of a trait. In contrast, the sibling standard deviation method (Conley et al., 2018) and dispersion weights (Young et al., 2018) show better properties in capturing distinctive genetic contributions to plasticity.
Why might past research studying methods for vQTL have missed the ways in which certain methods fail to capture distinctive genetic effects? Section S.3 in the Online Supplement begins with the common way that researchers assess whether a method for detecting vQTLs overlaps with a method for detecting mean effects/normal QTLs: examining whether the two methods select similar SNPs as “top hits.”9 The results show that while the top hits comparison reveals some degree of overlap—for instance, the Squared Z score and Levene’s top hits display more overlap with the levels top hits than the other methods—this comparison might be too conservative. In particular, an mPGS and vPGS might not happen to have overlap in the SNPs with p values below a threshold but the two might have overlap in SNPs with non-zero weights that contribute to the final scores. Overall, the combined results show that social scientists interested in using vPGS as a new tool should look carefully at whether the vPGS is distinctive from, or nearly identical to, the mPGS for that outcome.
4 Using the vPGS to examine heterogeneous impacts of education reform
Having examined the properties of the different polygenic scores for plasticity, their distinctiveness from the standard tool for GxE research (mPGS), and their relationships to one another, we can use them to adjudicate between different mechanisms of genetic moderation. Specifically, we have argued that interactions between mPGS and the environment in predicting an outcome capture outcome moderation, while variance polygenic scores can capture a different form of heterogeneous effects.
With this in mind, we turn to using mPGS and different vPGSs in a practical example to explore which kind of genetic moderation is at play. We build upon the research of Barcellos et al. (2018). They investigate the impact of an educational reform that raised the required age of schooling from 15 to 16 years in England, Scotland, and Wales. Unlike measures like an individual’s own educational attainment or their parent’s educational attainment, which can lead to false-positive gene-environment interactions through confounding between the environmental shock and parent genotype (discussed in (Conley, 2016)), the reform’s timing is exogenous to genotype. It allows us to study the different forms of genetic moderation, as well as a chance to examine the performance of different vPGS measures in an applied example.
We evaluate two sets of outcomes. First, following Barcellos et al. (2018), we evaluate whether there was genetic moderation of the reform’s impact on health outcomes in the form of body size. If the form this moderation takes is outcome moderation, then the mPGS for BMI would significantly interact with the reform—the reform might have a larger impact on those with an already-low genetic propensity towards obesity (amplifying their advantage) or it might have a larger impact on those with a high genetic propensity (buffering their risk). Alternately, if the form this moderation takes is variability moderation (significant interaction between the vPGS and the post-reform indicator), the reform has larger impacts on those who, across many shocks, experience more swings in BMI.
Second, extending Barcellos et al. (2018), we evaluate whether there was genetic moderation of the reform’s impact on educational outcomes. Here, outcome moderation occurs if the reform has a larger impact on those with especially high or low genetic propensities towards educational attainment. Under the “education as the great equalizer hypothesis” (Barcellos et al., 2020), we might expect that those with the lowest educational polygenic score are the most impacted by the extra year of mandatory education. Variability moderation occurs if the reform has heterogeneous efffects on individuals with different underlying genetic plasticity.
4.1 Genetic moderation of the education reform’s impact on body size
For the body size models, presented in Table 3, the interaction between mPGS and being exposed to the reform is the only statistically significant result. None of the polygenic scores for plasticity show a significant interaction, with the exception of Squared Z-Score vPGS (Appendix Section S.4, which uncovers a marginally significant result (on the Above Threshold outcome)). This result is likely due to the high correlation between the Squared Z-Score vPGS and mPGS.
Impact of education reform on health outcomes
These results suggest that outcome moderation (rather than plasticity) is the main form that genetic moderation of the education reform takes when impacting these measures of health. Put differently, and as visualized in Figure 4, the reform has larger impacts on reducing obesity-related measures among those with already-lower genetic propensities towards obesity. The results largely replicate those found in Barcellos et al. (2018), and show that the mPGS the original authors used ended up corresponding to the type of genetic moderation that unfolded.
Interaction between mPGS and Instrumented Educ16 on Body Size outcomes
4.2 Genetic moderation of the education reform’s impact on educational attainment
While the reform’s impact on health outcomes follows the pattern of outcome moderation, the reform’s impact on educational attainment might take a different form. When examining this impact, we find a significant interaction between the HLMM polygenic score for plasticity and Post Reform when predicting three of the four education outcomes: Left School 16 or later, Certification, and O-levels or CSE. The interactions between the HLMM plasticity score and Post Reform remain significant when controls are included. By contrast, we find significant interactions between mPGS and Post Reform for only one of the four outcomes: Left School 16 or later. The results are presented in Table 4 and visualized in Figure 5, which compares the predicted educational outcomes for children in the lowest, middle, and upper terciles of the HLMM vPGS distribution before and after the reform. For the significant interactions, the results show that those with higher HLMM polygenic score attain lower levels of education outcomes prior to the reform but equal levels of education outcomes after the reform, potentially because they had enhanced sensitivity to the positive effects of the reform. And because the results in Section 3.1 show that the HLMM plasticity scores capture genetic contributions to plasticity in outcomes distinct from genetic contributions to the conditional mean, we are more confident that the effect is a true positive.
Genetic moderation of reform’s impact on educational attainment
Interaction between HLMM polygenic score for plasticity and Post Reform on Educational Outcomes
5 Discussion
Recognizing the biosocial nature of most outcomes of interest to demographers, social scientists are increasingly interested in how genetic variation moderates the impact of life-course events that range from society-wide education reforms to targeted policy interventions aimed at specific sub-groups. That is, in addition to estimating direct main effects of genotypes and environments, social and behavioral scientists often seek to model the mutual dependence of nature and nurture. Numerous metaphors have been offered for this causal model of human traits—e.g., genetics as a lens (Domingue et al., 2020) or genetics as a prism refracting environmental influences into heterogeneous treatment effects (Conley and Fletcher, 2018).
In this paper, we argue that social scientists’ workhorse measure of G in GxE research—a genetic summary measure that reflects genetic contributions to levels of an outcome—commits those researchers to an implicit model of genetic moderation of environments. The model corresponds to what we call outcome moderation, and to what others have recently called “dimmer-type” moderation (Domingue et al., 2020). While this model may characterize some forms of gene-environment interplay, there are likely other forms of gene-environment interplay that summary measures constructed from aggregating effects on an outcome’s mean fail to capture.
We propose the use of polygenic scores for plasticity as an addition to social scientists’ toolbox. We first investigate the properties of this tool before applying it. First, focused on best practices, we show how conflation between genetic effects on an outcome’s mean and effects on that outcome’s variance begin with SNP-level analyses but then appear in the constructed scores. The conflation also makes it difficult to investigate whether plasticity in outcomes like BMI displays different patterns of heritability than levels of those outcomes, though an initial analysis of genetic correlations shows an interest flip where levels of BMI and height are negatively genetically correlated but plasticity in the two has a positive correlation. As a whole, we argue that researchers interested in a polygenic score for plasticity as a distinctive summary measure of genotype should be careful to construct scores based on weights from methods that try to adjust for false positive effects on the mean.
Second, applying the scores to a real-world application, we show how adding an E × vPGS analysis to an E × mPGS analysis can detect a particular type of GxE interaction that deploying only an mPGS would obscure. Building on Barcellos et al. (2018), we show that, in line with their results but contrary to our priors, outcome moderation best characterizes the education reform’s impact on health outcomes. But genetic plasticity might better explain the reform closing gaps in educational attainment between low and high-plasticity youth. These results show that one cannot know in advance with great certainty which form of moderation will be operative and thus researchers should test for both forms.
5.1 Limitations and directions for future research
The first limitation is that our application of the polygenic scores for plasticity was limited to one E: an education reform in the UK. In turn, and relevant to the theoretical discussion in Section 1.2, we might imagine that different environmental treatments are more or less likely to exhibit moderation by a person’s plasticity. Due to the issues others have raised about false positives where researchers think they are detecting GxE effects but instead are detecting unobserved confounding between genotype and environment (Conley, 2016; Domingue et al., 2020), we prioritized studying the effect of an “E” that was clearly causally identified over examining how multiple, potentially-confounded “E” interact with each of the focal vPGS. Future research leveraging other natural experiments that alter environments should incorporate plasticity scores to investigate their relevance for other contexts.
Second, as we outline in Section 1.2, there are at least two ways we can think about genetic contributions to plasticity. The first, within-individual plasticity, would require an estimation strategy where we train the inputs weights to the PGS on repeated measures of the same outcome within an individual (e.g., variation in BMI across many years). This form of plasticity is a promising avenue for future research. Predicting within-person (or within-family) variation over time without having to know explicitly what the fluctuating environmental factors are may prove to be a useful exploratory exercise before researchers try to hypothesize about specific factors in the environment that may be causing the fluctuation in genotypically plastic individuals. Moreover, one could imagine using a within-person variability score to identify individuals who might be responsive to an intervention in advance—be that a drug trial or an educational intervention. The advantages of identifying such individuals include increased statistical power for the identification of effects in a pilot study before investing in a larger, more costly study. In terms of the feasibility of this second type of plasticity score, unfortunately, data sources like UKB that contain a large enough sample size to estimate new weights for polygenic scores lack large-scale repeated measures of the same individual. Once these data sources become available, future research can construct scores better designed for this form of plasticity.
In sum, the present article aims to equip demographers and social scientists with an additional tool for studying the interplay of genes and environment, one that captures a broader range of how these interactions play out in applied settings. As cohort studies make polygenic scores for the mean of outcomes available to applied researchers, our paper suggests complementing these scores with scores aimed at capturing genetic contributions to plasticity.
Footnotes
The authors would like to thank members of the Conley Biosociology Lab and the University of Wisconsin Social Genomics workshop for helpful feedback on the project. Results from this research were presented earlier at the National Institute on Aging supported 2018 Integrating Genetics and Social Science Conference (R13-AG062366) at the University of Colorado, Boulder.
1 Put differently, a polygenic score trained in societies where those constraints were attenuated or absent would poorly predict education before an expansion of schooling and then predict in an improved way — i.e. show increased genetic penetrance — once access to formal education was opened up.
↵2 As they describe: “When considering lenses, the relative effect of a given genotype may be positive for a ‘low’ level of the relevant environmental exposure and negative for ‘high’ levels of the exposure, or vice versa”(p. 9).
↵3 This approach complements the approach in Boardman et al. (2014) of studying genetic moderation of specific environmental shocks. In particular, in their study, they use what they call a genome-wide gene-by-environment interaction (GWGEI) approach that regresses level of an outcome (BMI) on each SNP’s interaction with an environmental moderator (education). They note the promise of the approach for capturing G×E, but also challenges with statistical power.
↵4 Other methods that we do not include in the present review because of their similarity to the four we focus on include the new deviation regression model (Marderstein et al., 2020), which models the absolute difference between an individual’s phenotype value and the phenotype medians within each genotype, and the double generalized linear model (DLGM) (Rönnegård et al., 2010).
↵5 These are SNPs with effects on the outcome that fall below some p-value threshold.
↵6 Terms with higher order age variables (age2, age2 * sex), which have been employed in other studies, such as (Young et al., 2018), were excluded due to issues with multicollinearity.
↵7 This robustness check helps guard against us finding that the sibling SD score does not significantly predict levels of an outcome, while the non-sibling scores do, due to inadequate power for the sibling score compared to the scores estimated in a larger sample size. The fact that the patterns hold in the matched sample size supports our claim that the Squared Z and Levene’s test scores are less useful as distinctive tools.
↵8 These replicate results from Bulik-Sullivan et al. (2015a) for overlapping outcomes and also extend their analysis to look at additional outcomes like number ever born.
↵9 Researchers use this in conjunction with simulations comparing the methods, but those simulations likewise largely focus on one or two top causal SNPs.