Estimation of indirect genetic effects and heritability under assortative mating

Both direct genetic effects (effects of alleles in an individual on that individual) and indirect genetic effects — effects of alleles in an individual (e.g. parents) on another individual (e.g. offspring) — can contribute to phenotypic variation and genotype-phenotype associations. Here, we consider a phenotype affected by direct and parental indirect genetic effects under assortative mating at equilibrium. We generalize classical theory to derive a decomposition of the equilibrium phenotypic variance in terms of direct and indirect genetic effect components. We extend this theory to show that popular methods for estimating indirect genetic effects or ‘genetic nurture’ through analysis of parental and offspring polygenic predictors (called polygenic indices or scores — PGIs or PGSs) are substantially biased by assortative mating. We propose an improved method for estimating indirect genetic effects while accounting for assortative mating that can also correct heritability estimates for bias due to assortative mating. We validate our method in simulations and apply it to PGIs for height and educational attainment (EA), estimating that the equilibrium heritability of height is 0.699 (S.E. = 0.075 ) and finding no evidence for indirect genetic effects on height. We estimate a very high correlation between parents’ underlying genetic components for EA, 0.755 (S.E. = 0.035), which is inconsistent with twin based estimates of the heritability of EA, possibly due to confounding in the EA PGI and/or in twin studies. We implement our method in the software package snipar, enabling researchers to apply the method to data including observed and/or imputed parental genotypes. We provide a theoretical framework for understanding the results of PGI analyses and a practical methodology for estimating heritability and indirect genetic effects while accounting for assortative mating.

). We performed two-genera8on PGI analysis (Methods and Figure 3) in order to es8mate a) ! , the correla8on between parents' true DGE components (that explain all the heritability); b) ℎ eq % , the equilibrium heritability; c)

Phenotype model
We consider a model with direct genetic effects (DGEs) and indirect genetic effects (IGEs) from parents: where Y ij is the phenotype of individual i in family j; ϵ ij is the residual that is uncorrelated with ∆ ij , η p(i) , η m(i) and has variance σ 2 ϵ ; where g ijl is the genotype of individual j in family i at locus l; g p(i)l is the genotype of the father in family i at locus l; g m(i)l is the genotype of the mother in family i at locus l; and f l is the allele frequency at locus l such that E [g ijl ] = 2f l (assumed to be constant across generations); the DGE of locus l is given by δ l ; and the paternal and maternal IGEs at locus l are given by η l . We assume that paternal and maternal IGEs are equal, but we relate the results for this model to a model that allows for different paternal and maternal IGEs later. In a random-mating population, the phenotypic variance can be decomposed as in the RDR method for estimating heritability [14] : Note that for this we have made the further assumption that the L causal loci segregate independently so are uncorrelated in a random-mating population.

Assortative Mating Model
We consider assortative mating that has reached an equilibrium. From Chapter 4 of Crow and Kimura [1], the correlations between alleles at equilibrium are given in Figure S1. Fig. S1: Correlations between alleles in parents and offspring. Here, double-headed arrows indicate correlations between alleles. Assortative mating induces correlations between alleles in the mother and alleles in the father. At equilibrium, the correlations between alleles are as shown for two loci l and l ′ [1] We aim to express the equilibrium phenotypic variance in terms of the random-mating variance components (Equation 1) and the equilibrium correlations between the parents' direct and indirect components ( Figure 1).

Equilibrium variance decomposition
We generalise the approach taken in Section 4.8 of Crow and Kimura [1] to apply to a trait determined by both DGEs and IGEs from parents.

Equilibrium variance due to direct genetic effects
We have that: We relate this to Cov(∆ m(i) , ∆ p(i) ) = r δ Var(∆ ij ), where r δ = Corr(∆ m(i) , ∆ p(i) ) ( Figure 1). We have that We define the effective number of independently segregating loci of equal contribution to the variance due to DGEs, L δ : Note that if all loci had equal DGEs and equal allele frequencies, then at equilibrium m ll ′ = m ll = r ∀ l, l ′ , and therefore L δ = L. See Crow and Kimura section 4.7 for further details on the model with L independently segregating loci with equal frequency and equal effects [1]. If many common genome-wide variants contribute to the variance explained by direct genetic effects, L δ will be large. This allows us to express Var(∆ ij ) as Let the equilibrium variance Var(∆ ij ) be v eq g , then

Equilibrium variance due to parental indirect genetic effects
As above in Subsection 3.1, it can be shown that at equilibrium where r η = Corr(η p(i) , η m(i) ) and is the effective number of independently segregating loci of equal contribution to the variance due to parental IGEs.
Let v eq e∼g be the equilibrium variance due to parental IGEs. We have that We further have that Therefore,

Equilibrium variance due to covariance between direct and indirect genetic effects
We compute the phenotypic variance due covariance between direct and indirect genetic effects: = 4Cov(∆ ij , η p(i) ).

Relationship to random-mating variance component
We now relate the c eq g,e to c g,e assuming that c g,e ̸ = 0.
This can be related to We define the effective number of independent loci of equal contribution to the variance due to covariance between DGEs and IGEs. Note that if all loci had equal frequency and equal DGE and IGE, then L δη = L. Assuming that c g,e ̸ = 0, we have that From above, we have that and Therefore, and therefore After some rearrangement, it can be shown that and therefore

Expression in terms of direct and indirect genetic effect correlation
We now derive the relationship between r c δη and r τ δη to give an alternative expression for c eq g,e . Recall that (Figure 1) r c δη is the correlation between direct and indirect effect components within an individual: We now define the correlation between DGE and IGE components in an individual under random mating to be r 0 δη , which can be expressed in terms of the random-mating variance components: If we further assume equal allele frequencies, i.e. f l = f ∀ l, then this is simply the genome-wide correlation of the direct and indirect genetic effects: Without assuming equal allele frequencies, it is the genome-wide correlation of the direct and indirect genetic effects for genotypes standardized to have variance 1 . Now we compute r c δη at equilibrium. First, we compute Following a similar procedure to above, we obtain: Now we obtain the correlation at equilibrium: We can then express this in terms of r τ δη = Corr ∆ m(i) , η p(i) , r δ , r η , and the random-mating variance components. Letting L δη → ∞, This can then be further simplified by expression in terms of r 0 δη : We can then substitute this into the expression for c eq g,e derived above:

Different maternal and paternal indirect genetic effects
If we allow the maternal and paternal indirect effects to be different, i.e.
then we can write the phenotype model as (see [2]) which is the same as in 1, except Ignoring the term involving the differences between paternal and maternal indirect effects, this model is equivalent to the model considered above with η l replaced by (η pl + η ml ) /2, i.e. the average of paternal and maternal indirect effects. Now we consider the statistical relationship between the difference term and the other nonresidual terms. We first consider the covariance between the DGE component and the difference term. As the offspring genotype is equally related to maternal and paternal genotypes, i.e. by symmetry, We now consider the covariance between the sum and difference terms for parental IGEs. Since Var g p(i)l = Var g m(i)l , For distinct loci, we have that This shows that the difference component is uncorrelated with the DGE and average parental IGE components. Therefore, for many applications, the difference term can be subsumed into the residual and the variance decomposition for the model with equal paternal and maternal indirect effects used -along with the interpretation that the parental IGE is the average parental IGE.
However, for some applications, such as calculating covariances between relatives, the difference term remains important. We therefore give a variance decomposition at equilibrium including the difference term. First, we give the random-mating variance of Now we consider that at equilibrium, Corr λ p(i) , λ m(i) = r λ . Let v eq lp be the equilibrium variance of λ p(i) , then the at equilibrium we have lp . Following a similar procedure to above for the other variance components, it can be shown that In limit as L λ → ∞, This shows that assortative mating makes approximately no difference in the variance due to differences between paternal and maternal IGEs when there are many variants genome-wide contributing to differences between maternal and paternal IGE components. We can thereby generalize the equilibrium variance decomposition to include the variance component due to differences between maternal and paternal IGEs:

Primary phenotypic assortment
Previously, we did not consider the mechanism of assortment: just that we had reached an equilibrium where the correlations between alleles are constant across generations.
Here we consider what the equilibrium correlation between parents' direct effect components would be under a model of assortative mating due to matching on the phenotype, called primary phenotypic assortment. The key assumption is that the paternal DGE and IGE components are conditionally independent of the maternal DGE and IGE components given the maternal and paternal phenotypes: Under this assumption and a further assumption that Nagylaki 1982[3] for a discussion of the conditions under which this assumption holds), we have that where v eq y is the equilibrium phenotypic variance. It is trivial to derive that Cov ∆ p(i) , Y p(i) = v eq g + c eq g,e /2, and therefore, after some rearrangement, We therefore see that parental IGEs, when correlated with DGEs, can inflate the equilibrium correlation between DGE components of parents over what would be expected from DGEs alone. The same logic implies that the other correlations in Figure 1, and therefore the equilibrium phenotypic variance will be further inflated by AM when DGEs and IGEs are correlated.

Estimating heritability using realized relatedness
Here, we examine heritability estimation using realized relatedness between siblings ('sibregression'), as first proposed in 2006 by Visscher et al. [4]. Let R ijk be the realized relatedness between sibling j and sibling k in family i. While the expected relatedness between siblings is given by the pedigree, the realized relatedness varies around the expectation due to random segregations during meiosis in the parents, leading to variation in the fractions of the genome the siblings share identical-by-descent (IBD). We first derive the phenotypic covariance between siblings in terms of their realized relatedness, then we derive the bias due to AM in sib-regression estimates of heritability.

Equilibrium covariance between siblings
We show how to express the equilibrium phenotypic covariance between siblings in terms of their realized relatedness and the equilibrium decomposition of the phenotypic variance. The covariance between the DGE components of a sibling pair is: The covariance between siblings' genotypes conditional on the IBD state at the locus is: Let P 0 , P 1 , P 2 be the proportion of the genome shared in IBD states 0, 1, and 2, respectively. Therefore, assuming causal variants are located at random with respect to IBD sharing, The realised relatedness between the siblings is R ijk = (P 1 + 2P 2 ) /2. Therefore, Since P 0 + P 1 + P 2 = 1, 2P 0 + (3/2)P 1 + P 2 = 2 − R ijk . Therefore, This gives the covariance between siblings' DGE components as We now relate this to Cov ∆ m(i) , ∆ p(i) = r δ v eq g : Now expressing in terms of L δ : We therefore see that the realized relatedness does not track the assortative mating induced inflation of the variance due to direct effects. Setting R ijk = 0.5, the expected relatedness given the pedigree, we obtain the same as based on pedigree alone. This gives the phenotypic covariance between siblings as Allowing for finite L δ , we obtain

Estimating heritability by sib-regression
Heritability is estimated by the slope of the regression of (Y ij − µ y ) (Y ik − µ y ) /v eq y onto R ijk across sibling pairs [4,5], where µ y is the phenotypic mean and v eq y is the equilibrium phenotypic variance. Since Assuming that ϵ ij ϵ ik is uncorrelated with R ijk across sibling pairs (which is violated when there are indirect genetic effects between siblings [5]), this implies that performing the regression across siblings gives as slope i.e. the random mating variance of the DGE component divided by the equilibrium phenotypic variance. This is smaller than the equilibrium heritability, h 2 eq , by a factor of v g /v eq g = (1 − r δ ). Although unlikely to be relevant for complex traits in humans, the above result implies that there would be a further downward bias when there is AM for a phenotype affected by only a small (effective) number of loci. For example, if L δ = 1, as for a phenotype affected by a single variant, then the heritability estimate would be which approaches zero as r δ approaches 1. The intercept is where the r δ h 2 eq term captures the increased correlation between siblings' DGE components due to AM, and the remaining terms capture variance explained by environmental factors shared between siblings. This result (in the limit as L δ → ∞) agrees with the hypothesis put forward in Kemper et al. [10], which argued that sib-regression estimates the random mating genetic variance divided by the phenotypic variance in the present generation, which is h 2 f at equilibrium. Kemper et al. supported their argument through a theoretical derivation and simulations of one generation of assortative mating. We show here that their theoretical argument was incorrect. In Supplementary Note Section 3.2 of Kemper et al. (pages 60-61 of the supplement) they use the following result to argue that sib-regression estimates h 2 f : which is stated without proof. However, as we prove above (Equation 4.1), However, using the incorrect result E[δ i1 δ i2 |R ijk ] = v g R ijk in their derivation still gives the correct slope for sib-regression because the the missing r δ v eq g term is uncorrelated with R ijk across sibling pairs. The result they give for the slope in the section is also missing a r δ v eq g term since they ignore the inflation of genetic covariance between siblings due to AM when giving the phenotypic covariance between siblings.

Polygenic index analysis under random-mating
Here we derive the expected regression coefficients from two-generation PGI analysis under random-mating in terms of the weight vector of the PGI. Let where We consider performing the regression defined by: Let Then we have that, under random-mating, Var (X ij ) = 1 1 1 2 ; and We also have that . Therefore, .
Under random-mating, we also have that 6 Direct genetic effect PGI at equilibrium Here we derive the expected regression coefficients from two-generation PGI analysis of the true DGE PGI under assortative mating at equilibrium. This is defined by the following regression: where (Equation 1 and Figure 1) Let Then we have that Var (X ij ) = v eq g 1 1 + r δ 1 + r δ 2 (1 + r δ ) ; and We also have that where we have used the fact that Cov ∆ ij , η p(i) + η m(i) = c eq g,e /2. We can again use this to compute Cov ∆ par(i) , η p Cov (X ij , Y ij ) = v eq g + c eq g,e /2 (1 + r δ ) v eq g + c eq g,e ; and therefore δ δ α δ = Var (X ij ) −1 Cov (X ij , Y ij ) = 1 c eq g,e 2(1+r δ )v eq g .

Variance explained by parent and offspring DGE PGIs
We now compute the phenotypic variance explained the regression of parent and offspring DGE PGIs (Equation 34): Therefore, the fraction of phenotypic variance explained by the regression is: The increase in variance explained compared to if there were no IGEs (i.e. α δ = 0) is

Incomplete direct genetic effect PGI
Here we derive results for two-generation analysis of an incomplete DGE PGI. We assume that all causal variants have equal allele frequency, f , and equal DGE, δ. For some 0 < k ≤ 1, the incomplete direct DGE PGI for individual j from family i is: and the equivalent incomplete parental DGE PGIs are:

Equilibrium variance of incomplete direct genetic effect PGI
As above, we consider assortative mating to have reached an equilibrium with In this simplified model, the correlations between distinct alleles ( Figure S1) are all equal, with m ll = m ll ′ = r δ 2L (1 − r δ ) + r δ as first given by Sewall Wright in 1921 [6]. (We also have that L δ = L in this simplified model.) First, we consider the equilibrium variance of PGI δ k ij : The variance of the incomplete DGE PGI is inflated from kv g (under random-mating) to where r k = Corr PGI δ k p(i) , PGI δ k m(i) .

Relationship between correlations of incomplete and true DGE PGIs
By equating the two different expressions for the equilibrium variance of the incomplete PGI (above), we obtain the relationship between the equilibrium correlation between parents' DGE components, r δ , and the equilibrium correlation between parents' incomplete DGE PGIs, r k :

Direct to population effect ratio without indirect genetic effects
We assume a model without IGEs, i.e. η p(i) = η m(i) = 0, but with assortative mating at equilibrium. We first derive the 'population effect' of the incomplete DGE PGI, which is the equilibrium regression coefficient of Y ij on PGI δ k ij . The covariance between phenotype and incomplete DGE PGI is: Therefore, the population effect is: where we have substituted in Var PGI δ k ij = δ 2 kL2f (1 − f )[1 + (2kL − 1)m] from above. We now express 1 + (2kL − 1)m in terms of r δ and L : By setting k = 1, we obtain: 1 + (2L − 1)m = 2L 2L (1 − r δ ) + r δ and therefore Since the direct effect of the incomplete DGE PGI is 1, we therefore have that

Variance explained by incomplete DGE PGI
The variance explained by regression of phenotype onto incomplete DGE PGI is therefore: We can compare this to the equilibrium genetic variance, v eq g = v g / (1 − r δ ), to obtain the fraction of heritability explained by regression of phenotype onto incomplete PGI at equilibrium: To obtain the fraction of phenotypic variance explained, we multiply by h 2 eq :

Direct to population effect ratio with indirect effects
Consider the regression of phenotype onto offspring and parental true DGE PGIs outlined in the main text and in Section 6: where α δ = c eq g,e We now compute the population effect of the incomplete DGE PGI in terms of the parameters of the regression on the true DGE PGI (Equation 50): where the first term is the population effect of the incomplete DGE PGI when there are no IGEs (above). We now compute Cov PGI δ k ij , ∆ par(i) . Since PGI δ k ij ⊥∆ par(i) |G par(i) and As we are at equilibrium, Cov PGI δ k p(i) , Letting k = 1, we also have that Since the direct effect of the incomplete DGE PGI is 1, we obtain

Average NTC of incomplete DGE PGI
We now compute the average NTC of the incomplete DGE PGI, α PGI:k . At equilibrium, Therefore, α PGI:k = (β PGI:k − 1)/(1 + r k ) (55) Since δ PGI:k = 1 and ρ k = 1 − (1 − k)r δ , we also have 8 Estimating k For the inference procedure, one first needs to estimate k, the fraction of heritability the PGI would explain in a random-mating population. Consider that one has performed the following regression: Y ij v eq y = δ PGI:k PGI δ k ij + α PGI:k PGI δ k par(i) + ϵ ij , where the proband and parental PGIs have been scaled by the inverse of the standard deviation of the proband PGI, so that the proband PGI has variance 1. (This differs from the theoretical derivations above that use the un-normalized incomplete DGE PGI.) The expected direct effect at equilibrium is (Equation 44): .
If we have estimated δ PGI:k and α PGI:k from a two-generation PGI analysis (with phenotype and PGIs standardized to have variance 1) using N independent trios, then (see Equation 36) δ PGI:k α PGI:k ∼ N δ PGI:k α PGI:k , where R 2 trio is the variance explained by regression onto proband and parental PGI, which is equal to R 2 trio = δ 2 PGI:k + 2(1 + r k )α PGI:k (α PGI:k + δ PGI:k ).
We obtain our simulation estimates of δ PGI:k and α PGI:k by simulating from the bivariate normal distribution given in (67). We obtain our estimate of β PGI:k asβ PGI:k =δ PGI:k + (1 + r k )α PGI:k . We consider that we have estimated r k by computing the sample correlation coefficient between the parents' PGIs across the N independent trios, implying that the approximate sampling distribution ofr k isr k ∼ N r k , We simulatedr k independently fromδ PGI:k andα PGI:k as we found in our other simulations of trio genotype data that estimates ofr k were uncorrelated withδ PGI:k andα PGI:k when estimated from the correlation between parents' PGIs. We simulated estimates of h 2 f from a normal distribution: and we varied the sampling variance between simulations. We did not analyze simulation replicates whereĥ 2 f < 0. We considered N = 10 4 and N = 5 × 10 5 , k = 0.05, 0.2, 0.5, and Var(ĥ 2 f )) = 0.01, 0.1. We considered all combinations of these parameters, giving 12 simulations in total. For each set of simulation parameters, we simulated 1000 replicates. For each set of parameters, we assessed the bias by comparing the sample mean of the estimated parameters across the replicates, and we assessed the estimated standard errors by comparing the median standard error estimate across the replicates to the standard deviation of the parameter estimates across the replicates. We give the results in Supplementary Table 7.