Non-additive polygenic models improve predictions of fitness traits in three eukaryote model species

To describe a living organism it is often said that “the whole is greater than the sum of its parts”. In genetics, we may also think that the effect of multiple mutations on an organism is greater than their additive individual effect, a phenomenon called epistasis or multiplicity. Despite the last decade’s discovery that many disease- and fitness-related traits are polygenic, or controlled by many genetic variants, it is still debated whether the effects of individual genes combine additively or not. Here we develop a flexible likelihood framework for genome-wide associations to fit complex traits such as fitness under both additive and non-additive polygenic architectures. Analyses of simulated datasets under different true additive, multiplicative, or other epistatic models, confirm that our method can identify global non-additive selection. Applying the model to experimental datasets of wild type lines of Arabidopsis thaliana, Drosophila melanogaster, and Saccharomyces cerevisiae, we find that fitness is often best explained with non-additive polygenic models. Instead, a multiplicative polygenic model appears to better explain fitness in some experimental environments. The statistical models presented here have the potential to improve prediction of phenotypes, such as disease susceptibility, over the standard methods for calculating polygenic scores which assume additivity.

Over a decade of Genome-Wide Associa on (GWA) studies has confirmed what quan ta ve gene cists have long suspected -that many complex and con nuous traits, including human health traits (1) and experimentally measured fitness in plants (2) and animals (3) , are controlled by hundreds if not thousands of muta ons (1) . These observa ons suggest that adapta on might generally occur in a polygenic fashion, where mul ple alleles in the genome are under selec on, rather than through single selec ve sweeps, where a single allele contributes most of the heritability and has a large fitness effect (4) . There are many possible ways in which mul ple muta ons can combine or interact to determine a trait. S ll, the vast majority of polygenic models used for gene c mapping of traits exclusively consider addi ve effects, in which muta ons are assumed to act independently on a trait, without any interac on. Considering the complex nature of biological systems (5) , this is a counterintui ve assump on, which has generated controversy in quan ta ve and popula on gene cs (6)(7)(8) .
Quan ta ve gene cs, the branch of evolu onary gene cs focused on understanding the gene c contribu on of traits, has GWA as one of its core tools. When conduc ng GWA using fitness or a fitness proxy as the trait of interest, one implicitly assumes selec on is addi ve (i.e. fitness , with being the selec on coefficients or rela ve fitness effects of a given variant). The jus fica on for this assump on is mostly sta s cal, and rooted in the origin of the field of quan ta ve gene cs and the infinitesimal model (9) , which posits that when many loci affect a trait in small quan es, even if there are interac ons, the addi ve model is a good approxima on -a theore cal no on which has been backed up by prac cal successes in ar ficial selec on for breeding plant and animals (10)(11)(12)(13) .
Despite this success of the addi ve model, much classic and modern gene c research supports the existence of epistasis (14) , encouraging researchers to explore this biological phenomenon in genome-wide or polygenic approaches. Some models in quan ta ve gene cs have extended a linear model to include pairwise epista c interac ons include pairwise epista c interac ons ( ) (15,16) . Extending the pairwise epistasis model to genome-wide approaches, for a dataset of SNPs, one would have to infer main effects and interac on terms, which likely is many more parameters than current datasets have the power to detect (16) . Furthermore, admi ng pairwise epistasis immediately leads to the ques on of higher-order epistasis, such as trios or quartets of interac ng loci (17) .
The pairwise epista c model is only one of many non-addi ve models proposed in the broader field of evolu onary gene cs (reviewed and discussed in (18) and (19) ). Other important and commonly assumed models are: The "mul plica ve model" ( ), where fitness increases in a geometric fashion with the number of alleles affec ng fitness (a pairwise version of which also exist (20) ). The "synergis c or posi ve epistasis model", where fitness increases or decreases faster than a geometric func on (21) , and the "diminishing returns or nega ve epistasis model", where fitness gains are less than expected from the combina on of adap ve muta ons and ul mately plateauing ( Fig. 1C ) (21,22) .
The choice of func on for combining muta on effects is not merely a mathema cal triviality but rather represents a hypothesis on how molecular processes affect the development of a trait and how these, in turn, are jointly affec ng fitness (18) . For instance, the choice of a mul plica ve model of fitness made by many theore cal popula on gene cists can be jus fied by assuming that muta ons decrease survivorship by a certain frac on, independently of each other. On the other hand, quan ta ve gene cists tradi onally consider traits such as height, and thus may choose a model where muta ons ac vate or repress independent growth pathways and may compensate one one another addi vely. The addi ve and mul plica ve models are approximately iden cal when effect sizes (i.e. selec on coefficients if the trait is fitness) are small, as the product terms of the effect sizes vanish, but can differ more substan ally when there are muta ons of moderate or large effects, par cularly in the tails of the distribu on.
Despite recent conceptual efforts to connect GWA results and popula on gene cs and natural selec on concepts (23) , a common GWA framework to directly test different global addi ve, mul plica ve, or epista c models is s ll lacking (but see (24) ). Here we developed NAPs (Non-Addi ve Polygenic models) for joint inference of effect sizes for loci together with parameters modeling how the effects of loci combine (addi ve, mul plica ve, etc.). These models are so-called global epistasis models because they do not parameterize interac ons between individual loci directly, but rather model the interac on as a func on of the total combined effect of all alleles in all loci (25) ( Fig. 1C ). Our aim was to develop a flexible GWA-like model that could explain individual trait varia on as a func on of genomic varia on using both addi ve and non-addi ve func ons commonly used in evolu onary gene cs. Much of the theory on epistasis models in popula on and quan ta ve gene cs deals with fitness and natural selec on, so we developed the NAP models to be especially suited for fitness traits (or proxies thereof). Fitness is a complex trait that is of special interest (natural selec on is defined as gene c varia on in fitness), but o en is non-Normally distributed and, therefore, requires careful modeling (26) . Here, we define the rela ve fitness of an organism as the expected total number of offspring produced in a life me divided by the popula on mean number of offspring. Rela ve fitness is then bounded between zero and N , with mean=1. The number of offspring of an individual can be thought of as arising from the convolu on of a Bernoulli represen ng viability and a Poisson describing the offspring number, which is poorly approximated by a normal distribu on even a er various normaliza ons (26) ( Fig. 1A ). In par cular, modeling of offspring number is o en challenged by the infla on of zeros present in the dataset ( Fig. 1B ), which cannot be easily modeled by trunca ng the nega ve side of a normal distribu on and allowing a point mass at zero equal to the CDF of the normal distribu on at this point. We therefore model the normalized number of offspring from individual with genotype as being drawn from a mixture distribu on: ( Fig. 1B ); where represents the unobserved true fitness of a given genotype in a popula on, were different genotypes have different fitness (e.g. distribu on means in Fig. 2B vary for different genotypes); and are constants that allow either for homoscedas c Gaussian variance, when , or for variance to increase with the mean, (e.g. distribu on widths in Fig. 1B vary for different genotypes). Finally, independent of the genotype, the model allows for a frac on, , of stochas c zeroes, o en observed in real offspring distribu on data and fitness assays (e.g. frac on at zero in Fig. 1B ). This framework is flexible enough to accommodate a wide array of sampling distribu ons ( Fig. 1A ). Also, in the data sets we will consider, the number of offspring per individual is rela vely large so the normal distribu on of offspring per genotype should approximate the Poisson quite well. Each genotype, , is assumed to have a fitness that is a func on of the genotypic state in the genome-wide muta ons affec ng fitness. The func ons we will use to describe the combined effects of muta ons can be addi ve: ; or mul plica ve: , Here encodes the genotypes of biallelic SNPs, with alterna ve allele dosages: 0,1,2; assuming individuals are diploid, and is the fitness of a hypothe cal reference genotype (with =0 in all loci). Both addi ve and mul plica ve func ons can have different func onal forms that make them depart from linear or geometric func ons when . Specifically, when , the genotype-trait architecture is said to have synergis c epistasis. When , it has diminishing returns epistasis. In total, six combina ons of model architectures can be tested: purely addi ve, addi ve synergis c, addi ve with diminishing returns, purely mul plica ve, mul plica ve synergis c, and mul plica ve with diminishing returns ( Fig. 1C ). Given the above sampling distribu on per genotype and parameters, the likelihood of observing rela ve offspring number of a genotype is: And the log-likelihood of all observa ons is the sum over all log-likelihoods per genotype: .
The key predictor in our NAP model is the vector of rela ve fitness devia ons or selec on coefficients on gene c variants, . Following (27) , we define a selec on coefficient as the rela ve fitness advantage or disadvantage of a minor allele with respect to the fitness of a reference haplotype (the theore cal haplotype with all major alleles).
To infer the vector and the model hyperparameters, we use the mul variate quasi-Newtonian op miza on algorithm Spectral Projected Gradient (SPG) (28) . Although this procedure has been developed for large-scale mul variate op miza on and is implemented in C++ ( h ps://github.com/MoisesExpositoAlonso/napspg ), conduc ng these analyses with millions of SNPs is computa onally-demanding and me-consuming. Hence, we instead first run a Bayesian Sparse Linear Mixed Model (BSLMM, implemented in GEMMA) (12) , a polygenic addi ve GWA model for which efficient op miza ons have been implemented. Using BSLMM, we can pre-select a set of top associated SNPs with the trait for which we then run our likelihood op miza on. The SPG op miza on was run un l convergence or for 2,500 itera ons. To assess goodness-of-fit and avoid overfi ng, we use a cross-valida on approach, where 90% of the data is used for training and 10% is used for tes ng. Accuracy is measured using Spearman's rank correla on (r) between predicted and measured values of rela ve offspring number of individuals.
Using the NAP model, we analyzed real fitness datasets and tested which polygenic architecture model best fits the data. We used datasets from three model organisms: Arabidopsis thaliana (part of the 1001 Genomes Project (2) ), Drosophila melanogaster (part of the Drosophila Gene c Reference Panel (DGRP2) (3, 29) ), and Saccharomyces cerevisiae (part of a >1000 genomes effort (30) ). For A. thaliana, we analyze a total of eight experimental datasets, 515 natural lines, and 4,438,427 biallelic SNPs. For D. melanogaster , we use one dataset, 205 natural lines, and 4.5 million biallelic SNPs. For S. cerevisiae , we use 35 datasets, 1011 natural lines, and 83,794 biallelic SNPs. BSLMM Genome-Wide Associa ons for the three species confirmed the results from the original publica on that fitness traits were polygenic, with the highest es mated number of causal loci ( ) for any of the analyzed environments (mode of 1,000 MCMC steps) being: 63 for Drosophila, 261 for Arabidopsis, and 271 for Saccharomyces. None of the previous publica ons tested whether these polygenic traits were addi ve or not. For each of these 44 datasets we applied the likelihood method to the 1,000 SNPs with highest posterior probability in BSLMM, for different (non-)addi ve architectures, and selected the best models based on cross-valida on accuracy ( Fig. 2 ).

Figure 2 | Significant non-addi ve selec on in Arabidopsis , Drosophila and Saccharomyces experiments (A) Cross-valida on predic on accuracy using Non-Addi ve Polygenic (NAP) models in 44 fitness datasets in D. melanogaster (acronym D.), A. thaliana and S. cerevisiae. Accuracy was evaluated by Spearman's r correla on coefficient between the actual and predicted value of rela ve fitness from the test set of individuals in each dataset. Fitness predic ons were generated combining inferred selec on coefficients for 1,000 SNPs using addi ve, mul plica ve, or other non-addi ve func ons. For reference, the predic on accuracy using state-of-the-art BSLMM Genome-Wide Associa on (GWA) (17) is also shown, along with the magnitude of accuracy improvement ( Δr) . (B) Percentage of the 44 datasets that were best predicted (highest r) by each of the six tested models. (C) Cross-valida on predic ons of rela ve offspring number in A. thaliana (experiment code "mli", random test set n=52), with an example of the addi ve and the mul plica ve models, and the comparison of predictability with BSLMM GWA model with all and the top 1,000 SNPs (random test set n=52). ( D-E ) Cross-valida on predic on accuracy plots of 96 simulated datasets under a mul plica ve (orange) or addi ve (black) polygenic model. Grey crosses indicate simulated mul plica ve datasets for which the best model (highest r) was addi ve. ( D ) Each simulated dataset's cross-valida on accuracy improvement using NAP over BSLMM GWA, plo ed against the fitness variance of each simulated dataset. ( E ) Each simulated dataset's cross-valida on accuracy difference between the true model (fi ng NAP with the known parameters used to simulate the dataset), and the best model (NAP run with the parameters that maximized r). ( F ) A Random Forest was used to explain the accuracy difference between NAP and GWA based on the hyperparameters used to simulate datasets (n= 96 ). Variable importance shows which simula on parameters lead to the largest changes in accuracy improvement. The parameters are: model type addi ve/mul plica ve, selec on strength, Poisson noise, Gaussian noise, random zeroes, synergis c/diminishing global epistasis (see text for explana on). ( G ) A Random Forest was used to explain which parameters (see F) characterize the simulated mul plica ve datasets where the best model was addi ve (n= 32 ).
Our results show that non-addi ve polygenic (NAP) models in general tend to provide higher predic on accuracy than standard addi ve models ( Fig. 2A-B ). The most common architectures were either addi ve with diminishing global epistasis or addi ve with synergis c global epistasis, and a significant 15% frac on were also best fit under a mul plica ve model in at least one environment for each of the three species .
As expected due to the zero-infla on and long tails in these datasets ( Fig. 1A ), the state-of-the-art Genome-Wide Associa ons had a lower predic on accuracy than the NAP models ( Fig. 2A-C ) for almost all data sets. The NAP model improved cross-valida on accuracy over GWA by an average of r=0.216 (2.5-97.5% quan les: 0.0729, 0.592). This was more pronounced when the best model was mul plica ve (Δr=0.32 mul plica ve, Δr=0.35 mul plica ve with diminishing returns) compared to addi ve (Δr=0.21 for addi ve, Δr=0.16 for addi ve diminishing, Δr=0.23 for addi ve synergis c). To exclude that these improvements were due to the winner's curse, we re-ran the GWA with the top 1,000 SNPs from the first GWA run which were used to fit NAP (for comparison of the all SNPs GWA and the top 1,000 SNPs GWA, see Fig. 2C ).
To test the robustness of our conclusions, we ran the same set of analyses in 96 simulated datasets ( Fig. 1A ). The fitness of 1,500 individuals was simulated using 10,000 biallelic SNPs in 500 causal loci, each randomly assigned a selec on coefficient assuming ; with variance either 0.01 (weak selec on) or 0.1 (stronger selec on). The strength of selec on can also be interpreted as the polygenicity of the trait, as in the weak selec on case only ~30% of SNPs have selec on coefficients > 1%, while in the strong selec on case >90% of the 500 loci have selec on coefficients > 1%. Individual-level fitness is then calculated as a combina on of all genome-wide selec on coefficients using an addi ve or mul plica ve formula (see above), and in all combina ons with or without synergis c and diminishing returns epista c parameters (using a grid of parameter values from 0.8 to 1.2 in 0.1 increments). Finally, we added different degrees and types of sampling noise, either 0% or 20% zero-infla on of random zeroes, a basal Gaussian noise of 0.01 or 0.5 ( ), and variance increased in propor on to the mean of 0.01 or 0.5 ( ); and all combina ons of the above (n=96), crea ng an array of fitness distribu on shapes. The hyperparameter values were manually selected to create realis c fitness distribu ons similar to those observed in A. thaliana, D. melanogaster and S. cerevisiae datasets ( Fig. 1A) .
Using the same approach as with the experimental data, we conducted a pre-selec on of the top SNPs using BSLMM, ran our NAP models with 1,000 SNPs under different addi ve and mul plica ve architectures and a grid of values of the epista c parameter, and then selected the best models based on cross-valida on accuracy. As before, we found that our best model had an average cross-valida on predictability of r=0.50, while GWA had an average accuracy of r=0.21. The improvement in accuracy of NAP over GWA increased as the fitness distribu on became more complex, that is, heavy-tailed (skewness, Pearson's r=0.41, P= 3.64x10 -5 ) and broad (variance, r=0.24, P= 1.89x10 -2 ) ( Fig. 2D ).
Because the true underlying genome architecture is known for the simulated datasets, we could study the performance of our likelihood op miza on and cross-valida on approaches, and compare them with the BSLMM GWA baseline. We found that 98% of all addi ve simulated datasets had the highest predic on accuracy under an addi ve model. Fi ng different global epista c parameters (0.8-1.2 in 0.05 increments) in these addi ve models and selec ng the model with highest predic on accuracy, we nevertheless iden fied the correct simulated epista c parameter 20% of the me, close to the random expecta on 10%. This may be caused by the minor influence of this global epistasis parameter in shaping fitness (see Fig. 2 F ), or due to op miza on issues and/or our coarse grid search approach.
Mul plica ve architectures did not tend to improve predic on accuracy even when the true model was in fact mul plica ve. Only 32% of mul plica ve simula ons found highest predic on accuracy under a mul plica ve model. When we studied which mul plica ve datasets had poor predic on accuracy under a mul plica ve model, we found that it was those datasets with high noise, leading to an overall low accuracy of the model ( Fig. 2D-E ). We conducted a Random Forest analysis to quan fy the simula on condi ons that characterized runs where the wrong model had the highest predic on accuracy. Unsurprisingly, it was simula ons under condi ons with weak mul plica ve selec on and high Gaussian noise that were wrongly iden fied as addi ve ( Fig. 2F-E ,  crosses), as in cases where loci have small effects, addi ve and mul plica ve models are approximately iden cal. Despite the fact that simulated mul plica ve datasets were consistently be er predicted with addi ve rather than mul plica ve models, we found that addi ve models fi ed with synergis c global epistasis parameter ( ≥1.2) had on average a 6% higher accuracy than addi ve models with diminishing returns parameter ( ≤0.85) (t-test, t = 2.433, df = 189.7, p-value = 0.0158; Wilcoxon rank sum test, W = 5631, P = 0.0079). This shows that our algorithm may detect non-linear mul plica ve-like architectures, but they appear to be well approximated by an addi ve model with synergisms (see the similarity in mul plica ve and addi ve-synergis c lines in Fig. 1C ).
Here we presented the NAP model, the first Genome-Wide Associa on pipeline that enables researchers to choose a global non-addi ve architecture in their associa on study for predic on purposes. Applying NAP to three diverse species, we find evidence for a mul plica ve genotype-fitness architecture at least in some environments, and simula ons suggest our model selec on based on predic on accuracy is conserva vely biased towards addi ve architectures. Beyond tes ng different polygenic architectures, the tailored likelihood model behind NAP accommodates non-Normal phenotypic distribu ons and achieves a higher cross-valida on accuracy when predic ng trait values of genotypes hidden from the model, which could have broad applica ons both to study natural selec on and fitness, as well as other traits. We also note that the method has poten al for improving phenotypic predic on in human disease studies and in other studies that rely on the standard addi ve polygenic risk score model (12) .
Conclusions on global epistasis could arise due to non-lineari es in the measuring scale of a trait (31) . Fitness has a natural scale, i.e. the number of offspring produced by an individual, and therefore should not be affected by this confounding. However, some proxies for fitness, e.g. growth rates for unicellular organisms, could also be problema c for measuring selec on (32) .
The evidence for non-addi ve gene c architecture presented here may not be surprising given the growing literature of experiments that require genome-wide epistasis to explain asymmetric responses to ar ficial selec on (33) , line-dependent effects of muta ons in Drosophila (14,34) , or significant quan ta ve trait loci hubs in yeast (35) . This has sparked recent development of various sta s cal approaches to test epistasis more generally, by studying the emergent pa erns of epistasis as its contribu on to variance (36) , or one genotype-to-trait map (24,25,31,37,38) as in this study. Recent systems biology approaches for crea ng massively-parallel muta ons using CRISPR/Cas9 techniques (39) , as recently aimed in yeast experiments (40) , should further enable researchers to test the underlying addi ve or epista c interac ons of muta ons. The evidence for non-addi ve gene c architecture in three key model species iden fied in this study will have substan al impact in our understanding and predictability of polygenic adapta on of species (6,41,42) .