Testing the adaptive walk model of gene evolution

Understanding the dynamics of species adaptation to their environments has long been a central focus of the study of evolution. Theories of adaptation propose that populations evolve by “walking” in a fitness landscape. This “adaptive walk” is characterised by a pattern of diminishing returns, where populations further away from their fitness optimum take larger steps than those closer to their optimal conditions. Hence, we expect young genes to evolve faster and experience mutations with stronger fitness effects than older genes because they are further away from their fitness optimum. Testing this hypothesis, however, constitutes an arduous task. Young genes are small, encode proteins with a higher degree of intrinsic disorder, are expressed at lower levels, and are involved in species-specific adaptations. Since all these factors lead to increased protein evolutionary rates, they could be masking the effect of gene age. While controlling for these factors, we used population genomic datasets of Arabidopsis and Drosophila and estimated the rate of adaptive substitutions across genes from different phylostrata. We found that a gene’s evolutionary age significantly impacts the molecular rate of adaptation. Moreover, we observed that substitutions in young genes tend to have larger physicochemical effects. Our study, therefore, provides strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale.

6 each group, we found that nearly all correlations between ω % and the co-factor are non-significant 185 (Table S1). Moreover, the linear model analyses showed that in all cases but gene length, gene age is gene age was only significant as an interaction with the species variable, having a positive effect only 188 in Drosophila (Table S2). These findings, therefore, suggest that the effect of gene age on ω % is 189 independent of the co-factor.

190
To jointly estimate the effect of the potential confounding factors, we applied a recently 191 developed method that extends the MK test with a generalized linear model [57]. This approach 192 disentangles the effects of each factor on the rate of adaptive substitutions per nucleotide site.

193
However, this method does not model the distribution of fitness effects and hence cannot account for 194 segregating slightly deleterious mutations, which can bias estimates of the rate of adaptive 195 substitutions [58]. Hence, following the approach suggested in Huang [57], we removed sites for 196 which the derived allele frequency was below 50% to minimize any potential bias. Despite the large 197 reduction in the data set, this analysis revealed a significant effect of gene age (Table S3 in 198 supplementary data). Our findings, therefore, suggest that the effect of gene age on rates of protein 199 evolution is robust to the tested confounding factors and that a gene's age acts as a significant 200 determinant of the rate of adaptive and non-adaptive evolution in both species.

201
Lastly, we aimed at assessing the effect size of gene age on # # relative to other factors.

202
Because correlation coefficients were computed from values averaged over multiple genes and genes 203 were categorized differently for each analysis, the comparison of correlation coefficients does not 7 analyses using genes with very low E-values that are likely to be detected in most age strata and for 222 which the correlation with gene age was no longer significant (see Material and Methods and 1.30e-03; # $# : t = 0.786, p = 6.49e-03; # # : t = 0.643, p = 2.59e-02 in A. thaliana; and w: t = 0.697, p 227 = 1.61e-03; # $# : t = 0.636, p = 3.98e-03; # # : t = 0.636, p = 3.98e-03 in D. melanogaster, Figure S4 228 in supplementary data). These results suggest that the correlation of gene age with the rate of adaptive 229 evolution cannot be attributed to errors in dating the emergence of a gene stemming from the failure of 230 identifying homologs in older taxa.

232
The effect of gene age on the rate of molecular adaptation does not depend on protein function 233 Lineage-specific genes are known to be involved in species-specific adaptive processes, such as the

244
To further correct for the potential bias of protein function, we ran Grapes across categories of 245 GO-annotated genes while simultaneously controlling for the effect of gene age. As some gene 246 functions were biased towards some age categories ( Figure S5), we could not do this analysis for all 247 GO terms. We, therefore, only used the GO terms with a sufficient number of annotated genes in each    Grantham's distances between residues within each age stratum. We observed that substitutions in 275 young genes tend to occur between less biochemically similar residues (Arabidopsis: t = 1, p = 2.00e- Drosophila species, we showed that the higher rate of non-synonymous substitutions in younger genes 9 molecular adaptation (Table S4)