Natural selections on both amino acid sequences and expression levels are determinants of ohnolog retention

The mechanism of ohnolog retention is a subject of concern in evolutionary biology. Natural selections on coding sequences and gene dosages have been proposed to be determinants of ohnolog retention. However, the relationship between the two models is not widely accepted, and the role of regulatory sequences on ohnolog retention has long been neglected. In this study, based on a model of complex traits’ genetic architecture, we compared the natural selection’s strength on corresponding sequences between ohnologs and non-ohnologs by comparing complex traits’ heritability enrichments. We showed that complex traits’ regulatory sequences’ heritability enrichments (p = 1.1 × 10−5 in 5 kb flanking regions) and expression-mediated heritability enrichments (p = 2.1 × 10−5) of ohnologs were significantly higher than non-ohnologs. Then, we deduced that regulatory sequences of ohnologs were under substantial natural selection, which was also a determent of ohnolog retention. Meanwhile, we showed that in coding sequences, the complex traits’ heritability enrichments of ohnologs were significantly higher than of non-ohnologs (p = 9.9 × 10−5), supporting the ohnolog retention model of natural selection on coding sequences. We also showed that complex traits’ causal gene expression effect sizes of ohnologs were significantly larger than of non-ohnologs (p = 8.8 × 10−6), supporting the ohnolog retention model of natural selection on gene dosages. In conclusion, we provide the first unified framework to show that both amino acid sequences and expression levels of ohnologs are under substantial selection, which may end the long-standing debate on ohnolog retention models.


Introduction
Ohnologs are paralogs that descended from two rounds of whole-genome duplications (WGD) early in the evolutionary history of Vertebrata. The types of natural selection that ohnologs have undergone are closely linked to the types of pathogenic variants observed in ohnologs, which makes the role of ohnologs in disease and the mechanism of ohnolog retention a subject of concern. Due to the observation of overrepresented ohnologs in pathogenic copy number mutations, ohnologs were considered dosage-sensitive, and natural selection on gene dosages was proposed to be a driving force behind the retention of ohnologs (Makino and McLysaght, 2010;McLysaght et al., 2014; Rice and McLysaght, 2017). Meanwhile, because human dominant disease genes were enriched in ohnologs, ohnologs were considered to be prone to dominant deleterious mutations, and purifying selection on coding sequencings was thought to be the driving force behind the retention of ohnologs (Singh et al., 2012(Singh et al., , 2014Roux et al., 2017). However, the relationship between the two models of ohnolog retention is unclear, and the role of ohnolog regulatory sequences in human complex traits and regulatory sequences' role on ohnolog retention has long been neglected.
In this study, we take advantage of Eyre-Walker's model, in which mutations that affect a complex trait also affect fitness (Eyre-Walker, 2010; Keightley and Hill, 1990), to deduce the relative strength of natural selection on corresponding sequences between ohnologs and non-ohnologs.
We tested the null hypotheses that there were no differences between complex traits' heritability enrichments of corresponding sequences (e.g., intron) between ohnologs and non-ohnologs.
We deduced if the natural selections on corresponding sequences (e.g., intron) of ohnologs were stronger than non-ohnologs based on the results of hypothesis tests.

Comparison of complex traits' heritability enrichments of corresponding regions between ohnologs and non-ohnologs
We took advantage of the stratified LD score regression (LDSC) (Finucane et al., 2015) to partition heritability (ℎ 2 g ) of complex traits among corresponding regions of ohnologs and non-ohnologs. LDSC calculates complex trait's ℎ 2 g enrichment of genomic regions, (proportion of ℎ 2 g )∕(proportion of SNPs), using a GWAS result of the complex trait and LD scores of an LDSC model calculated from a reference population. We made an LDSC model from the baseline-LD model that was robust to a wide variety of LD patterns (Finucane et al., 2015;Gazal et al., 2017). Our LDSC model, named baselineLD-ohnolog, included annotations of coding regions, introns, UTR and gene flanking regions (0-5 kb, 5-20 kb, and 20-100 kb) (Yao et al., 2020) of ohnologs and non-ohnologs (Singh and Isambert, 2020) (refer to Supplementary Table 5 for gene lists and refer to Supplementary Table 6 for annotations).

Comparison of complex traits' expression-mediated heritability enrichments between ohnologs and non-ohnologs
Regulation of gene expression is an essential aspect of the functions of regulatory sequences. In the previous section, we observed that in 5 kb flanking regions of protein-coding genes, the complex traits' heritability enrichments of ohnologs were significantly higher than of non-ohnologs.
Next, we took advantage of the mediated expression score regression (MESC) (Yao et al., 2020) to quantify the expression-mediated heritability (ℎ 2 med ) between ohnologs and non-ohnologs, and further test if complex traits' ℎ 2 med enrichments, (proportion of ℎ 2 med )∕(proportion of expressed genes), of ohnologs were significantly higher than of non-ohnologs.

Discussion
We showed that (i) complex traits' ℎ 2 g enrichments of regulatory sequences of ohnologs were significantly higher than of non-ohnologs ( = 6.9 × 10 −5 for 0-5 kb gene flanking regions; Figure 1D); (ii) complex traits' ℎ 2 med enrichments of ohnologs were significantly higher than of non-ohnologs ( = 8.5 × 10 −5 for the group of all tissues; Figure 2A). To make a further conclusion, we took advantage of Eyre-Walker's model (Eyre-Walker, 2010;Keightley and Hill, 1990), which hypothesized that mutations' absolute values of effect sizes and selection coefficients were positively correlated, either because the trait was a component of fitness or because the mutations had pleiotropic ef-fects on fitness. Recent empirical studies have supported the hypothesis, as heritability of complex traits is more enriched in genes under stronger natural selection regardless of whether a trait affects fitness (Gazal et al., 2017;Yao et al., 2020;Gazal et al., 2018). Under Eyre-Walker's model, we deduce that the natural selection on regulatory sequences of ohnologs is stronger than that of non-ohnologs.
Compared with previous studies, we showed that complex traits' ℎ 2 g enrichments of coding sequences of ohnologs were significantly higher than of non-ohnologs ( = 3.0 × 10 −4 ; Figure 1A). Under Eyre-Walker's model, we deduce that the natural selection on coding sequences of ohnologs is stronger than that of non-ohnologs, which is consistent with the ohnolog retention model of natural selection on coding sequences (Singh et al., 2012, 2014; Roux et al., 2017). We also showed that complex traits' causal gene expression effect sizes of ohnologs were significantly larger than of non-ohnologs ( = 3.5 × 10 −5 for the group of all tissues; Figure 2B). Because both regulatory sequences and gene copy number affect gene expression level (Stranger et al., 2007) In conclusion, previous studies showed that ohnologs were vulnerable to coding sequence mutations (Singh et al., 2012(Singh et al., , 2014Roux et al., 2017) and copy numbers mutations (Makino and McLysaght, 2010;McLysaght et al., 2014;Rice and McLysaght, 2017). We showed that ohnologs were also vulnerable to regulatory sequence mutations. We believe that both amino acid sequences and expression levels of ohnologs are under substantial natural selection. The natural selection on regulatory sequences ensures that ohnologs are stably expressed, thus maintain a functional state.

Methods and Materials
We curated GWAS results of 38 independent complex traits and diseases (Abbott et Table 1 and 4). We used the Ensembl GRCh37 annotated autosomal protein-coding genes as the universal gene set. We used the human ohnolog list generated by the intermediate criterion of Singh and Isambert (2020) (Supplementary Table 5 Table 6). We applied stratified LDSC to the 38 independent traits using the baselineLD-ohnolog model as described previously (Gazal et al., 2017).
The expression scores of ohnologs and non-ohnologs were calculated from the eQTL effect sizes of GTEx v8 tissues (Yao et al., 2020;GTEx, 2017) with tissue groups defined by Yao et al. (2020).
We applied stratified MESC on traits with significantly positive ℎ 2 med (Supplementary Figure 7-15) for each tissue group to calculate ℎ 2 med enrichments of ohnologs and non-ohnologs. We added two annotations of all SNPs within 100 kb of ohnologs and non-ohnologs in addition to the baseline-LD model to ensure the elimination of false positive ℎ 2 med enrichment (Yao et al., 2020) when performing MESC. For a comparison of ℎ 2 g or ℎ 2 med enrichments between ohnologs and non-ohnologs, as each trait had values of ohnologs and non-ohnologs, we performed a Wilcoxon signed-rank test.
We used the Benjamini-Hochberg method to calculate q-values. We used the Paule and Mandel method to perform the random-effects meta-analysis of ℎ 2 g or ℎ 2 med . All statistical analyses were performed using the Python package of Statsmodels v0.12 (Seabold and Perktold, 2010). Refer to Supplementary Materials and Methods for more details.