Abstract
In humans and other mammals, germline mutations are more likely to arise in fathers than in mothers. Although this sex bias has long been attributed to DNA replication errors in spermatogenesis, recent evidence from humans points to the importance of mutagenic processes that do not depend on cell division, calling into question our understanding of this basic phenomenon. Here, we infer the ratio of paternal-to-maternal mutations, α, in 42 species of amniotes, from putatively neutral substitution rates of sex chromosomes and autosomes. Despite marked differences in gametogenesis, physiologies and environments across species, fathers consistently contribute more mutations than mothers in all the species examined, including mammals, birds and reptiles. In mammals, α is as high as 4 and correlates with generation times; in birds and snakes, α appears more stable around 2. These observations can be explained by a simple model, in which mutations accrue at equal rates in both sexes during early development and at a higher rate in the male germline after sexual differentiation, with a conserved paternal-to-maternal ratio across species. Thus, α may reflect the relative contributions of two or more developmental phases to total germline mutations, and is expected to depend on generation time even if mutations do not track cell divisions.
1 Main
Humans tend to inherit more de novo mutations (DNMs) from their fathers than from their mothers. This phenomenon was first noted over 70 years ago, when JBS Haldane relied on the population frequency of hemophilia in order to infer that the de novo mutation rate at the disease locus is substantially higher in fathers [1]. Work since then, particularly in molecular evolution, has confirmed a “male bias” in mutation (henceforth paternal bias) [2–9], with estimates from human pedigrees indicating that, genome-wide, DNMs occur roughly four times more often on the paternal genome than on the maternal one [10, 11].
The textbook explanation for the paternal mutation bias is that it arises as a consequence of the vastly different numbers of cell divisions–and hence DNA replication cycles–necessary to produce sperm compared to oocytes [12– 15]. Indeed, in humans as in other mammals, oocytes are arrested in meiotic prophase I at birth, with no subsequent DNA replication in the mother’s life, whereas spermatogonia start dividing shortly before puberty and divide continuously throughout the reproductive life of the father [13, 16]. The observation that the number of DNMs increases with paternal age has been widely interpreted in this light, as evidence for DNA replication errors being the predominant source of germline mutation [10, 11, 17, 18].
A number of recent findings have called this view into question, however. First, analyses of large numbers of human pedigrees revealed an effect of maternal age on the number of maternal DNMs [19, 20], with an additional ∼ 0.4 mutations accrued per year. Given the lack of mitotic cell division in oocytes after birth, this observation indicates that by typical reproductive ages, at least half of maternal DNMs arise from DNA damage [18]. Second, despite highly variable rates of germ cell division over human ontogenesis, germline mutations accumulate with absolute time in both sexes, resulting in a ratio of paternal-to-maternal germline mutation, α, of around 3.5 at puberty and very little increase with parental ages [21]. Third, studies in a dozen other mammals suggest that α ranges from 2 to 4 whether the species reproduces months, years or decades after birth [22, 23], when estimates of germ cell division numbers at time of reproduction would predict a much wider range in α [13, 22, 24, 25]. Explaining the observations in humans under a model in which most mutations are due to replication errors, and thus track cell divisions, would call for an exquisite balance of cell division and mutation rates across developmental stages in both sexes [26]. In males, the constant accumulation of mutations with absolute time would require varying rates of germ cell divisions over ontogenesis to be precisely countered by reciprocal differences in the per cell division mutation rates. In females, it would necessitate that the mutation rate per unit of time be identical whether mutations arise from replication errors or damage. In turn, the similarity of α across mammals that differ drastically in their reproductive ages would entail two distinct sources of mutation—replication error in males and damage in females—covarying in tight concert with generation times.
A more parsimonious alternative is that most germline mutations arise from the interplay between damage and repair rather than from replication errors [27], and that the balance results in more mutations on the paternal than the maternal genome [26]. Assuming repair is inefficient relative to the length of the cell cycle or, perhaps more plausibly, that repair is efficient but inaccurate [28, 29], mutations that arise from damage will not track cell divisions [26]. Damage-induced mutations must underlie the observed maternal age effect on DNMs in humans; they could also account for the accumulation of germline mutations in proportion to absolute time in males, assuming fixed rates of damage and repair machinery errors in germ cells [26]. In support of damageinduced mutations being predominant in the human germline, analyses of the mutation spectrum indicate that 75% of DNMs and 80% of mutations in adult seminiferous tubules are due to mutation “signatures” SBS5/40 [30, 31], which are clock-like, uncorrelated with cell division rates in the soma [32, 33], and prevalent in post-mitotic tissues [29, 34]. In addition, most substitutions in post-pubertal germ cell tumours are attributed to SBS5/40, in both females and males [35]. More generally, cell division rates do not appear to be a major determinant of mutation rates across somatic tissues [36]: notably, post-mitotic neurons accumulate mutations at a similar rate as granulocytes, which are the product of continuous cell divisions [29]. A decoupling between cell division numbers and mutation burden has also been described in colonic crypts across mammals [37], and in yeast, up to 90% of mutations have been estimated to be non-replicative in origin [38]. Altogether, these results suggest an important role, for both germline and soma, of mutagenic processes that accumulate with absolute time, as expected from damage-induced mutations [26].
1.1 Estimating sex differences in germline mutation rates across amniotes
In undermining the prevailing understanding of the paternal bias in human germline mutations, these observations revive the question of how the bias arises, as well as of the influences of life history traits and exogenous or endogenous environments. To investigate them, we took a broad taxonomic view, characterizing the paternal mutation bias across amniotes, including mammals but also birds and snakes, which differ in potentially salient dimensions. As two examples, in birds as in mammals, oogenesis is arrested by birth in females,while spermatogenesis is ongoing throughout male reproductive life [39, 40], but birds have internal testes whereas some mammals have external testes. In addition, mammals and birds are endotherms, in contrast to ectothermic reptiles such as snakes. More generally, the taxa considered vary widely in their life histories, physiologies, and natural habitats.
To estimate α in each lineage, we based ourselves on the evolutionary rates at putatively neutrally-evolving sites of sex chromosomes compared to the autosomes [41]. The more direct approach of detecting de novo mutations in pedigrees requires them to be available for each species, in large numbers for precise estimates. In contrast, the evolutionary method is in principle applicable to any set of species with high quality genome assemblies and a stable sex karyotype. It takes advantage of the fact that at the population level, sex chromosomes spend different numbers of generations in each sex (e.g., the X chromosome spends twice as many generations in females as in males), whereas autosomes spend an equal number in both (Figure 1A). Thus, all else being equal, if there is a paternal mutation bias, an autosome with greater exposure to the more mutagenic male germline will accumulate more neutral substitutions than the X over evolutionary timescales (Figure 1A); the inverse will be true for the autosomes compared to the Z chromosome [41].
Such evolutionary approaches have been widely applied, but until recently they were limited in the number of loci or species (e.g. [6–8, 42–44]) and did not take into account the influence of sex differences in generation times on the estimation of α [23]. An additional complication to consider is that X (Z) and autosomes differ not only in their exposures to male and female germlines but in a number of technical and biological features (notably, GC content) that may need to be controlled for [45–47]. Moreover, analyses involving closely related species can be confounded by the effects of ancestral polymorphism: for example, lower ancestral diversity in the X chromosome relative to the autosomes reduces the X-to-autosome divergence ratio, leading to overestimation of α [5] (Figure 1B). In birds, unresolved branches within the phylogeny present an additional difficulty in estimating substitution rates [48, 49].
Here, we designed a pipeline for estimating the paternal mutation bias systematically across a wide range of species, mindful of these issues. To these ends, we employed existing whole genome alignments [50, 51] or produced our own (for snakes, see Sequence alignments in Methods), focusing on assemblies with high quality and contiguity and, where possible, those based on a homogametic individual. To handle the confounding effects of ancestral polymorphism on divergence, we thinned species in the phylogeny to ensure a minimum level of divergence between them, relative to polymorphism levels (see Species selection criteria in Methods). This stringent filtering procedure resulted in three whole genome alignments including 20 mammals, 17 birds and five snake species, respectively (Table S2).
In order to estimate neutral substitution rates from the alignments and compare X (Z) and autosomes while minimizing confounding factors, we focused on non-repetitive, non-exonic regions that were orthologous across all species in an alignment and did not overlap with pseudo-autosomal regions with orthologs on the Y (W) chromosome (see Selecting non-repetitive and putatively neutral sequences in Methods; see Figure S1F for a more stringent masking of all conserved regions). To account for differences between X (Z) and autosomes in features other than their exposure to each sex, we regressed neutral substitution rates in the 1Mb genomic windows against GC content and GC content squared (Figure 1B). We took this approach because GC content is readily obtained from any genome sequence and is highly correlated with known modifiers of the mutation rate such as replication timing and the fraction of CpG dinucleotides [47, 52]. We then obtained substitution rate estimates for the X (Z) chromosome and autosomes from the regression fit. Finally, 135 we inferred α for the terminal branches leading to the 42 amniote species from the ratio of the substitution rate estimates for the X (Z) versus the autosomes (Figure 2), taking into account sampling error as well as uncertainty in the ratio of paternal-to-maternal generation times [53] (see Estimating α from X-to-autosome substitution rate ratios in Methods).
Overall, our evolutionary-based estimates, , are consistent with estimates from pedigree sequencing studies, (Figure 2); in particular, the point estimates for species with the largest amount of available DNM data (e.g., humans, mice and cattle) are in very close agreement. This finding is not necessarily expected, as is an average over many thousands of generations of evolution, whereas estimates from DNMs are based on small numbers of families at present. In principle, differences between the estimates could therefore arise if α evolves rapidly, or if the historical ratios of paternal-to-maternal generation time differ from the those sampled in the pedigrees (Figure 2) [53].
Disagreement between the two estimates could also arise from mutation rate modifiers that differ between sex chromosomes and autosomes: in particular, the low compared to in cats [54] could be due to unusual features of the X chromosome (as a hypothetical example, if the feline X chromosome is very late replicating relative to the autosomes). Given the many reasons for the two types of estimates to differ, the general concordance between them suggests that, with the possible exception of cats and dogs, the evolutionary approach is providing sensible estimates and the paternal bias in mutation is not rapidly evolving.
1.2 A paternal bias in mutation is widespread across amniotes
A paternal bias in mutation is seen across amniotes, with a range of 1–4 in the species considered (Figure 2). The estimates remain similar if we exclude hypermutable CpG sites (Figure S1B), or focus only on mutation types that are not subject to the effects of GC-biased gene conversion (gBGC) (Figure S1F and Figure S2). More generally, they are robustly above 1 for different choices of conservation filters (e.g., excluding all conserved regions, not just exons) and different substitution types (see Figure S1 for details). These results establish that the paternal bias in mutation is not a feature of long lived humans or of mammals, but instead ubiquitous across species that vary markedly in their gametogenesis, physiology and life history.
Since gBGC is induced by recombination and acts like selection for GC, and given the greater population recombination rate of autosomes relative to the sex chromosomes, we would expect the X-to-autosome substitution rate ratio of gBGC-favored mutation types (T>C and T>G) to be somewhat lower than that of mutation types unaffected by gBGC (C>G and T>A). Consistent with this expectation, estimates in mammals using only gBGC-favored mutation types were inflated relative to estimates from mutation types unaffected by gBGC (Figure S2). Also as expected, bird and snake species with ZW sex determination exhibit the opposite pattern (i.e., a deflated ratio of Z-to-autosome substitution rate leads to a decreased estimate of ; Figure S2). The behavior of the different mutation types therefore provides a further sanity check on our estimates.
Within mammals, the mean value of is 2.7, with a range 1.0 to 4.1 and a coefficient of variation of 0.29. In birds, is lower on average but also seemingly more stable, ranging from 1.5 to 2.7 (mean = 1.8, coefficient of variation = 0.19). In the handful of snake species sampled, the mean is similar to that of birds and ranges from 1.3 to 2.2 (mean = 1.7, coefficient of variation = 0.23), in agreement with a previous evolutionary estimate for rattlesnake (α = 2.0; [55]).
In mammals, variation in α has long been known to be associated with generation times, and has been consistently interpreted as resulting from greater numbers of replication errors in species with longer-lived fathers (e.g., [4, 9, 23, 56]). We confirmed the observation here: after accounting for the phylogenetic relationship between species, mammals reproducing at older ages show a stronger paternal bias in mutation (p-value = 0.01, r2 = 29%; Figure 3). Statistically significant relationships also exist between and other life history traits (Figure S3), but these traits are strongly correlated with one another (Figure S4) and generation time is the strongest single predictor (Figure S3; see Testing relationships between α and life history traits in Methods). In contrast, a significant relationship between generation time and is not seen in birds (p-value = 0.30, r2 = 7%; Figure 3; [57], despite similar numbers of species and a similar range of generation times to mammals. (Given the paucity of generation time and α estimates for snakes, we could not test the relationship in reptiles.) In light of more recent evidence that mutations depend on absolute time and not cell division rates, the standard explanation for this generation time effect no longer holds. These observations therefore raise the question of how else the relationship between generation times and α in mammals can be explained.
1.3 A cell-division-independent explanation for the correlation between α and generation time
In eutherian mammals, embryo development is likely independent of sex until primordial germ cell (PGC) specification and subsequent development of the gonads [58]. As a result, mutations arising during early embryogenesis (Early) are expected to occur at a similar rate in males and females (αEarly = 1), as has been inferred in the few pedigree studies in which DNMs during parental early embryogenesis are distinguished from mutations later in development, namely in humans [59], cattle [25] and mice [24] (Figure 4A). While sex differences in early development may exist [60], differences in male and female mutation rates at such an early stage are likely modest in mammals [61, 62]. At some point after sexual differentiation of the germline, however, (in what we term the Late stage) mutation rates in the two sexes need no longer be the same: sources and rates of DNA damage could differ between germ cells, as could the efficiency and accuracy of repair. Indeed, human fathers that recently reached puberty contribute over three times more mutations than similarly aged mothers [21]. Intriguingly, the magnitude of paternal bias for mutations that occurred long after sexual differentiation of the PGCs appears to be similar in mice, cattle and humans, at approximately 4:1 [24, 25, 59] (Figure 4A).
In light of these observations, we considered a simple model in which α in mammals is the outcome of two developmental stages with distinct ratios of paternal-to-maternal mutations. In the Early stage to germline sex differentiation, we assumed a paternal-to-maternal mutation ratio of 1 and an expected number of mutations on par with what is observed in humans (i.e., 5 mutations per haploid genome; [59, 63, 64]) (Figure 4A). In the Late developmental stage after germline sex differentiation, which varies in length among species, we assumed a conserved ratio of paternal-to-maternal yearly mutation rates of 4, as suggested by DNM data [24, 25, 59] (Figure 4A and Table S3). This model yields a relationship between α and generation time bounded below by 1 and with a plateau at 4, assuming the same generation times in the two sexes (Figure 4B); more generally, the height of the plateau depends on the ratio of paternal-to-maternal generation times (Figure S5). The rapidity with which α reaches this asymptote is determined by the magnitude of the paternal mutation rate per year in the Late stage (Figure 4B). Most saliently, a positive relationship between α and generation time is expected as long as αEarly < αLate.
Using this model, we then predicted α for the terminal branches in the mammalian tree. To estimate the number of mutations occurring in Late for each branch, we used the evolutionary rates in Figure 2A. Specifically, we calculated a sex-averaged substitution rate per generation by multiplying the autosomal yearly substitution rate in each branch by a generation time estimate for its tip (Table S2). Given a fixed ratio of paternal-to-maternal mutation rates of 4 in the Late stage, the substitution rate for each sex can be calculated for any given ratio of paternal-to-maternal generation times (Model-ing the effects of germline developmental stages on α in Methods). From these quantities, we obtain an estimate of α that we can use to predict (Modeling the effects of germline developmental stages on α in Methods). When we do so, we explain a significant proportion of the variance (r2 = 37%) in in mammals (p-value = 0.005; Figure 4C)—42% of the variance, after taking into account sampling error in our estimates (see Modeling the effects of germline developmental stages on α in Methods). Moreover, this remains true regardless of the precise number of Early mutations assumed (see Modeling the effects of germline developmental stages on α in Methods). The two clear outliers are carnivores, for which may be an underestimate, given the higher estimate from DNMs in cats (Figure 2).
These predictions rely on evolutionary estimates that are uncertain, due for instance to inaccuracies in split time estimates and the use of contemporary generation times as proxies for past ones. If we instead predict α using parameters derived from pedigree data in the nine mammalian species for which more than one trio has been studied (Modeling the effects of germline developmental stages on α in Methods), the model explains 82% of the variance in (p-value = 0.001; Figure 4C). We note that this assessment is based on few phylogenetically-independent contrasts, and so while the fit is statistically significant, the high variance explained may be somewhat deceiving.
In any case, this phenomenological model clarifies that the increased α seen in long-lived mammals may simply result from a reduction in the fraction of early embryonic mutations relative to total number of mutations per generation–consistent with the higher proportion of Early mutations in mice and cattle compared to humans (Figure 4A). In addition, the model helps to explain the only modest increase in α with parental ages observed in humans [21].
Given this explanation for an effect of generation time on α in mammals, how then to interpret the absence of such an effect in birds? One possibility is that sex differences in mutation rates arise earlier in development: unlike in mammals, the avian sexual phenotype is directly determined by the sex chromosome content of individual cells [65, 66] and PGCs are determined by inheritance of maternally derived gene products (preformation) [67]. These features of germ cell development raise the possibility that sex-differences in mutation rates could appear earlier in ontogenesis in birds than in mammals, consistent with reported sex differences in the cellular phenotypes of PGCs prior to gonad development [68]. If the developmental window when both sexes have a similar mutation rate is indeed small, then assuming that the ratio of paternal-to-maternal mutation rate is roughly constant across parental ages, generation times should have no influence on α. Alternatively, the lack of an apparent generation time effect on α in birds may arise simply because the ratio of paternal-to-maternal age effects in Late is lower in birds than in mammals (e.g., 2 instead of 4). In this scenario, bird generation times would influence α within a narrower range (e.g., between 1 and 2), and our power to detect the relationship between generation time and α may be reduced.
1.4 Outlook
Analyzing diverse species with the same pipeline, we found that, far from being a feature of species with long-lived males, a paternal bias in germline mutation is ubiquitous across amniotes that differ markedly in their life history, physiology and gametogenesis. Moreover, by considering the different development stages over which germline mutations arise, we provide a new and simple explanation for variation in the degree of sex bias across mammals that does not require dependence on the number of cell divisions. While our findings do not account for why male germ cells might accumulate more mutations than female ones, the observation that paternal bias varies little across species exposed to disparate physical environments, and presumably exogenous mutagens, hints at sex differences in endogenous sources of DNA damage or repair (e.g., [69]). Another question raised by our findings is why, after sexual differentiation of the germline, mutation appears to be more paternally-biased in mammals (∼ 4:1) than in birds and snakes (∼2:1).
More generally, our results recast long standing questions about the source of sex bias in germline mutations as part of a larger puzzle about why certain cell types (here, spermatogonia versus oocytes) accrue more mutations than others. Intriguingly, the relative mutagenicity of different tissues appears to be conserved across species: for instance, in mammals, the balance of damage and repair results in an approximately four-fold increase in mutation rates per unit of time in spermatogonia compared to oocytes (Figure 4A). Similarly, comparing mutation rates in colonic crypts [37] to estimates for spermatogonia, the ratio of crypt-to-sperm mutation rate appears relatively stable across four mammalian species (Figure S6). These observations may point to a role of stabilizing selection in maintaining the relative rates at which mutations accumulate in different tissues over evolutionary timescales.
3 Data availability
Scripts for reproducing the analyses and figures may be found at https://github.com/flw88/mut_sex_bias_amniotes.
Acknowledgements
We thank Ziyue Gao and Guy Sella for comments on an earlier version of the manuscript. We thank Rusty Lansford, Mike McGrew, and Daniel Hooper for discussions about avian development and evolution; Turk Rhen for discussions about reptile sex determination; Carla Hoge and Zach Fuller for sharing their corn snake genome assembly; Anne Bronikowski and the Vertebrate Genome Project for sponsoring and generating the Thamnophis elegans assembly; Alex Cagan for early access to data of mutation burdens in colonic crypts across mammals; Carole Charlier and Michel Georges for sharing data on de novo mutation in cattle; and Peter Andolfatto, Michael B. Eisen, Priya Moorjani, as well as William R. Milligan, Anna Yoney and other members of the Andolfatto, Przeworski, and Sella labs for helpful discussions. This work was funded by GM122975 to MP and a HFSP postdoctoral fellowship to MdM.
References
- [1].↵
- [2].↵
- [3].
- [4].↵
- [5].↵
- [6].↵
- [7].
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].
- [44].↵
- [45].↵
- [46].
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].
- [72].
- [73].
- [74].
- [75].
- [76].
- [77].
- [78].
- [79].
- [80].
- [81].
- [82].
- [83].
- [84].
- [85].
- [86].
- [87].
- [88].
- [89].
- [90].
- [91].
- [92].
- [93].
- [94].
- [95].
- [96].
- [97].
- [98].
- [99].
- [100].
- [101].
- [102].
- [103].