Abstract
Hybrid zones provide a window into the evolutionary processes governing species divergence. While the role of postzygotic isolation has been extensively characterized in the context of hybrid zones, the contribution of prezygotic isolation is less well explored. In particular, the effect of mate choice mediated by preference learning such as self-recognition or imprinting remains largely elusive. Here, we present model-based simulations investigating the influence of the preference function, the genetic architecture of the mating trait and the role of sexual selection on spatio-temporal hybrid zone dynamics. The model is parameterized with empirical data from the hybrid zone between all-black carrion and grey-coated hooded crows allowing qualitative and quantitative inference in a natural setting. The best-fit model resulted in narrow clines for the mating trait loci coding for color polymorphism maintained by a moderate degree of assortative mating. Epistasis induced hybrid zone movement in favor of dark alleles followed by a shift in the opposite direction ∼1,200 generations after secondary contact. Unlinked neutral loci diffused near-unimpeded across the zone. This study demonstrates that assortative mating can explain steep transitions in mating trait loci without generalizing to genome-wide reproductive isolation. It further emphasizes the importance of mating trait architecture for spatio-temporal hybrid zone dynamics.
1 Introduction
In sexually reproducing organisms, speciation requires the emergence of reproductive isolation (Coyne and Orr, 2004; Dobzhansky, 1937; Mayr, 1942). Isolating barriers can assume any mechanism, reducing the probability of heterospe-cific mate choice, fertilization success or hybrid fitness (Coyne and Orr, 2004; De Queiroz, 2007). During the first steps of population divergence a limited number of loci subject to disruptive or divergent selection can lead to incomplete reproductive isolation (Wu, 2001). Under conditions of gene flow, effective migration will consequently be reduced around these loci (now functioning as “barrier loci”) leaving elevated patterns of genetic differentiation (Ravinet et al., 2017; Wolf and Ellegren, 2017). Depending on the relative magnitude of the total selection coefficient and recombination, the localized effect around barrier loci may, or may not, generalize and reduce gene flow across the entire genome (coupling) (Barton, 1983; Bierne et al., 2011; Feder et al., 2014). Understanding how traits function as isolating barriers and how they are encoded in the genomes of diverging populations is a crucial step to fully appreciate the mechanisms spawning global biodiversity (Dobzhansky, 1937; Presgraves, 2010; Shaw and Mullen, 2011; Wolf et al., 2010).
The above mentioned processes do not take place in a void, but unfold in populations subject to the governance of space and time (Abbott et al., 2013). During periods of geographic isolation, reproductive barriers most readily evolve as a by-product of divergence and are put to test when populations come into secondary contact (Barton and Hewitt, 1985). When reproductive isolation is complete, sister species can coexist and potentially overlap in their ranges, provided there is little competition. Incomplete isolation will result in hybrid zones (Bigelow, 1965; Harrison, 1990; Hewitt, 1988; Rieseberg et al., 1999). These zones act as semipermeable barriers retaining loci under divergent selection, while unlinked neutral variants can still introgress (Key, 1968; Payseur, 2010). When the homogenizing effect of dispersal into the zone is offset by selection against hybrids, the resulting hybrid zone can reach relative stability in the form of sigmoid clines (“tension zone model”) (Barton and Hewitt, 1985; McEntee et al., 2018). As a result, unique clines are expected for different traits and the underlying genes, unless coupling elicits a genome-wide response (Barton, 1983; Flaxman et al., 2014; Feder et al., 2014). Cline width and position will be influenced by the strength of selection, dispersal distances and the genetic architecture of the trait subject to selection (Endler, 1977; May et al., 1975). Importantly, hybrid zones need not be static. Asymmetries in relative fitness between parental populations or a dominant genetic architecture of the loci contributing to divergence may affect cline centers and induce hybrid zone movement until they are caught in a trough of low population density, often aligning with physical or ecological barriers (Brodin et al., 2013; Mallet, 1986; Secondi et al., 2006).
Mathematical models and computer simulations within the framework of cline analyses have led to insights on the effects of selection strength (Barton and Hewitt, 1985), the underlying nature of selection (exongenous vs. endogeneous) (Kruuk et al., 1999), the genetic architecture of the contributing loci (Bürger, 2017; Mallet and Barton, 1989; Mallet, 1986) and their complex interactions (Barton, 1983; Barton and Shpak, 2000; Kruuk et al., 1999). Most models, however, rely on the assumption of random mating, and few have explored the role of isolation arising from mate choice alone. Given the importance attributed to premating isolation, in particular during the early stages of speciation (Coyne and Orr, 1997; Stelkens et al., 2010; Mayr, 1963), this gap deserves attention. It is undebated that sexual selection can drive rapid evolution of mating cues and preferences promoting speciation, albeit most readily in conjunction with ecological factors (Ritchie, 2007; Servedio and Boughman, 2017) in form of sensory drive (Seehausen et al., 2008) or magic traits (Servedio et al., 2011). Yet, one of the major challenges for speciation to occur by sexual section is recombination breaking down the association between mating cues and corresponding preferences (Felsenstein, 1981). This limitation can be overcome in two ways: physical linkage of the loci underlying genetically determined trait and preference (Merrill et al., 2019; Xu and Shaw, 2019), or preference learning of a genetically encoded mating trait (we do not consider learned traits here). With preference learning, frequencies of preference and trait become correlated resulting in positive frequency dependent selection. With some degree of incipient divergence (secondary sympatry, geographic isolation) the correlation between trait and preference essentially reduces to a ‘one allele model’ with the potential to accelerate speciation (Felsenstein, 1981; Servedio, 2016). Indeed, learning of sexually selected traits has been shown to promote phenotypic divergence and maintain polymorphism in the face of gene flow (Yang et al., 2019; Brodin and Haas, 2006; Verzijden et al., 2012). In hybrid zones, simulations suggest that assortative mating can maintain diverged mating traits, but contributes little in impeding genome-wide gene flow of unlinked genetic variation (Brodin and Haas, 2009; Irwin, 2020). Empirical work in hybrid zones regularly identifies sexually selected mating traits and preferences as an important component of reproductive isolation (Yang et al., 2019; Schumer et al., 2017; Seehausen et al., 2008; Hench et al., 2019). Yet, only few studies have rigorously established the link between phenotypic divergence in mating traits, their genetic basis and preference learning.
In this study, we investigate the dynamics of a hybrid zone governed by assortative mating using a theoretical model informed by empirical data. The study is inspired by the well-studied hybrid zone between all-black carrion and grey-coated hooded crows (Corvus (corone) corone and C. (c.) cornix, respectively) that presumably arose by secondary contact in the early Holocene approximately 12,000 years (or 2,000 crow generations) ago (Mayr, 1942; Meise, 1928; Parkin et al., 2003; Vijay et al., 2016). In this system, there is only limited evidence for natural selection against hybrids (Saino, 1990; Saino and Bolzern, 1992; Saino and Villa, 1992), but multiple support for plumage based assortative mating (Meise, 1928; Randler, 2007a) and social marginalization of minority phenotypes (Saino and Scatizzi, 1991; Londei, 2013). The narrow morphological cline between the two taxa is thus believed to be mainly driven by prezygotic isolation mediated by assortative mating based on cues encoded in plumage pigmentation patterns (cf. Brodin et al. (2013); Kryukov and Blinov (1994); Meise (1928); Vijay et al. (2016)). While the mechanism underlying assortative mate choice remains elusive (imprinting, genetic coupling, self referent phenotype matching), the genetic basis of the presumed mating trait is known. In the European zone of contact, plumage pigmentation patterns are encoded by two epistatically interacting pigmentation genes (Fig. 1) that are subject to divergent selection. The rest of the genome, however, appears to introgress without much resistance (Knief et al., 2019; Poelstra et al., 2014, 2015; Vijay et al., 2016). An ecological contribution to the European hybrid zone dynamics exceeding local effects can almost certainly be excluded (Haas et al., 2010; Randler, 2007b; Rolando and Laiolo, 1994; Saino et al., 1998).
We formulated a theoretical model inspired and partly parameterized by empirical data in the crow system. This combination of theory and empirical data accomplishes two goals. First, by fitting the model to genotypic data we obtained quantitative estimates of the parameters characterizing the mate choice function. Contingent on the assumptions in the model, the study thus provides valuable, and to the date rare, information on the strength of assortative mating in a natural system. Second, conditional on the basic properties of the empirical system we explored a set of variants of the main model modifying
the preference function underlying mate choice,
the genetic architecture of the mating trait and
the fitness effect of mate choice.
This exercise provides insight into the processes governing spatial and temporal hybrid zone dynamics. Furthermore, by considering reproductive isolation not only for barrier loci associated with the mating trait, but also for freely recombining neutral loci, the study sheds light on the consequences of preference learning for genome-wide divergence, and eventually speciation.
2 Material and methods
In the following, we will first detail the empirical data and then specify the assumptions and setup of the model.
2.1 Empirical data
2.1.1 Phenotypic variation
Knief et al. (2019) sampled crow individuals across the German and Italian part of the European crow hybrid zone (hereafter “central” and “southern” transects, respectively). Transects were chosen such that they were perpendicular to the geographical course of the zone and included phenotypically pure carrion and hooded crow populations (Corvus (corone) corone and C. (c.) cornix) at either end, as well as hybrids within the hybrid zone. In the southern transect, individuals were sampled as fully feathered adults allowing characterization of individual phenotypes. For a total of 129 individuals, 11 plumage patches were scored for the amount of eumelanin deposited in the feathers (coded as 0, 1, 2 with increasing blackness) and subsequently combined using principal component analysis. Principal component 1 (PC1) explained 78.22 % of the phenotypic variance with positive values reflecting a black carrion crow phenotype, negative values a grey hooded crow phenotype and intermediate values being associated with phenotypic variation in hybrids (Knief et al., 2019). In this study, we used these PC1 scores as the metric for a one-dimensional representation of phenotypic variation.
2.1.2 The genetic basis of phenotypic variation
Using genome wide association mapping, Knief et al. (2019) identified two major effect loci explaining 87.91 % of variation in plumage patterns represented by PC1. Most of the variation in PC1 (up to 76.07 %) was explained by a locus on chromosome 18 (chr18) embedded in a region of reduced recombination. For the purpose of this study, we represent allelic variation at this locus with capital letters as D = dark and L = light. The second locus, which is associated with allelic variation of the NDP gene on chromosome 1, will be denoted in small letters with alleles d = dark and l = light. Knief et al. (2019) demonstrated that phenotypic variation was best explained by recessive epistasis between chr18 and NDP explaining an additional 10.87 % of phenotypic variation captured by PC1: Allelic variation in NDP had no phenotypic effect in all-black chr18DD individuals, but accounted for most of the residual variation in chr18DL and chr18LL (Fig. 1 and online supplement Fig. S1). Under this model of epistasis, the possible nine genotypic combinations (DDdd, DDdl, DDll, DLdd, DLdl, DLll, LLdd, LLdl, LLll) thus reduce to seven phenotypic states, as DDdd, DDdl and DDll all code for the same black carrion crow phenotype. Fig. 1 illustrates the phenotypic space using representative individuals with the respective genotypic constitution.
To explore the effect of genetic architecture on hybrid zone dynamics in the model (see below), we will consider both the epistatic genetic architecture described above, and phenotypes from an additive model (see online supplement Fig. S1).
2.1.3 Sample preparation and genotyping
To quantify assortative mate choice and predict its effect on the genotypic constitution across the hybrid zone, one would ideally sample breeding pairs along a transect. This is unfortunately not possible at a larger scale in crows. We therefore used unpaired, randomly sampled adults to characterize phenotypic variation in the southern transect (see 2.1.1) and established its genetic basis (see 2.1.2). Individuals from the central part of the hybrid zone, in contrast, were sampled as nestlings at an age at which plumage characteristics can not be inferred with certainty (Blotzheim et al., 1993); see online supplement Fig. S2. However, due to their relatedness, nestling genotypes contain information on the genotypic constitution of their parents and hence on phenotype-dependent mate choice. We therefore used the genotypic distribution of 152 nestlings sampled from 55 nests (median brood size = 3 nestlings) along the transect (Knief et al., 2019) as read out for model fitting (see 2.2.6). This procedure rests on the assumption that the genetic architecture of plumage patterning is equivalent in both transects. This is well supported by a shared evolutionary history of the parental populations at either side of the hybrid zone (Vijay et al., 2016), congruent landscapes of genetic variation across the central and southern part of the hybrid zone (Vijay et al., 2016) and near-identical selection signatures at the underlying loci (Knief et al., 2019).
We genotyped all individuals from the central transect for a selection of 1,152 SNPs spread across the whole genome using the GoldenGate assay (Illumina). A detailed description of the assay design, sample preparation, SNP calling and quality control procedure is given in (Knief et al., 2019). The final data set comprised all 152 individuals genotyped at 1,111 polymorphic loci (average call rate of 99.48%). We further included 65 individuals of the allopatric populations in the GoldenGate genotyping and added 10 hooded crows that had been sequenced on the HiSeq2000 (Illumina) platform (paired-end libraries; sequence coverage ranged from 7.12× to 13.28×, average = 9.77×, median = 9.83×) and genotyped using the HaplotypeCaller in GATK (v3.3.0; DePristo et al., 2011; Vijay et al., 2016). All individuals were sexed based on their heterozygosity for 114 SNPs located on the sex chromosome Z (excluding the pseudo-autosomal region located at chrZ ≤ 2.56 Mb, N = 15 SNPs). Genotypes of the pigmentation loci (see 2.1.2) were inferred as follows. The genetic factor on chr18 was represented by ancestry informative diplotypes spanning the 2.8 Mb region on chromosome 18. Ancestry (pure carrion chr18DD; pure hooded chr18LL; mixed chr18DL) was inferred using the NewHybrids software (v2.0+ Developmental. July/August 2007; Anderson and Thompson, 2002) as described in (Knief et al., 2019). Individuals that were assigned as backcrosses with evidence for recombination were excluded (N = 12 individuals in the hybrid zone and N = 11 allopatric individuals). To represent genotypic variation in NDP we chose the SNP on chromosome 1 at 6,195,380 Mb (genome version 2.5, RefSeq Assembly ID according to the National Center for Biotechnology Information: GCF 000738735.1; Poelstra et al., 2014) that explained most of the variation in plumage coloration (Knief et al., 2019).
2.1.4 Paternity assignment
Our model assumes that individuals from the same nest are full-siblings. Thus, we estimated whether nest mates were full- or half-siblings originating from extra-pair copulations using 752 SNPs that were located on all autosomes except chromosome 18 (for details see Knief et al., 2020). Full- and half-sibling pairs could reliably be separated based on their kinship coefficients (θ, Weir et al., 2006). For one nest containing three individuals, with two full-siblings (θ = 0.23 and θ = 0.26) and one half-sibling (θ = 0.12), relationships could not unambiguously be resolved. We removed this nest (N = 3 individuals) and another six extra-pair young, leaving 52 nests containing a total of 131 individuals in the hybrid zone and 64 allopatric individuals for subsequent analyses.
2.2 Theoretical model
Here we specify our main model for the hybrid zone of carrion crows and hooded crows. The model parameters were then fit to data from the crow system (sections 2.2.4 and 2.2.6). For an overview of the different components of our models and the data sets to which we fit them, see online supplement Fig. S2. In section 2.3 we will specify three variants of our model, which differ from the main model in their assumptions on mating preferences (2.3.1), the genetic architecture of the mating trait (2.3.2) or the fitness costs associated with mate choice (2.3.3). The source code of the programs that we developed to carry out simulation studies and to fit all four model variants to the data is available from https://github.com/statgenlmu/assortative_crows/.
2.2.1 Temporal dimension
Similar to other European taxa, Eurasian carrion and hooded crows were presumably separated into refugia during the last ice age (Mayr, 1942; Hewitt, 1988) which peaked in glacial coverage around 20,000-18,000 years ago (Hewitt, 1988; Frenzel, 1992). Despite a subsequent period of increasing temperature, it was likely only after intermission of a cooling phase in the younger Dryas 12,800 to 11,500 years ago (Goslar et al., 2000; Rasmussen et al., 2014) that rapid warming opened for suitable crow habitat including breeding opportunities in larger shrubs or trees (Giesecke et al., 2017). We therefore set the initiation of secondary contact to approximately 12,000 years before present. Using a generation time of 6 years for the Eurasian crow (Kutschera et al., 2020; Vijay et al., 2016), this corresponds to 2,000 generations.
2.2.2 Spatial dimension
In our model, we conceptualize space in a one-dimensional grid of 200 bins. When we fit the model to the crow data, we assume that these bins represent 5 km wide strips that are parallel to the initial contact line (ICL) of the two color morphs, such that our model covers a range from 500 km to the west to 500 km to the east of the ICL. More precisely, we assume that bins 100 and 101 represent the areas that are directly adjacent to the ICL, up to 5 km to the west and east, respectively; bins 99 and 102 are the areas that are 5 to 10 km western or eastern of the ICL, etc. Generally, bin x for any x ∈ {1, 2, …, 200} represents the area that has a distance between |x − 100| · 5 km and |x − 101| · 5 km to the ICL and is west of the ICL if x ≤ 100 or to the east for x ≥ 101.
2.2.3 Modeling approach for assortative mating
We assume that each bin is populated by the same number of crows and that this number is so large that we can neglect genetic drift and any other source of random fluctuations in genotype frequencies, which means that our model is deterministic. For each genotype g with respect to the two mating trait loci, let fg,x(s) ∈ [0, 1] be the frequency of g in bin x at time s. We assume discrete time steps s = 0, 1, 2, …corresponding to non-overlapping generations, with time s = 0 referring to the time of the secondary contact of carrion crows and hooded crows after ice age. We assume that at time 0 all crows in bins 1 to 100 were carrion crows and all crows in bins 101 to 200 were hooded crows. For x ≥ 101 this implies fg,x(0) = 1 if g = LLll, and fg,x(0) = 0 otherwise. For x ≤ 100 we obtain fg,x(0) = 0 for g ∉ {DDdd, DDdl, DDll}, and assume Hardy-Weinberg frequencies (fDDdd, x(0), fDDdl,x(0), fDDll,x(0)) = δ2, 2 · δ · (1 − δ), (1 − δ)2), where δ is the initial frequency of allele d in bin x, which we assume to be equal in all bins x ≤ 100 at time 0.
The core of our model is the recursion to compute the generation s + 1 genotype distribution matrix f (s + 1) = (fg,x(s + 1))g,x from the genotype distribution matrix f (s). Implicitly we assume that the sex ratio is the same for all bins and all genotypes. Following a common mass-action law approach (see e.g. Otto and Day, 2011) we assume that the number of matings between individuals with genotypes g1 and g2 taking place in bin x is proportional to the number of potential couples in bin x, weighted by a mate choice value wg1, g2. The latter can be interpreted as the probability that a male and a female crow with genotypes g1 and g2 will mate if they meet (see 2.2.5 below). Thus, if dz,x is the dispersal probability from bin z to bin x (section 2.2.4), the number of matings between individuals of genotypes g1 and g2 taking place in bin x is proportional to where the summation variables y and z iterate over all combinations of bins from x − 10 (or 1 if x − 10 ≤ 0) to x + 10 (or 200 if x + 10 > 200), where the range of ±10 bins reflects a maximum dispersal distance of 50 km (section 2.2.4). This proportionality could arise, for example, if within each bin randomly moving individuals form a pair in the beginning of the mating season and occupy one of a limited number of territories to reproduce. Individuals that need to search longer for suitable mating partners have smaller chances of finding a vacant spot for their nest. In accordance with this heuristic consideration, our model implies that individuals that are surrounded by fewer preferred mating partners will on average have fewer offspring. Thus, assortative mating induces a reduction of fitness via frequency dependent sexual selection.
To finally obtain the recursion to compute f (s + 1), let for each triplet (g, g1, g2) of genotypes be µ(g | g1, g2) the probability that an offspring of parents with genotypes g1 and g2 has genotype g, according to Mendelian genetics with free recombination between the two loci. The frequency of genotype g in nestlings of generation s + 1 in bin x is then where the constant c ensures that ∑g fg;x(s + 1) = 1.
2.2.4 Dispersal
Natal dispersal distance was assumed to be equal for both sexes and parameterized with data from Siefke (1994, Table 3 and table caption), who analyzed data of hooded and carrion crow individuals that were ringed as nestlings and recovered during the breeding period. Kalchreuter (1970) investigated ring recovery data of carrion crows and obtained a similar distribution of dispersal distances. To use the dispersal distances quantified by Siefke (1994) on a two-dimensional map we first needed to project these distances to our one-dimensional representation of the contact zone. For this we assume that the ICL is a straight line going in south-north direction and decompose each dispersal movement of a crow into its component Dx orthogonal to the ICL and its component Dy parallel to the ICL. The values of Dx or Dy are positive if the direction of movement is west to east or south to north, respectively, and negative otherwise. We assume that the random distribution of dispersal vector (Dx, Dy) is a mixture of two spherical two-dimensional normal distributions. A single spherical normal distribution did not lead to a satisfying fit to the long-tailed distribution of the dispersal distance data (online supplement Fig. S3 top). Thus, the dispersal vector can be represented as (Dx, Dy) = S ·(Zx, Zy), where S is a random variable with two possible values σ1 and σ2 and the vector (Zx, Zy) has a two-dimensional standard normal distribution. To fit the three parameters σ1, σ2 and p = Pr(S = σ1) to the empirical dispersal distances (Siefke, 1994) we numerically maximized the likelihood of (σ1, σ2, p) using the method of Byrd et al. (1995) as implemented in the R command optim with the option L-BFGS-B (Team, 2018). To compute likelihoods we used that with probability p the squared rescaled dispersal distance ‖(Dx, Dy)/σ1 ‖2 or otherwise ‖ (Dx, Dy)/σ2‖ 2 is equal to and thus chi-square distributed with 2 degrees of freedom. The fitted values are and .
As the bins in our models represent areas that extend as long stripes parallel to the ICL, only the component Dx is relevant. Thus, for the probability to find a crow that stems from bin x is found in bin x + d, the dispersal d is a discretization of Dx, whose distribution consists of 76.7 % of a normal distribution with a standard deviation of , and the other 23.3 % come from a normal distribution with a standard deviation of . For computational efficiency we restrict the range of d to 10 bins, corresponding to a max. 50 km, setting Pr(d = −10) = Pr(d = 10) = Pr(Dx <− 47.5) = Pr(Dx > 47.5) (online supplement Fig. S3 bottom). Furthermore, the boundaries of our binned space are reflecting. That is, probability weights of dispersal d that would result in x + d < 1 or x + d > 200 are shifted to instead increase the probabilities of bin 1 + |x + d| or 200 − |x + d − 201|, respectively.
2.2.5 Mate choice
We summarize the phenotype by the values of the first principal component derived in (Knief et al., 2019) (see 2.1.1 and its encoding by two-locus genotypes as specified in section 2.1.2. For simplicity, phenotype values are scaled to a range between 0 (pure hooded crow phenotype, genotype LLll) and 1 (pure carrion crow phenotype black, genotype DDdd, DDdl, DDll). Mating preferences are based on the distance to self. In the absence of decisive knowledge on the mechanism underlying mate choice in the crow system, self-referent phenotype matching is a logical first choice. It most easily generates linkage between mating trait and preference, but also serves as a good proxy for a range of other preference learning mechanisms such as maternal or paternal imprinting (Verzijden et al., 2005, 2012), as has also been invoked in the crow system (Brodin and Haas, 2006, 2009; Londei, 2013).
Let φ be the genotype-phenotype map. The mate choice value wg1, g2 then is the mating probability of two birds with phenotype values φ(g1) and φ(g2) relative to the mating probability of two birds of equal color, such that wg1, g2 = 1 if φ(g1) = φ(g2). We model the mate choice function w with two parameters η and ζ, where η is the mating preference of crows with very different phenotypes and ζ controls the width of a Gaussian kernel modeling how mating preference wg1, g2 of more similar birds depends on their phenotypic difference φ(g1) − φ(g2):
Online supplement Fig. S4 shows how wg1, g2 depends on φ(g1) − φ(g2), with parameter values η and ζ set according to maximum-likelihood values in our model with induced fitness effects. Examples for resulting mate choice values for various parameter settings are shown in Fig. 2 B.
2.2.6 Fitting hybrid zone model parameter to genotype data
Weissensteiner et al. (2019) inferred, that the d allele of the NDP gene arose approximately 500,000 years ago and segregated in the ancestor of both carrion and hooded crows. Since the three genotypes DDll, DDld and DDdd all lead to the all-black phenotype, we therefore allow that the ancestral frequency of allele d in the west can initially be different from 1. This initial frequency of the d allele δ is the third model parameter besides ζ and η. A fourth parameter is the geographic location of the initial contact line. For this, all sampling sites are projected on a one-dimensional transect as specified in (Knief et al., 2019), see also section 2.1.3. The position on the transect is measured in kilometers, starting with the most western sampling site as km 0. For the position of the initial contact line (which is assumed to intersect the transect in a right angle) we allow the range from km 300 to km 700 on the transect.
To numerically optimize all four parameters, we first simulated for any triple of candidate values for the first three parameters the hybrid zone model. Then we calculated for the grid km 300, km 301, …, km 700 the probability of the empirical genotype data for the case that the initial contact line was at that position. To calculate the parameter likelihoods, that is the probability of the data, we rounded the nest locations on the transect to the bins and calculate for each possible combination of parental genotypes the probability that the chick genotypes of a nest stem from such parents. The likelihood calculations were then based on the simulated genotype distribution after 2,000 generations and mate choice according to the model assumptions. Thus we optimized the likelihood of the four parameters w.r.t. the genotype data. For this we use the R package optimParallel (Gerber and Furrer, 2018).
To infer the point of inflection of a cline from allele frequencies y(x) in all bins x ∈{1, …, 200} we first chose the bin x for which y(x 1) − y(x + 1), then fitted a third order polynom to y in the range from x − 2 to x − 3 and calculated the point of inflection of the fitted polynom.
2.3 Model variants
In the following, we formulated deviations from our main model exploring several parameters we deem of importance in the crow system and beyond. These include i) the specific mode of how phenotypic space is translated into mate choice (preference function), ii) the impact of the genetic architecture of the mating trait and iii) how the system is affected if we remove sexual selection altogether while maintaining assortative mate choice.
2.3.1 Categorical choice model
In this variant of our model, hybrids are summarized into a single category and have no mating preferences. However, carrion crows prefer carrion crows and hooded crows prefer hooded crows as mating partners. If two crows interact and both are of the same category—carrion crow, hooded crow or hybrid—their mating preference value w is 1. If one is a hybrid and the other is a hooded crow or a carrion crow, their mating preference value w is given by the parameter ψ0 or ψ1, respectively. If one is a hooded crow and the other a carrion crow, their mating preference value w is the product ψ0 ·ψ1. We combined this model with our standard genotype-phenotype map as specified for the main model (section 2.1.2).
2.3.2 Additive model
This model variant assumes an additive genomic architecture of plumage pigmentation patterns. The genotype-phenotype map was combined with the mate choice function of the main model (section 2.2.5). We constructed the additive genotype-phenotype map by taking the same phenotypic values for the LLll, LLdl and LLdd genotypes as in the epistatic model and the slope between LLdl and DLdl to infer all other additive phenotypic values (online supplement Fig. S1 B). Note that according to this model, only genotype DDdd leads to an entirely black carrion crow phenotype. As we assume that right after ice age all crows in the west were black, the initial allele frequency of the d allele at locus 2 was 1.0. (See online supplement section F.3; model variant with a relaxation of this assumption.)
2.3.3 Neutral model
In this variant of our model, all individuals have the same expected number of offspring. In turn, we allow in this model that the total number of individuals in a bin can vary in time and among the bins. We combined this model with the standard genotype-phenotype map of the main model (section 2.1.2).
For this model, the frequency fg,x(s) is (still) the frequency of g in x at any time s relative to the initial total frequency ∑g fg,x(0) in bin x (or any other bin), but for s ≥ 1 it is in general not the relative frequency of g in x. The initial matrix f (0) is the same as in 2.2.3. The recursion to calculate f (s + 1) is now where the factors vx,g(s) compensate induced fitness effects for individuals stemming from bin x and having genotype g, which could reflect, e.g., that these individuals intensify or extend their search for mating partners by a factor of vx,g(s), with the assumption that this is possible without incurring any fitness cost. For this, the values of v.,.(s) must simultaneously for all (y, g1) fulfill which we solve numerically (see online supplement E for details).
3 Results
3.1 Main model: parameter estimates
The main model assumes an epistatic genetic architecture of the mating trait loci, and sexual selection for similarity to the self-referent phenotype. This model corresponded well to the empirical genotype frequency distribution, and, with a log-likelihood of −175.6, was superior to the alternative models discussed below. The cline for locus 1, which is responsible for the majority of the overall phenotypic variation, was steepest (Fig. 2). It was closely followed by a shallower, and geographically offset cline of the second, epistatically interacting locus. This is in accordance with the clines that have been fitted to the empirical data in (Knief et al., 2019).
Consistent with the proposition that the light allele of the second locus already segregated in the ancestral population of carrion crows (Weissensteiner et al., 2019) the maximum likelihood parameter estimate of the initial allele frequency of the dark allele in carrion crows was below one (0.777). As maximum likelihood parameter values for the mate choice function we found (ζ, η) = (2.17, 0.617) and km 302 on the transect as position of initial contact. These parameters translate into pairwise probabilities of mate choice which depend on the pairwise combination of mating trait genotypes (Fig. 2). Individuals of the same phenotype had the highest preference of mating with each other. Individuals differing most extreme in appearance (i.e. pure hooded crows with genotype LLll vs. pure carrion crows with genotypes DDdd, DDdl or DDll) were only 61 % as likely to choose each other for reproduction. Note that the ultimate mating probabilities not only depend on this preference matrix but are also contingent on the frequency of each genotype in the population determining the relative frequency of mutual encounter.
3.2 Hybrid zone movement
The maximum-likelihood parameterization of the main model predicted a substantial shift of the hybrid zone center to the east after 2,000 generations. More precisely, the inflection points of loci 1 and 2 were inferred to be 189.6 and 188.3 km to the east of the initial contact line (ICL). To further explore these dynamics we simulated 6,000 generations under the main model which we parameterized with the maximum likelihood estimators described above. During the first ∼1,200 generations, the zone moved eastwards from the ICL (in favor of dark morphs). After allele d of locus 2 decreased to a certain frequency, the hybrid zone began to reverse its movement and shifted westwards (now in favor of the light morphs, Fig. 3 and online supplement H.1).
The movement of the hybrid zone can be explained as follows. Due to the epistatic architecture three different genotypes code for all black crows (Fig. 1). This induces a high initial frequency of dark morphs among hybrids which induces positive frequency dependent sexual selection in favor of darker crows. This, in turn, increases the frequency of alleles D and d in the hybrid zone (Fig. 3, arrows A and B) which promotes a shift of the hybrid zone to the East (arrow C). At the same time, the l allele moves into the west. While allele L is under negative selection in the West, allele l has no fitness effects. It freely introgresses into the DD background which entails a decrease of allele d (arrow D). In the long run this will also decrease the frequency of d in the proximity of the hybrid zone (arrow E), which reduces the fraction of dark morphs among the hybrids and thus the fitness of dark crows. This eventually brings the hybrid zone movement to a halt after approximately 1,200 generations. As the decrease of allele d continues (arrows D and E), sexual selection provides an advantage to lighter morphs, decreasing the frequencies of D and d in the hybrid zone (arrows F and H). As a consequence, the hybrid zone travels westward (arrow G) and finally drives alleles d and D to extinction after 5,500 generations.
3.3 Model variants
3.3.1 Categorical choice model
With a log-likelihood of −199, the model fit the data substantially worse than the main model assuming mate discrimination between all seven phenotypes. In this model, mate choice relevant phenotypes were categorized into pure and hybrid-type. Maximum likelihood estimates were (ψ0, ψ1) = (0.318, 0.645), which translates into a larger differencen in mate choice (Fig. 2D). As in the main model, pure morphs had a clear preference for phenotypically matched partners. However, while the probability of mating between carrion and hooded crows was only reduced to 61 % in the main model, pure morph mismatches were five times less likely in the categorical model (21 %). According to the best fitting model parameter values, also for locus 2 the dark allele d was fixed in the west before secondary contact.
In terms of spatio-temporal dynamics according to this model, the hybrid zone had shifted westwards since the initial contact. The inferred points of inflection of clines of loci 1 and 2 after 2,000 generations were located 142.1 and 141.7 km, respectively, to the west of the ICL. According to simulations with more than 2,000 generations, the model predicted a continuous shift of the hybrid zone westwards until carrion crows went extinct (online supplement Fig. S12).
3.3.2 Additive model
Changing the genetic architecture from epistasis between the mating trait loci to additivity had a strong impact. The best possible fit to the data for the additive model was poor (log likelihood −228.7), and maximum likelihood estimates of ρ = 2.878 and η = 0.987 implied only very slight mating preferences overall (Fig. 2). Mate choice probabilities only differed by 1.3 % (0.987 vs. 1.0) indicating very weak sexual selection. It is thus not surprising, that clines of both mating trait loci were shallow. Still, the difference of 1.3 % in mate choice probabilities shielded against gene flow, and clines were more pronounced than by simple diffusion of a neutral locus (3.4). Furthermore, due to additivity the frequency curves of the two alleles were still almost identical after 2,000 generations (Fig. 2). With inferred points of inflection of both clines only 0.1 km to the east of the ICL, almost no shift of the clines was predicted. The additive model thus did not capture the defining features of the empirical data highlighting the importance of the genetic architecture of mating traits for hybrid zone dynamics.
The above considerations are based on the assumption that according to the initial phenotype distribution all crows in the west were black, which entails that the dark alleles D and d were fixed in the west. This differs from the main model, where the frequency of the d allele was free to vary. In online supplement section F.3 we report results of further parameter combinations of the additive model relaxing the assumption that all parental carrion crows were entirely black upon secondary contact. Also with this assumption, clines were either not steep enough to fit the empirical data or the empirical disparity in cline location and inclination between the loci was absent. Overall, none of the models with additive genetic effects fit the data well.
3.3.3 Neutral model
Next, we formulated a model without sexual selection where all individuals had the same expected number of offspring and did not incur any fitness advantage or disadvantage in relation to their phenotype. With a log likelihood of −210.3 the fit to the data was substantially worse than in the main model including sexual selection. The best fitting parameter combination for the neutral model had an initial d allele frequency for locus 2 of 0.619, and showed a slight shift for both loci 1 and 2 of 11 and 7.2 km to the east from the ICL (measured at the inflection points of the clines).
The maximum likelihood parameter estimates of the mating function ζ = 5.53, η = 0.0237 translate to the most extreme avoidance of non-self phentoypes of all investigated model alternatives. Matings of the opposite morph (carrion with hooded crows) were 50 times less likely than between equivalent phenotypes (Fig. 2). Similar as for the categorical model this may be explained by maintaining a narrow hybrid zone against increased gene flow in the center of the zone by hybrids traveling far to seek an appropriate partner (with no cost in this model).
While the model somewhat mimicked the empirical disparity in allele frequency clines between loci, it did not capture the steepness of the cline for locus 1. By setting the model parameters to values reflecting stronger assortative mate choice this could be achieved. In this case, however, the frequency curve for locus 2 was shifted too far to the east. This would produce many LLdd individuals in the hybrid zone, which were near-absent in the empirical data and explain the poor fit (compare online supplement Fig. S6 to online supplement Fig. S7 and online supplement Fig. S8).
Interestingly, the model predicted a decrease in population density in the center of the hybrid zone accompanied by a population increase at a distance of around 200 km on either side (online supplement Figs. S6 and S7, bottom). Assuming no cost of mate choice and movement, this shift in densities is due to individual compensation for differences in sexual attractiveness induced by the preference function. As we assume that individuals of a phenotype that is rare in their environment intensify their search for mating partners, these individuals may effectively migrate over larger distances and deplete local population densities.
3.4 Effects of assortative mating on unlinked neutral loci
Using the best fit parameters of the main model, we ran additional simulations in which we included a third, neutral locus. This bi-allelic locus was unlinked to the loci associated with the mating trait and had no effect on mate choice or dispersal. We defined the focal allele as the allele with initial frequency of 1 in the west and of 0 in the east. We compared this simulation to a simulation in which all three loci had no effect on the mating trait or dispersal (random mating) and essentially follow neutral diffusion according to our dispersal model (section 2.2.4). (Note that in the latter case, locus 1 and locus 3 have the same allelic distribution.)
The spatial distribution of alleles from the neutral locus 3 was affected by the mating trait loci, but this effect was very weak (Fig. 4 and online supplement section H.1). After 2,000 generations the frequency of the neutral allele at locus 3 was 0.620 in western boundary of the range and 0.398 in the far east in the main model with locus 1 and 2 under sexual selection. With random mating the frequency of locus 3 was slightly closer to equalized allele frequencies (at 0.5) with 0.597 in the west and 0.403 in the east. Hence, this model predicts that sexual selection does not substantially reduce effective migration at loci unlinked to those determining the mating trait. Even if assortative mating in this model entails that hybrids have reduced fitness, local reduction in gene flow of the mating trait affected other parts of the genome only marginally.
4 Discussion
In this study, we explored the effect of preference learning of a genetically-determined mating trait on hybrid zone dynamics. Combining empirical data from the European crow system with computer simulations allowed us to estimate the strength and mode of mate choice as a function of individual genotype combinations. Further simulations were based on model variants investigating the mate choice function, the genetic architecture of the mating trait and the fitness costs induced by frequency dependent selection. Their results provided insight into the processes modulating the outcome of sexual selection. We discuss the results in the context of reproductive isolation. First, we focus on the mating trait loci themselves and then consider genome-wide implications.
4.1 Mating trait loci
Theory predicts that preference learning, such as self-referent phenotype matching or imprinting, readily enhances positive frequency dependent sexual selection of locally abundant mating trait alleles (Yeh and Servedio, 2015). In the context of secondary gene flow between populations with diverged mating trait values, this process is expected to result in directional selection on each side of the hybrid zone pulling in opposite directions. For the entire population, this results in disruptive selection of mating trait values between the two populations (Servedio, 2016). Hence, sexual selection mediated by (learning-based) assortative mating may promote hybrid zone stability in the form of geographic clines (Irwin, 2020; Brodin and Haas, 2009) or as mosaics, depending on the dispersal kernel (M’Gonigle and FitzJohn, 2010). These theoretical predictions are in accordance with empirical observations. Geographically structured polymorphism in phenotypes with relevance to mate choice are not uncommon in nature (McLean and Stuart-Fox, 2014), and learning of (self-) recognition cues or imprinting is widespread in the animal kingdom (Ten Cate and Vos, 1999; Bereczkei et al., 2004; Pfennig et al., 1983). Preference learning modulating mate choice thus features as a prevalent force in maintaining or promoting divergence with ongoing gene flow (Irwin and Price, 1999; Verzijden et al., 2012). This prediction applies particularly well to the context of hybrid zones where mating traits that diverged in isolation come into secondary contact (Randler, 2008). Consistent with these predictions, the main model, best fitting the empirical data in the crow system, could explain the maintenance of steep clines over thousands of years for both loci associated with the mating trait (Fig. 2). All our models however predicted that the hybrid zone will ultimately vanish, either as one color morph takes over, as in our best fitting model, or as the clines will flatten out more and more.
4.1.1 Mode of preference learning
The evolutionary outcome of sexual selection was contingent on the class of the mating preference function. Assuming self-reference for all possible seven phenotypes (Fig. 1) in the main model required little variation in the degree of assortative mating (1.7 fold difference) to maintain high allelic differential between the parental populations. A different outcome was predicted for mate choice based on catogorizing crows into three classes of pure parental and hybrid forms, as has previously been suggested for the crow system (Brodin and Haas, 2006, 2009). In the categorical model, the lack of discrimination among hybrid phenotypes increased gene flow of alleles from both loci with hybrids serving as a bridge for introgression between the pure parental forms (Irwin, 2020). Stronger avoidance (4.8-fold difference) of phenotypically dissimilar matings in the areas dominated by the parental genotypes was necessary to still obtain the empirically observed steep cline in genotypes and alleles of the mating trait loci. This result demonstrates that the details of the preference function and the genetic basis of the mating trait matter, even for a simple trait within a single sensory modality. This has potential implications for the evolution of imprinting. Theory predicts that imprinting can be favored by reinforcement (Yeh et al., 2018). Models explicitly addressing the interplay between the genetic architecture of the mating trait and the optimal resolution of preference learning for the trait seem to be worthwhile exploring. The case of crows investigated here, suggests the capacity for full resolution learning of the phenotypic trait space. However, details of the learning process (self-reference, imprinting and respective imprinting sets) and the actual learning cues would require large-scale cross fostering experiments.
4.1.2 Genetic architecture
Hybrid zone dynamics were also strongly contingent on the genetic architecture of the mating trait. Gene flow was most readily reduced under an epistatic architecture, which fit the data best. Assuming an additive genetic architecture, we found no parameter combination that would recover cline steepness and offset of cline centers characteristic for the empirical data. Instead, the parental populations were rapidly homogenized by gene flow as illustrated by flatting of the clines across large geographic distances. Brodin and Haas (2009) modeled the southern-Danish hybrid zone between carrion crows in the south and hooded crows in the north. They assumed an additive model of three unlinked loci for the genetic basis of the mating trait without any other selection than assortative-mating induced sexual selection. Their simulation runs led to stable phenotypic clines that resemble the cline observed in the field. For the stability of these clines it may however be crucial that among the model assumptions was continuous inflow of carrion crows from the south and of hooded crows from the north.
The difficulty in maintaining divergence with additivity of the mating trait is consistent with simulations by Irwin (2020) concluding that very strong assortative mating was required for hybrid zone maintenance (10-fold difference). In these simulations, an increasing number of loci required even stronger assortment. These findings highlight the importance of the genetic architecture underlying the mating trait. Tentatively, these results may suggest that preference learning may be most relevant for evolution where single sensory modalities with a simple, at most oligogenic architecture (such as color (Cuthill et al., 2017)), are recruited for mate choice (Nadeau et al., 2007). Whether epistasis generally promotes divergence remains an open question and warrants further theoretical exploration. On the empirical side, genetic investigation of mating traits in systems where preference learning has been shown to play an important role for prezygotic isolation provide a promising empirical avenue (Yang et al., 2019). This likewise applies to sexually selected traits that appear to accelerate speciation rates on macroevolutionary scales (Panhuis et al., 2001; Hugall and Stuart-Fox, 2012; Seddon et al., 2013).
The genetic architecture also had a major effect on the spatio-temporal dynamics of the hybrid zone. While cline centers remained at the initial line of contact in the additive model, they shifted in all models assuming an epistatic architecture. Hybrid zone movement is expected to be induced by dominance or asymmetries in allelic fitness (Brodin et al., 2013; Mallet, 1986; Secondi et al., 2006). To our knowledge it has never been explored in the context of sexual selection alone. The combination of epistasis in the mating trait and frequency dependent fitness effects had interesting emergent properties. The best-fit main model predicted an eastward movement of the hybrid zone which reversed its direction after ∼1,200 generations. This shift was due to changes in the relative allele frequencies of the two interacting loci. The central European crow hybrid zone was mapped for the first time with great precision in 1928 by Meise (1928). Re-examination of its location by Haas and Brodin (2005) 80 years (or ∼13 generations) later suggested a slight movement towards the east. While empirical sampling variance and sensitivity to model assumptions preclude a one to one comparison, both approaches agree that the crow hybrid zone does not conform to equilibrium or quasi-equilibrium state. Instead, it has likely been dynamic to the present day and will, according to our model, ultimately lead to the extinction of one or the other color morph within the time-frame of a glacial cycle.
4.1.3 Releasing sexual selection
Last, we considered the effect of assortative mating alone without any induced fitness effects. A possible interpretation of this model is that there is a global, but no local carrying capacity, breeding sites are not limiting and the time to find mating partners has no fitness effect. As a consequence, hybrid back-crosses with a phenotype resembling the pure morphs tend to leave the hybrid zone. Even though this model is entirely neutral, it allows for migration of individuals in search of appropriate mating partners. Accordingly, the model predicted a lower population density in the hybrid zone center and higher densities towards the parental ranges. While the model overall poorly supported the empirical geographic distribution of genotypes, it has interesting conceptual implications. In the popular tension zone model, where hybrid zones are maintained at an equilibrium of migration and selection against hybrids, population density is preset by the local carrying capacity (Barton and Hewitt, 1985). Moving tension zones are predicted to be trapped in regions of low carrying capacity or biogeographic obstacles to migration (Barton, 1979; Barton and Hewitt, 1985). On the contrary, our model predicts variation in population density as an emergent property of assortative mating in the absence of any form of selection. Hence, reduced population densities, as observed or predicted for hybrid zones (Barton and Hewitt, 1981, 1985; Hewitt, 1988), need not necessarily result from ecological constraints, but may be an emergent property of mate choice, even in the absence of selection.
4.2 Relevance to speciation
The genic view of speciation emphasizes that reproductive isolation will initially be caused by a small number of loci (Wu, 2001; Wolf et al., 2010; Presgraves, 2010). Divergent selection reduces effective migration of alleles at these loci allowing for local allelic differentiation (Ravinet et al., 2017; Wolf and Ellegren, 2017). The mating trait loci considered above constitute such barrier loci subject to selection (direct barrier effects). Yet, with few barrier loci of moderate effect reproductive isolation remains confined to the linked local genomic neighborhood. Gene flow of neutral genetic variants disassociated from the selected background by recombination remains high. Indirect barrier effects reducing gene flow genome-wide are only expected to unfold when a sufficient number of barrier loci come into linkage disequilibrium–depending on the strength of selection and the recombination rate (Barton, 1983; Bierne et al., 2011; Feder et al., 2014). Here, we explored this indirect barrier effect by introducing a third, unlinked locus serving as a proxy for genome-wide, neutral genetic variation. Adding this locus to the best-fit main model allowed us to quantify genome-wide reduction in gene flow elicited by frequency dependent sexual selection on a learned mating trait. While gene flow was slightly reduced, the effect was overall very small. This is consistent with simulations by Irwin (2020) who likewise found that assortative mating had only minor effects on unlinked, neutral loci. These findings are also consistent with genome-wide data from the crow system. Poelstra et al. (2014) found that only a small number of narrow genomic islands related to the plumage phenotype exhibited resistance to gene flow. The remainder of the genome appears not to experience any barriers to gene flow (see also (Vijay et al., 2016)). More specifically, Knief et al. (2019) demonstrated divergent selection in the European hybrid zone only for a ∼2 Mb region on chromosome 18 and the gene NDP (locus1 & 2, this study). Geographic clines were narrow for these loci (see Fig. 2), but stretched across several hundred kilometres for the remaining loci in the genome. Overall, these results are consistent with the findings of this study and suggest a plausible history of genome-wide admixture with the exception of few mating trait loci maintained by preference learning. Divergence in putative mating trait (and preference) loci against a homogeneous genome-wide background is not restricted to the crow system (Toews et al., 2016; Hench et al., 2019; Malinsky et al., 2015). The framework presented here thus likely applies more widely, and makes it worthwhile exploring in other organisms tailored to the specifics of each system.
An important element of speciation is reinforcement of assortative mating. If hybrids have reduced fitness, even if only in the sense of sexual selection induced by assortative mating, there may be a selection pressure in favor of mating strategies that avoid producing hybrids (Servedio and Kirkpatrick, 1997; Felsenstein, 1981). This reinforcement of assortative mating might be an important factor for speciation (Barton, 2013; Butlin and Smadja, 2018). It would be interesting to include reinforcement in our theoretical model to analyze how reinforcement could affect the dynamics of the hybrid zone and whether it could even lead to speciation before one of the two color morphs becomes fixated. The extent of reinforcement of assortative mating as an evolutionary process, however, is contingent on the amount of variation of the strength of mating preferences in the crow populations and on the genomic architecture of mating preference behavior. To the best of our knowledge, both are unknown for crows, such that adding reinforcement to our theoretical model would be purely speculative or require extensive behavioral observations and population genomic analyses. As our model neglects the possibility of reinforcement and other eco-evolutionary aspects of the hybrid zone, we cannot rule out the possibility that speciation in carrion and hooded crows takes place. With our analyses, however, we show that the available data can be explained by a model that will not lead to speciation.
5 Acknowledgments
We thank the members of the Wolf lab and the Dobzhansky discussion group at LMU Munich for valuable feed-back. For funding we thank the German Science Foundation DFG (grant ME 3134/6-2 to DM in Priority Program SPP 1590 “Probabilistc Structures in Evolution”), the European Research Council (ERCStG-336536 FuncSpecGen to JBWW) and LMU Munich (startup grant to JBWW).
Footnotes
metzler{at}bio.lmu.de, knief{at}bio.lmu.de, penalba{at}bio.lmu.de, j.wolf{at}bio.lmu.de
Updated some literature references. Added description of movie (external link). Minor corrections and clarifications in text.
https://raw.githubusercontent.com/statgenlmu/assortative_crows/master/movies/threeloci_6000.mp4