ABSTRACT
Proteins encoded by interacting mitochondrial and nuclear genes catalyze essential metabolic processes in eukaryote cells. The correct functioning of such processes requires combinations of mitochondrial and nuclear alleles that work together (mitonuclear interactions) and the avoidance of mismatched combinations (mitonuclear incompatibilities). This interplay could have a major role during the early stages of population divergence. Here, we show that mitonuclear interactions maintain a deep mitochondrial divergence in the face of nuclear gene flow between two lineages of the songbird Eastern Yellow Robin (Eopsaltria australis) occupying contrasting climatic habitats. Using >60,000 SNPs we explored patterns of nuclear gene differentiation and introgression along two sampling transects intersecting the deep mitochondrial divergence between lineages. We found a replicated pattern of low genome-wide differentiation contrasting with two prominent regions of high differentiation (genomic islands of divergence) in different nuclear backgrounds. The largest island of divergence (~15.4 Mb) showed a significant excess of nuclear-encoded genes with mitochondrial functions (N-mt genes), low genetic diversity and high levels of linkage disequilibrium. Thus, genetic differentiation between the two adjacent but climatically divergent lineages is mostly limited to the mitochondrial genome and a nuclear genomic region containing tightly linked N-mt genes that presumably experience reduced recombination. The second island of divergence mapped to the Z-chromosome, suggesting that nuclear gene flow occurs primarily via male hybrids, in accordance with Haldane’s Rule. Our results are consistent with accumulating evidence that mitonuclear co-evolution could represent a key vehicle for climatic adaptation during population divergence.
INTRODUCTION
Studies of genome-wide variation of natural populations in the early stages of divergence have enhanced our understanding of the genetic basis of local adaptation, reproductive isolation and speciation (Seehausen et al. 2014; Smadja & Butlin 2011). Genomic analyses of closely related populations often reveal a pattern of heterogeneous levels of genetic differentiation, with most of the genome exhibiting low differentiation contrasting with areas of clustered loci of high differentiation, known as genomic islands of divergence (Harr 2006; Harrison & Larson 2016; Nosil et al. 2009; Turner et al. 2005). Islands of divergence often contain genes involved in genetic incompatibilities and local adaptation, signalling their direct role in the evolution of reproductive isolation (Nosil & Feder 2012; Payseur 2010; Wu 2001). Most genomic studies of population differentiation have focused on genes with roles limited to the nuclear genome (e.g. Jones et al. 2012; Malinsky et al. 2015; Marques et al. 2016; Poelstra et al. 2014; Soria-Carrasco et al. 2014). However, genetic interactions between mitochondrial and nuclear genes (i.e. mitonuclear interactions) are receiving increasing attention as key players in speciation (Burton & Barreto 2012; Burton et al. 2013; Dowling et al. 2008; Gershoni et al. 2009; Hill 2015; Lane 2011; Levin et al. 2014). Yet, evidence for the involvement of mitonuclear interactions during the early stages of divergence in natural populations remains rare (Bar-Yaacov et al. 2015; Boratyński et al. 2016; Gagnaire et al. 2012) and mostly limited to plant systems (i.e. cytoplasmic-nuclear; Barnard - Kubow et al. 2016; Case et al. 2016; Roux et al. 2016b; Sambatti et al. 2008).
In eukaryotes, the mitochondrion performs cellular respiration and regulates many aspects of cellular metabolic functioning and energy expenditure (Allen 2003). Such processes are dependent on interactions between genes of the mitochondrial genome and nuclear-encoded genes that have mitochondrial functions (N-mt genes). The products of mitochondrial and N-mt genes interact directly in the enzyme complexes of the Oxidative Phosphorylation System (OXPHOS), the primary driver of energy availability in the cell (Bar-Yaacov et al. 2012). Mitonuclear interactions also regulate the expression and assembly of OXPHOS complexes, among other critical cellular processes (Ballard & Pichaud 2014; Bar-Yaacov et al. 2012; Gershoni et al. 2009; Horan et al. 2013). While the mitochondrial genome (mitogenome) is a common target for natural selection with strong influence on organismal fitness, this effect is often expressed through interactions with N-mt genes, so fixation of adaptive or slightly deleterious mutations in the mitogenome requires compensatory changes in N-mt genes (Ballard & Pichaud 2014; Dobler et al. 2014; Dowling et al. 2008; Havird & Sloan 2016; Horan et al. 2013; James et al. 2016; Osada & Akashi 2012; Wolff et al. 2014). Accordingly, mitonuclear co-evolution should be enforced by strong natural selection, even though the two genomes have different modes of inheritance and recombination and mutation rates (Dowling et al. 2008; Wolff et al. 2014).
Recent multidisciplinary studies provide insights into fitness consequences of mitonuclear interactions and mechanisms by which natural selection can maintain integrity of these interactions during population divergence (Burton et al. 2013; Hill 2015; Lane 2011). Experiments with cell lines and model organisms demonstrate that hybrids with disrupted mitonuclear interactions can show defects in key metabolic processes (e.g. low OXPHOS efficiency) and negative impacts on life-history traits (e.g. low viability and fertility; reviewed in Burton et al. 2013; Levin et al. 2014). Moreover, strong feedbacks between mitochondrial metabolism and environmental conditions, such as temperature, diet and aridity, can drive mitonuclear co-adaptation (Burton et al. 2013; Hill 2015; Lane 2011; Tieleman et al. 2009). Of particular importance is the coupling efficiency of the OXPHOS pathway, which regulates the balance between energy and heat production. For example, less-coupled systems can facilitate heat production in colder environments, and more-coupled systems can optimize energy production with less heat generation under warmer conditions and/or caloric restriction (Das 2006; Wallace 2005). Thus, mitonuclear match needs to regulate energy coupling trade-offs under local environmental conditions, while avoiding negative effects for the cell, such as oxidative stress, reduced metabolism and apoptosis (Finkel 2003; Stier et al. 2014).
Divergence of mitochondrial and N-mt genes between populations that are not fully reproductively isolated represent an important challenge for organisms to maintain optimal metabolic functioning. Gene flow between differently adapted individuals can promote the assembly of defective mitonuclear combinations (Burton et al. 2013; Gershoni et al. 2009). To avoid low fitness from mitochondrial and N-mt mismatch (mitonuclear incompatibilities), selection should favour co-adapted mitonuclear interactions that provide optimal metabolic functioning in local environments (Burton et al. 2013; Lindtke & Buerkle 2015). Two main processes could promote optimal mitonuclear combinations. First, natural selection should disfavour globally inefficient combinations, while divergent selection should increase the frequencies of locally efficient combinations (Burton & Barreto 2012; Burton et al. 2013; Hill 2015). Second, genomic architecture that reduces chromosomal recombination should increase co-inheritance of co-adapted genes (Butlin 2005; Kirkpatrick & Barton 2006; Lindtke & Buerkle 2015; Ortiz-Barrientos et al. 2016; Yeaman & Whitlock 2011).
The endemic Australian songbird Eastern Yellow Robin, Eopsaltria australis (hereafter EYR), provides an excellent model to study mitonuclear co-evolution during ecological divergence with gene flow. EYR is characterized by a striking pattern of mitochondrial-nuclear spatial discordance: north-south structure of genome-wide nuclear DNA (nDNA) is geographically perpendicular to the climate-correlated inland-coastal distribution of two mitochondrial DNA (mtDNA) lineages (mito-A and mito-B; Fig. 1A; Pavlova et al. 2013). This pattern is consistent with early Pleistocene (~2 MYA) north-south population divergence followed by two mid-to-late Pleistocene mitochondrial introgression events: the introgression of northern mtDNA lineage mito-A southwards through the arid and climatically variable inland range ~276 KYA, and the introgression of southern mtDNA lineage mito-B northwards through the cooler and more climatically stable coastal range ~90 KYA (Fig. S1) (Morales et al. 2016). Non-neutral mitochondrial introgression is supported by evidence of divergent positive selection on five mitochondrial amino acids and signatures of strong selective sweeps in the mitogenome (Morales et al. 2015). Such introgression events could be associated with new mitonuclear interactions that generated inland-coast nDNA divergence with gene flow of the southern population ~64 KYA; the northern population did not show similar divergence, potentially due to weaker selection on mitonuclear interactions or introgression being more recent (Morales et al. 2016). We propose that mitonuclear interactions were established during co-introgression of interacting mitochondrial and N-mt genes. Resulting inland-coastal population divergence at co-evolved mitochondrial and N-mt genes was maintained by selection for optimal mitonuclear interactions under local environmental conditions and against mitonuclear incompatibilities, despite ongoing nuclear gene flow (Fig. S1).
Here we examine genomic evidence of mitonuclear co-evolution in EYR along two independent geographic transects intersecting the inland-coastal mitochondrial divergence in northern and southern nuclear backgrounds (Fig. 1A). First, we tested the level and pattern of nuclear genomic differentiation and introgression between inland and coastal populations. For this we employed different methods to detect highly differentiated (outlier) loci. We mapped the results to a reference genome and investigated if outlier loci are arranged into genomic islands of divergence. Then, we evaluated if genomic islands of divergence are enriched with genes with known mitonuclear functions (N-mt genes) and characterized by low genetic diversity and high linkage disequilibrium, as would be expected from selective sweeps paralleling those previously observed in the mitogenome. Lastly, we investigated the potential functional differences in regulation of energy coupling between alternative alleles of N-mt genes, using newly-available 3-dimensional protein structure models and applying principles of protein biochemistry.
RESULTS
Mitolineages meet in a narrow contact zone of sharp environmental transition
A total of 418 EYR individuals sampled across the species’ range were categorized for mitochondrial lineage membership (two mitolineages: inland mito-A = 270 and coastal mito-B = 148; red and blue on Fig. 1A). Sampling efforts were focussed on two distant (~700 km) transects chosen to represent northern and southern nuclear backgrounds (squares and circles on Fig. 1A), each of which hosts the two mitolineages. In each transect, a narrow contact zone (< 20 km) between the mitolineages is located in a region of climatic transition (Fig. 2A). The correlation between mitolineage distribution and climatic variation is stronger in the south than in the north transect (Fig. 2B-C).
Repeated evolution of high genetic differentiation due to mitonuclear co-introgression
We obtained 60,444 SNPs by performing complexity-reducing representative sequencing of the genome (DArTseq; Kilian et al. 2012). We obtained genotypes for 164 individuals of known mitolineage (mito-A = 100, mito-B = 64) with most of the samples located along the two transects (north = 52 and south = 101). A PCA analysis using all non-outlier loci (59,638 SNPs; see below) and all 164 samples showed that genome-wide variation is structured along two geographic axes: north-south (4.7%) and inland-coastal (3.4%; Fig. 1B). As we aimed to understand the inland-coastal divergence, all the following analyses were performed only using samples within the two transects, grouped into four mitogroups: north mito-A = 23; north mito-B = 27; south mito-A = 70; south mito-B = 32. Genetic structure analyses indicated that most individuals in each transect could be assigned to inland or coastal genetic clusters with high posterior probability (Q > 0.8; Fig. 2A). Individuals with intermediate assignment scores (Q < 0.8) were between inland and coastal genetic clusters (Fig. 2A).
We explored the pattern of genetic differentiation throughout the nuclear genomes between mitogroups independently for each transect (i.e. two pairwise comparisons: north mito-A versus north mito-B and south mito-A versus south mito-B). We used three methods that differ in their approach and assumptions to identify loci with exceptionally high levels of genetic differentiation between mitogroups (i.e. outlier loci): fine-scale FST estimates (including only individuals sampled within 40 km of the centre of the contact zone), BayeScEnv and PCAdapt. Our FST analyses revealed that genome-wide genetic differentiation between inland and coastal mitogroups is very low in north transect (mean FST = 0.022, sd = 0.083, 1% upper quantile = 0.425) and low in south transect (mean FST = 0.046, sd = 0.095, 1% upper quantile = 0.505; Fig. S2). All three methods discovered hundreds of outlier loci in similar numbers for the two transects (across all methods: north transect = 439 and south transect = 357; overlap between all methods: north = 108 and south = 203; Table 1). Genetic differentiation for outlier loci was more than 12 times greater than the average genome-wide differentiation (north mean outlier FST = 0.44 and south FST= 0.56). However, despite evidence for strong differentiation, no completely fixed alleles were observed between mitogroups.
We compared the genetic structure depicted with the genome-wide PCA (Fig. 1B) with a PCA using only outlier loci (again using all 164 individuals, to make a direct comparison). In striking contrast to the genome-wide variation, majority of the genetic variation in outlier loci was structured in the inland-coastal direction (PC1 45.2%) and the north-south differentiation was completely absent (Fig. 1C). This result suggests that many of the nuclear outlier loci co-introgressed with mtDNA (Beck et al. 2015; Boratyński et al. 2016). We further explored the possibility of co-introgression by calculating correlations of allele frequencies for the outlier loci between north mito-A and south mito-A and between north mito-B and south mito-B and comparing them to correlations between loci drawn at random from the rest of the genome (i.e. random expectation). Correlations for outlier loci (mito-A: north vs. south = 0.98 and mito-B: north vs. south = 0.99) were significantly higher than for sets of randomly chosen non-outlier loci (average correlation: mito-A: north vs. south = 0.74; mito-B: north vs. south = 0.78; P < 0.001; Fig. S3). This indicates that allele frequencies of outlier loci have a greater tendency than non-outlier loci to segregate in the same inland or coastal direction in both nuclear backgrounds, indicating mitonuclear co-introgression (Fig. S1).
Heterogeneous genomic differentiation with mtDNA-linked islands of divergence
In order to investigate the genomic organization of genetic differentiation, we mapped the SNPs to the reference genome of the zebra finch, Taeniopygia guttata (Warren et al. 2010). We confidently mapped 35,030 of our SNPs (unique high-quality hits, see Methods) to the zebra finch genome. This revealed that genetic differentiation between mitogroups is highly heterogeneous in genomic positioning in both transects (Fig. 3; Fig. S4-S6). While outlier loci were present on most chromosomes, they were disproportionately located on two autosomes (1A and 4) and the Z sex-chromosome (Fig. 3). To look for significant genomic clusters of high differentiation between mitogroups in each transect (i.e. mtDNA-linked islands of divergence), we analysed the cumulative distribution function of q-values from the BayeScEnv analysis (see Methods) across the entire genome, using a Hidden Markov Model (HMM). HMM results indicated that chromosome 1A (for both transects) and chromosome Z (for the southern transect) contained one genomic island of divergence each (Fig. 3). The chromosome 1A mtDNA-linked island of divergence was ~15.4 Mb long (approximate zebra finch genomic coordinates: 44,200,000 – 59,600,000), and that of the Z chromosome was ~0.75 Mb long (coordinates: 2,000,000 – 2,750,000). (coordinates: 2,000,000 – 2,750,000).
Restricted genetic introgression at mtDNA-linked islands of divergence
We investigated if variable levels of introgression between mitogroups in each transect could explain the observed pattern of heterogeneous genomic differentiation (Harrison & Larson 2016). To approximate introgression, we used geographic cline analysis to estimate changes in allelic frequencies between mitogroups as a function of geographic distance across each transect. We first subset the mapped SNPs by choosing one marker at random every 100 Kb across the genome (2494 + mtDNA = 2495 genome-wide markers per transect), which included 42 outlier loci. Under differential resistance to introgression we expected: (1) outlier loci, especially those within genomic islands of differentiation, to show strong clinal variation in allele frequencies reflecting restricted introgression and (2) the rest of the genome to show shallow clines or no clinal variation at all, reflecting moderate or unrestricted introgression, respectively. Among non-outlier loci, only 1.4% in the north transect and 4.6% in the south transect fitted a clinal model significantly better (ΔAIC > 2) than the neutral model of non-clinal variation. Among outlier loci, 50% in the north and 76% in the south fitted the clinal model, thus, outlier loci were much more likely to show clinal variation than the rest of loci. Accordingly, clinal loci were significantly overrepresented in genomic islands of differentiation, especially on chromosome 1A (North = 14; t27 = −17.8; P < 0.001 and South = 31; t27 = −19.3; P < 0.001; Fig. 4).
Two cline parameters and their confidence intervals were calculated for every clinal locus, centre (geographic location of allelic frequency change) and width (the slope of the cline). We compared the confidence intervals of nDNA clinal loci to those of mtDNA clines (grey bars in Fig. 4), to identify loci that could be subject to restricted introgression between mitogroups. For the cline centre, 55% of the nuclear loci overlapped with the mtDNA cline in the northern transect and 26% in the south. However, non-overlapping loci were offset on average only ~50 km (Fig. 4). This result suggests that many of the clinal nuclear loci have very similar geographic location of allele frequency transition to the mtDNA variation, and thus might be linked to reproductive isolation barriers between mitogroups. On the other hand, the majority of nuclear cline widths were considerably wider (i.e. flatter slopes with broad confidence intervals) than the mitochondrial clines, indicating that the majority, but not all, of the nuclear clinal loci experience some level of genetic introgression between mitogroups (Fig. 4). The greatest overlap in both cline centre and width between nuclear clinal loci and mitochondrial clinal variation occurred in the chromosome 1A mtDNA-linked island of divergence, suggesting a prominent role for this genomic region in reproductive isolation between mitogroups (Fig. 4). It is important to note however, that the sampling for this study was not specifically designed for cline analysis so these results should be taken with caution (Fig. S7).
Overrepresentation of mitochondrial-nuclear (mitonuclear) genes in mtDNA-linked islands of divergence
We tested if mtDNA-linked islands of divergence are significantly enriched for nuclear-encoded genes with mitochondrial function (i.e. general N-mt genes), in particular for N-mt genes encoding supernumerary subunits and assembly factors for OXPHOS complexes (i.e. OXPHOS genes). The island of divergence in chromosome 1A had a significant excess of general N-mt genes (32 genes; P < 0.001; Fig. 5; Table 2) compared to randomly chosen genome segments of similar length. Of these 32 genes, seven are directly involved in cellular respiration, eight have regulatory functions such as transcription, translation and replication of mtDNA, and the remaining 17 perform other mitochondrial functions. The chromosome 1A island also had a significant overrepresentation of OXPHOS genes (four genes; P < 0.01; Fig. 5; Table 2). Three of these genes encode supernumerary subunits required for OXPHOS complex I function (NADH-coenzyme Q oxidoreductase; NDUFA6, NDUFA12, NDUFB2) and one gene that encodes the assembly chaperone FMC1 in OXPHOS complex V (F1Fo-ATPase). In contrast, the islands of divergence in Z chromosome contained no OXPHOS and only one general N-mt gene (SLC25A46).
We mapped the complex I OXPHOS gene products identified within the islands of divergence in chromosome 1A to the recently-published cryo-EM structure of bovine complex I (Zhu et al. 2016) (Fig. S8). This allowed us to hypothesize potential structural and mechanistic effects of complex I polymorphisms, as well as their potential links with previously identified amino acid replacements fixed by positive selection in the mitogenome (Morales et al. 2015). The three complex I OXPHOS subunits encoded within the chromosome 1A island (Table 2) bind at regions critical for energy transduction (Fiedorczuk et al. 2016; Zhu et al. 2016). NDUFA6 has ancient role in stabilising the main interface between the hydrophobic and hydrophilic regions in complex I (Fiedorczuk et al. 2016; Ostergaard et al. 2011; Yip et al. 2011), whereas NDUFB2 directly interacts with the mitochondrion-encoded ion pump ND5 (Fiedorczuk et al. 2016). NDUFA12 binds a phosphopantetheine-containing ACP subunit (SDAP-α), suggesting it has a role in regulating OXPHOS activity in response to carbon metabolism, in addition to a potential role in stabilising the enzyme (Angerer et al. 2014; Fiedorczuk et al. 2016; Zhu et al. 2016).
Reduced recombination and selective sweeps in chromosome 1A mtDNA-linked island of divergence
We estimated the level of linkage disequilibrium (LD) between SNP loci for all pairs of markers along each chromosome. Genome-wide LD across all chromosomes decayed quickly and reached the average LD level at a genetic distance of ~7.8 Kb between markers (Fig. 6A; Table 3; Fig. S9). Chromosome 1A had the highest level of LD of all chromosomes and a substantially slower LD decay, reaching average LD levels at genetic distance of ~140 Kb (Fig. 6A; Table 3; Fig. S9). Chromosome Z also showed high levels of LD and slow decay as expected for a sex chromosome, which undergo less recombination than autosomes (Fig. 6A; Table 3; Fig S9).
Next, we wanted to better understand the pattern of LD levels across chromosome 1A. As a proxy for this, we calculated an index of LD (r2) between nuclear and mitochondrial alleles (Sloan et al. 2015). This is because two neighbouring nuclear markers in strong LD with the mitochondrial genome will necessarily be in strong LD with each other. We found that markers within the chromosome 1A mtDNA-linked island of divergence have exceptionally high levels of mitochondrial-nuclear LD (north: mean = 0.23, max = 0.64; south: mean = 0.24, max = 0.78), compared to markers outside the island (north: mean = 0.03, max = 0.41; south: mean = 0.03, max = 0.61; Fig. 6B). Linkage disequilibrium within the mtDNA-linked islands of divergence varies between neighbouring markers in both transects, suggesting that while some loci recombine often between mitogroups, others experience reduced recombination (Fig. 6C).
The fact that regions of high and low LD are interspersed throughout the chromosome 1A mtDNA-linked island of divergence suggests that this region contains several co-adapted N-mt genes that maintain epistatic interactions with the mitochondrial genome and with each other (Fig. 6C).
Strong pairwise and mitochondrial-nuclear LD in chromosome 1A mtDNA-linked island of divergence also suggests that this region underwent a selective sweep. To further investigate this we tested whether the islands of divergence also exhibit reduced genetic diversity, by calculating the observed heterozygosity for each of the four mitogroups. Consistent with the expectation of a hard selective sweep, we observed low heterozygosity in the region of the mtDNA-linked islands of divergence (Fig. 7) (Maynard Smith & Haigh 1974). Together, low intra-lineage genetic diversity and high LD make a strong case for a selective sweep within the chromosome 1A mtDNA-linked island of divergence (Kim & Nielsen 2004), as previously inferred for the EYR mitochondrial genome (Morales et al. 2015).
DISCUSSION
We used a high-density genome scan and replicated sampling to estimate genome-wide genetic differentiation and introgression between two parapatric lineages that undergo nuclear gene flow despite having two very different mitogenomes and occupying contrasting climates. Nuclear genomic differentiation and restricted introgression is concentrated in a ~15.4 Mb genomic region on chromosome 1A that has signatures of a selective sweep, consistent with evidence for selective sweeps for the mitochondrial genome (Morales et al. 2015). This mtDNA-linked island of divergence contains an overrepresentation of known nuclear encoded genes with mitochondrial function (N-mt genes) that, together with mitochondrial genes, contribute to OXPHOS complexes and related metabolic functions. N-mt genes on chromosome 1A are maintained in a region of high LD that is thus inferred to experience reduced recombination. Therefore, we propose that mitonuclear divergence is maintained by strong selection and genomic architecture that favours advantageous combinations of mitochondrial and nuclear alleles (i.e. mitonuclear interactions) and that prevents the assembly of mismatched combinations (i.e. mitonuclear incompatibilities).
Mitochondrial-nuclear divergence in the face of nuclear gene flow
We presented replicated evidence for heterogeneous levels of genomic differentiation and introgression between two deeply divergent, parapatric EYR mitolineages experiencing nuclear gene flow. Nuclear genetic differentiation is concentrated in two mtDNA-linked islands of divergence, one on chromosome 1A and one on Z. To our knowledge, this the first report of genomic islands of divergence implicated in mitonuclear co-evolution. Our explanation for the emergence of this pattern in the EYR agrees with the presumption that genomic islands of divergence experience reduced gene flow in comparison to the rest of the genome (i.e. divergence-with-gene-flow; Feder et al. 2012; Nosil et al. 2009; Smadja & Butlin 2011). This expectation considers that reduced gene flow can result from the differential effects of reproductive barriers across the genome.
Reproductive barriers can prevent individuals from undergoing hybridization (prezygotic barriers) or, once hybridization happens, prevent hybrid individuals from further rounds of breeding (postzygotic barriers) (Coyne & Orr 2004). In EYR, evidence of nuclear gene flow between mitogroups in each transect suggest that hybridization does occur, but individuals with mixed nuclear and mitochondrial genetic backgrounds are found only in low densities within the contact zones in each transect. Thus, is likely that mismatched mitonuclear combinations in the form of mitonuclear genetic incompatibilities on EYR chromosome 1A are selected against in hybrids, effectively reinforcing the inland-coastal divergence (Burton & Barreto 2012; Lindtke & Buerkle 2015). Genetic incompatibilities however, are unlikely to be solely responsible for reproductive barriers in EYR. Climatic adaptation is another likely mechanism to contribute in the accumulation of genetic incompatibilities and reproductive barriers (Qvarnström et al. 2016). Accordingly, intrinsic reproductive isolation in the form of genetic incompatibilities is commonly observed in systems adapted to contrasting environment that represent extrinsic reproductive barriers (Bernatchez et al. 2010; Keller & Seehausen 2012; Sobel & Chen 2014).
The intimate feedback between environmental conditions and mitochondrial metabolism offers a plausible mechanism for the maintenance of reproductive isolation by the joint effect of intrinsic and extrinsic barriers in the EYR (Arnqvist et al. 2010; Boratyński et al. 2016; Boratyński et al. 2014; McFarlane et al. 2016; Pereira et al. 2014; Sambatti et al. 2008). EYR has several hallmarks of adaptive evolution to local climates, so it is likely that environmental variation plays an important role in the evolution of reproductive barriers in this system. For example, despite EYR individuals being capable of travelling several km per day (Debus & Ford 2012), individuals with alternative mitochondrial types are rarely found across the sharp climatic divide between inland and coastal mitolineages (Fig. 1A). Moreover, we showed that mitochondrial and N-mt alleles segregate in the same inland versus coastal direction in north and south nuclear backgrounds, despite having diverged ~2 MYA (Morales et al. 2016); suggesting a replicated pattern of ecological divergence. To disentangle the effect of environmental-based selection from that of genetic incompatibilities and their interactions, fitness of individuals with different mitonuclear combinations under contrasting climatic conditions, and in common garden experiments need to be assessed (Boratyński et al. 2016; McFarlane et al. 2016).
Genomic islands of divergence can also emerge during divergence-without-gene-flow (Wolf & Ellegren 2016). For example, islands of divergence could be the consequence rather than cause of differentiation accumulated during periods of reduced gene flow, that are subsequently maintained by differential rates of linkage, recombination, mutation or background selection (Cruickshank & Hahn 2014; Noor & Bennett 2009). In EYR, coalescent models indicate non-zero gene flow between populations that first diverged in north-south and then in inland-coast directions: hence prolonged periods of population isolation during EYR evolution are unlikely (Morales et al. 2016). Moreover, heterogeneous rates of recombination and mutation cannot fully explain the heterogeneous divergence observed in EYR because genomic islands of differentiation were restricted to two chromosomes rather than found scattered across the genome (Ellegren et al. 2012; Marques et al. 2016; Poelstra et al. 2014).
Background selection remains as a possible explanation for the elevated differentiation at mtDNA-linked islands of divergence in EYR. Specifically for chromosome 1A, our evidence indicates that selective sweeps could have locally decreased genetic diversity and increased divergence at linked sites (Charlesworth et al. 1993; Maynard Smith & Haigh 1974). Complete genomic sequences are needed to comprehensively explore mitonuclear differentiation and background selection, together with demographic models that account for heterogeneous levels of gene flow and recombination (Cruickshank & Hahn 2014; Roux et al. 2016a; Schrider et al. 2016).
Candidates for mitonuclear interactions include OXPHOS (complex I) and mitochondrial regulatory genes
We propose that N-mt genes within the EYR chromosome 1A island of divergence have functional roles in climate adaptation. Three EYR mitonuclear candidates (Table 3) encode supernumerary subunits of complex I, the primary input into the mitochondrial respiratory chain (Nicholls & Ferguson 2013). This builds on previous inferences of positive selection on three mitochondrially-encoded amino acids in genes ND4 and ND4L (Morales et al. 2015), which form two of the four proton channels in the complex. The five nuclear candidates identified here and mitochondrial ones in Morales et al. (2015) occur in functionally related regions in complex I structures, with nuclear NDUFA6, NDUFA12, and mitochondrial mt-ND4L contributing to the main coupling interface, and NDUFB2 and mt-ND4 occurring in the distal proton channels (Fig. S8). Thus, structural or mechanistic complementarity between these substitutions may have driven co-evolution of mitochondrial and N-mt genes. The efficiency of mitochondrial energy-coupling is determined by how effectively complex I propagates conformational changes from the hydrophilic domain to the proton channels (Fiedorczuk et al. 2016; Zhu et al. 2016). Hence, it is reasonable to speculate that compatibility between the complex I variants will affect the volume of ATP synthesized and amount of heat generated by EYR mitochondria, thereby influencing environmental fitness and climate adaptation of the organism. Consistently, studies show mitochondrial genes from complex I register the highest number of documented cases of positive selection in a wide variety of taxa (Garvin et al. 2014). Mutations observed in complex V assembly factor FMC1, required for heat-shock adaptation in yeast (Lefebvre-Legendre et al. 2001; Schwimmer et al. 2005), also suggest links between energy-coupling and climate adaptation. Moreover, we found N-mt genes involved in modulation of mtDNA gene expression in the chromosome 1A island of divergence, notably a tRNA synthetase gene (mt-TyrRS) for a tRNA (tRNATyr) previously shown to diverge between EYR mitolineages (Morales et al. 2015). In Drosophila, hybrids harbouring incompatible combinations of mt-TyrRS and tRNATyr suffered delayed development and reduced fecundity, exacerbated at high temperatures (Hoekstra et al. 2013; Meiklejohn et al. 2013).
Genomic architecture: the evolution of a mitonuclear supergene?
The maintenance of matching mitonuclear combinations relies on a favourable genomic architecture that allows strong co-inheritance of interacting N-mt and mitochondrial genes (Lindtke & Buerkle 2015; Ortiz-Barrientos et al. 2016). Upon mitochondrial divergence, N-mt genes are expected to undergo compensatory evolution to maintain or restore mitochondrial functioning (Havird & Sloan 2016; Osada & Akashi 2012). In line with this expectation, we found that genes within the island of divergence on chromosome 1A underwent a selective sweep consistent with that previously identified in EYR’s mitogenome (Morales et al. 2015). Within the chromosome 1A island of divergence, several but not all genes are in high linkage disequilibrium, suggesting the formation of haplotype blocks of co-adapted N-mt genes available to be co-inherited with the mitochondrial genome.
Recombination reduction is an efficient mechanism to promote genomic architecture that protects the integrity of locally adapted suites of alleles (Kirkpatrick & Barton 2006; Ortiz-Barrientos et al. 2016; Yeaman & Whitlock 2011). The inferred location of the mtDNA-linked island of divergence on chromosome 1A overlaps the position of the chromosome’s centromere in zebra finch (Warren et al. 2010). Centromeres can undergo reduced recombination and disproportionately contribute to genetic divergence (Butlin 2005; Kawakami et al. 2014); thus, given that the recombination landscape in birds is highly conserved (Singhal et al. 2015) it is likely that suppressed recombination within EYR's chromosome 1A plays an important role in the species’ observed mitonuclear divergence.
Based on the arguments above and the present data, we propose that the joint action of recombination reduction and strong ecological selection likely maintain functional interactions between tightly linked co-adapted N-mt genes. Specifically, we contend that EYR’s chromosome 1A island of divergence behaves as a ‘mitonuclear supergene’ (Schwander et al. 2014; Thompson & Jiggins 2014). Given that in this study we rely on the inferred genomic location of EYR markers based on the zebra finch arrangement, it is of crucial importance to study the genomic architecture of EYR’s chromosome 1A to better understand how the proposed supergene could have assembled over evolutionary time (Hooper 2016). For example, chromosomal rearrangements are commonly implicated with the formation of supergenes, but we do not yet have information on this (Küpper et al. 2016; Tuttle et al. 2016). Moreover, the functional and phenotypic consequences of alternative N-mt complexes and their interaction with alternative mitochondrial types remain to be tested.
Sex-chromosome evolution and male-mediated gene flow
The Z sex chromosome was also highly divergent between mitogroups in both transects, and presented a significant genomic island of divergence in the south, with an apparently similar but less statistically supported island in the north (Fig. 3). Sex chromosomes are hotspots of genetic incompatibilities and often play a major role in speciation (Charlesworth et al. 1987; Coyne & Orr 2004; Qvarnström & Bailey 2009).
Given its lower recombination rates and smaller effective population sizes compared to autosomes, the Z chromosome experiences faster evolutionary rates and disproportionate fitness effects in hybrids, even in the absence of selection (Mank et al. 2010). Genetic incompatibilities have stronger effects in the heterogametic sex (in birds, ZW females) due to patterns of expression of recessive sex-linked genes (reviewed in Edwards et al. 2005; Haldane 1922). Reduced gene flow of females compared to males under Haldane’s Rule could explain mitochondrial divergence with nuclear gene flow in the EYR, despite females being the dispersive sex (Harrisson et al. 2012; Pavlova et al. 2013). Future field measurements should confirm whether female hybrids have lower fitness than hybrid males (Beekman et al. 2014).
Given the evolutionary properties of sex chromosomes, it has been proposed that genes directly related to species recognition and hybrid incompatibilities could reside on the Z chromosome in birds (Qvarnström & Bailey 2009). Based on this, Hilland Johnson (2013) proposed that sexual selection could drive females to choose males with compatible mitonuclear genes, to avoid the negative fitness effects of hybridization. Under this mitonuclear sexual selection theory, the recognition mechanism would be most efficient if genes coding for a male trait related to mitonuclear metabolism (e.g. ornamentation or song) were Z-linked. A previous study found limited evidence for sexual dimorphism in colour, but there were small but significant colour differences between inland and coastal individuals (Morales et al. in press). Future field experiments should test whether birds of different mitogroups mate assortatively.
METHODS
Samples and mitolineage identification
We determined the EYR mitolineage for 418 individuals (Nmito-A = 270; Nmito-B = 148; red and blue, respectively, on Fig. 1A) using ND2 sequences (including 100 from Genbank: accession numbers KC466740 - KC466839; Table S3). DNeasy Blood and Tissue Kit (Qiagen, Germany) was used to extract DNA. PCRs were performed following Pavlova et al. (2013) and sequenced commercially (Macrogen, Korea).
Climatic variables
Two BIOCLIM (Hijmans et al. 2005) variables were best at predicting EYR species distribution and explaining mitolineage divergence of EYR (Morales et al. 2015; Pavlova et al. 2013): maximum temperature of the warmest month (BIOCLIM 5) and minimum precipitation of the driest month (BIOCLIM 14). We used these variables to estimate the magnitude and significance of the correlation between distribution of mitolineages and climatic variation using a Pearson’s r test in R (R Development Core Team 2014). We used the package raster to manipulate BIOCLIM layers (Hijmans 2014).
Sequencing, genotyping and mapping
We genotyped samples using the reduced representation approach implemented in DarTseq (Diversity Arrays Technology, Australia). Four methods of complexity reduction were tested in EYR (data not presented) considering both the size of the representation and the fraction of a genome selected for assays, and the PstI-NspI method was selected. DNA samples were processed in digestion/ligation reactions following Kilian et al. (2012) but replacing a single PstI-compatible adaptor with two different adaptors corresponding to two different Restriction Enzyme (RE) overhangs. The PsfI-compatible adapter was designed to include Illumina flowcell attachment sequence, sequencing primer sequence and a varying length barcode region, following Elshire et al. (2011). The reverse adapter contained a flowcell attachment region and NspI-compatible overhang sequence. Only fragments that contained both adaptors (PstI-NspI) were amplified in 30 rounds of PCR using the following reaction conditions: 94 °C for 1 minute, 30 cycles of 94 °C for 20 seconds, 58 °C for 30 seconds, 72 °C for 45 seconds, and 72 °C for 7 minutes. After PCR, equimolar amounts of amplification products from each sample were combined and applied to c-Bot (Illumina, United States) bridge PCR, followed by 77 single read sequencing cycles on Illumina Hiseq2000.
Sequences generated from each lane were processed using proprietary DArT analytical pipelines. In the primary pipeline poor quality sequences were filtered out, applying more stringent selection criteria to the barcode region compared to the rest of the sequence. In that way the assignments of the sequences to specific samples carried in the ‘barcode split’ step were highly reliable. Approximately 2,000,000 sequences per barcode/sample were identified and used in marker calling. Finally, identical sequences were combined and cleaned using DArT P/L’s proprietary algorithm to correct low-quality bases from singleton tags into correct bases using combined tags with multiple members as a template. The clean file was used in the secondary pipeline for SNP-calling algorithms (DArTsoft14, unpublished). For Single Nucleotide Polymorphism (SNP) calling, all tags from all libraries included in the analysis were clustered using DArT P/L’s C++ algorithm at the threshold distance of 3. Tags that mapped to the mitochondrial genome were filtered out at this point. Then, clusters were parsed into separate SNP loci using a range of technical parameters in order to balance read counts for the allelic pairs. Additional selection criteria were added to the algorithm based on analysis of approximately 1,000 controlled cross populations. Testing a range of tag count parameters facilitated selection of true allelic variants from paralogous sequences. In addition, SNP calls were confirmed in independent libraries and sequencing runs. Multiple samples were processed from DNA to allelic calls as technical replicates and scoring consistency was used as the main selection criteria for high quality/low error rate markers. Individual SNP calling quality was assured by high average read depth per locus (>12).
A total of 68,258 DArT-tags were retained containing 97,070 SNP markers (raw data are deposited to dryad doi: XXXX). Of these, 46,709 DArT-tags contained only one SNP, 15,741 contained two SNPs, and 5,808 contained more than two SNPs. We obtained a main SNP dataset for the majority of the analyses by removing SNPs with more than 20% missing data (main dataset after filtering = 60,444 SNPs). Given the stringent filters to detect sequencing errors already incorporated in the DArT pipeline (including confirmation of SNPs in independent libraries and sequencing runs), we did not filter SNPs by Minor Allele Frequency (MAF) in the main dataset, but results ignoring SNPs with MAF < 5% were qualitatively the same (data not shown). A second reduced SNP dataset was obtained for methods that are sensitive to missing data and presence of rare alleles, by filtering-out SNPs with more than 10% missing data and a MAF < 10% (reduced dataset = 27,912 SNPs). Both datasets were independently filtered for each transect, and most analyses were performed with the main SNP dataset unless otherwise stated.
The approximate genomic position of each SNP was obtained by mapping DArT-tags to the reference genome of the zebra finch Taeniopygia guttata, taeGut3.2.4 (Warren et al. 2010) using BLASTn v.2.3.0 (Altschul et al. 1990; Camacho et al. 2009) and parameter string: -evalue 1E-4 -word_size 11 -gapopen 5 -gapextend 2 -penalty -3 -reward 2. Only unique hits were considered. Despite ~40 MY of divergence between EYR and zebra finch, high gene order conservation in Passeriformes (Derjusheva et al. 2004; Griffin et al. 2008) enabled such mapping. Moreover, the same SNPs mapped with equally high confidence to the flycatcher reference genome (results not shown; Ellegren et al. 2012). Of 42,503 DArT-tags from the main dataset, 23,560 mapped to a single location in the Zebra Finch genome across all chromosomes, except the micro-chromosomes 16, LG2 and LG5, and the female-specific W-chromosome (absent in the Zebra Finch reference genome). Even distribution of DArT-tags across genome was confirmed by strong positive correlation of number of tags with chromosome size (Pearson’s r = 0.99; Fig. S10). For a summary of results of non-mapped loci see Fig. S11.
Analyses of genetic structure
Principal Component Analysis (PCA) was used to summarize genetic structure using dudi.pca function in Adegenet 2.0.1 (Jombart 2008). To identify fine-scale population structure across each transect, we assigned cluster memberships to each individual using the admixture model with correlated allele frequencies implemented in STRUCTURE v.2.3.4 (Earl 2012; Jakobsson & Rosenberg 2007; Pritchard et al. 2000). STRUCTURE assumes that loci are in linkage and Hardy-Weinberg (HWE) equilibria. Hence, we first used PLINK v.1.07 (Purcell et al. 2007) with a sliding-window size of 50 SNPs and step sizes of 5 SNPs to identify clusters of loci in LD (r2 ≥ 0.2) within each chromosome and randomly selected one marker per cluster. Second, we filtered out loci not in HWE with the HWE.test.genind function in the R package adegenet (Jombart & Ahmed 2011). The initial significance level (α < 0.05) of the HWE test was corrected for multiple tests using the B-Y method to account for False Discovery Rate following Narum (2006). To reduce Wahlund effect when testing for HWE, individual samples were classified into populations based on their geographic location (Fig. S12). After filtering our loci violating linkage and HWE equilibria, 12,198 SNPs were retained. Each STRUCTURE analysis involved 25 independent Markov chains of 200,000 iterations of burn-in and 140,000 recorded iterations, for one to five possible genetic clusters (K = 1 to 5). Convergence of the parameters alpha and Log-Likelihood across chains was determined for every K value with custom R scripts (github: XXXXX). Results were summarised and the optimal number of clusters estimated with the Evanno test (Evanno et al. 2005) in STRUCTURE HARVESTER v.0.6.94 (Earl 2012). Average individual Q-values (cluster membership) across replicates were obtained with CLUMPP v.1.1.2 (Jakobsson & Rosenberg 2007).
Genetic differentiation between mitogroups and identification of outliers
We evaluated level of differentiation between mitogroups for each transect to identify outlier loci, i.e. loci of extreme level of differentiation compared to the average genetic differentiation. To distinguish true outliers (e.g. markers subject to divergent selection) from false positives (e.g. markers subjected to stochasticity of genetic drift and population history; Hoban et al. 2016), we used three methods that incorporate corrections for demographic history and that differ in their approach and assumptions. We performed each analysis independently for each transect:
1) FST-outlier detection at fine spatial scales:
We measured per marker genetic differentiation between mitogroups (Fst) at very fine spatial scales using only samples within a 40-km radius from the centre of the contact zone between the mitogroups in each transect (north mito-A = 11 and north mito-B = 20; south mito-A = 20 and south mito-B = 20). EYR disperses 2-25 km each generation (Debus & Ford 2012; N. Amos, unpublished analysis of Australian Bird and Bat Banding Scheme data). Thus, our assumption for this test is that at short distances the confounding effect of genetic drift will be reduced, and differentiation is more likely to reflect real divergence between mitogroups (true positives). We estimated Weir and Cockerham’s FST with the diffCalc function of the R package DiveRsity (Keenan et al. 2013; Weir & Cockerham 1984). The upper 1% quantile of FST values were considered outlier loci.
2) Correlations between nuclear loci and mitochondrial membership with BayScEnv:
We used BayeScEnv to detect loci that depart from neutral expectations (outliers) based on their FST values and associations of allele frequencies with environmental variables, while correcting for the confounding effects of population history. SNPs are assigned either to a neutral model or one of two non-neutral models: the environmental correlation model and the locus-specificmodel (Villemereuil & Gaggiotti 2015). We used the mitolineage membership of each location-based “population” as a binomial environmental variable (Coop et al. 2010), i.e. 1 for mito-A and -1 for mito-B (north: mito-A = 25 and mito-B = 27; south mito-A = 71 and mito- B = 32). Samples were grouped into populations according to their geographic location (Fig. S12). Twenty pilot runs of 4000 iterations were used, with a burn-in of 80,000 iterations and samples taken every 10 steps. Convergence of every run was confirmed using the R package coda (Plummer et al. 2006). Outlier loci were defined after correcting for multiple testing with a False Discovery Rate (FDR) significance threshold of 5%. We assigned equal prior probability to both non-neutral models, but only the environmental correlation model produced outliers (correlation with mitolineage membership).
3) Principal component analysis:
We estimated genetic differentiation across each transect without assuming any kind of prior grouping using PCAdapt. PCAdapt uses a hierarchical Bayesian model to determine population structure with latent factors (K, analogue to PCA axis) and identify outlier loci that disproportionately contribute to explain each of the K factors (Duforet-Frebourg et al. 2016; Luu et al. 2016). Among the important differences from other methods are that PCAdapt does not rely on FST estimates, does not require classifying individuals into populations, performs well across a range of demographic models and it is agnostic to environmental information (or mitolineage membership in our case). An initial inspection of 76 K factors revealed that while the first seven K explain most of the genetic variation, only K=1 differentiated well between mitogroups in both transects (Fig. S13). Accordingly, we decided to perform the analysis with K=2 and extracted only those outliers that loaded more strongly in the direction of K=1, i.e. those related to the differentiation between mitogroups.
Identification of mtDNA-linked islands of divergence
Hidden Markov Models (HMM) are useful to identify genomic regions that contain contiguous loci of high differentiation without having to rely on methods that require defining arbitrary sliding windows sizes (Hofer et al. 2012). HMM assumes that genetic differentiation changes across the genome between hidden states and assigns each SNP to a given state level and identifies state transitions. We defined three hidden states of genetic differentiation, low, intermediate and high. Genomic islands of divergence are defined and statistically tested as clusters of contiguous SNPs that belong to a same state and do not include state transitions,. We modelled genetic differentiation using the cumulative distribution function of q-values from the BayeScEnv analysis with mitochondrial membership (outlier detection method #2 above). We considered only chromosomal regions of high differentiation after multiple-testing correction with a FDR significance threshold of 1%. To avoid potential ascertainment biases related to rare variants, we used only SNPs with minor allele frequencies > 10% following Marques et al. (2016). HMM analyses are commonly done per-chromosome in genome re-sequencing studies, however for some chromosomes we did not have a large number of markers available. To increase the power of our analysis, we modelled hidden state changes across the entire genome. This decision did not bias our results because we did not identify any significant state transition between chromosomes (Marques et al. 2016). The analysis was performed with the R package HiddenMarkov (Harte 2015) (GitHub: XXX).
Allelic frequency correlations
To test whether alleles from outlier loci were segregating in the same direction in both divergent nuclear genomic backgrounds we performed a correlation analysis of allelic frequencies between mitogroups in north and south transects. For this, the frequency of a given allele (drawn at random) was calculated in each of the four mitogroups for every locus. The individual allelic frequencies were pooled into group of equal number of loci: one group with 249 outlier loci (outliers identified in both transects) and 240 groups of randomly selected non-outlier loci, each containing 249 loci (i.e. random distribution). For each group we compared allelic frequencies between transects within mitogroups, i.e. two comparisons: north mito-A versus south mito-A and north mito-B versus south mito-B. Significant differences between allelic correlations of outlier loci and the random distribution were assessed with a t-test in R. The entire methods was implemented with a custom R script (GitHub: XXX).
Geographic cline analysis
We examined how allelic frequencies change as a function of geographic distance across each transect. We used the R package hzar (Derryberry et al. 2014), which implements a MCMC approach to fit allelic frequency data to cline models (Barton & Gale 1993). We used nuclear allele frequencies (from 0 to 1) and mitochondrial haplotypes (mitoA = 0 and mitoB = 1) to fit the cline models. Given that hzar is a computationally intensive program, we reduced the analysed dataset by randomly selecting one SNP every 100 Kb along each chromosome (2494 + mtDNA = 2495 data points per transect). This decision is unlikely to impact our results because SNPs in close physical proximity would provide redundant results and the selected data points cover the entire genome nevertheless. For each transect, we projected the location of each individual along a unidimensional axis that captured the mitochondrial divergence, and calculated the distance of each sample to a common geographic point, westernmost sample in the north and northernmost sample in the south (Fig. S7). We independently fitted three models, all of which estimate cline centre (i.e. geographic location of allelic frequency change) and width (i.e. the slope of the cline) but differ by the fitting of cline tails (i.e. the exponential decay curve). Model I fitted fixed cline parameters and no tails. Model II fitted variable cline parameters and no tails. Model III fitted variable cline parameters and two mirroring tails. Models were compared first against a neutral cline model (flat, no clinal variation) and then to each other. Model comparisons were performed with the Akaike information criterion (AICc) and the best model was accepted only if its AICc was more than two units better than the null model.
Functional significance of candidates for mitochondrial-nuclear interactions
We extracted gene IDs for all the annotated genes of the reference zebra finch genome (accessed November 2015, Cunningham et al. 2015). From this list we counted genes with functional annotations for mitochondrial activity (i.e. general N-mt genes) and a subset of those encoding supernumerary subunits and assembly factors for OXPHOS complexes (i.e. OXPHOS genes). We performed the counts within each of the mtDNA-linked islands of divergence detected by HMM and equal-sized regions across the whole genome (i.e. random distribution) with a custom R script (Github: XXXX). General N-mt genes were obtained from the Zebra Finch ENSEMBL database (GO term: 0005739) and OXPHOS genes from the Zebra Finch accession of the Kyoto Encyclopedia of Genes and Genomes (KEGG, accessed June 2015; GO term: 0006119; Kanehisa & Goto 2000). For each island of divergence we computed the probability that the observed count was significantly higher than the random distribution with the t.test function in R.
The products of the OXPHOS genes identified within the chromosome 1A mtDNA-linked island of divergence (see results) were mapped to the 4.2 Å resolution cryo-EM structure of bovine mitochondrial complex I (Zhu et al. 2016) and visualized using the molecular graphics program UCSF Chimera (Pettersen et al. 2004). We also mapped fixed amino acid replacements between mitolineages found to be under positive selection by Morales et al. (2015) following the same method.
Genetic diversity and Linkage Disequilibrium (LD)
We calculated observed heterozygosity as a proxy for per-locus genetic diversity using basicStats function of the R package DiveRsity (Keenan et al. 2013). We calculated pairwise LD per chromosome and the rate of LD decay as a function of physical distance across the entire genome (all chromosomes together) and independently for two chromosomes with mtDNA-linked islands of divergence, 1A and Z. For each of the four mitogroups, we first estimated LD (r2) between each pair of SNP markers from the reduced SNP dataset within each chromosome with PLINK (Purcell et al. 2007). We then calculated the overall decay of LD against distance using the formula introduced by Hilland Weir (1988): where C is the population recombination parameter (4Ner) and n the sample size. We approximated C by fitting a nonlinear regression as implemented in Marroni et al. (2011).
We estimated mitochondrial-nuclear LD following Sloan et al. (2015) with a custom perl script (electronic supplementary material, file S1 from Sloan et al. 2015), using the reduced SNP dataset. This method calculates the correlation between nuclear and mitochondrial alleles by testing one randomly selected nuclear allele against its mitolineage membership in each transect, and assigns statistical significance with a Fisher’s exact test. P-values were calculated by Monte Carlo simulations (1 × 106 replicates) and adjusted with a FDR significance threshold of 5%.
ACKNOWLEDGMENTS
Funding was provided by the Australian Research Council Linkage Grant (LP0776322), the Holsworth Wildlife Research Endowment (2012001942) and Stuart Leslie Bird Research Award from BirdLife Australia. HM was funded by a Monash Graduate Scholarship (MGS), a Monash Faculty of Science Dean’s International Postgraduate Research Scholarship and a Monash Postgraduate Publication Award. Genomic analyses were undertaken at the Monash Sun Grid high-performance compute facility. Field samples were collected under scientific research permits issued by the Victorian Department of Environment and Primary Industries (numbers 10007165, 10005919 and 10005514) and New South Wales Office of Environment and Heritage (SL100886). We are grateful to Leo Joseph, Robert Palmer, Holly Sitters and Christine Connelly for providing genetic samples. Anders Gonçalves da Silva, David Marques and Victor Soria-Carrasco provided valuable inputs regarding data analysis. Jonci Wolf provided valuable assistance to understand functional properties of the mitonuclear candidates. Thanks to Scott Edwards, Mike Webster, Lynna Kvistad and Stephanie Falk for comments on early versions of the manuscript.