Genome-wide admixture is common across the Heliconius radiation

How frequent is gene flow between species? The pattern of evolution is typically portrayed as a phylogenetic tree, implying that speciation is a series of splits between lineages. Yet gene flow between good species is increasingly recognized as an important mechanism in the diversification of radiations, often spreading adaptive traits and leading to a complex pattern of phylogenetic incongruence. This process has thus far been studied in cases involving few species, or geographically restricted to spaces like islands, but not on the scale of a continental radiation. Previous studies have documented gene flow, adaptive introgression and hybrid speciation in a small subsection of the charismatic Neotropical butterflies Heliconius. Using genome-wide resequencing of 40 out of 45 species in the genus we demonstrate for the first time that admixture has played a role throughout the evolution of Heliconius and the sister genus Eueides. Modelling of phylogenetic networks based on 6848 orthologous autosomal genes (Maximum Pseudo-Likelihood Networks) or 5,483,419 high quality SNPs (Ancestral Recombination Graph) uncovers nine new cases of interspecific gene flow at up to half of the genome. However, f4 statistics of admixture show that the extent of the process has varied between subgenera. Evidence for introgression is found at all five loci controlling the colour and shape of the mimetic wing patterns, including in the putative hybrid species H. hecalesia, characterised by an unusual hindwing. Due to hybridization and incomplete coalescence during rapid speciation, individual gene trees show rampant discordance. Although reduced gene flow and faster coalescence are expected at the Z chromosome, we discover high levels of conflict between the 416 sex-linked loci. Despite this discordant pattern, both concatenation and multispecies coalescent approaches yield surprisingly consistent and fully supported genome-wide phylogenies. We conclude that the imposition of the bifurcating tree model without testing for interspecific gene flow may distort our perception of adaptive radiations and thus the ability to study trait evolution in a comparative framework.


INTRODUCTION
Interspecific hybridization and the resulting gene flow across porous species barriers are increasingly recognized as major processes in evolution, detectable at many taxonomic levels (Abbott et al. 2013;Abbott et al. 2016;Schumer et al. 2014;Feliner et al. 2017). This conceptual shift has been facilitated by whole-genome sequencing yielding precise quantitative estimates of introgression (Schrider et al. 2018). However, there are other factors that create the pattern of phylogenetic discordance in genomic data including retention of ancient polymorphism and population structure, incomplete lineage sorting, selection, gene shuffling and duplication. Phylogenomics has massively expanded the potential of comparative molecular biology to elucidate these phenomena in both ancient clades [e.g. birds (Jarvis et al. 2014); mammals (Song et al. 2012)] and recently diverged taxa [e.g.
The few genomic studies of recent radiations published to date confirm the prediction of high discordance among independent markers. Analyses of 16 Anopheles species and all 48 families of birds (Jarvis et al. 2014) found that no individual gene tree, whether based on protein-coding sequences or large sliding windows, is identical to the species tree topology. This phenomenon appears to be partially attributable to incomplete lineage sorting (ILS) in rapidly radiating groups exposed to novel ecological opportunities, including Darwin's finches speciating after the colonisation of the Galapagos archipelago (Lamichhaney et al. 2015), mammals in response to climatic conditions (Song et al. 2012), and birds after the K/T extinction event (Jarvis et al. 2014;Jarvis 2016).
Discordance due to gene flow between species that are not fully reproductively isolated has been detected in several cases, ranging from humans (Meyer et al. 2012;Sankararaman et al. 2016) to sunflowers (Renaut et al. 2014). In many such taxa genome-wide studies demonstrate both stochastic gene flow across the species barrier, as well as adaptive introgression, where natural selection acts to fix introgressed variants. Alleles at a major locus contributing to the variation in beak shape among Darwin's finches have introgressed between multiple species (Lamichhaney et al. 2015). Similarly, selection has driven a block of transcription factor binding sites from the ecological generalist Drosophila simulans to the specialist D. secchelia (Brand et al. 2013), and recent malaria prevention efforts have caused a resistance allele to sweep across Anopheles gambiae populations and into A.
colluzzi (Clarkson et al. 2014;Miles et al. 2017). Nonetheless, the extent and importance of hybridisation in fueling speciation remain hotly contested (Schumer et al. 2014;Feliner et al. 2017;Schumer et al. 2018). Furthermore, gene flow has been less frequently considered than ILS in phylogenetic studies of radiations. One reason is perhaps that modelling gene flow poses greater challenges to computational methods (Wen et al. 2018); for example, almost opposite conclusions were drawn from two methodologically different studies of the same assemblage of Xiphophorus swordtail fish (Cui et al. 2013;Kang et al. 2013 (Table 1). The natural propensity of Heliconius and the sister genus Eueides to produce hybrids in the wild (Mallet et al. 2007; Dasmahapatra et al. 2007) has generated an early interest in the genetic porosity of the species barriers (Beltrán et al. 2002;Bull et al. 2006). Further in-depth studies revealed that ecologically divergent species in the melpomene-cydno-silvaniform clade (MCS) can share variation at up to 40% of the genome in sympatry (Kronforst et al. 2006;Nadeau et al. 2012;Nadeau et al. 2013;Mérot et al. 2013;Kronforst et al. 2013;Martin et al. 2018). Loci responsible for aposematic and mimetic wing patterns are especially likely to be shared between species (Heliconius Genome Consortium 2012; Pardo-Diaz et al. 2012;Enciso et al. 2017;Jay et al. 2018), providing a source of genetic variation in a strongly selected trait Heliconius heurippa, also belonging to the MCS clade, remains the best documented case of homoploid hybrid speciation (Salazar et al. 2010;Schumer et al. 2014), in which the adaptive red wing pattern introgressed from H. melpomene into the H. cydno background contributes to reproductive isolation between species (Mavarez et al. 2006). Introgression at the cortex locus between H. cydno and H. melpomene facilitated the occupation of atypical mimetic niches in both species (Enciso et al. 2017). More surprisingly, the complex Dennis/Ray pattern found in H. melpomene and relatives, as well as in H. elevatus, is a product of bidirectional adaptive introgression between lineages up to 5 million years apart (Heliconius Genome Consortium 2012; Wallbank et al. 2016). Similarly, introgression of an inversion fixed in H. pardalinus triggered the evolution of polymorphic wing patterns in H. numata (Jay et al. 2018).
It is virtually unknown whether the phenomena documented in the relatively recently emerged (4.5-3.5MYA) (Kozak et al. 2015) MCS clade are typical of the genus as a whole. Museum collection data suggest that hybridisation happens most frequently among the MCS species (Mallet et al. 2007), which can also be hybridised in captivity (Gilbert 2003). However, many hybrids are also known between H. erato and H. himera, as well as within the genus Eueides (Mallet et al. 2007). Here we generate alignments for 7324 orthologous, protein-coding genes obtained from a comprehensive whole-genome resequencing data set of 40 of the 45 recognized Heliconius species, six of the 12 Euides species (the most closely related genus) and two outgroup species to investigate the prevalence of hybridisation in Heliconius, quantify its extent, and compare the processes producing discordance at several loci. We demonstrate varied amounts of phylogenetic incongruence across the genus, related to heterogenous levels of interspecific gene flow. Instances of hybridisation at adaptive loci and more broadly across the genome, are found across the radiation, although they are far more frequent in the relatives of H. melpomene.
Our study dissects the evolution of two genera across a continent to demonstrate that gene flow can be a ubiquitous and significant force in shaping an adaptive radiation, and to provide first estimates of the probability of such gene flow and adaptive introgression. We show that a misleadingly well-supported and resolved tree can be recovered despite incongruence, and that previously unknown, complex hybridisation event can thus be missed. We propose that phylogenetic networks should become de rigour tools in the study of adaptive radiations.

A genomic tree for the heliconiines
We first constructed a bifurcating phylogeny for our genome-wide data. Mapping genome-wide short read data from 144 individuals in 45 species (S1 Table) to the H. melpomene reference did not result in full coverage of the distantly related species (S2 Table), but we were able to recover 7264 high quality, orthologous CDS alignments with less than 4% missing data (S3 Table). Filtering for autosomal exome sites with biallelic, non-singleton SNPs without missing data produced a 122,913 bp supermatrix. This matrix generated a Maximum Likelihood tree that is resolved with full bootstrap support (S1 Figure), except for uncertain placement of the H. telesiphe/hortense/clysonymus clade (bootstrap support 62/100). This concatenation tree is not substantially different from the results obtained by either ASTRAL or MP-EST multispecies coalescent methods (Fig. 1), which model the species tree from the distribution of gene trees. Both MSC methods also yield well resolved phylogenies and, in the case of ASTRAL, completely supported topologies (Fig. 1). The MSC trees differ from each other at only two relatively recent splits, while the concatenation phylogeny differs from ASTRAL at three out of 56 ( Fig. 1).
At one level, these well resolved trees clarified some uncertainties from previous work (Beltran et al. 2007;Kozak et al. 2015), including placement of the H. hecuba and H. egeria groups, relations in the H. sapho clade, and the position of H. besckei. Nonetheless, similar to other large phylogenetic studies (Brawand et al. 2014;Jarvis et al. 2014;Fontaine et al. 2014), none of the individual gene trees showed exactly the same topology as the species tree. making us realise that the well-supported bifurcating trees do not fully represent the underlying signal in the genomes within this clade.

Genome-wide phylogenetic incongruence
Rapid radiations of closely related species are likely to show considerable phylogenetic incongruence due to incomplete lineage sorting and introgressive hybridization. This is especially likely to be the case in Heliconius, where population sizes are large (Flanagan et al. 2010), speciation events are clustered in time (Kozak et al. 2015), inter-specific hybrids are found in the wild (Mallet et al. 2007), and there is strong evidence for extensive introgression of closely related sympatric species Martin et al. 2018). Indeed, despite the apparently well-supported phylogeny, we found considerable phylogenetic incongruence across the genome in our data. For example, Robinson-Foulds pairwise distance was 0.745 out of 1.0 for the 6848 autosomal phylogenies and 0.699 for the 416 loci on the sex-linked Z chromosome, indicating that any two gene trees are likely to contain multiple differing partitions. Among the 56 nodes between species and major subspecies, less than a half (26)  Brower and Garzon-Orduña (2017) suggested that the phylogenetic incongruence among Heliconiini is an artifact of missing data. This is unlikely, as the 7324 nucleotide matrices recovered here are nearly complete (>96%) and only 0.87% of the sites were removed as incomplete or ambiguous (S3 Table). Phylogenetic studies using modern statistical methods are typically robust to far higher levels of missing data (Wiens & Morrill 2011;Roure et al. 2013).
It has been suggested that sex-linked markers might be more reliable than autosomes for phylogenetic reconstruction, due to faster coalescence at lower effective population size (¾ of the autosomal Ne at the Z), and reduced likelihood of hybridisation as a result of Haldane's Rule (Zhang et al. 2013;Fontaine et al. 2014). We found that there is indeed significantly greater concordance between species tree and the Z chromosome trees, as compared to autosomal trees (gene tree-species tree triplet distance: 10.3*10 6 vs 12.5*10 6 , Wilcoxon's test p=3*10 -11 ) (S3 Figure). Notably, many nodes within  et al. 2014) showing deeper coalescence and lower levels of phylogenetic discordance at the X chromosomes. A comparison of gene tree discordance levels, and coalescent trees between autosomes and the Z demonstrates that the sex chromosome in Heliconius is likely subject to slightly lower levels of gene flow (S4 Figure). Nonetheless, the Z-linked consensus did not resolve the relative positions of the major clades unequivocally (S2-S3 Figures) and there were roughly similar levels of conflict with the species tree between sex-linked and autosomal loci (S5 Figure). Thus, even the Z chromosome yields inconsistent the phylogenic estimates (S4 Figure). This may be a result of an ancient introgression of the entire chromsome, since with our larger species sampling we confirm previous demonstration of Z chromosome incongruity between the MCS clade and the Silvaniform group containing H. ethilla, H. hecale and H. pardalinus (S6 Figure) (Zhang et al. 2016).
Similarly, the whole mitochondrial phylogeny was strongly conflicted with the multispecies coalescent trees (Fig. 1), demonstrating that organellar history is unlikely to be a good proxy for phylogeny at the species level (Hurst & Jiggins 2005). Generally, the same branches tended to be short and unsupported in concatenation and coalescent analyses of mitochondrion-, Z-and autosome-linked markers ( Fig. 1, S1-S3 Figure, S6 Figure). This corroborates the idea that some of the diversification events among Heliconiini occurred nearly simultaneously and are unlikely to be resolved in a bifurcating tree.

Hybridisation between species has been common throughout the radiation of Heliconius
We next sought to document the extent and degree of discordance across the phylogeny, and distinguish between incomplete lineage sorting and gene flow. We did this in two ways, first modeling the extent of hybridization in the history of the radiation in the software PhyloNet, using the Maximum Pseudo-Likelihood network algorithm for distinguishing admixture from Incomplete Lineage Sorting based on gene tree topologies and branch lengths (Yu & Nakhleh 2015). Here, an inference of speciation history networks from 6848 gene trees revealed a pattern of gene sharing within all major clades of the radiation (Fig. 2). This was particularly true within the H. melpomene, H. cydno and Silvaniform clade (MCS), where our approach supports previous work documenting extensive admixture, including the hybrid speciation of Heliconius heurippa (Mavarez et al. 2006;Salazar et al. 2008;Salazar et a. 2010), admixture between H. cydno/timareta and subspecies of H. melpomene (Nadeau et al. 2013;Enciso et al. 2017), and the exchange between H. melpomene and H. elevatus (Heliconius Genome Consortium 2012; Wallbank et al. 2016). This result made us confident that outcomes of this analysis can be trusted.
In the small clades containing H. egeria, H. hecuba and H. aoede, we found similar evidence for multiple instances of admixture, particularly between all species in the H. hecuba quartet (Fig. 2B).
Gene flow is also apparent between the ancestors of modern clades, including a potential hybrid origin of H. aoede. This finding explain the unusual combinations of morphological and ecological traits in these species, which used to be classified in separate genera (Brown et al. 1981;Brower 1994) prior to molecular evidence (Beltran et al. 2007;Kozak et al. 2015). Finally, in the large group of H. erato (sensu Brown 1981)we detected admixtures between the triplet of (H. telesiphe,(clysonymus, hortense)) and both H. erato and H. sara clades ( Fig. 2A). Notably, there are fewer instances of reticulation than in all the other groups consistent with other evidence (Fig. 4, S9 Figure). The most likely network estimated for the genus Eueides contains an unusual reticulation pattern leading to none of the included taxa (S1 Figure). We suspected that this pattern may be a sampling artefact, as only half of the species are included with one individual each, and divergence from the reference genome makes the Eueides alignments less complete and reliable.
Although MPL networks are currently the most robust and computationally efficient tools to model gene flow in face of ILS (Wen et al. 2018), it is unknown if the estimates are sensitive to errors in gene tree estimation and missing data as in case of Eueides. To account for this possibility we also used the Ancestral Recombination Graph algorithm implemented in TreeMix (Pickrell & Pritchard 2012), which analyses genome-wide variation under a Gaussian drift model, interpreting deviations from expected differentiation as evidence for gene flow. When applied to the SNP supermatrix, TreeMix again detects known events in the MCS clade, confirming the robustness of the technique, even when applied at the relatively deep level of a genus (Fig 3).
The ARG shows gene flow from H. hecalesia into the telesiphe clade, and from the erato lineage into H. hecalesia (Fig. 3). This suggests a hybrid origin for Heliconius hecalesia ( Fig. 2A). In contrast, and somewhat surprisingly, we find no evidence for a hybrid origin of H. hermathena. This species, found in the white sand forests of Brazil, has a wing pattern resembling a combination of the zebra-patterned of H. charithonia and the red-banded H. erato hydara (Brown & Benson 1977;Mavárez & Linares 2008;Mavárez et al. 2006).
In the small clades including H. egeria, H. hecuba and H. aoede, we find evidence for pervasive admixture, particularly between all species in the hecuba quartet (Fig. 2B). In contrast, TreeMix detects only a single gene flow event from H. aoede into the egeria/hecuba clade (Fig. 3). Gene flow is also apparent between the ancestors of modern subgenera, including a potential hybrid origin of H.
aoede, which may explain the unusual trait combinations in these species that were sometimes classified in separate genera (Brown et al. 1981;Beltran et al. 2007;Kozak et al. 2015) (Fig. 2, 3).
The Treemix ARG and PhyloNet network approaches do not uncover an exactly identical set of introgression results, as the ARG does not detect the exchanges in and between the small H. egeria and H. hecuba clades, or some of the events previously documented between species of the Silvaniform group (Jay et al. 2018) and H. melpomene (Zhang et al. 2016). The discrepancies are expected between two widely different techniques, and the Treemix algorithm assumes that the underlying sequence of events was largely tree-like (Pickrell and Pritchard 2012). Although Treemix is designed for populations and less commonly used to examine divergent species (but see ), it infers a basic tree similar to that found by the phylogenetic approaches that do not model introgression ( Fig. 1).

Distributions of allele frequencies show that the extent of introgression between species varies
between clades within Heliconius. When all possible combinations of taxa are examined, most of the quartets in the MCS clade (22 species, 3309790 SNPs) show some admixture (f4<0; Z-score<-4.47 after Bonferroni correction for the number of lineages) (Fig 4). In contrast, most f4 values computed from the H. erato group (20 species, 3152880 SNPs) are not significant and thus indicate a relatively better fit to a simple bifurcating tree (Peter 2016). This summary statistic allows a crude comparison of levels of admixture across the genus, but it remains computationally difficult to examine gene flow between more than a handful of taxa using these approaches (Martin et al. 2015;Pease & Hahn 2015).

Adaptive introgression at the wing pattern loci
A characteristic aspect of Heliconius evolution is the adaptive introgression of five colour pattern loci, corresponding to a small number of protein-coding genes ( Table 2) (reviewed in Kronforst and Papa 2015). For many of these genes more recent work has identified intervals that are associated with specific aspects of color and pattern that are thought to contain regulatory regions responsible for differential expression (Table 2). Introgression driven by selection for Mullerian mimicry has been documented extensively (Salazar et al. 2010;Pardo-Diaz et al. 2012;Heliconius Genome Consortium 2012;Wallbank et al. 2016;Zhang et al. 2016;Jay et al. 2017), but only in the H. melpomene and Silvaniform (MCS) clade of Heliconius, which includes less than a third of all species. By comparing the topologies at the colour pattern loci for nearly all species in the genus against the overall species tree, we provide a uniform framework in which to gauge the amount of introgression across around these functionally important regions. Topologies around color pattern loci differed from the species tree (p<0.001, SH test) and in many cases, primarily at the optix and cortex loci, showed multiple departures from the expected topology (Table 1). The majority of the differences are found in the MCS clade, where introgression reaches considerable complexity across genomic and geographic regions (Wallbank et al. 2016;Enciso et al. 2017). Again, we verify our approach by recovering the known signals the optix and cortex loci (Table 1), but it is the first time that introgression is demonstrated in other genomic intervals.
In contrast, the H. hecalesia/clysonymus/hortense is the only case of wing pattern introgression found among the 19 species of the H. sara/erato/telesiphe subgenus. No pattern introgression is found among the three smaller clades (for example Fig. 6), despite evidence for gene flow in other regions ( Fig. 2, 3). Although we cannot rule out an influence of sampling bias, as substantially more individuals have been sequenced in the MCS group, our data nonetheless indicate that introgression around color pattern loci is more pervasive in this clade.
We find new evidence for gene flow at the cortex locus (Nadeau et al. 2016) which is responsible for the diverse white and yellow patterns across the genus (Sheppard et al. 1985;Kronforst & Papa 2015;. At a region previously highlighted (Nadeau et al. 2012 (Table 1) (Wallbank et al. 2016). A large number of genomic studies of interspecific gene flow have found introgressions of small genome regions driven by natural selection for critical adaptations, such as the hypoxia resistance EPAS1 haplotype (Denisovans → anatomically modern Tibetans) (Huerta-Sánchez et al. 2014), the Vgsc-1014F insecticide-resistance mutation (Anopheles gambiae → A. colluzzi) (Clarkson et al. 2014;Norris et al. 2015), or the ALX1 alleles determining diverse beak shapes among Darwin's finches (Geospiza) (Lamichhaney et al. 2015).
Whereas adaptive introgression has been localised to a single part of the genome in all but one other animals studied to date (but see Stryjewski & Sorenson 2017)

Mosaic genome in Heliconius hecalesia
We identify the first cases of gene flow among distantly related species in the H. erato clade, demonstrating that admixture has not been limited to the well-studied MCS group. We infer two instances of gene flow. into both the ancestor of H. hortense and H. clysonymus, and into H. hecalesia.
The position of the (H. telesiphe,(hortense,clysonymus)) triplet (THC) is the only unsupported branch in the genome-wide phylogeny ( Fig. 1; S2 Figure), and no specific placement is found in a majority of gene trees (IC=0; S3 Figure). This uncertainty is not a result of coalescent stochasticity, as both MPL networks and TreeMix account for ILS and still recover signals of admixture from both the clades of H. erato and H. sara into THC ( Fig. 2A, 3). H. hecalesia latter has especially complex ancestry, and although usually grouped with H. erato in simple trees ( Fig. 1) (Kozak et al. 2015), appears to be nearly equally diverged from H. erato, H. telesiphe and H. sara (Fig. 5). Results of network modelling are corroborated by the D statistic of admixture (Durand et al. 2011), which is highly positive and statistically significant for all tests where H. hecalesia is the recipient of admixture from the H. clysonymus and H. sara clades (Table 3). However, consistent with the pattern of genome-wide  (Table 3), raising the possibility that admixture took place between the ancestors of modern clades, leading to incongruence deep in the tree (S1 Figure).
For the first time we find evidence outside of the MSC clade for introgression of multiple colour pattern alleles, from H. hecalesia into the sympatric THC lineage, possibly due to selective pressure on mimetic appearance. H. clysonymus and H. hortense display the unusually wide red/orange bar on the hindwing, absent from their relative H. telesiphe, but shared with the distantly related H. hecalesia (Fig. 5). Based on the phenotype we predict that, at the key wing patterning loci (Table 2), sequences from these three species should cluster together to the exclusion of H. telesiphe.
Sequences from the three species indeed cluster exactly at the 360-380kbp interval of scaffold HE670865 (aLRT>0.95; p<0.001, SH test), which aligns to the specific region of the D locus controlling the red pattern in H. erato (Supple et al. 2013) (Fig. 6). This indicates that the alleles governing the red pattern in the three species are more similar than expected from the genome-wide phylogenies. As predicted if the sequences were introgressed after species divergence, the alleles from the three phenotypically similar butterflies appear younger (2.3-2.1 MA) in RelTime estimates than those of H. telesiphe (2.9MA), and much more recent than the original split between H. hecalesia and the THC clade (6.28-4.18 MA) (Kozak et al. 2015).
Whereas optix colours the band red , the shape of bands is typically Adaptive gene flow between the three species is plausible, as H. clysonymus x H. hecalesia and H. hortense x H. hecalesia hybrids are known from the wild (Mallet et al. 2007), and H. hecalesia is sympatric with the other two species in parts of its range (Rosser et al. 2012). The subtle changes in the wing morphology of H. hecalesia across its range appear driven primarily by sexually-dimorphic mimicry of species in the Ithomiinae, with a wider hindwing band found further away from the Equator (Brown & Benson 1975). It seems likely that the precise mimicry between the distantly related H. hecalesia and H. clysonymus/hortense has resulted from localised introgression of regulatory loci in sympatry, similar to that seen between H. melpomene/elevatus (Wallbank et al. 2016).

Neither concatenation nor coalescent trees adequately represent species history
There has been a marked shift over recent years away from phylogenetic methods that involve concatenation of data towards approaches that involve coalescent modelling. The proposed methods for inferring a species tree under the coalescent by modelling the incomplete sorting of loci represent a great improvement on the assumption that there is a common evolutionary history across all genome regions (Liu et al. 2010;Mirarab et al. 2014;Heled & Drummond 2010). Across the tree of life, from birds (Jarvis et al. 2014;Reddy et al. 2017) and mammals (Song et al. 2012) to fungi (Shen et al. 2016), treatment of individual gene trees under multispecies coalescent methods has yielded substantially different results to simple concatenation. However, both approaches still impose a bifurcating tree on the data and imply that speciation consists of a series of splits (Hahn & Nakhleh 2016). When applied to the adaptive radiation of Heliconius, both philosophies appear to produce results that fail to fully describe the evolutionary history of the species. Network modelling clearly demonstrates that introgression has happened throughout the evolution of the genus, and yet this process could easily be overlooked even with many state of the art phylogenetic methods. Appropriate reconstruction of speciation history needs to incorporate approaches that test for admixture more routinely (Long & Kubatko 2017;Pease & Hahn 2015). The main appeal of studying adaptive radiations is their power for analyzing trait evolution in a comparative framework, and a growing number of studies are looking at several Heliconius characters through this lens (Mallet et al. 2007;Briscoe et al. 2013;. Commonly, trait histories are best represented by exactly those regions that show high levels of introgression, and a comparative approach that used a single bifurcating species tree would give highly incomplete results. To accurately reflect the hidden uncertainties in the phylogeny of this exciting group, and to capture the potential of traits to be shared between species by genome-wide introgression, future work needs to utilize novel approaches that reflect the non-bifurcating reality of evolving genomes (Hahn & Nakhleh 2016;Bastide et al. 2017).  species of Heliconius, 6/12 Eueides and the monotypic Dryadula and Agraulis (S1 Table).

Samples and DNA sequencing
Sequencing was performed using the Illumina technology (S1 Table) with 100 bp paired-end reads, insert sizes of 250 -500 bp and read coverage from 12x to 110x. In case of 13 new samples, DNA was extracted with the DNeasy Blood and Tissue kit (Qiagen) from 30-50 µg of thorax tissue homogenised in buffer ATL using the TissueLyser (Qiagen); purified by digesting with RnaseA (Qiagen); and quantified on a Qubit v.1 spectrophotometer (LifeTechnologies). The Beijing Genomics Institute constructed and sequenced whole-genome libraries on a HiSeq 2500 with 500 bp insert size to ~50x coverage (S1 Table).

Read mapping and genotyping
Raw reads were checked using  Table). Alignments were sorted with Samtools

Exome alignments and gene trees
Protein-coding genes can be effectively treated as discrete markers for multilocus phylogenetics (Edwards et al. 2016) with lower level of homoplasy than the non-coding sequence (Brawand et al. 2014). We minimized paralogy by narrowing the gene set to 1:1:1 orthologs between H. melpomene, Danaus plexippus (Zhan et al. 2011) and Bombyx mori (Xia et al. 2004) identified by OrthoMCL (Li et al. 2003; Heliconius Genome Consortium 2012). 6848 autosomal gene alignments, excluding those found on the colour pattern scaffolds (Heliconius Genome Consortium 2012), and 416 Z-linked single-copy genes were extracted using the tabular "calls file" format [gene_fasta_from_reseq.py (Martin 2017)]. TrimAl v1.2 (Capella- Gutiérrez et al. 2009) was used to remove the taxa for which 50% of residues did not overlap with at least 50% of the other sequences, while uninformative fast-evolving regions were excluded by Block Mapping and Gathering with Entropy (BMGE) (Criscuolo & Gribaldo 2010). Individual ML gene trees were estimated in FastTree v2.1 (Price et al. 2010) with parametric aLRT nodal support (Anisimova & Gascuel 2006) .

Incongruence in the data
To assess the level of topological variation in the gene trees, the average normalised Robinson-Foulds distance (RF) (Robinson & Foulds 1981) between all pairs of phylogenies was computed in PAUP* v4 (Swofford 2002). The calculation was repeated after trimming to 57 individuals representing species or highly distinct subspecies (e.g. the three geographic clades of H. melpomene), thus eliminating the noise from lack of intraspecific resolution. The 57 samples were chosen based on the quality of the genotype calls (S1 Table). In addition, the average distance between all possible triples of taxa represented in full gene trees -a measure similar to the RF -was calculated in MP-EST (Liu et al. 2010). To identify incongruent nodes, 50% Majority Rule consensi were calculated from the trimmed gene trees and the relative support for branches was evaluated under the Internode Certainty criteria (IC/ICA/TCA), which compare the support for a branch and for all alternatives found in the distribution (Salichos & Rokas 2013;). The effect of poor resolution in some gene trees was accounted for by repeating the procedure with the 1000 most resolved and supported phylogenies. To quantify the gene-species tree disagreements, we computed what proportion of gene trees contain the quartets found in the ASTRAL-III phylogeny (-t1) (Sayyari & Mirarab 2016).

Species trees
Naïve "total evidence" phylogenies were estimated from concatenated genome-wide SNPs in RAxML v8 with 100 bootstrap replicates, GTR+Γ model and ascertainment bias correction (Stamatakis 2014). The history of the matriline was approximated from the whole-mitochondrial Two Multispecies Coalescent (MSC) approaches were used to estimate the species tree under the assumption of Incomplete Lineage Sorting. MP-EST v1.4 maximises a pseudo-likelihood function over the distribution of taxon triples extracted from gene tree topologies (Liu et al. 2010). 145 individual tips were assigned to species a priori (S1 Table), treating Western and Eastern clades of H. melpomene and H. erato as distinct (Nadeau et al. 2013;van Belleghem et al. 2017). Bootstrap support was evaluated by repeating 100 times with random samples of 500 gene trees. Since MP-EST may be misled by errors in gene tree reconstruction (Mirarab & Warnow 2015), we compared the results with ASTRAL-III, a fast quartet method that deals with polytomies and low support values in the input .

Modelling hybridisation
Following the identification of problematic nodes based on ICA and MSC trees, we applied three radically distinct approaches to disentangle ILS from hybridization (Hahn & Nakhleh 2016;Peter 2016;Scornavacca & Galtier 2016). First, we determined the Ancestral Recombination Graph (ARG) in TreeMix v1.13 by identifying the pairs of taxa sharing more than the expected proportion of allelic variation (Pickrell & Pritchard 2012). Relations between major lineages and species were inferred from allele frequency data computed in PLINK! v2 (Purcell 2009;Chang et al. 2015) after Linkage Disequilibrium prunning (Gaunt et al. 2007), removing sites with less than 95% complete data, and grouping SNPs into blocks of 100. Models were fitted with zero, 10 or 20 migration events, treating H. aoede as an arbitrary outgroup for rooting.
Second, we tested the fit of the data to the bifurcating tree ideal by calculating the f4 statistic (Reich et al. 2009), a difference in allele frequencies of two taxa pairs that can be interpreted as a measure of treeness (Peter 2016), similar to the D statistic (Durand et al. 2011). The f4 was calculated for all possible quartets of taxa in each clade using the fourpop routine in TreeMix on the SNP data described above, determining the statistical significance of the results by jacknifing. erato from Amazonia, and the outgroup was always H. melpomene. To test the possibility that the lack of resolution at one node in the larger phylogeny ( Fig. 1; S1 Figure) is caused by ancient admixture prior to the formation of current species, we conducted tests treating entire clades as taxa (Table 3).
The extent of similarity between species clusters was illustrated with PCAs of genome-wide variation, calculated for the H. erato/sara clade in the R package adegenet (Jombart & Ahmed 2011). A chronogram for the putatively introgressing B/D red pattern intervals (HE670865: 320-380 and 410-440 kbp) was estimated with RelTime.

Introgression at the colour pattern loci
Out of 4309 scaffolds, the five fragments containing introgressing loci associated with aposematic wing phenotypes (Table 2) were treated separately. Each scaffold alignment was partitioned into windows of 20 kbp, sliding by 10 kbp, discarding windows with less than 1000 bp of data [script sliPhy3.py (Martin 2017)]. The topology for every window was reconstructed with FastTree, tested for significant differences from the MP-EST species tree using the SH test (Shimodaira & Hasegawa 1989) in RAxML and inspected visually. In order to understand precisely how the Hmel1 reference corresponds to the specific red control loci of H. erato (Supple et al. 2013), the B/D scaffold HE670865 was aligned against the H. erato B/D BACs (Papa et al. 2008) with mLAGAN (Brudno et al. 2003).   The key loci are named differently in divergent species (Sheppard et al. 1985), but in all cases the approximate genomic location has been identified, good candidate genes have been identified and intervals containing functional variation have been localized (Reed et al., 2011;Martin et al., 2012;Nadeau et al. 2016;Wallbank et al., 2016;Van Belleghem et al. 2017
Most clades within Heliconius (as defined in Brown 1981 andKozak et al. 2015) were sampled thoroughly, except the inclusion of only one out of four species in the subgenus Neruda (see Chapter 2 for taxonomy). Among the 23 Heliconius species with multiple samples, the number of individuals ranged from two to 25, the number of geographically distinct populations was between one and 12, and the number of colour pattern races ranged from one to 14 (S1 Table).
The depth and number of mapped reads decreased predictably with increasing divergence from the reference (  Table). More distant taxa showed naturally higher variation from the reference, which simultaneously reduced the overall number of mapped sites. In total, 126,865,683 SNPs were identified, a number driven primarily by taxon-specific differences from the reference, but also by high variability in the densely sampled MCS group. For instance, private SNPs constituted 30% of the total and 11.5*10 6 private variants were discovered just in the H. melpomene samples. 5,483,419 SNPs were found in the exome. Overall, these findings are consistent with previous reports for the MCS group    Incongruence is higher at the autosomes than the Z sex chromosome. Mean proportion of gene tree quartets supporting all nodes as calculated in ASTRAL is plotted for all groups of Heliconius; Eueides; and the deep nodes that link them. Higher support values indicate less incongruence between individual gene trees.