Abstract
Previous phylogeographic studies of the lion (Panthera leo) have offered increased insight into the distribution of genetic diversity, as well as a revised taxonomy which now distinguishes between a northern (Panthera leo leo) and a southern (Panthera leo melanochaita) subspecies. However, existing whole range phylogeographic studies on lions focused on mitochondrial DNA and/or a limited set of microsatellites. The geographic extent of genetic lineages and their phylogenetic relationships remain uncertain, clouded by incomplete lineage sorting and sex-biased dispersal. In this study we present results of whole genome sequencing and subsequent variant calling in ten lions sampled throughout the geographic range. This resulted in the discovery of >150,000 Single Nucleotide Polymorphisms (SNPs), of which ~100,000 SNPs had sufficient coverage in at least half of the individuals. Phylogenetic analyses revealed the same basal split between northern and southern populations as had previously been reported using other genetic markers. Further, we designed a SNP panel, including 125 nuclear and 14 mitochondrial SNPs, which was subsequently tested on >200 lions from across their range. Results allow us to assign individuals to one of the four major clades: West & Central Africa, India, East Africa, or Southern Africa. This SNP panel can have important applications, not only for studying populations on a local geographic scale, but also for guiding conservation management of ex situ populations, and for tracing samples of unknown origin for forensic purposes. These genomic resources should not only contribute to our understanding of the evolutionary history of the lion, but may also play a crucial role in conservation efforts aimed at protecting the species in its full diversity.
Background
Recent developments in next generation sequencing (NGS) techniques allow for the application of massive parallel sequencing to non-model organisms [1, 2], such as the lion (Panthera leo). As a result, both the evolutionary history of a species and population histories can be reconstructed based on vastly expanded datasets [3, 4]. Apart from improved insight into patterns of biodiversity, this type of data can inform conservation efforts on how best to maintain this diversity [5, 6].
The lion (Panthera leo) has been the subject of several phylogeographic studies which have provided insights into the evolution and distribution of genetic diversity in the African populations (formerly subspecies Panthera leo leo) and its connection to the Asiatic population (formerly subspecies Panthera leo persica), located in the Gir forest, India. These studies included data from mitochondrial DNA (mtDNA) [7–16], autosomal DNA [12–14, 16] and subtype variation in lion Feline Immunodeficiency Virus (FIVPle) [13]. Studies using mitochondrial markers and/or complete mitogenomes, describe a basal dichotomy, consisting of a northern group that includes populations from West and Central Africa as well as the Asiatic population (formerly subspecies P. leo persica), and a southern group with populations from East and Southern Africa [7, 10, 12, 15]. Another study focused on populations in Tanzania, but also included three populations from West and Central Africa, and confirmed this distinction based on Single Nucleotide Polymorphisms (SNPs) [16]. The distinction of lions in West and Central Africa was further corroborated by autosomal microsatellite data [14]. However, because of limited variation in the remaining Asiatic population as a result of past bottlenecks [13, 17], it is impossible to reliably infer the evolutionary relationship between African and Asiatic populations from these type of data. Nevertheless, these newly described evolutionary relationships led to a review of the taxonomic split between the African and Asiatic subspecies, resulting in a revision of the taxonomy which now recognizes a northern subspecies (Panthera leo leo) and a southern subspecies (Panthera leo melanochaita) [18].
However, the approaches used in the studies mentioned above have a number of limitations. For example, mtDNA cannot identify admixture and represents only a single locus, potentially misrepresenting phylogenetic relationships due to the stochastic nature of the coalescent [19, 20]. In addition, sex-biased dispersal and gene flow will alter patterns derived from mitochondrial versus nuclear data. Female lions exhibit strong philopatry whereas male lions are capable dispersers [21, 22], indicating that phylogeographic patterns based on mtDNA in lions may overestimate divergence between populations. Inference of phylogenetic relationships from microsatellite data, on the other hand, is problematic due to their high variability and their mutation pattern. Finally, most studies mentioned above were limited by sparse coverage of populations, notably in West and Central Africa. In particular, FIVPle prevalence is geographically restricted. Additional data from genome-wide markers, covering more lion populations and different conservation units, are necessary to overcome these restrictions. This will improve our understanding of the spatial distribution of diversity in the lion, as well as help guide future conservation efforts.
Here, we describe the discovery and phylogenetic analysis of genome-wide SNPs based on whole genomes and complete mitogenomes of ten lions, providing a comprehensive overview of the intraspecific genomic diversity. We further developed a SNP panel consisting of a subset of the discovered SNPs, which was then used for genotyping >200 samples from 14 lion range states, representing almost the entire current distribution of the lion. This resolves phylogeographic breaks at a finer spatial resolution, and may serve as a reference dataset for future studies. Finally, we discuss the applications and future directions of high-throughput genotyping for wildlife research and conservation that, hopefully, will contribute to future studies on lion genomics.
Materials & Methods
Sampling
Blood or tissue samples of ten lions, representing the main phylogeographic groups as identified in previous studies [7, 10, 13–15] (Figure 1: map, Supplemental Table 1), were collected and preserved in a buffer solution (0.15 M NaCl, 0.05 M Tris-HCl, 0.001 M EDTA, pH = 7.5) at −20°C. All individuals included were either free-ranging lions or captive lions with proper documentation of their breeding history and with no known occurrences of hybridization between aforementioned lineages. A sample from a leopard (Panthera pardus orientalis, captive) was included as an outgroup. The tiger genome [23] was used as a reference for mapping of the lion and leopard reads. All samples were collected in full compliance with specific legally required permits (CITES and permits related to national legislation in the countries of origin). Details on laboratory protocols, sequencing, assembly, SNP calling, and quality control are given in Supplemental Information 1.
Whole genome data and complete mitogenomes
Identified SNPs were attributed to a chromosome following the genomic architecture in the tiger [23] (Supplemental Table 2). For the nuclear SNPs, we applied 5 levels of filtering, in which we included 1) all SNPs, 2) SNPs called in at least three samples, 3) SNPs called in at least five samples, 4) SNPs called in at least eight samples, and 5) SNPs called in all samples (Supplemental Table 2). Positions with coverage <3 were replaced by an ambiguous nucleotide. Full mitogenomes were recovered as described in Supplemental Information 1 and [15].
Phylogenetic analyses were performed on the full mitogenomes and on the concatenated SNP datasets with varying levels of missing data with MrBayes v.3.1.2 [24, 25] and Garli [26] using parameters as determined by MrModeltest2 (v.2.3) [27]. MrBayes and Garli were run for one million generations and five million generations respectively, using a GTR substitution model with rate variation across sites set to equal. In addition, we ran SVDquartets [28], assuming multi-locus unlinked single-site data with 100 bootstrap replicates as implemented in PAUP* 4.0a164 [29]. For the mitogenome data, the coalescent process in the model was disregarded. Nodes receiving >95% PP in Bayesian analysis (MrBayes) and/or 0.7 bootstrap support in Maximum Likelihood (ML) analysis (Garli) and SVDquartets are considered to have significant support. Individual ancestry coefficients were estimated using sparse nonnegative matrix factorization algorithms in sNMF, as implemented in the R package LEA [30, 31], exploring K=1-10, using 20 replicates and 4 values for the alpha regularization parameter (1, 10, 100 and 1,000).
SNP panel data
In order to obtain better insight into the geographic locations of phylogeographic breaks, we made a selection of SNPs for inclusion in a SNP panel for genotyping more samples. We used the following criteria for the selection process: 1) minimum coverage of 20 for all lion samples combined, across 50 bp upstream and downstream of the SNP position, 2) maximum of one variable position in these 50 bp flanking regions, 3) high quality mapping of the flanking regions as identified by eye using IGV Genome Browser [32, 33], 4) SNPs evenly spread across all chromosomes, as implied by the chromosomal architecture in tiger (max. one SNP per scaffold), 5) preferably both homozygotes and heterozygote genotype present among the ten genotyped lions, otherwise both alleles identified. In addition, we included a total of 14 mitochondrial SNPs selected to represent each major branch within the mitochondrial phylogenetic tree (as described in [15]), as well as the distinction between the northern and the southern subspecies is represented by two SNPs. The selected mtDNA SNPs had already been assessed in a wide range of populations [15], making it more likely that the selected SNPs are diagnostic throughout the lion range. Finally, we ensured that the SNPs were not located in any of the nuclear copies (numts) which are known to exist in cats [34–38]. After test runs and quality control (see Supplemental Information 1), we retained 125 nuclear SNPs and 14 mtDNA SNPs which were used for genotyping 211 lions of known origin from 14 lion range states, representing the entire geographic range of the species (Figure 1: map).
Resulting nuclear genotypes were analysed with STRUCTURE [39], using correlated allele frequencies and running the program for 5,000,000 generations, discarding the first 500,000 generation as burn-in. Five replicates were run for K=1 to K=7. The optimal number of K was assessed by using the DeltaK method as implemented in STRUCTURE Harvester [40]; CLUMPP [41] was used before generating the barplots of population assignment. To reduce the effect of missing data on the assignment results, STRUCTURE was then repeated including only the samples with >75% of the SNPs successfully called (N=171). MtDNA SNPs were used to assign a specific haplotype to each individual, matching with previously described lineages (see delineation of haplogroups in [15]): West Africa, Central Africa, North East Africa, East/Southern Africa, South West Africa, and India.
A Principal Component Analysis (PCA) was performed in Genalex [42]. PCA was repeated, only retaining samples with >75% of the SNPs successfully called (171 individuals), and excluding the Asiatic population.
Patterns of connective zones and barriers were investigated using Estimating Effective Migration Surfaces (EEMS) [43] for all individuals genotyped with the SNP panel. Three independent runs were performed, using 10 million generations and discarding the first 5 million generations as burn-in. We followed the author’s suggestions for tuning the proposal variances, using 0.1 and 1 for mSeedsProposalS2 and mEffctProposalS2 respectively, and 1.5 and 0.015 for qSeedsProposalS2 and qEffctProposalS2 respectively.
Heterozygosity
The level of heterozygosity was assessed for each individual for which the whole genome had been sequenced, ignoring ambiguous nucleotides (i.e. positions which were not scored due to insufficient quality or coverage). SNP panel results were included if several individuals from the same population had been included. Results were then compared to known levels of heterozygosity based on earlier studies using microsatellite data from the same populations [12, 14].
Results
Sequencing
The sequencing runs yielded a total of 6.5⋅108 reads. Following quality control, a total of 5.9⋅108 reads (94.4%) were retained for subsequent alignment (Supplemental Table 1). Filtering of variable positions in lions yielded a total of 155,678 SNPs. Upon filtering for positions which had reliable callings (i.e. coverage ≥3), we retained 118,270 SNPs of which 98,952 SNPs were called in at least five individuals (see Supplemental Table 2 for results for different levels of missing data). Missing data ranged from 90% (Benin) to 10% (Kenya) for all variable positions. Results reported for downstream phylogenetic analysis are based on the lion-specific variable positions which could be called in at least half of the included samples (98,952 SNPs).
Whole genome data and complete mitogenomes
Phylogenetic analyses, based on 98,952 SNPs, show a well-supported dichotomy between the northern populations (Benin, Cameroon, DRC, India) and the southern ones (Somalia, Kenya, Zambia1, Zambia2, Republic of South Africa (RSA), Namibia) (Figure 1: left tree). Phylogenetic trees based on the mitochondrial genomes show the same basal dichotomy, with the Asiatic population nested within Central Africa (Figure 1: right tree), as was previously reported [15]. However, it must be noted that the individual from RSA contains a haplotype from Namibia, likely the result of historic translocation, as was previously described [15]. MrBayes, Garli and SVDquartets resulted largely in the same topology, although the three methods do not agree on the relationships of Zambia1, Zambia2, RSA and Namibia based on the nuclear SNPs. Clustering of these ten individuals, as inferred by sNMF, identifies the same basal split into a northern and a southern cluster (Figure 2: map and inserted barplot).
SNP panel data
A total of 211 samples from across the entire range of the lion were genotyped for 125 nuclear SNPs and 14 mitochondrial SNPs. Missing data for the nuclear SNPs ranged from to 0 to 115 with a median value of 11; for the mitochondrial data, these ranged from 0 to 10 with a median value of 0 (Supplemental Table 3, Supplemental Figure 1). STRUCTURE suggests an optimal number of four clusters (Figure 2: graph) corresponding to West & Central Africa, India, East Africa, and Southern Africa (Figure 2: lower barplot). A lower peak can be detected for K=2, distinguishing roughly between the northern and the southern subspecies with India forming a strong separate cluster (Figure 2: upper barplot). Assignment values to clusters and to mitochondrial haplogroups are reported in Supplemental Table 4. In the PCA, African and Asiatic lions are distinguished as two clouds (A and 3C). Removing the Asiatic population reveals more structure within African lions (Figure 3B), which becomes increasingly apparent when only including samples with >75% of their SNPs successfully called (N=171) are included (Figure 3D). This results in two nearly distinct clouds representing the northern (West and Central Africa) and the southern (East and Southern Africa) subspecies.
EEMS infers corridors and barriers for dispersal and gene flow from spatial the decay of genetic similarity. Results show that the Central African rainforest is highlighted as a barrier (indicated with orange shading, Figure 4: upper row), whether or not the Asiatic population is included in the analysis. Barriers are further identified between East and Southern Africa, and across the Arabian peninsula. Diversity indices illustrate that the Asiatic population has a much lower genetic diversity than the African populations (Figure 4: bottom left). After repeating the analysis with only African populations, low genetic diversity is detected in West Africa and Southern Africa (Figure 4: bottom right).
Heterozygosity
Comparisons of levels of heterozygosity to previously published data [12, 14] (Supplemental Table 5) show that ranking between SNP data and microsatellite data is roughly congruent. However, for the estimates based on whole genome data, levels of heterozygosity are likely to be underestimated in the samples with low coverage (notably Benin and RSA).
Discussion
This study is the first to use whole genome sequencing in an attempt to describe the full genomic diversity over the lion’s range. Phylogenetic analyses show strong congruence with previously published patterns, confirming the basal dichotomy separating a northern and a southern subspecies in the lion. The use of a SNP panel provides further insight into phylogeographic patterns at a higher spatial resolution.
Whole genome data and complete mitogenomes
Although genome-wide SNPs can be a powerful marker, ascertainment bias as a result of the study’s design is a concern [44]. To reflect the full diversity of the species, it is necessary to base the SNP discovery on samples representing all lineages. MtDNA has proven to be a useful genetic marker for gaining insight into phylogeographic patterns, partly because of its shorter coalescence time compared to nuclear markers. However, this may lead to ‘oversplitting’, reflecting fully coalesced groups based on mtDNA data, but incomplete lineage sorting of nuclear DNA (nuDNA) alleles. Different types of markers used in lion phylogeographic studies show largely congruent results with some local discrepancies (e.g. widespread East/Southern Africa haplogroup not recovered from nuDNA data [14], admixture in Kruger area [15, 45]). Together, they provide useful criteria for selecting populations to be subjected to whole genome sequencing.
Based on the whole genome data, we explored different levels of missing data, balancing the number of SNPs and the number of samples with an accepted call at a given position. As a higher number of SNPs represents a denser sampling of the coalescent (e.g. see [46]), presented trees are based on SNPs which were present in at least 50% of the samples. Increasing the number of SNPs (and therefore also the amount of missing data), did not change the topology or support of the phylogenetic trees. Simulation studies and studies on empirical data have shown that concatenated SNP data are able to produce reliable trees reflecting the true topology, as long as enough genes are sampled [47, 48]. The underlying assumption is that there is enough phylogenetic signal in the data, and that discordant coalescent histories will effectively cancel each other out when all histories are considered together. The conclusion that our concatenated phylogenetic trees (MrBayes and Garli) produce reliable topologies is supported by the fact that observed patterns are congruent with results of SVDquartets, which does assume multi-locus unlinked data. An exception is the topology for Southern Africa where patterns of differentiation are likely to be affected by continuous gene flow and incomplete lineage sorting. Further, described topologies are in line with previously described mitochondrial trees [7, 15] with similar discordances as found in microsatellite datasets [14]. Notably, the wide-spread haplogroup labeled as East/Southern Africa (Supplemental Table 4, [15]), stretching from Kenya to Namibia, is not recovered from microsatellite or SNP data [14] (this study). The previously mentioned dichotomy between the northern subspecies, Panthera leo leo, and the southern subspecies Panthera leo melanochaita, is also supported by the assignment values of sNMF. The putative hybrid character of the individual from Benin is likely the result from the substantial amount of missing data, and does not reflect its evolutionary origin or admixture.
SNP panel data
The whole genome sequencing of ten individuals allowed identification of variable positions in the lion. Based on the ten whole genomes, we generated a SNP panel of 125 nuclear and 14 mitochondrial SNPs which allows cost-effective genotyping of larger numbers of samples. As part of quality control, i.e. to ensure reliable SNP callings, we explored different concentrations of the starting amount of DNA and the effect of using a Whole Genome Amplification (WGA) kit. In addition, we included a number of samples which were known to be of lower quality DNA (older, degraded samples). We observed an increase in missing values for samples of lower quality or quantity of DNA, however, we never observed changes in the called genotype (allelic drop-out). This illustrates that the SNP panel is producing reliable results, although the sensitivity of downstream analyses with high numbers of missing values needs to be taken into account.
The SNP panel was tested on 211 samples from across the range of the lion, with a special focus on the region where the ranges of the northern and southern subspecies overlap, i.e. Ethiopia and Kenya. This region is known to harbor haplotypes from strongly diverged lineages [15], and microsatellite data have suggested admixture [14]. Our results assign all tested lions to one of the four identified clusters: West & Central Africa, India (the two clades of the northern subspecies, Panthera leo leo), East Africa or Southern Africa (the two clades of the southern subspecies, Panthera leo melanochaita). Main regions of admixture are Ethiopia and Zambia, which was also found using microsatellite data [12, 14]. The high assignment values to West and Central Africa within the Kenyan population are likely to be an artefact of missing data in these individuals; this pattern disappears after repeating the run with only individuals with >75% called SNP data (Supplemental Figure 2). However, as lions from the northern part of Kenya are thought to represent a different evolutionary lineage, some admixture between lineages is plausible and merits further research. It must be noted that when K=2 the result is a split that resembles the distinction between the northern and the southern subspecies, however, due to the very low diversity in the Asiatic population, assignment values are driven to ~1 in the Asiatic lions. Samples from West and Central Africa are largely assigned to the northern cluster, however they also show assignment to the cluster representing the southern subspecies. We wish to emphasize that this does not indicate region-wide hybridization between the two subspecies, nor does it imply that the populations with the highest assignment values constitute ancestral populations. Rather, this result is driven by the extremely low genetic diversity in the Asiatic population, comparable to the Africa/Asia split which is found based on STRUCTURE analysis or PCA of microsatellite data [14]. It is well known that STRUCTURE is sensitive to groups of closely related individuals, such as siblings, family groups, or in our case, an inbred population [49, 50] and that identification of ancestral populations may be an over-interpretation of the data, depending on demographic histories [51]. Also in this case, clustering of West and Central African lions with Asiatic lion decreases after running STRUCTURE with only individuals with >75% SNPs called (Supplemental Figure 2).
Comparing the assignments based on STRUCTURE and the assignments based on the mitochondrial SNPs show a largely congruent pattern. Six individuals show an unexpected haplotype (i.e. a haplotype not previously documented from this country): Ethiopia13 – West African haplotype, Ethiopia26 – East/Southern African haplotype, Kenya49 – Central African haplotype, Tanzania1 – North East African haplotype, RSA10 – North East African haplotype, Namibia2 – North East African haplotype. However, it must be noted that all of these individuals have >35% missing nuDNA data (with Kenya49, Tanzania1, RSA10 and Namibia2 even missing >50% nuDNA data). In addition, Ethiopia26, RSA10 and Namibia2 also have >20% missing data in the mitochondrial SNPs, and therefore results should be interpreted with caution.
PCA illustrates the strong clustering of the Asiatic population, likely the result of low diversity from multiple bottlenecks (see above). After excluding the Asiatic population, three overlapping clouds are apparent, roughly indicating West & Central Africa, East Africa and Southern Africa. Including only samples with >75% of the SNPs successfully called, a clearer divide is visible between the northern and southern subspecies in the PCA (Figure 3D). The individual which has a color code from East Africa (red) but falls in the West and Central Africa cluster (green) in the analysis originates from Ethiopia, where both subspecies are known to overlap and admixture has been described before [14, 15]. The distinction between East and Southern Africa seems to be more gradual in PCA space, which is in line with the widespread haplogroup which occurs throughout almost the entire region.
Barriers to dispersal and evolutionary history
In order to put the phylogenetic patterns of the lion into an evolutionary perspective, it is worthwhile to explore current and historical barriers to lion dispersal. Paleoclimatic data show that cyclical contraction and expansion of vegetation zones, e.g. rain forest and desert, may have represented temporal barriers for lion dispersal [15]. The discrete genetic lineages recognizable in the mtDNA are likely to be the result of the restriction of suitable lion habitat to a number of refugia [15]. The pattern found in mtDNA data of the lion is congruent with that of other African savanna species [15, 52, 53] and predicted refugial areas based on climate models [54]. Faster coalescence times of mtDNA may have led to reciprocally monophyletic mtDNA clades in the lion, while isolation in refugia may not have lasted long enough for coalescence in autosomal markers [14, 15]. In addition, dispersal in lions is male-biased [21, 22], which may explain the more discrete phylogeorgaphic pattern found in mtDNA data. This is reflected by the fact that we do not retrieve a North East Africa cluster or a South West Africa cluster based on nuclear SNPs, even though they are represented by diverged mitochondrial haplotypes. Interestingly, the discrepancy in population structure between nuclear and mitochondrial markers in East and Southern Africa, where a wide-spread mitochondrial haplogroup occurs from Kenya to Namibia and nuclear markers suggest a phylogeographic break around Zambia and Mozambique, is identical to the discrepancy found in giraffe [55]. Current barriers for gene flow seem to be mainly presented by the recent population disjunction in North Africa/Middle East and the longstanding barrier representing the Central African rain forest. Although the Rift valley has been mentioned as a potential barrier for gene flow in the lion [7–9, 11] and gene flow may be reduced in that region, co-occurrence of strongly diverged haplotypes and admixture detected in microsatellite and SNP data in Ethiopia indicates that the Rift valley does not pose a complete barrier for lion dispersal [14, 15].
Applications for wildlife research and conservation
The design of a SNP panel and a reference dataset of >200 lions from 14 lion range states has important applications for wildlife research and conservation. First, it allows us to distinguish phylogeographic breaks and overlap between lineages which can be used to study the evolutionary history of the species in more detail, e.g. at the national level. This may have important conservation implications and genomic data can be used to inform both population management and translocations. Secondly, it enables tracing of lion samples of unknown origin, such as confiscated material from illegal trade or poaching. Illegal trade in wildlife products is currently estimated to be the fifth largest illegal industry globally, and a major concern for conservationists [56–58]. Part of the trade is thought to cater to domestic markets, but growing evidence suggests an increase in illegal shipments to international markets [59]. Genetic toolkits can contribute to combatting wildlife crimes by identifying confiscated material, source populations and tracking trade routes [45, 60–63]. Thirdly, a SNP panel can be used to guide breeding efforts for ex situ conservation [45, 64]. Recently, this SNP panel was used to genotype a batch of lion samples from institutions linked to the European Association for Zoos and Aquaria (EAZA). Based on the resulting information, managers are currently deciding how many and which lineages to include in future breeding efforts.
Future perspectives
With the increase of genomic data and computational and technological developments, the field of conservation genomics is undergoing a major transition. Demographic histories and patterns of gene flow can be inferred in greater detail [65, 66]. Searching genomes for signatures of adaptation has been highlighted as a powerful tool for gauging the sensitivity of populations in a changing environment [67–70]. New developments allowing SNP genotyping from poor quality (e.g. non-invasively collected) samples, will further contribute to the applicability of SNP genotyping in the field [71]. Finally, the development of mobile, hand-held sequencing devices shows great promises for rapid, real-time identification of samples [72]. Such tools represent tremendous potential for training and capacity building [73], especially for biodiverse countries with limited facilities to process samples, thereby making the field of wildlife research and conservation more democratized and inclusive.
This study is the first to report phylogeographic relationships between lion populations throughout their entire range based on ~100,000 SNPs derived from whole genome sequencing. We present a SNP panel, containing 125 nuclear and 14 mitochondrial SNPs which has been tested on >200 individuals from 14 lion range states, spanning most of the lion’s range. The results confirm a basal distinction between a northern and a southern subspecies that supports the recent revision of the lion’s taxonomy. The samples on which the panel was tested can serve as a reference database for future research and conservation efforts.
Authors’ contributions
L.D.B. performed analyses and wrote the manuscript, M.V. performed and J.F.J.L. supervised bioinformatics analyses, O.D.S. performed SNP genotyping, F.L., M.C., P.N.T., E.A.S., H.B., B.D.P. and P.A.W. supplied material and assisted with useful advice, H.H.d.I. and K.V. supervised analyses. All authors contributed to writing the manuscript.
Supplemental Files
Supplemental Information 1. Details on laboratory protocols, sequencing, assembly, SNP calling and quality control
Supplemental Figure 1. Distribution of missing data in 211 lions which were genotyped using 125 nuclear and 14 mitochondrial SNPs.
Supplemental Figure 2. Barplots indicating assignment values for K=2 and K=4 from a STRUCTURE run, including only samples with >75% of the nuclear SNPs successfully called (171 samples and 125 nuclear SNPs).
Supplemental Figure 3. Coverage plots for putative Y-chromosomal scaffolds from one leopard and ten lions. Coverage shown for five scaffolds that had been identified as having an Y-chromosomal origin in cat (Felis catus).
Supplemental Figure 4. Read quality derived from the first Illumina run, containing one leopard and two lion samples. Drop in quality scores for (A) Leopard, (B) Benin and (C) Kenya and quality scores after hard clipping of reads after 30 bp for (D) Leopard, (E) Benin and (F) Kenya.
Supplemental Figure 5. GC content distribution for two lion samples showing signs of bacterial contamination. GC content of raw reads of (A) Benin and (D) RSA, (B+E) reads filtered against main contaminants and (C+F) reads after filtering and aligned against the reference genome of the tiger.
Supplemental Tables 1-7 (see captions in Excel)
Acknowledgements
Samples were kindly provided by P. Henschel, Yohanna Saidu and the Nigerian National Park Service (Nigeria), S. Adam, R. Buij and B. Croes (Cameroon), N. Vanherle (CURESS, Chad), ICCN, Garamba NP (DRC), NABU (Ethiopia, Kafa BR), T. Jirmo (Kenya), South Africa National Parks (SANParks) (RSA), S. Miller, R. Groom, Save Valley Conservancy and DeBeers Venetia-Limpopo Nature Reserve/The Diamond Route (RSA and Zimbabwe), O. Aschenborn (Namibia), C.A.Driscoll (India), Safaripark Beekse Bergen (Hilvarenbeek, The Netherlands) (Somalia), Ouwehands Dierenpark (Rhenen, The Netherlands) (RSA), and Planckendael (Muizen, Belgium) (Leopard). We further thank N. Schidlo, R. Hennevelt, H. Buermans, Y. Ariyurek, and S. Greve-Onderwater for assisting in processing of the samples and BioIT for bioinformatics support.
The investigations were supported by the Division for Earth and Life Sciences (ALW) with financial aid from the Netherlands Organization for Scientific Research (NWO) (project no. 820.01.002).