Tests of hybridisation in Tetragonula stingless bees using multiple genetic markers

Discrepancies in mitochondrial and nuclear genetic data are often interpreted as evidence of hybridisation. We re-examined reports of hybridisation in three cryptic stingless bee species in the genus Tetragonula in South East Queensland, Australia (T. carbonaria, T. davenporti, and T. hockingsi). Previous studies on this group using microsatellite markers proposed that occasional hybrids are found. In contrast, we find that allele frequencies at neutral regions of the nuclear genome, both microsatellites and random snps, reliably separated the three species, and thus do not support hybridisation. We found no inter-species variation in PCR amplicons of the nuclear gene EF1alpha, but low and moderate species-specific polymorphisms in the nuclear gene Opsin and the mitochondrial 16S respectively, with no cases of mito-nuclear discordance at these genes. We confirm that nuclear divergence between these species is low, based on 10-26kb of non-coding sequence flanking EF1alpha and Opsin (0.7-1% pairwise difference between species). However, we find mitogenomes to be far more diverged than nuclear genomes (21.6-23.6% pairwise difference between species). Based on these comprehensive analyses of multiple marker types, we conclude that there is no ongoing gene flow in the Tetragonula species of South East Queensland, despite their high morphological similarity to one another and the low nuclear divergence among them. The mitogenomes and draft nuclear genomes provided for these species will be a resource for further molecular studies on this group, which are important pollinators in Australian natural and agroecosystems.


44
Allopatry is key to speciation, but species boundaries may be tested when populations 45 that diverged in allopatry come back into secondary contact through changes in habitat or T. davenporti we sampled these species in sympatry in South East Queensland. We compared 127 microsatellite data, using the same markers as previous studies, to a genotyping by sequencing 128 approach that yielded 2,049 snp markers with less than 5% missing data. We also look for 129 evidence of mitochondrial introgression, by considering two nuclear gene sequences and one 130 mitochondrial gene sequence of each species. We then use Illumina sequencing to reconstruct Rockhampton (Queensland), two hives of Tetragonula carbonaria from Newcastle (NSW), 142 and two from Sydney (NSW) (Fig. 1). Most T. carbonaria and T. hockingsi colonies were 143 sampled in the greater Brisbane region. We opened each hive and assigned it to species based 144 on brood nest architecture (spiral/layered comb or semi-comb). The three T. davenporti 145 colonies were sampled from the border area of Queensland and New South Wales and were 146 the same colonies used for past genetic work on this species (P. Davenport, pers. comm. 2019).

147
A further two T. carbonaria colonies and one T. hockingsi colony were sampled from the same 148 meliponary as the three T. davenporti colonies. We extracted DNA using a silica spin-column

173
The sequence data were demultiplexed, assembled, and snps called using STACKS 174 (Catchen et al. 2013). We then filtered the vcf using vcftools, with the data first being filtered 175 to a minor allele count of three (one heterozygote and one homozygote), which allows the 176 conservative removal of singleton snps that are likely to be errors, without discarding rare 177 alleles (Linck & Battey 2019). We set a minimum depth of 5 and kept only biallelic snps. We 178 filtered the data for missing data in three steps. First, any marker missing more than 50% data 179 was discarded (i.e. 50% of individuals were not genotyped at that marker), to remove the 180 markers most affected by missing data. Second, any individual that had missing data at more 181 than 50% of the markers was discarded, to remove the individuals that had bad quality 182 genotyping. Finally, any marker missing more than 5% data was discarded to produce a final 183 dataset with relatively little missing data (~3%).  To confirm whether common marker genes showed species-specific polymorphisms, 206 we sequenced two nuclear genes (EF1alpha and Opsin) and one mitochondrial gene (16S).

207
These genes were selected as they have previously been used for a phylogenetic study of 208 stingless bees (Rasmussen and Cameron, 2009    rather than a few specific loci (Fig. 2C). assignment to a species' cluster (Fig. 3). Notably, STRUCTURE results were sensitive to sibling-sampling and missing data. The same analyses using datasets comprising four workers 284 per colony (akin to resampling the same individual multiple times), assigned four individuals 285 (microsatellites) and two individuals (snps) at less than 90% posterior probability of belonging 286 to their species' cluster, thus identifying them as putative hybrids (Fig. 3) The mitochondrial gene 16S, and the nuclear gene Opsin, each grouped samples into 293 three haplotype clusters consistent with species assignment (Fig. 4) (Fig. 4). However, one colony that was labelled T. carbonaria during field collections returned 304 sequences and nuclear markers consistent with T. hockingsi and was assumed to be the result 305 of either labelling error, or a recent interspecific colony takeover. Further, one colony suspected of being T. davenporti was genetically assigned to the T. carbonaria cluster, highlighting the 307 difficulty of identifying them without genetics.

309
Mitogenomes and genomes 310 We were unable to fully resolve the AT-rich control region of the mitochondria, but we 311 did recover all mitochondrial genes. Our assemblies of the mitogenomes for these three species between NAD3 and NAD5.

319
Nuclear genomes returned a mean n50 of 14,954 across the three species (Table 1).

320
From this sequencing, we investigated further the lack of interspecific difference at EF1alpha. 321 We took the EF1alpha sequence and used it as a 'bait' to pull out the regions of the genome 322 surrounding the 768bp PCR product. We managed to select at least 10kb in each species that we sequenced. We thus confirmed that the pairwise difference across these three bee species 327 was indeed extremely low in this region of the nuclear genome (0.7-1%, Fig. 5), relative to that 328 detected in the mitochondrial genome. Similarly, we recovered 26kb around the Opsin gene, 329 which returned the same low nucleotide difference between species (0.7-1%).
of EF1alpha, resolved the evolutionary relationships between the Tetragonula species in the 332 same way, with T. carbonaria and T. davenporti diverging from each most recently, and T. 333 hockingsi as their sister group (Fig 5). assigned at close to 100% into a species group. When we subsampled this dataset to 20% 397 missing data, however, some samples appeared to be hybrids, indicating that STRUCTURE 398 analyses can be sensitive to such levels of missing data. Notably, in this test on the effect of 399 missing data, we subsampled our microsatellite data randomly across markers. In real datasets, 400 microsatellites tend to go missing in a much less random way, generally with one or two loci 401 most affected. The impact of missing microsatellite data is therefore likely to be even greater 402 in practice than in our simulation. STRUCTURE assignments using microsatellites were the 403 basis of hybrid identification in previous studies of Australian Tetragonula     belonging to each of the three hypothetical (K) populations. Analyses were conducted for two snp datasets, one with 1,935 snps and less than 5% missing data, and one with the same number of snps with 20% missing data, using one individual per hive (A). The same analyses were conducted on the two datasets based on seven microsatellite loci and one individual per hive, one with less than 5% missing data and one with 20% missing data (B). The analyses were also run with multiple individuals per hive, mostly four but sometimes fewer, to examine the effect of including closely related siblings (C).