Whole plastid genome-based phylogenomics supports an inner placement of the O. insectifera group rather than a basal position in the rapidly diversifying Ophrys genus (Orchidaceae)

Some lineages of the Orchid genus Ophrys exhibit among the highest diversification rates reported so far. As a consequence of a such intense and rapid evolution, the systematics and the taxonomy of this genus remains unclear. A hybrid assembly approach based-on long- and short-read genomic data allowed us to outperform classical methods to successfully assemble whole plastid genomes for two new Ophrys species: O. aymoninii and O. lutea. Along with three other previously Ophrys plastid genome sequences, we then reconstructed the first whole plastome-based molecular phylogeny including representatives of the three mains recognized Ophrys lineages. Our results support the placement of the O. insectifera clade as sister group of ‘non-basal Ophrys’ rather than a basal position. Our findings corroborate recent results obtained from genomic data (RAD-seq and transcriptomes) but contrast with previous ones. These results therefore confirm that molecular phylogenetic hypotheses based on a limited number of loci (e.g. nrITS, matK, rbcL) may have provided a biased picture of phylogenetic relationships within Ophrys and possibly other plant taxa.


Introduction
Among the most speciose family of flowering plants that orchids (Orchidaceae) form, some lineages of the genus Ophrys display among the highest diversification rates ever reported (Givnish et al. 2015;Breitkopf et al. 2015). The adaptive radiation that Ophrys experience is likely to be due to their unusual pollination strategy (by sexual swindle) that leads to high levels of specialisation of these plants to their insect pollinators and favour evolutionary divergence (see Baguette et al. 2020 for a recent review). The systematic relationships of such fast and importantly diversifying groups are difficult to infer for two reasons. Firstly, because recent divergent times often renders molecular signal of lineage delineation undetectable or at least ambiguous (incomplete lineage sorting) and because emerging species are still 3 particularly prone to introgressive hybridization and reticulate evolution.
The systematics and the taxonomy is particularly problematic in Ophrys for which different authors recognize a number of species ranging from 9 to 354 (see Bateman et al. 2018;Bateman 2018, Bertrand et al., 2021. In particular, contrasting results still make the phylogenetic position of the (three) main lineages (sometimes considered as subgenera) debated. Several molecular phylogenetic studies show that the genus Ophrys is basically subdivided in three main sub-lineages (sometimes considered as subgenera): a first clade formed by the Ophrys insectifera group (also defined as group A since the study of Devey et al. 2008 Bertrand et al. 2021). As Euophrys is a taxonomically incorrect term to refer to Ophrys groups, 'section Ophrys' should be used as a contrast to section Pseudophrys. However, because the section Ophrys form a paraphyletic group, we propose to use 'basal Ophrys' and 'non-basal Ophrys' instead of 'archaic Euophrys' and 'recent Euophrys' for clarity purpose.
Out of the studies that could not unambiguously resolve tree topology for the three main Ophrys lineages (e.g. Soliva et al. 2001;Tyteca and Baguette 2017) (Devey et al. 2008;Breitkopf et al. 2015, Zitoun et al. in prep). However, recent findings based on genomic data: SNPs derived from RAD-seq approaches  or transcriptomes (Piñeiro Fernández et al. 2019) rather support that the group insectifera (A) is directly related to 'non-basal Ophrys' (groups F to H) both of which being sister to the clade comprising 'basal Ophrys' + Pseudophrys (groups B to E) (T inner ).
In this study, we aim to reconstruct a phylogenomic hypothesis to test whether whole We relied on a hybrid approach to assemble the whole plastid genomes of the two Ophrys taxa mentioned above. In brief, this consists in a combination of long reads (here, Oxford Nanopore Technologies reads that can span repeated DNA regions known to be difficult to assemble) with the low error rate of short (paired-end) reads (here, Illumina reads).
Although relatively recent, such hybrid strategy was found to outperform classical approaches (as recently supported by Wang et al. 2018 andScheunert et al. 2020). To do so, we used the Unicyler pipeline (Wick et al. 2017) which is able to analyse reads from both platforms 5 simultaneously. Gene annotation and basic downstream analyses were then carried out as described in our former study (Bertrand et al. 2019, see also Appendix1).

Field sampling and sample processing
We samples were frozen and stored at -20°C until DNA extraction. We used a CTAB2X protocol to extract genomic DNA from the two specimens sampled (see Appendix 1 for details).

Long-read (Nanopore) sequencing
Four Nanopore sequencing libraries (two for each individual) were prepared with the SQK-LSK108 kit following the ONT "1D genomic DNA by ligation protocol" or the "1D gDNA long reads without BluePippin protocol" from 2730/3632 ng and 2964/2500 ng of unfragmented DNA for O. aymoninii and O. lutea, respectively (see Appendix 1 for detail).
Long read sequencing was carried out from four FLO-MIN106D R9 flowcells (two for each individual) on a MinION (Oxford Nanopore Technologies, Oxford, UK) in the lab using with the -discard_middle option turned on. We then used Nanofilt v.2.7.1 (https://pypi.python.org/pypi/NanoFilt) to filter out reads shorter than 5 kb and bases with quality < 9 on both sides of reads. To facilitate assembly, we then extracted plastid reads by mapping short-reads onto a multi-fasta file comprising the three whole plastid genome sequences published for Ophrys species. As in Wang et al. (2018), we duplicated and concatenated each of the three sequences and included them in the reference set to avoid losing reads corresponding to the region where genomes were circularized. We then extracted reads that mapped onto this dataset with Minimap2 v.2.17 (Li, 2018).

Short-read (Illumina) sequencing
Whole genomic libraries were prepared and sequenced in paired-end mode (2x150 bp, insert size: 350 bp) by Novogene Co., Ltd (HK) from 1.92 and 1.97 pg of DNA for O. aymoninii and O. lutea, respectively. Genomic DNA was extracted with the same protocol than the one used for long reads. Raw reads were trimmed with Trimmomatic v.039 (Bolger et al. 2014) and the resulting read quality was checked with FastQC v0.11.8 (Andrews et al. 2010). Plastid read extraction was carried out by mapping short-reads onto the Ophrys plastome dataset, as mentioned above, this time with bowtie2 v.2.3.4 (Langmead and Salzberg, 2012). As too high coverage is prone to disturb the assembly process, we subsampled the resulting read set to an expected coverage of 100X (i.e. by keeping 500,000 of both R1 and R2 reads assuming a plastid genome size of around 150 kb) before the assembly step.

Plastid genome reconstruction and gene annotation
Hybrid de novo assembly was performed with both long-and short-reads simultaneously using default settings in Unicycler v0.4.9b (Wick et al. 2017). Gene annotation and alignments were performed as in our previous study (Bertrand et al. 2019). Some genes (ndhA 7 to ndhK) exhibited significant differences in length and similarity, even between the closely related Ophyrs species considered here, probably as a result of pseudogenisation and were removed from the alignments for further analyses. In particular, O. sphegodes, O. aveyronensis and O. aymoninii presented truncation of most ndh genes and shared the loss of the partially duplicated gene of ycf1 and a truncation of ndhF gene as already reported by Roma et al. (2018), see also Appendix 4.

Phylogenetic reconstruction, concordance factors and tree topology tests
We used the Maximum Likelihood approach implemented in IQ-TREE v2.0.6 (Minh et al., 2020a) to reconstruct gene trees and species tree with 1,000 replicates (-B 1000) of Ultrafast Bootstrap Approximation (UFBoot) to assess nodes support. Species tree was constructed based on three data set: i) whole plastome, ii) genes and iii) CDS alignments. The genealogical concordance in the dataset was also quantified with gene concordance factor (gCF) and site concordance factor (sCF) (Minh et al., 2020b). In addition, tree topology tests implemented in IQ-TREE were performed to test whether the inferred species tree was rather consistent with the inner placement (T inner ) or the basal (T basal ) position of O. aymoninii. To further investigate which loci specifically supports one or the other topology and to what extent (by evaluating its phylogenetic signal), we then computed the difference in gene-wise Log-likelihood scores (ΔGLS, see Shen et al. 2017). For all phylogenetic analyses the plastid genome of another Orchidoideae species, Platanthera japonica (GenBank Accession no.: MG925368) was used as outgroup.

Characterization of the plastid genomes of O. aymoninii and O. lutea and comparison with previously published Ophrys plastid genomes
Following the approach described in our previous study (i.e. using short-reads (Illumina) and

Phylogenetic relationships within the genus Ophrys
We found an overwhelming support for an inner placement (T inner ) (Figure 1). The gene and site concordant factor metrics (gCF and sCF) do not contradict bootstrap values even though they were found to be lower (especially gCF). This may be explained by very short branch lengths and the very limited amount of information contained in each gene/CDS sequence. All the tree topology tests also reject the topology consistent with the basal placement of O. aymoninii when compared to the one consistent with its inner placement ( Table 2). The distribution of ΔGLS values (Figure 2 and Appendix 3) confirms that most of the genes also support this topology 9 and when they do not, they only weakly support the alternative one.
Altogether, our results are therefore congruent with the genomic-based findings recently reported by Bateman et al. (2018) andPiñeiro Fernandez et al. (2019) and contrast with several previous studies. Although being located in a single molecule, plastid regions have been shown to not necessarily behave as a single locus and experience certain forms of intra-and inter-molecular recombination (see Gonçalves et al., 2019;Walker et al., 2019). As most of the plastid genes are also known to encode important biological functions, they may display a sequence evolution patterns that deviate from the species tree topology, and even Ophrys aymoninii singular characteristics that further confirms that the O. insectifera forms a clearly distinct Ophrys lineage. As suggested by other authors for angiosperms, GLS show that the phylogenetic signal of genes such as matK slightly outperform rbcL but that other loci such as ycf1 (but also ycf2), rpoC2 (see Walker et al., 2019), rpoB and rpoC1 may be considered as good candidate for plastid-based phylogenetic analyses in Ophrys.