Genome sequence of the ornamental plant Digitalis purpurea reveals the molecular basis of flower color and morphology variation

Digitalis purpurea (foxglove) is a widely distributed ornamental plant and the producer of the biomedical compound digoxin. Here, we present a long read sequencing-based genome sequence of a red flowering D. purpurea plant and a corresponding prediction of gene models. The high assembly continuity is indicated by the N50 of 4.3 Mbp and the completeness is supported by discovery of about 96% complete BUSCO genes. This genomic resource paves the way for an in-depth investigation of the flower pigmentation of D. purpurea. Structural genes of the anthocyanin biosynthesis and the corresponding transcriptional regulators were identified. The comparison of red and white flowering plants revealed a large insertion in the anthocyanidin synthase gene in white flowering plants that most likely renders this gene non-functional and could explain the loss of anthocyanin pigmentation. In addition, the anthocyanin biosynthesis activator MYB5 shows a 18 bp deletion in white flowering plants that results in the loss of 6 amino acids in the protein. Furthermore, we found a large insertion in the DpTFL1/CEN gene to be responsible for the development of large terminal flowers.


Introduction
Digitalis purpurea, commonly known as foxglove, is a biennial plant in the Plantaginaceae family (Kreis, 2017).The species is native to Europe, but appears in many other parts of the world.D. purpurea has been used for centuries for its medicinal properties, particularly for treating heart conditions such as atrial fibrillation and congestive heart failure (Bucca, 2018;Carroll et al., 2023).The plant contains a variety of cardiac glycosides, with the most important being digoxin (Bucca, 2018).These compounds work by increasing the force and efficiency of heart contractions, which are important to treat heart failure and other cardiac conditions.D. purpurea and the close relative D. lanata are cultivated for the extraction of cardenolides (Kreis, 2017).Scientists have been studying the plant for many years to better understand the chemistry of the cardiac glycosides it produces and their effects on the human body.Many experiments explored the potential of tissue cultures to propagate plants and to improve the yield of medicinal compound production (Kreis, 2017).A cytochrome P450 CYP87A4 is crucial for the biosynthesis of digoxin (Carroll et al., 2023).The identification of all genes required for a biosynthesis process is the first step towards production in a heterologous system.
Besides its medicinal value, D. purpurea is also famous for beautiful flowers (Fig. 1a).It is a tall plant that can reach up to six feet in height and produces striking purple or pink flowers on tall spikes.Various Digitalis species display a plethora of diverse flower colors and shapes.While the plant is visually appealing, it is considered toxic due to the formation of specialized metabolites like the cardenolides.Nonetheless, D. purpurea is widespread as an ornamental plant.Contributing to the flower colors are anthocyanins, the product of one specific branch of the flavonoid biosynthesis (Fig. 1b).The substrate of the flavonoid biosynthesis is the aromatic amino acid phenylalanine, which is channeled through the general phenylpropanoid pathway and the flavonoid biosynthesis towards anthocyanins (Winkel-Shirley, 2001;Pucker & Selmar, 2022).Major branches of the flavonoid biosynthesis lead to colorful anthocyanins, colorless flavonols, colorless flavones, and proanthocyanidins which appear dark after oxidation.The anthocyanin biosynthesis shares several enzymatic steps with other pathways like the proanthocyanidin biosynthesis.Dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS), UDP-glucose dependent 3-O-anthocyanidin-glucosyltransferase (UFGT), and other decorating enzymes are often considered as the anthocyanin biosynthesis branch (Winkel-Shirley, 2001;Pucker et al., 2020) (Fig. 1c).A recent study revealed anthocyanin-related glutathione S-transferase (arGST) as an additional enzyme in the anthocyanin biosynthesis (Eichenberger et al., 2023).After biosynthesis of anthocyanins at the endoplasmic reticulum, a transport through the cytoplasm and import into the central vacuole is required for long term storage.This process involves several proteins including two tonoplast-localized transporters and a tonoplast-localized ATPase for generation of a proton gradient (Goodman et al., 2004;Zhao & Dixon, 2009;Pucker & Selmar, 2022).A previously postulated 'ligandin', that might protect anthocyanins during the transport through the cytoplasm (Mueller et al., 2000), was recently characterized as an enzyme in the cyanidin biosynthesis, which makes an involvement in the anthocyanin transport less likely (Eichenberger et al., 2023).Besides a direct transport through the cytoplasm, there are several studies that report an involvement of vesicles in the anthocyanin transport in different plant species (Nozue & Yasuda, 1985;Grotewold et al., 1998;Zhang et al., 2006;Poustka et al., 2007;Pucker & Selmar, 2022).
The pigmentation of a plant structure is not only determined by the activity of the anthocyanin biosynthesis, but also by the activity of pathways that compete for the same substrate.High activity of the flavonol biosynthesis can deplete the available substrate of the anthocyanin biosynthesis branch (Luo et al., 2016).Subtle differences in substrate specificity of the competing enzymes e.g.flavonol synthase (FLS) and DFR can determine the pattern of synthesized flavonoids (Johnson et al., 2001;Choudhary & Pucker, 2024).Differences in expression of the first committed genes DFR and FLS, respectively, appear as a decisive factor in controlling the activity of the anthocyanin and flavonol biosynthesis branch (Choudhary & Pucker, 2024).
Activity of different branches of the flavonoid biosynthesis is mostly controlled at the transcriptional level through regulation of gene expression by transcription factors (Weisshaar & Jenkins, 1998;Li, 2014;Xu et al., 2015;Lloyd et al., 2017).Decades of research turned the regulation of the flavonoid biosynthesis into a model system for research on cis-and trans-regulatory elements in eukaryotes.The anthocyanin biosynthesis genes are activated by a complex comprising three different proteins: a R2R3-MYB protein, a bHLH protein, and a WD40 protein (Ramsay & Glover, 2005;Gonzalez et al., 2008;Lloyd et al., 2017).Different subgroup 6 MYBs and subgroup IIIf bHLHs can be included in this complex leading to a diverse set of different complexes which differ in their ability to activate certain target genes (Feller et al., 2011;Albert et al., 2021).The WD40 component is often TTG1, but LWD1 and LWD2 can also participate in complex formation (Airoldi et al., 2019).An additional WRKY component (TTG2) has been suggested to specifically increase the expression of genes associated with the anthocyanin transport into the vacuole (Gonzalez et al., 2016;Lloyd et al., 2017).Anthocyanin biosynthesis-regulating MYBs have been reported in different MYB clades.PAP1 (MYB75)/PAP2 (MYB90)/PAP3 (MYB113)/PAP4 (MYB114) are well known anthocyanin regulators in Arabidopsis thaliana (Borevitz et al., 2000;Tohge et al., 2005;Gonzalez et al., 2008).MYB5 was identified as an anthocyanin regulator in Fragaria (Jiang et al., 2023), Vitis (Cavallini et al., 2014), Mimulus (Zheng et al., 2021), and Nelumbo (Sun et al., 2016).It was shown that MYB5 forms a MBW complex with LWR1/LWD1-like instead of TTG1 to activate the anthocyanin biosynthesis (Jiang et al., 2023).A study in Vaccinium suggests an involvement of the proanthocyanidin regulator MYB123 in the anthocyanin biosynthesis (Karppinen et al., 2021).While most anthocyanin regulating MYBs are activators, the incorporation of subgroup 4 MYBs into the MBW complex can result in an inhibitory function (Albert et al., 2014).
Pigmentation differences between plants of the same species are often due to genetic variations affecting the functionality or gene expression of the MYB and bHLH transcription factors (Quattrocchio et al., 1999;Streisfeld et al., 2013;Wheeler et al., 2022;Marin-Recinos & Pucker, 2024).This results in extremely low or no transcriptional activation of specific flavonoid biosynthesis genes under certain conditions.However, activation of the structural genes in the context of a different pathway is still possible (Marin-Recinos & Pucker, 2024).Loss of function or gene loss affecting biosynthesis genes or transporters involved in the biosynthesis pathway are rare, but can explain deeper loss of the anthocyanin pigmentation in the Caryophyllales and might be irreversible (Pucker et al., 2024).Changes in flower color often affect the attraction of pollinating animals and are thus subject to selection.For example, the transition of purple blue to orange-red in Ipomoea quamoclit L. probably facilitated the shift from bee pollination to hummingbird pollination (Zufall & Rausher, 2004).A study in Silene littorea discovered that a lack of anthocyanins exclusively in flower petals appeared as a stable polymorphism, but complete loss of anthocyanins in all plant parts was rare suggesting anthocyanins are the target of selection in photosynthetic tissues (Del Valle et al., 2019).
The flower pigmentation of D. purpurea is not uniform, but characterized by intensely pigmented dark spots on the bottom petals (Fig. 1c).Petal spots have been reported in other plant species including Gorteria diffusa (Thomas et al., 2009), Kohleria warszewiczii (Fairnie et al., 2022), Clarkia gracilis (Martins et al., 2013), Lilium leichtlinii (Wang et al., 2023), and Mimulus guttatus (Ding et al., 2020).Pigmentation spots on petals can be caused by the accumulation of high levels of anthocyanins (cyanidin derivatives) (Martins et al., 2013;Zhang et al., 2015).These spots can be surrounded by a white region characterized by high levels of flavonols, the products of a pathway competing with anthocyanin biosynthesis (Yuan et al., 2016).Formation of intensely pigmented spots can be explained by a system involving a local R2R3-MYB activator (NEGAN) and a lateral R3-MYB repressor (RTO) (Ding et al., 2020).NEGAN interacts with a bHLH and a WD40 partner (MBW complex) to activate RTO expression and RTO represses NEGAN expression by sequestering bHLH proteins required for the MBW complex (Ding et al., 2020).RTO proteins appear to travel through the endosomal system to other cells, where they act as repressors (Ding et al., 2020).Another study in C. gracilis reported an anthocyanin activating MYB (CgsMYB12) as the causal gene for the formation of a single large spot (Lin & Rausher, 2021) indicating that alternative mechanisms have evolved to control the formation of pigmentation spots.Numerous functions have been described for such spots including an attraction and guidance of pollinating insects to increase the number of visits and to make each event more successful (Sasaki & Takahashi, 2002;Medel et al., 2003;Lunau et al., 2006;Davies et al., 2012;Fairnie et al., 2022).The inflorescence of D. purpurea is usually constructed in the form of an indeterminate spike with zygomorphic tubular flowers on the side (Rudall, 2016).But in rather rare cases individuals develop large terminal flowers at the top of the spike, rendering the inflorescence determinate.These phenotypes show two types of flowers in the same plant: the normal zygomorphic flowers at the lower part of the spike and one actinomorphic structured flower at the top (Fig. 2).Keeble et al. investigated the heredity of this special trait and found it to be inherited recessively (Keeble et al., 1910).The characterization of the genetics behind the development of a terminal flower in D. purpurea was not investigated yet.The occurrence of terminal flowers in plants with otherwise indeterminate inflorescence architecture has been described in different species like Antirrhinum (Bradley et al., 1996) and Arabidopsis (Shannon & Meeks-Wagner, 1991).In Antirrhinum, a loss of function mutation in the CEN (CENTRORADIALIS) gene leads to the development of a terminal flower (Bradley et al., 1996).The functional CEN gene is expressed in the subapical meristem and disrupts the initiation of flower development (Bradley et al., 1996).Therefore, CEN is a flowering repressor.The Arabidopsis ortholog TERMINAL FLOWERING 1 (TFL1) plays a similar role (Bradley et al., 1997).The mutant tfl1 no longer represses flowering in the apical meristem, so it changes its fate and develops into flower primordia and subsequently into a flower (Shannon & Meeks-Wagner, 1991;Alvarez et al., 1992).Both genes encode a protein belonging to the family of phosphatidylethanolamine binding proteins.Therefore, they are not conventional regulators of the transcription or translation of any specific target genes (Pnueli et al., 2001).They have been shown to interact directly with other proteins so they might be part of a more elaborate signaling system (Pnueli et al., 2001).Here, we reported about the genome sequence of D. purpurea and the corresponding annotation.As a proof of concept, we demonstrate the power of this genome sequence for the discovery of specialized metabolism pathways by investigating the flavonoid biosynthesis.A comparison of white and red flowering individuals revealed molecular differences in the flavonoid biosynthesis genes that could explain the pigmentation loss in the white flowering plants.The molecular basis of large terminal flowers was discovered through comparative genomics based on this genome sequence.

Plant material and DNA extraction
Digitalis purpurea is a native species in Germany that is widespread at the borders of forests and in gardens.We collected seeds of a white and red flowering plant in a garden close to these coordinates 51.92351 N, 8.84441 E. Plants were cultivated under long day conditions (16h light, 8h darkness) at 20°C.LEDs (Niello ® , 300W) were used as the sole light source.Plants were incubated in the dark for two days immediately prior to the sampling of leaves for the DNA extraction to reduce polysaccharide content.Young leaves were harvested for the DNA extraction.The DNA extraction was performed with a CTAB-based protocol as previously described (Siadjeu et al., 2020).Quality and purity of the extracted DNA were assessed through NanoDrop measurement, agarose gel electrophoresis, and Qubit measurement.Short fragments (<25kb) were depleted with the Short Read Eliminator kit (Pacific Biosciences) (see (Wolff et al., 2023) for a detailed workflow).

Nanopore sequencing
The library preparation for sequencing was started with 1µg of high molecular weight DNA.The initial DNA repair was performed with two enzyme mixes of the NEBNext ® Companion Module Oxford Nanopore Technologies ® (ONT) Ligation Sequencing (E7180L) following the suppliers' instructions.All following steps were performed according to the SQK-LSK109 protocol (ONT).Sequencing was performed with MinIONs using R9.4.1 flow cells.Upon blockage of a substantial number of nanopores, a wash step was performed with the EXP-WSH004 following ONT's instructions to recover the blocked nanopores.Afterwards, a fresh library was loaded onto the flow cell.High accuracy basecalling was performed with guppy v6.4.6+ae70e8f (ONT) and minimap2 v2.24-r1122 (Li, 2018) on a graphical processor unit (GPU) in the de.NBI cloud.

Structural annotation
Paired-end RNA-seq reads (SRR128113,SRR128114,SRR128115,ERR2040595,Additional file 1) were retrieved from the Sequence Read Archive via fastq-dump (NCBI, 2020) and aligned to the assembled D. purpurea genome sequence with STAR v2.5.1b (Dobin et al., 2013;Dobin & Gingeras, 2015) in 2-pass-mode using a minimal similarity of 95% and a minimal alignment length of 90% as previously described (Haak et al., 2018) to provide hints for the gene prediction process.Gene models were predicted with BRAKER v3.0.2 (Gabriel et al., 2023) based on the Arabidopsis thaliana species parameters and GeneMark v4 (Brůna et al., 2020).BUSCO v3.0.2 (Simão et al., 2015;Manni et al., 2021) was used to assess the completeness of the genome sequence and the completeness of the gene prediction with BRAKER.The genomic BUSCO assessment command contained the arguments '--long --limit 10 -sp arabidopsis', 10 -3 as e-value cutoff for BLAST, and eudicotyledon_odb10 (embryophyta) as the reference.AUGUSTUS v3.2.2 (Stanke et al., 2006;Keller et al., 2011) was used for the gene prediction.HMMER v3.1b2 (Finn et al., 2011;Mistry et al., 2013;Eddy, 2023) was run to confirm orthology of candidates.The entire process was repeated after a species specific training for optimization of the gene prediction.When running BUSCO in proteome mode, the same configuration was used except for the gene prediction arguments.All predicted protein encoding genes were checked for expression evidence to differentiate between active genes and potentially inactive pseudogenes.A cutoff of 1 TPM in any of the analyzed RNA-seq datasets (Additional file 1) was defined to distinguish active genes from potentially inactive pseudogenes.

Functional annotation
Homologs in Arabidopsis thaliana were identified for the predicted peptide sequences to transfer annotation terms from this model organism to the D. purpurea sequences.Previously developed Python scripts (Pucker et al., 2017;Haak et al., 2018) were modified to enable the efficient identification of Reciprocal Best BLAST hits (RBHs) supplemented with best BLAST hits for query sequences that are not assigned to a RBH partner.BLASTp v2.13.0+ (Altschul et al., 1990(Altschul et al., , 1997) ) was run with an e-value cutoff of 0.0001.This approach allows an efficient identification of putative orthologs and forms the basis for the transfer of functional annotation details across species borders.Arabidopsis thaliana TAIR10/Araport11 annotation terms were automatically transferred to the predicted D. purpurea sequences.This workflow is implemented in the Python script construct_anno.py(Pucker & Iorizzo, 2023).Enzyme and transporter encoding genes of the flavonoid biosynthesis were identified using KIPEs v3.2.2 with the flavonoid biosynthesis data set v3 (Pucker et al., 2020;Rempel et al., 2023) with additional parameters listed in Additional file 2. Flavonoid biosynthesis controlling MYB transcription factors were annotated with the MYB_annotator v0.231 (Pucker, 2022).The following parameters were used for the MYB annotation: initial candidate identification via BLASTp v2.13.0+ (Altschul et al., 1990(Altschul et al., , 1997)), alignment construction with MAFFT v7.475 (Katoh & Standley, 2013), phylogenetic tree construction with FastTree v2.1.10( Price et al., 2010), minimal BLASTp alignment length of 75 amino acids, and 10 neighbors were used for the ingroup/outgroup classification (see Additional file 3 for details).The bHLH transcription factors were annotated using the bHLH_annotator v1.03 (Thoben & Pucker, 2023).The following parameters were used for the bHLH annotation: identification of initial candidates via BLASTp v2.13.0+ (Altschul et al., 1990(Altschul et al., , 1997)), alignment with muscle v5.1 (Edgar, 2022), phylogenetic tree construction with FastTree v2 (Price et al., 2010), a minimal BLASTp hit similarity of 40%, minimal BLASTp hit alignment length of 80 amino acids, and considering 10 neighbors for the ingroup/outgroup classification (see Additional file 4 for details).

Analysis of gene expression
Fastq-dump (NCBI, 2020) was applied to retrieve the FASTQ files of D. purpurea RNA-seq experiments from the Sequence Read Archive (SRA).Gene expression was quantified using kallisto v0.44.0 (Bray et al., 2016) to process all RNA-seq data sets (Additional file 1) based on the predicted coding sequences.Individual count tables were merged with a customized Python script and served as the basis for a visualization of gene expression as previously described (Pucker & Iorizzo, 2023;Choudhary & Pucker, 2024).

Identification of sequence variants associated with anthocyanin loss
Reads of two red flowering plants and two white flowering plants (Additional file 5) were aligned against the reference genome sequence of a red flowering plant with minimap v2.24-r1122 (Li, 2018).The resulting BAM files were indexed with samtools v1.13 (Li et al., 2009).Candidate genes associated with the flavonoid biosynthesis were manually inspected for systematic sequence differences via Integrative Genomics Viewer (IGV) v2.15.4 (Robinson et al., 2023).

Genotyping of a Digitalis purpurea population
A population of 89 plants was subjected to genotyping via PCR for the discovered mutations responsible for the white flower color or the changed flower morphology, respectively.The plants were first grown under long day conditions (16h light, 8h darkness) at 20°C.LEDs (Niello ® , 300W) were used as the sole light source during that time.After four months the plants were moved outside the plant cultivation chamber and placed into two beds located at 52.2804716735375 N and 10.548701669439103 E. DNA for subsequent genotyping PCR was extracted based on the GABI-Kat CTAB DNA preparation protocol (Rosso et al., 2003).This protocol, designed for A. thaliana, was adapted for the use on D. purpurea: Only one young leaf was harvested from each plant.After extraction the DNA samples were stored at -20 °C.
PCR was conducted with oligonucleotides flanking the ANS and the TFL1/CEN loci, respectively (Additional file 10, Additional file 11).Sizes of the PCR products were analyzed via agarose gel electrophoresis and revealed the genotypes.Initial phenotypes of the plants resulting from the ANS mutation were derived from the visual inspection of the leaves as D. purpurea with a functional ANS would show red colored petioles (Additional file 12).These phenotypes were later validated by adding information about the flower color.

Genome sequence assembly and structural annotation
A red flowering plant (DR1) was selected for sequencing and construction of a reference genome sequence (Pucker et al., 2023).The rationale behind this selection was the assumption that a red flowering plant should have all genes required for the biosynthesis of anthocyanins which are most likely responsible for the red flower pigmentation.For comparison, another red flowering plant (DR2) and two white flowering plants (DW1, DW2) were subjected to nanopore long read sequencing (Additional file 5).
Different assemblers were evaluated with respect to their performance in this study (Table 1).The assembly produced by NextDenovo2 was selected as the representative D. purpurea genome sequence due to the large assembly size of 940 Mbp, the high continuity (N50=4.3Mbp),and the high completeness of 96% of complete BUSCOs.Only the representative genome sequence DR1_v1 was subjected to a gene prediction process that initially revealed 329,325 protein encoding genes (DR1_v1a1; (Pucker et al., 2023).A completeness check revealed 93% complete BUSCO genes in this annotation.The presence of 44% duplicated BUSCO genes suggested that a large fraction of the assembly represents separate haplotypes.This was supported by a coverage analysis that found an average coverage of 20x for large parts of the assembly, while small parts showed a coverage of around 40x (Additional file 6).The analysis of transcript abundances based on RNA-seq data revealed 56,065 active genes (TPM > 1 in at least one sample), while 273,260 predicted genes showed close to zero evidence for expression in any of the studied samples (Additional file 7).
Additionally, the transcription factors controlling the anthocyanin biosynthesis genes in D. purpurea were investigated.MYB genes placed in the MYB5 lineage (g25977, g268730, g7955,g166017), in the MYB75/PAP1 lineage (g147607), and in the MYB123/TT2 lineage (g26269) were identified.Candidates for the bHLH partner in the anthocyanin biosynthesis-regulating MBW complex (bHLH42/TT8) were identified in D. purpurea: g55107.t1and g55108.t1.Additionally, orthologs of the WD40 genes TTG1 (g109077) and LWD1 (g183911,g251558), and for the bHLH MYC4 (g166063, g268685) were identified in a phylogenetic analysis.In summary, all genes expected for a complete anthocyanin biosynthesis pathway were successfully identified in the reference genome sequence DR1_v1.Additional support for candidate genes comes from a gene expression analysis that was conducted across a wide range of different samples (Fig. 4, Additional file 1).Anthocyanin biosynthesis genes are expected to be highly expressed in pigmented structures like flowers and in particular within the intensely pigmented purple spots on the lower flower petal.A consideration of the gene expression pattern enabled a reduction of the candidate gene set to those gene copies which appear active in the anthocyanin-pigment plant organs.Sequence variants underlying pigmentation differences All identified candidate genes associated with the anthocyanin biosynthesis were manually inspected in a read mapping comparing red flowering and white flowering plants.The most striking difference is an insertion of approximately 12 kb large into the ANS gene.This insertion is homozygous in the two white flowering plants, while the red flowering plants harbor an ANS allele without this insertion.Since the reference genome sequence DR1_v1 contains the 12 kb sequence, the automatic prediction of the ANS gene model is inaccurate.The alignment of RNA-seq reads indicates that ANS is not located at the end of this insertion and downstream of it as suggested by the gene model, but instead spans the entire insertion with one exon being located upstream (Fig. 5).
Genotyping of 89 plants for the ANS mutation marker identified 14 plants with the homozygous wildtype allele, 31 plants with a heterozygous genotype, and 44 plants with the homozygous mutant allele.Unfortunately, only a part of the population has flowered so far.For all flowering plants the presence or absence of anthocyanins in the plant has been accurately derived from the leaf stem by visual inspection as confirmed by the flower colors.For 14 plants, flower color phenotypes were identified.These phenotypes correlate perfectly with the plants' genotypes thus supporting ans as the causal locus in white flowering D. purpurea plants.Another systematic difference between the red group and the white group is located inside the MYB5 candidate g25977 (DpMYB5).This gene showed a homozygous 18 bp deletion in the sequencing reads of the white flowering plants (Fig. 6).At least some reads of the red flowering plants contain this 18 bp deletion thus suggesting that the mutation cannot be homozygous in the red flowering plants.The 18 bp deletion leads to the lack of the amino acids "KKVKET" at position 152 to 157 in the DpMYB5 protein.Structural analysis with alphafold2 revealed the location where the difference between the mutant and the wildtype protein is located.This loss of six amino acids does not affect the well conserved MYB domain, but might influence the transcriptional activation ability of this protein.Sequence variants leading to the formation of a large terminal flower In the search for the genetic basis of the formation of large terminal flowers the sequencing reads of one phenotypical plant were compared to three wildtype plants.Upon investigation of the Digitalis orthologs of the Arabidopsis TFL1 a 14337 bp insertion was found starting only 4 base pairs after the start codon of the gene and leading to a premature stop codon after 4 codons (Fig. 7).The insertion has a high similarity to the insertion in the ans gene described above.

Discussion
The highly continuous genome sequence of a red flowering D. purpurea and the corresponding annotation of protein encoding genes enabled the identification of genes associated with the anthocyanin biosynthesis.The original number of 273,260 predicted protein encoding genes was reduced to the more realistic number of 56,065 genes by considering only genes that show solid transcript evidence in any of the available RNA-seq datasets.Similar approaches have been discussed and applied before to reduce large numbers of predicted gene models based on transcription evidence to the most important ones (Liang et al., 2009;Dohm et al., 2014;McGrath et al., 2022).Given that structural annotation workflows do not provide perfectly accurate results, evaluation and filtering is recommended (Vuruputoor et al., 2023).Although RNA-seq samples from a range of different plant organs and treatments were included in this filtering based on transcript evidence, it is possible that transcripts of bona fide genes were not detected in the RNA-seq data leading to a mis-classification of the corresponding genes as non-expressed pseudogene.According to previous reports, only 60-70% of all encoded genes are typically expressed in a given sample (Salojärvi et al., 2017).Given that plant genomes harbor large numbers of genes that are activated in response to stress conditions, it seems obvious that many genes are only active under specific conditions.
Long read sequencing of a white flowering individual enabled the discovery of sequence variants that might explain the lack of pigmentation.ANS encodes the anthocyanidin synthase, a crucial enzyme for the anthocyanin biosynthesis, and shows a homozygous insertion of a large DNA fragment in the two white flowering plants, while the two red flowering plants show at least one allele without this insertion.The consequence of this insertion is most likely a loss of ANS activity that disrupts the anthocyanin biosynthesis.A loss of ANS activity fits all observations regarding the presence/absence of flavonoids in D. purpurea.We noticed that flavonols are produced in the white flowering plants (Additional file 9).There is an overlap regarding the enzymes required for flavonol and anthocyanin biosynthesis.CHS, CHI, F3H, and F3'H are shared between both pathways.The presence of flavonols indicates that the genes CHS, CHI, F3H, and F3'H are functional, because they are required for the flavonol biosynthesis (Forkmann et al., 1980;Ferrer et al., 1999;Jez et al., 2000;Brugliera et al., 2002;Saito et al., 2013).This observation reduces the set of remaining structural genes in the anthocyanin pathways to DFR, ANS, arGST, and UFGT.While we cannot completely rule out that DFR, arGST, or UFGT contribute to the loss of anthocyanin pigmentation, we have not identified evidence for systematic sequence variants in these genes.Several previous studies discovered loss-of-function mutations in ANS as the cause of anthocyanin pigmentation loss (Abrahams et al., 2003;Kim et al., 2004;Jiang et al., 2020).While the above described ans mutation could be a universal explanation for the lack of pigments in white D. purpurea flowers.Theoretically, it remains possible that different mechanisms are responsible for the lack of pigmentation in different white flowering D. purpurea individuals.
As transcription factors are often implicated in the loss of anthocyanin pigmentation (Marin-Recinos & Pucker, 2024), the identified candidate genes were checked for sequence variants associated with the anthocyanin loss.An ortholog of the previously described anthocyanin biosynthesis regulator MYB5 (Jiang et al., 2023) was discovered to show a 18 bp deletion in D. purpurea.While this sequence variant could disrupt the activator domain of this transcription factor, the low expression of this gene in flower samples suggest that DpMYB5 might only control the anthocyanin formation plant parts except for the flower.However, an analysis with genetic markers suggest that DpMYB5 is at least genetically linked to the locus responsible for the flower pigmentation.Expression of DpMYB5 in the leaves suggests that this MYB might trigger the anthocyanin biosynthesis in response to light stress.
Whether or not the absence of color and spot pigmentation in white flowering plants has any effect on pollination efficiency remains to be tested.However, there are many examples of plants where different visual cues in the flower lead to increased pollination efficiency (Owen & Bradshaw, 2011;Abid et al., 2022;Richter et al., 2023).The visual cues can consist of flower forms indicating landing sites for insects, simply pigmented flowers letting the flower stand out against the background or contrasting patterns indicating landing-position and -direction for insects on the flower (Johnson & Midgley, 1997;de Jager et al., 2017).This suggests the assumption that the pigmentation patterns in D. purpurea flowers contribute to increased attractiveness of the flower for pollinators.The contrasting spots on the lower petal possibly indicate a landing zone and the two UV fluorescent stripes at the base of the connation of the side petal with the lower petal may act as nectar guides (Additional file 9).
The present study focuses on explaining the completely white flowering phenotype (Fig. 1), but there are also cases of white flowers with red spots in D. purpurea.Similar patterns have been described before in other plant species (Ding et al., 2020).It seems plausible to assume that a similar local activator/lateral repressor system comprising NEGAN and RTO also underlies the pigmentation pattern in D. purpurea.Further investigations are required to test this hypothesis.
DpTFL1/CEN is an ortholog of AmCEN and AtTFL1, genes crucial for the development of an indeterminate inflorescence architecture.Loss-of-function mutations of these genes have been shown to lead to large terminal flowers (Bradley et al., 1996(Bradley et al., , 1997)).The system responsible for inflorescence architecture seems to be highly conserved among plants (Pnueli et al., 2001), suggesting a similar system in D. purpurea as well.Genotyping of a D. purpurea population for the DpTFL1/CEN locus revealed that only the homozygous mutant genotype leads to the development of a determinate inflorescence architecture, corroborating the findings of Keeble et al. (Keeble et al., 1910).

Declarations Ethics approval and consent to participate
Not applicable

Consent for publication
Not applicable

Fig. 2 :
Fig. 2: Three terminal flowers.(A): A purple terminal flower with purple spots originated from the Botanical Garden of TU Braunschweig (B and C): White terminal flowering plants from the garden of the Institute of Plant Biology, TU Braunschweig.

Fig. 5 :
Fig.5: Comparison of the ANS locus in red and white flowering D. purpurea plants revealed a large insertion that appears homozygous in the white flowering plants.The insertion is expected to be a loss-of-function mutation that results in the loss of ANS enzymatic activity and thus a block in the anthocyanin biosynthesis ultimately resulting in the white flower phenotype.

Fig. 6 :
Fig.6: Genomic region showing the deletion of 18 bp in the coding sequence of DpMYB5 (g25977) in long reads belonging to white flowering plants, while reads to red flowering plants show that this region is present.

Fig. 7 .
Fig. 7. Genomic region showing the insertion of 14 kb in the DpFL1/CEN (g55597) in the reads of the plant with a terminal flower (red box).The plants with indeterminate flowering architecture show at least some reads without the insert.

Table 1 :
Comparison of assembler performance for the analysis of Digitalis purpurea long read sequencing data of the red flowering plant DR1.