H IGH - RESOLUTION P ROFILING OF B ACTERIAL AND F UNGAL 1 C OMMUNITIES USING P ANGENOME -I NFORMED T AXON - 2 S PECIFIC A MPLICONS AND L ONG -R EAD S EQUENCING

High-throughput sequencing technologies have greatly advanced our understanding of microbiomes, but resolving microbial communities at species and strain levels remains challenging. In this study, we developed and validated a pipeline for designing, multiplexing, and sequencing highly polymorphic taxon-specific amplicons using PacBio circular consensus sequencing. We focused on the wheat microbiome as a proof-of-principle and demonstrate unprecedented resolution for the wheat-associated Pseudomonas microbiome and the ubiquitous fungal pathogen Zymoseptoria tritici. Our approach achieved an order of magnitude higher phylogenetic resolution compared to existing ribosomal amplicons. We show that the designed amplicons accurately capture species and strain diversity outperforming full-length 16S and ITS amplicons. Furthermore, we tracked microbial communities in the wheat phyllosphere across time and space to establish fine-grained species and strain-specific dynamics. To expand the utility of our approach, we generated pangenome-informed amplicon templates for additional key bacterial and fungal genera. The strain-level microbiome profiling enables the tracking of microbial community dynamics in complex environments and is applicable to diverse ecological niches. Overall, our work demonstrates how pangenome-informed amplicons overcome limitations in phylogenetic resolution to unravel microbial strain diversity and dynamics.


INTRODUCTION
we focus on the microbiome of wheat (Laino et al., 2015). The fungal microbiome of wheat is often 93 dominated by the pathogenic fungus Zymoseptoria tritici causing septoria tritici blotch (Kerdraon et al.,94 2019; Barroso-Bergada et al., 2022). The bacterial microbiome is dominated by Gammaproteobacteria 95 including the genus Pseudomonas both above and below ground (Chen et al., 2021;Kavamura et al., 96 2021). The fungal pathogen Z. tritici significantly impacts the composition of microbial communities 97 associated with wheat (Kerdraon et al., 2019). For instance, Z. tritici suppresses the host immune system 98 to facilitate colonization by strains of the P. syringae group (Seybold et al., 2020), while specific P. 99 fluorescens strains inhibit the growth of Z. tritici (Levy, Eyal and Chet, 1988). 100 In this study, we introduce a pipeline to establish and validate a novel suite of highly multiplexed 101 amplicons with over an order of magnitude higher phylogenetic resolution compared to existing 102 ribosomal amplicons. As a proof-of-principle, we target the wheat-associated Pseudomonas 103 microbiome and the major fungal pathogen Z. tritici. We achieve unprecedented species and strain-104 level resolution for both groups in mixed samples. We highlight the substantial gains in phylogenetic 105 resolution by tracking strains across the wheat canopy and over time.

PANGENOME-INFORMED DESIGN OF TAXON-SPECIFIC AMPLICONS 109
We developed new 3-kb long amplicons to enhance the resolution of bacterial and fungal community 110 profiling while maintaining universal amplification within the targeted group of organisms ( Figure 1A). 111 To identify highly polymorphic Pseudomonas amplicons, we constructed a comprehensive pangenome 112 of 18 high-quality genomes representing all subgroups of the genus. From this analysis, we identified 113 28 core regions conserved in the Pseudomonas pangenome. These core regions served as candidate 114 regions for primer development. For each core fragment, we designed all possible amplicons ranging 115 from 2.7-3.2 kb resulting in 224 amplicon candidates. Nucleotide diversity in aligned core regions was 116 used as a metric to prioritize amplicon candidates. 117 We selected the ten most polymorphic amplicon candidates for PCR evaluation using both 118 Pseudomonas reference strains and naturally colonized wheat leaf samples. Candidate primer sequences 119 were adjusted based on alignment against all available sequences of Pseudomonas strains to maximize 120 recovery of the Pseudomonas diversity. We allowed primer candidates to include up to five degenerate 121 bases. To reduce the number of sequence variants to be considered in primer candidates, we reduced respectively. We pursed a parallel approach to identify primers suitable for amplifying intra-specific 128 diversity of the fungal pathogen Z. tritici. We based our analysis on a recently established reference-129 quality global pangenome for the species (Badet et al., 2020). The two best performing amplicons were 130 located on chromosome 9 and 13, respectively.  To evaluate the performance of the two Pseudomonas-specific and two Z. tritici-specific amplicons, we 147 conducted tests using an extensive mock community of well-characterized laboratory strains. In  Table 1).

159
The wheat leaf samples revealed a diverse assembly of bacteria using the full-length 16S with 160 Pseudomonas and Sphingomonas being the dominant genera ( Figure 1C and Supplementary Figure S1).

161
The Pseudomonas-specific rpoD and transporter amplicons revealed a diverse assembly of species, 162 primarily belonging to the P. fluorescens and P. syringae groups ( Figure 1D   To assess the performance and detection limits of the new amplicons, we analyzed both defined mock 169 communities and dilution series. For the Pseudomonas amplicons, we examined a panel of ten isolates 170 representing the phylogenetic diversity of the genus (Supplementary Table 2). Our results showed that 171 both the full-length 16S and the Pseudomonas-specific rpoD and transporter amplicons correctly 172 distinguish all isolates ( Figure 2A). All isolates consistently amplified for all three amplicons, except 173 for the P. putida Leaf59 isolate failing for the transporter amplicon. No genome sequence is available 174 for verification, but amplification failure may be caused by primer mismatches. The rpoD and 175 transporter amplicons exhibited clustering of the eight P. fluorescens group isolates according to their 176 subclade described by (Flury et al., 2016). While the rpoD and transporter amplicons predominantly    Phylogenetic trees of amplified sequence variants (ASVs) from ten individual To assess gains in phylogenetic resolution of the wheat microbiome, we compared the Pseudomonas 223 and Z. tritici amplicons to 16S and ITS amplicons, respectively. Pseudomonas-specific amplicons 224 revealed 933 and 538 ASVs at the rpoD and transporter locus, respectively. In contrast, the 16S 225 amplicon revealed only 86 ASVs matching the genus Pseudomonas ( Figure 3A and B, Supplementary 226 Figure S3). This represents a three-fold (2.7X for rpoD and 3.3X for the transporter) increase in ASVs 227 for the Pseudomonas-specific amplicons compared to the 16S and based on relative number of reads 228 per amplicon ( Figure 3B). 229 We assigned all ASVs to Pseudomonas species using 1071 available genomes from nine different 230 groups for BLASTn analyses (Supplementary Table 4). The Pseudomonas-specific amplicons produced 231 significantly better matches for species assignment compared to 16S sequences ( Figure 3C). To assess 232 the consistency of species assignments, we extracted the smallest subtree containing >90% of the ASVs  For Z. tritici, we found 869 and 757 ASVs for the chromosome 9 and 13 amplicon, respectively, 243 compared to 13 ASVs obtained by ITS ( Figure 3G and H). Based on the total number of reads per 244 amplicon, this represents a more than tenfold increase of ASVs for the Z. tritici-specific amplicons.  timepoints during the wheat growing season, from May (first node appearance) to July (prior to harvest). 268 We sampled eight different wheat cultivars replicated each in two plots. Furthermore, leaves were  Figure S5). We compared correlation values to null expectations based on permutations. 283 We observed that 68% and 77% using Spearman and Pearson correlations, respectively, of the species 284 correlations were higher than the 95% confidence interval of the null expectation. we observed two distinct groups based on their abundance patterns ( Figure 5B). The first group 302 consisted of a small number of highly abundant strains persisting throughout the entire growing season.

303
In contrast, the second group consisted of numerous strains that are predominantly scarce and only 304 detected at specific timepoints.  311 We demonstrate that limitations in phylogenetic resolution of microbial community profiling can be 312 overcome using systematically designed taxon-specific 3-kb amplicons. We find that the new loci 313 provide species and strain-level insights into subsets of bacterial and fungal communities coexisting in 314 the plant microbiome exceeding full-length 16S or ITS amplicons by an order of magnitude.

315
The pangenome-informed design of amplicons for pseudomonads was optimized to capture diversity     To establish three sets of mock communities, we used ten different Pseudomonas strains and two 381 Z. tritici isolates (Supplementary Table 2). The first set was composed of a ten-fold serial dilution series 382 of a two-strain mixture up to 10 -5 . Specifically, we combined two Pseudomonas isolates (P. syringae 383 Leaf129 and P. thivervalensis PITR2) and two Z. tritici isolates (1E4 and IR01_48b). The second set 384 consisted of a ten-fold serial dilution series up to 10 -5 diluting P. thivervalensis PITR2 and 1E4 in a leaf Panseq loci as core if they were shared by >88% of the genomes (i.e. >=16/18). We created a multiple 397 sequence alignment of each core fragment using muscle v.3 with default parameters (Edgar, 2004 Leaves were lyophilized for 48h and weighed. Then, the complete leaves were homogenized using 435 0.5mm and 0.2mm zirconium beads in the Bead Ruptor bead mill homogenizer (OMNI) using the 436 following settings: speed 5.00, number of cycles 2, time of cycle 1:00, time distance between cycles 437 1:00. DNA extraction was performed with automated magnetic-particle processing using the 438 KingFisher Flex Purification Systems (Thermo Scientific). To enhance the DNA extraction of fungal 439 and bacterial DNA, lyticase and lysozyme was added to the first lysis step with PVP buffer. Specifically, 440 for 10mg dry leaf mass 3.9µl lyticase (200,000 U/mg, diluted to 6.5mg/ml), 3.9µl lysozyme (22,500 441 U/mg, diluted to 10mg/ml) and 98µl PVP lysis buffer was added, and samples were incubated at 55°C 442 for 30min. Then, 3.9µl proteinase K (30 U/mg, diluted to 10mg/ml) for 10mg dry mass was added and 443 incubated at 55°C for 30min. From each sample, 150µl of clear lysate was transferred to an empty 444 binding plate (KingFisher Flex, Thermo Scientific). For each sample, 360µl PN binding buffer, 30µl 445 well suspended Sbeadex beads were added. Using the KingFisher Flex, the first washing step was 446 performed using 400µl PN1 buffer per sample, then a second wash using 390µl buffer PN1 with 10µl 447 RNase A (diluted to 10mg/µl in water) and a third wash using 400µl PN2 buffer. Each sample was 448 eluted in 100µl nuclease-free water. The DNA concentration was measured using the Spark Microplate 449 reader (Tecan). Then, DNA concentrations were diluted to 5ng/µl using the Liquid Handling Station 450 (BRAND). PCR reactions were pipetted using the Mosquito HV liquid handling robot (SPT Labtech).

451
The first amplicon PCR reaction was performed in a 15µl reaction volume. Specifically, 7.5µl KAPA