The relationship between microbial biomass and disease in the Arabidopsis thaliana phyllosphere

A central goal in microbiome research is to learn what distinguishes a healthy from a dysbiotic microbial community. Shifts in diversity and taxonomic composition are important indicators of dysbiosis, but a full understanding also requires knowledge of absolute microbial biomass. Simultaneous information on both microbiome composition and the quantity of its components can provide insight into microbiome function and disease state. Here we use shotgun metagenomics to simultaneously assess microbiome composition and microbial load in the phyllosphere of wild populations of the plant Arabidopsis thaliana. We find that wild plants vary substantially in the load of colonizing microbes, and that high loads are typically associated with the proliferation of single taxa, with only a few putatively pathogenic taxa achieving high abundances in the field. Our results suggest (i) that the inside of a plant leaf is on average sparsely colonized with an estimated two bacterial genomes per plant genome and an order of magnitude fewer eukaryotic microbial genomes, and (ii) that higher levels of microbial biomass often indicate successful colonization by pathogens. Lastly, our results show that load is a significant explanatory variable for loss of estimated Shannon diversity in phyllosphere microbiomes, implying that reduced diversity may be a significant predictor of microbial dysbiosis in a plant leaf.

understanding microbiome function is therefore the development of methods that help us to infer indirectly which of the numerous taxa associated with a host also have a measurable impact on host health.
In both plants and animals, compositional microbiome analyses such as 16S rDNA amplicon sequencing have been integral to revealing the hundreds or even thousands of microbial species that co-occur within single tissues, and the responses of taxa to environmental differences [3][4][5] . While compositional analyses indicate changes in the number of microbial cells relative to each other, these data alone do not tell us whether a focal taxon changed in absolute abundance relative to the host. A proxy for what we term here the microbial load of a host is therefore the ratio of microbial to plant cells, and this may in turn be an important indicator of host health.
Indeed, assessing the inability of a host to control pathogen proliferation is a common metric for assessing disease state 6 , and the importance of evaluating microbial load is increasingly acknowledged in microbiome studies. For example, a recent study demonstrated that inferring correlations between human disease and microbial taxa is contingent on microbial load in the gut 7 . In probing the relationship between microbiome composition, microbial load, and Crohn's disease, the authors found that only when microbial load was taken into account was it possible to detect associations between specific taxa and disease state. There are also examples of low biomass microbes exerting a significant effect on host health 8 . Therefore, at least in humans and likely other mammals, information on microbial load is apparently informative for identifying novel disease agents and dysbiotic microbial communities.
In plants, microbial load has been shown to differ substantially between individuals of the same and different species. Maignien and colleagues 9 found bacterial load in naturally colonized Arabidopsis thaliana populations to vary by an order of magnitude even within a single population. Karasov and colleagues 2 similarly found that wild A. thaliana in the same local metapopulation differed by an order of magnitude in their estimated microbial loads, and discerned a possible association between microbial load and putative infection. The major driver of differences in microbial load was a pathogenic Pseudomonas taxon, which indicated that this taxon reduced the ability of the host to control bacterial growth in its leaves. These studies raise the possibility that microbial load could be used to identify-without prior information-taxa that are novel disease agents.
At present, we lack a baseline for how many microbes on average colonize the leaf of a wild plant. Only once this baseline is established (or at least approximated) can we determine its relationship to host health. With this information we can, for example, identify exceptional plants with abnormally high or low loads, and analyze these individuals to discover host, microbial and environmental factors that influence load.
In this study, we make use of data on the progression of infection in the laboratory to inform our understanding of microbiome colonization in wild plant populations. We first analyze the relationship between microbial load, composition, and diversity in the phyllosphere of wild A. thaliana populations in Germany and Sweden. The two geographic regions provide contrasts in both microbiome composition and load, and with higher loads being due primarily to the proliferation of putatively pathogenic taxa of both bacteria and oomycetes.
To relate these observations to disease progression, which has been extensively studied in the laboratory, we perform controlled infections of a bacterial and an oomycete pathogen, and monitor the microbial load and composition throughout disease progression. Using these controlled infections as standards, we find that in wild populations only a few taxa proliferate to a load higher than basal colonization in a laboratory-grown plant.
Importantly, these high-load taxa encompass the most common pathogens of A. thaliana, such as Pseudomonas sp. 10 and Hyaloperonospora arabidopsidis 11 . Based on our observations, we propose that high microbial biomass in the A. thaliana phyllosphere may be achieved nearly exclusively by pathogenic taxa. This prediction can now be tested by the systematic analysis of other A. thaliana populations.

Microbial load differs between regions, but high levels of microbial colonization are rare
We assessed microbial load variation in the endophytic compartments of the phyllosphere of wild A. thaliana plants. To ensure that our results do not simply reflect the idiosyncrasies of a single geographic region, we targeted two regions with contrasting climates and environments: Southwestern Germany and Sweden ( Figure   S1, Table S1). Within each region we surveyed several populations. With this information our aim was to determine (1) which microbial families contribute the most to microbial load, (2) whether the same families within a region are reproducibly associated with high load, and (3) whether there is a consistent relationship between load and microbial diversity.
To assess the level of colonization of a plant, one can count the microbial cells directly 7 . While accurate, this method is suitable only for fresh tissue that can be decomposed and sorted without destruction of microbial cell membranes. When fresh tissue is not available, one can use nucleic acid-based detection methods and assess DNA composition to estimate the ratio between microbial and host genomes. Quantitative polymerase chain reaction (qPCR) 7 , spike-ins of known taxa to extracted DNA 12 , and shotgun metagenomics are three established methods for determining the overall load of microbes in an environment. While every method of estimating microbial load has its limitations 13,14 , among approaches that can be easily scaled to a large number of samples, shotgun metagenomics has several advantages: it avoids many well-established biases in amplicon sequencing 15,16 , and it has the potential to simultaneously provide information on community composition and plant biomass.
We previously demonstrated that one can obtain quantitative measures of microbial colonization in wild plants by performing shotgun metagenomic sequencing of whole plant tissue, followed by determining the ratio of reads assigned as having either a microbial or plant origin 2 . We have further validated our approach by showing that compositional inferences agree well between this method and more conventional amplicon sequencing methods 17 . In the current study, to maximize reproducibility across samples, all samples were processed in the same manner. A strength of our metagenomic comparison method is that we compare the ratios of abundances across samples, an approach that can bypass several known biases in analyzing shifts in microbiome composition 14 .
Nonetheless, we recognize that a metagenomic assessment of microbial DNA is merely a proxy for microbial load, and not a direct measurement.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint Our previous work demonstrated that plants within the same population differ in microbial load 2 and that the presence of a specific taxonomic group from the genus Pseudomonas is correlated with high loads in Southwestern Germany. Using the same washing methods as before, we generated shotgun metagenomic data from washed leaves of A. thaliana collected from five natural populations in 18 Sweden 18 , and used previously published data from four populations in Southwestern Germany 2 . We mapped the shotgun metagenomic data to the A. thaliana Col-0 reference genome 19 , and classified unmapped reads using the kmer metagenomic mapping tool centrifuge 20 . As shown before, the coverage estimates are robust and largely insensitive to the A. thaliana reference genome used 17 . We then estimated the ratio of average coverage over microbial genomes to plant nuclear genomes. We estimated the average total load of a plant, per The bacterial, fungal, and oomycete families in Germany . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint  Figure  1a. Panels are colored according to region, according to population (nested in region), and according to load.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint with the highest average loads were Pseudomonadaceae, Ceratobasidiaceae, and Peronosporaceae, while in Sweden they were Burkholderiaceae, Pleosporaceae, and Peronosporaceae. Compared to Swedish plants, German plants had on average higher bacterial load (Wilcoxon rank-sum test p = 8x10 -7 ), and lower fungal load (Wilcoxon rank-sum test p = 0.006), but were not significantly different in oomycete load (Wilcoxon rank-sum test p = 0.2512).
Considering only bacterial families, the metagenomes from the two regions were compositionally distinct (permutational analysis of variance 23 mean partial r 2 = 0.27, p = 0.001). Indeed, 33/379 bacterial families differed in abundance by more than two-fold between the two regions (Benjamin-Hochberg 24 adjusted p-value < 0.05, Wald-test), with twenty three of those families more abundant in Germany than in Sweden. Compositional distinctness was true for populations in each region (mean partial r 2 = 0.18, p = 0.001). Load was also a significant explanatory variable for differences in microbiomes across regions (mean partial r 2 = 0.06, p = 0.008). Because the German and Swedish plants had been collected at different times and processed separately, we could not rule out that some of the observed differences between regions also reflected differences in sample processing.
In light of this possibility, we focused on the population for which we had the most plants sampled (Eyach in Germany, n = 86) and tested within this population the relationship between load and the Bray-Curtis measure of community dissimilarity. Within the Eyach population alone, differences in load were also significantly associated with differences in composition (permutational analysis of variance r 2 = 0.19, p = 0.001).

High microbial load is associated with proliferation of single taxa and reduced taxonomic diversity
Our previous work indicated that high microbial load in German populations was correlated with the presence of strains representing a single pathogenic taxon of Pseudomonas 2 . To test whether taxa other than Pseudomonas could be responsible for high load, we calculated the relationship between the bacterial loads of a plant leaf and the maximum relative abundance of a family observed per sample ( Figure 2b). There was a positive relationship between these two variables for bacteria (Beta regression Pseudo-R = 0.36, p < 4x10 -8 ) and the results were robust to the exclusion of plants with the highest loads (loads > 10).
Our results revealed that in the most heavily colonized plants, those with highest microbial load, a single taxonomic family of oomycetes or bacteria explained the majority of the load increase relative to less heavily colonized plants. A direct consequence of this relationship was a reduction in the estimated Shannon's Diversity Index in the leaves of plants with a high load ( Figure 2c). Importantly, the reduced bacterial Shannon diversity was not an artifact of reduced depth of sequencing of bacterial reads ( Figure S2). The reduced diversity could be the result of at least two distinct processes: (1) The most abundant family increases in abundance without an influence on the surrounding microbial presence, or (2) the family suppresses the surrounding microbiota while proliferating. We did not see a negative correlation between the load of Pseudomonadaceae and the load of any . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint other bacterial families ( Figure S2), indicating that the change in observed diversity is not due to the effect on the surrounding microbes and on species richness, but instead to the proliferation of Pseudomonadaceae.
That bacterial and oomycete families associated with high load contained well-known pathogens, such as P. syringae and Hyaloperonospora arabidopsidis (HpA), further suggesting that high load could be an indicator of plant disease state. We note that when plants were sampled, we had done so without regard to the presence or absence of visual signs of disease. In the field it is often difficult to assess disease state of A. thaliana plants, as there are numerous causes for reduced plant size and chlorosis, including non-optimal temperatures 25 and drought 26,27 . Consequently, for field-collected plants, unless we have a priori information on the presence of disease agents, we can rarely assess microbe-associated disease state and progression directly.

Microbial load correlates with disease under laboratory conditions
In contrast to disease assessment in the field, in standard laboratory infections it is relatively straightforward to assess disease state of A. thaliana by measuring macroscopic disease symptoms, size differences, and molecular markers of disease. For the field collected plants in this study, it was unclear how the colonization levels we observed in the field related to microbial colonization observed in the laboratory. We therefore wanted to make use of information from laboratory experiments to interpret the field data.
Molecular plant pathology, especially in A. thaliana, has a long history of assessing the pathogen colonization process through laboratory infections 28 . This progression is typically tracked by infecting plants with a low titer of pathogen, then following the growth of the pathogen via plating or quantification of microbial growth via microscopy 29 . We sought to create from laboratory infections the equivalent of standard curves that would inform our observations of wild plants. To this end, we performed controlled infections with a model prokaryotic pathogen (P. syringae) under standard laboratory conditions, followed by simultaneously measuring disease progression over time with traditional plant pathology metrics (colony forming units), qPCR measures of microbial abundance, and, as we had done in the field, metagenomic load quantification by whole-genome shotgun sequencing.
For the P. syringae analysis, we performed controlled infections in the laboratory of the A. thaliana reference accession Col-0 with two P. syringae strains: Pst DC3000:avrB, which is recognized by the host immune system of this accession 30 , and Pst DC3000:EV, which is not 31  replicates. With each sampling, we performed colony counting on extracts from leaf hole punches to directly measure live bacterial content, we performed qPCR to quantify the ratio of plant DNA to the 16S rDNA of . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint colonizing bacteria, and we performed shotgun metagenomics and taxonomic assignment to assess the ratio of plant metagenomic reads to Pseudomonadaceae metagenomic reads. Qualitatively, the three parallel measures of Pst DC3000 growth over time were remarkably similar, with an increase in bacterial abundance for the first three days after infection, followed by a plateau in microbial propagation ( Figure 3a). Comparisons of the log-transformed measurements of the data confirmed that qPCR and metagenomic load measurements were significantly correlated (r = 0.88, p < 2x10 -16 ) ( Figure S3), and that the correlation coefficient between metagenomic load measurements and live colony counting was higher than the correlation coefficient between qPCR measurements and colony counting (r = 0.61, p < 2x10 -16 ). This may be because the qPCR amplification method we used did not distinguish between the 16S rDNA amplicons of Pseudomonadaceae and those of other bacterial families, whereas the metagenomic classification did. Eukaryotic microbes are also frequent pathogens of A. thaliana and among the most common ones is HpA 36 . An obligate biotroph, HpA spreads between plants as spores 11 . Under humid conditions, the spores will develop hyphae, which penetrate the plant epidermis and subsequently the mesophyll (Figure 3), eventually form fruiting bodies. These fruiting bodies then sporulate, releasing the spores to infect other plant tissues. Assessing . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint the abundance of HpA via nucleic acid-based methods is complicated by the fact that its hyphae are multinucleated 37 . Consequently, one "cell" of HpA encapsulates more than one nuclear genome. For the HpA analysis, we took advantage of an HpA strain that we recently collected and an A. thaliana genotype related to those from which this HpA strain was isolated. We drip-inoculated the A. thaliana genotype H2081 with the HpA isolate 14OHML004 (both from the Midwestern USA), then took samples of plants over an eleven-day time window. Trypan blue staining 38 and metagenomic load were used as parallel metrics to assess the growth of HpA.
Sporulation commenced around day 5 of infection, as seen both in the Trypan blue staining and in the substantial increase in microbial load by day 11. Until sporulation was induced on day 5, the microbial load of HpA was low (approximately three HpA genomes per every 1,000 A. thaliana genomes). After induction, the load increased to an average of seven HpA genomes per every 100 A. thaliana genomes.
These results suggest that Pst DC3000, a prokaryotic microbe, can achieve microbial loads-on a chromosome-per-chromosome level-that are more than an order of magnitude higher than HpA, a eukaryotic microbe, can. This was consistent with what we observed in the field.

Pseudomonas load is negatively associated with plant growth
While it was intriguing to observe overall differences in microbial loads across plants, we also wanted to know whether increased microbial load is associated with decreased plant growth. Previous studies testing gradients of initial infection doses with a P. syringae pathogen demonstrated that higher initial infection titers led to more prominent symptoms and reduced plant fitness 39,40 . To determine whether this relationship holds for other combinations of Pseudomonas and A. thaliana genotypes, we replicated the infection experiments described in ref. 2 with pairs of one of two A. thaliana genotypes and one of three Pseudomonas strains, and simultaneously measured both plant and microbial growth ( Figure S4). Across all A. thaliana and Pseudomonas genotypes, proliferation of Pseudomonas was negatively associated with the size of A. thaliana host plants, and significantly so for 4/6 genotype x genotype combinations (Wilcoxon rank-sum test of each plant-microbe interaction, p < 0.05 after Benjamin-Hochberg correction 41 ). These results suggest that-at least for the Pseudomonas strains tested here-higher loads are associated with reduced plant health, and likely reduced plant fitness.

Few microbial taxa in wild populations exceed loads observed in incompatible infections
Having obtained information about the progression of infections with P. syringae that is either recognized by the host immune system (incompatible) or unrecognized (compatible), we were in a position to interpret the loads we had observed in the field. More specifically, we expected that this information would allow us to distinguish between compatible and incompatible interactions in the field.
In the laboratory, P. syringae abundance increased over the first three days after infection, with mean infection levels of compatible infections on day 2 exceeding those of incompatible infections on any day. We thus . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint considered the mean load achieved by DC3000:EV on day 2 to exceed the "resistance threshold" of the infection.
In the field, we found that 6% of plants contained a Pseudomonadaceae load that exceeded the maximum in cases where the plants in the laboratory could detect the colonizing microbe because it carried the avrB effector gene.
The second most common bacterial family in Germany comprises the Sphingomonadaceae, but there is no evidence in the literature that they can be pathogens, and their load did not exceed the resistance threshold in any of the plants analyzed (Figure 4). . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint As illustrated in Figure 4, in the bacterial microbiome from the field, only Pseudomonadaceae and Burkholderiaceae-a family containing phytogenic members such as species of the genus Ralstonia 42 -colonized plants to a level that surpassed the mean colonization level of the incompatible P. syringae interaction (DC3000:avrB) in the lab. Similarly, only Peronosporaceae and Albuginaceae achieved a load that exceeded the day 2 resistance threshold for oomycetes, when hyphae and spores were still rarely observed in samples. The remaining taxa in the microbiota of wild plants were not found above levels of laboratory plants infected with microbes that did not induce an immune response.
Note that while several of the same bacterial and oomycete families were found to contribute to microbial load across populations, it is likely that in different regions and plants, the actual species or strains were likely not the same, since strains of species, as well as their gene content and pathogenic capacity, are known to differ within and between populations 43 . The control plants inoculated with buffer showed increasing growth of background microbes over time ( Figure S5). We speculate that this was the result of the destructive and humidifying nature of syringe inoculation, which is likely to damage plant structural defenses and thereby enable colonization of surrounding microbes.

P. syringae and
Does the presence of Pst DC3000 influence the colonization of these background microbes? When we regressed the abundance of microbial family on the presence of Pseudomonadaceae, the presence of Pseudomonadaceae positively and significantly affected four of the ten most abundant families (p < 0.005). The family Sphingomonadaceae was positively correlated with the abundance of Pseudomonadaceae, a result that contrasts with previously observed antagonistic relationships between some strains in these families 1 , but replicates in an expanded dataset the findings of Regalado and colleagues 17 .
Similarly, analysis of HpA-infected plants revealed that HpA colonization was associated with increasing loads of all but one of the ten most abundant microbial families ( Figure S6, p < 0.005). These results are consistent with findings that eukaryotic microbes can be significant remodelers of the surrounding microbiome 44,45 . One must interpret the results with caution, however, because infected and uninfected plants were not co-housed . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint because we did not want to risk cross-contamination by the highly mobile HpA spores. Hence, we cannot definitively separate the effects of tray and treatment.
Nonetheless, that we observed changes in the background microbiota has important implications. First, co-infections with resident microbes may modify disease symptoms in laboratory experiments, and indeed be responsible for the well-known between-lab variation in infection outcomes. We therefore propose that it is advisable to monitor the level of background microbiota in such experiments. Second, controlled infections coupled with unbiased monitoring of the microbiota can be used to identify and subsequently isolate microbes that cannot colonize healthy plants, but that might enhance or inhibit disease caused by known pathogens.

Discussion
Hundreds of microbial families colonize A. thaliana leaves, and the colonizing microbiota differs between plant genotypes 46,47 and regions 5,47 . In this study we found that very few of these microbial families-not even 1 in a 100-ever proliferate to high biomass in planta, with fewer than 10% of plants being extensively colonized to levels equivalent to late-stage infections in the laboratory.
While composition and total load differed across regions, the identity of microbial families that successfully proliferated in planta was largely the same across A. thaliana populations. These exceptional families-Pseudomonadaceae, Peronosporaceae, and Albuginaceae-contained known A. thaliana pathogens that proliferated in both regions to loads higher than the average achieved in incompatible infections in the laboratory.
This extensive proliferation occurred only in a minority of plants, and we hypothesize that these plants were undergoing pathogenic infection. Importantly, we find that high load of a taxon was associated with two leaf microbiome summary statistics: differences in composition ( Figure 1) and reductions in alpha diversity ( Figure 2).
The relationship between the presence of a single prolific taxon and microbial diversity is so clear in our dataset that it should be possible to take low microbial diversity in the phyllosphere as an initial classifier for latent disease both in wild and cultivated plants. Follow-up work could examine the host for signs of compromised health and test the disease capacity of the identified microbe across a range of environments.
There are several explanations for the limited loads achieved by "background taxa" in our dataset. One possibility is that the background taxa are controlled by the plant immune system and it is only when a microbe successfully evades 48,49 or overcomes 50 the immune system that it can proliferate beyond the level of incompatible infection resistance. Whether these microbes cannot proliferate successfully in planta under any conditions, or are instead controlled by the plant immune system, remains an open question. In one example, Kniskern and colleagues 51 demonstrated that immune-deficient A. thaliana mutants planted in the field exhibit increased microbial loads, indicating host control of the larger microbial community. We now have the tools to probe this question with greater specificity to determine on a strain-specific level which microbes are controlled by the host immune system. . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint There is a growing body of evidence that commensal microbes interact with the plant immune system.
For example, specific strains of Sphingomonas, an abundant, putatively commensal genus in the phyllosphere, stimulate plant immune response 1 , and one can envision that their proliferation is kept in-check by elements of the plant immune system that also control bona fide pathogens 52,53 . In addition, many microbes can probably not easily proliferate in the nutrient-limited conditions of the phyllosphere. While this possibility has not been explicitly tested, in the root system it is well established through stable isotope profiling that only a fraction of the microbiome measurably proliferates at any point in time 54 . Lastly, there is a possibility that unknown biases in our extraction and sequencing methods could potentially skew the relative abundance of microbiota. While this is true of every extraction and sequencing method to date, metagenomic results are concordant with those of the 16S rDNA data 17 , and the prominence of Pseudomonadaceae, Peronosporaceae, Sphingomonadaceae, and Albuginaceae in wild phyllospheres has been observed repeatedly across experiments and methodologies 2,3,44,46 .
Our findings of low microbial load and a negative relationship between load and alpha diversity point to a paradigm in the leaf apoplast that differs from that in the human gut and perhaps that in the root. While a healthy human gut is colonized by large numbers of microbes interacting symbiotically with the host, and lower bacterial levels can even be associated with disease 7 , our results suggest that a healthy leaf apoplast likely does not house high numbers of microbes.  57 . The few cases in the leaf apoplast in which microbes have been found to be beneficial are largely due to their ability to protect the plant against pathogens 1 (but see the work by Mayak and colleagues 58 , who describe what may be an endophytic association). It seems plausible that it is only when a pathogenic microbe successfully evades the host immune responses that it can colonize to high titer in the phyllosphere.
Knowledge of microbial load may be particularly informative when one wants to identify microbes with an effect on plant fitness. There is an increasing appreciation of the role of microbe-derived small molecules in interactions with the plant 59 . Many of these small molecules are regulated by quorum sensing mechanisms 60,61 , which signal a certain biomass of the focal microbe. Indeed, achieving appreciable metabolite levels to influence host function may require high microbial load. Furthermore, because few microbes in the plant apoplast benefit the host, a reasonable hypothesis going forward is that a high microbial load is a signature of pathogenic (or prepathogenic) infection. We conclude that the methods we have developed and implemented can be used not only to monitor plants for pathogenic infection, but also to discover novel disease-causing microbes.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint

Field sample collections
Metagenomic samples from Germany were previously described and published in ref. 2 Table S1. Whole rosettes were collected from the field with tweezers and scissors sterilized between plant samples, washed in sterile water, then flash frozen on dry ice. The samples were then stored at -80°C until time of DNA extraction.

qPCR quantification
To assess bacterial abundance in infections, we amplified a fragment of bacterial 16S rDNA using a forward primer that begins priming at position 799 of the 16S rDNA locus 62 , and a reverse primer that begins priming at position 902 (ref. 63 ). (799F-AACMGGATTAGATACCCKG and 902R-GTCAATTCMTTTGAGTTTYARYC).
To assess plant abundance in the samples, we amplified a segment of the A. thaliana single-copy gene GIGANTEA

Metagenomic library analysis
Total DNA was extracted from rosettes that had been flash-frozen in liquid N2 by grinding plant tissue in microtubes filled with garnet rocks. Further metagenomic DNA purification and library preparation protocol was followed as in ref. 2  not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint reference genome 64 with bwa mem 19 using standard parameters. We have shown that metagenomic load can be reliably assessed with as few as 30,000 reads 17 . After removing samples that had fewer than 30,000 reads, all read pairs flagged as not mapping to A. thaliana were extracted with samtools 65 as the putatively "metagenomic" fraction. Average coverage of the A. thaliana genome was assessed by the samtools 'depth' command, taking the average over the five nuclear chromosomes (excluding mitochondria and chloroplast DNA). The remaining unmapped reads were then classified using the metagenomic classification tool centrifuge 20 (standard parameters). Taxonomic assignment at the family level was considered for downstream analyses.
The output read table from centrifuge was normalized using custom scripts (https://tkarasov.github.io/controlled_metagenomics) to assess for each sample the average coverage per each microbial family/the A. thaliana genome. Average genome size of a microbial family was estimated using information downloaded from https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/ on 9/21/2018 and an NCBI taxonomy generated by the 'tools/taxdmp2tree' script in the Ultimate edition of MEGAN 66 in September 2018. For families for which genome size data was not available, a default size of 3.87 Mb (the average bacterial genome size) was assigned for bacteria 67 , 8.97 Mb for fungi 68 , and 37 Mb for oomycetes 69 . Note that the assigned values for fungi and oomycete are the lower bounds for genome sizes for these taxa, hence this assignment is likely to inflate the estimate of abundance of genomes with less complete assemblies (i.e. typically less wellstudied organisms). If a genome assembly was smaller than 1 Mb, we deemed the assembly to be unreliable and assigned the same missing value for size. Less than 3% of families detected in centrifuge assignment had no genome size estimate. Note that all subsequent comparisons are based off of the ratio of microbial/plant abundance.
To compare composition and abundance between regions we acknowledged differences in sequencing depth between samples by transforming the genome coverage data to variance-stabilized estimates with the R package DESeq2 (ref. 22,70 ).
Because standard compositional comparisons such as those in 16S rDNA amplicon sequencing are constrained by their compositional nature, we compare ratios (with plant coverage as the denominator), which avoids bias 14  . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint

P. syringae infections
Seeds of the A. thaliana genotype Col-0 were frozen overnight at -80°C and bleach sterilized. The sterilized seeds were then planted on soil and grown with 8-hours of light per day at 23°C with a high humidity dome. At 42 days of age, the plants were infected with DC3000:EV, DC3000:avrB or control (10mM MgSO4) as described in the paragraph below. Plants were labeled and randomized across three flats.
P. syringae strain Pst DC3000 was used for all infections. The strain used for infections was transformed with either the plasmid pMH221 encoding kanamycin (KM) resistance only or pMH221:AvrB. DC3000 was first made electrocompetent via sucrose washes 73 and transformed with the constructs described. Successful transformants were selected on plates containing 50µg/mL kanamycin. The night prior to infection, a colony was inoculated into 1-5 mL of Luria broth (LB) containing 10 µg/mL KM, and the culture was grown overnight at 28°C. In the morning, the sample was diluted 1:10 in 5 mL of fresh LB and grown for 2 to 4 hours. The resulting culture was spun down at 3500 x g, and resuspended in 10mM MgSO4 and diluted to an OD600 of 0.0002. Four leaves were syringe-inoculated per plant. Six replicates per treatment per day were taken at five timepoints: the first was taken immediately upon infection, the next after one day, then two days, three days, four days and a final time-point six days post infection. For collection of infected or uninfected material, sterile forceps were used to remove four infected leaves per plant which were then flash-frozen immediately in liquid nitrogen and stored at -80°C. Leaves used for cfu counting were surface-sterilized in 70% EtOH for five seconds. Two hole punches per two sterilized leaves were taken using gelatin capsules (Kapselwelt product number 1002).

HpA infections
Seeds of the A. thaliana genotype H2081 belonging to the HPG1 haplogroup 74 were frozen overnight at -80°C and bleach sterilized. This genotype was chosen for its susceptibility to the focal HpA genotype. The sterilized seeds were then planted on soil and grown with 8-hours of light per day at 23°C with a high humidity dome. Progression of HpA infections was monitored using Trypan blue staining 29,38 . Briefly, Trypan blue is known to stain the external structures of HpA hyphae 29 . Leaves were removed from the rosette, heated for one hour . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint at 70°C in Trypan blue stain solution (10 ml ddH20, 10 ml phenol, 10 ml lactic acid, 10 ml glycerol, and 20 mg Trypan blue and water mixed in 1:2 ratio with 95% ethanol). Decolorization of stained leaves was achieved via soaking in chloral hydrate (2.5 g of chloral hydrate dissolved per 1ml ddH20) until leaves were largely translucent (from two days to two weeks). Leaves were then mounted in 60% glycerol for observation on a Zeiss AxioImager Z1.

Comparison of bacterial and plant growth via luminescence
Plant genotypes Eyach 15-2 and Col-0 were used. Seeds were stratified for twelve days at 4ºC and then were grown in long-day (16 h) at 23ºC. After 5 days, seedlings were transferred to 24-well plates with ½ strength MS medium (Duchefa M0255.0050) and 1% agar , one seedling per well. Six days after, i.e. when plants were 11-days old, they were infected with single bacterial strains.
Pseudomonas strains were transformed to express the lux operon via electroporation. pUC18-mini-Tn7T-Gm-lux was a gift from Herbert Schweizer (Addgene plasmid # 64963). Pseudomonas strains were grown overnight at 28ºC in Luria-Bertani (LB) medium with 100 ng/mL of nitrofurantoin, diluted the following morning 1:10 in 5 mL selective medium and grown for 3 additional hours. Bacteria were then pelleted at 3,500 g and brought to an OD600 of 0.01 in 10 mM MgSO4. 100 µL of this bacterial suspension were used to drip-inoculate plants, distributing the volume over the whole rosette. Plants were mock infected with 10 mM MgSO4 as control.
Plates were returned to the growth chamber, and three days after infection whole rosettes were cut for luminescence quantification.
For luminescence quantification whole rosettes were transferred to 96 deep-well plates (2.2 mL, Axygen), containing two 5 ± 0.03 mm glass beads (Roth) and 400 µL of 10 mM MgSO4, and ground for one minute at 20 m/s in a TissueLyser II (QIAGEN). Then, 10 mM MgSO4 was added to a final volume of 1 mL, and 200 µL were transferred to a 96-well Lumitrac white plate. Luminescence was measured in a multiplate reader (TECAN Infinite F200) with 2,000 ms of integration time. Each well was measured three times, and the mean was calculated for further analysis. The signal of MgSO4 blanks was subtracted from the samples' signal before analysis.
Data analysis was conducted in R 3.5.1. Luminescence signal was log-transformed, and the NA generated due to negative luminescence values (obtained after subtraction of blanks) were replaced by 0, to retain the meaning of absence of luminescence.

Image analysis
For plant growth quantification, plant plates were photographed before infection and seven days post infection, before imaging with a tripod-mounted Canon PowerShot G12 digital camera. Individual plants were extracted from whole-plate images. The number of green pixels was determined for each plant and used as a proxy for plant fresh mass. The segmentation of the plant from background was performed by applying thresholds in Lab . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint color space, followed by a series of morphological operations to remove noise and non-plant objects. Finally, a GrabCut-based postprocessing was applied and csv files with plant IDs and green pixel counts were created. The workflow was implemented in Python 3.6 and bash using OpenCV 3.1.0 and scikit-image 0.13.0 for image processing operations.

Cryogenic electron microscopy
A. thaliana plants Col-0 were syringe infected with Pst DC3000 (OD600=0.0002) as described above after 4 weeks of growing in soil. The leaf material was collected three days after infection and immediately fixed with 2.5% Glutaraldehyde in PBS-buffer. After 1-hour incubation at room temperature (approximately 25°C), the samples were transferred to 4°C for three days. Subsequently, four washes in PBS buffer were carried out over two days.
Immediately prior to visualization, samples were carefully dried with paper tissues and compressed into a metal holder. Freezing was performed inside a cryo loading chamber filled with liquid N2. Frozen leaves were shattered with a metal tool tapped onto the edges. Sample transfers were conducted with the Vacuum cryo transfer system VCT100.
All samples were imaged on a Zeiss LEO Crossbeam 1540 Scanning Electron Microscope with a VCT100 cryo load lock system. The samples were sublimated inside the microscope by increasing the temperature from -140°C to 90°C. Afterwards the samples were sputter coated with platinum in a cryo sputter coater Baltec SCD500. The image was recorded with an Everhart Thornley Detector at 114 X magnification and 3 kV beam voltage at -140°C temperature. metagenomic analysis. JB provided samples from Sweden. TLK and DW wrote the manuscript with help from all authors.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint Supplemental Information, Karasov et al. . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint Figure S1: Maps of collections locations in Germany and Sweden (a) Plants were collected from Germany and sequenced as described 2 . Plants were collected from Sweden in this study and the same procedure was followed as for the German plants for metagenomic extraction and sequencing. Related to Figure  1.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint Figure S4: Pseudomonas growth is associated with reduced A. thaliana growth The growth of three Pseudomonas isolates 2 and two A. thaliana genotypes were measured simultaneously in laboratory infections. Increased Pseudomonas growth is significantly associated with reduced A. thaliana growth across all Pseudomonas and A. thaliana genotypes. The graph shows the relationship between bacterial growth (measured in luminescence) and plant growth (measured in green pixels from image analysis). The black line represents the line of best fit from linear regression.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint Figure S5: P. syringae Pst DC3000 infection increases the abundance of other microbes Metagenomic load and composition were tracked over six days for (a) mock infections with buffer alone, (b) infections with a strain of Pst DC3000 recognized by the plant (Col-0) due to the expression of the effector AvrB, and (c) a strain of Pst DC3000 unrecognized by the plant (c). The ten most abundant microbial families are displayed.
. CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint . CC-BY 4.0 International license under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available The copyright holder for this preprint (which was this version posted November 2, 2019. ; https://doi.org/10.1101/828814 doi: bioRxiv preprint