Pathogen-driven coevolution across CBP60 plant immune regulator subfamilies confers resilience on the regulator module

Among eight Arabidopsis CaM-Binding Protein (CBP) 60 family members, AtCBP60g and AtSARD1 are partially functionally redundant, major positive immune regulators while AtCBP60a is a negative immune regulator. Phylogenetic analysis of CBP60 protein sequences of 247 diverse land plant species indicated that the immune regulator CBP60a, CBP60g, and SARD1 subfamilies diversified at divergence of Angiosperms and have been evolving very fast, suggesting strong selection pressure from pathogen effectors. We developed the Protein Evolution Analysis in Euclidean Space (PEAES) approach to investigate effects of this potential selection. We detected significant coevolutionary interactions across the immune regulator subfamilies specific to different Core Eudicot lineages, which are consistent with hypothetical coevolutionary mechanisms that protect the positive immune regulator function from being targeted by pathogen effectors. Thus, fast coevolution across the subfamilies with overlapping or opposing functions appears crucial to maintain resilience of the CBP60 immune regulator module against fast-evolving pathogen effectors.


INTRODUCTION
Pathogen effectors, typically proteinaceous, are delivered from pathogens into plant cells and manipulate plant functions to improve pathogen environments in planta, including compromising plant immune signaling [1]. The plant immune signaling network characterized in Arabidopsis has a remarkable level of resilience as different parts of the network have compensating functions that buffer the disabling impacts of mutations or pathogen effectors [2][3][4]. We are interested in how such network resilience evolved in biological systems. To gain insights into this topic, we have been studying evolution of important components of the immune signaling network. Members of the CaM-Binding Protein (CBP) 60 family are such network components [5][6][7][8].
Arabidopsis has eight CBP60 family members (AtCBP60a-g and AtSARD1) [6,8,9]. The CBP60 protein family was defined by the domain conserved among known CBP60 proteins (Pfam: "calmodulin_bind", PF07887). Despite the Pfam domain name, the actual CaM-binding domains are outside the conserved domain (Fig 1) [5,6,9]. Thus, we call the conserved domain the CBP60-conserved domain. AtCBP60g and AtSARD1 are major positive regulators of immunity controlling activation of many immune responses, including synthesis of the important immune hormone salicylic acid (SA) [5,6,8]. They are transcriptionally induced during immune responses. Their functions as positive immune regulators are partially redundant since an atcbp60g atsard1 double mutant has a more severe immune deficiency than either of the single mutants [6,8]. In contrast, AtCBP60a is a negative regulator of SA signaling and immunity [7].
In an atcbp60a mutant the basal level of SA is higher [7] and decrease of the post-induction level of AtCBP60g mRNA is slowed [10]. The functions of the other AtCBP60 members (AtCBP60bf) are unknown as mutants lacking any one of them did not show substantial changes in immunity or other obvious phenotypes [7]. AtCBP60g and AtSARD1 are DNA-binding proteins [8,11]. Since the DNA-binding activity resides within the CBP60-conserved domain, it is likely that all CBP60 proteins are DNA-binding proteins. Phylogenetic analysis of AtCBP60 members with a CBP60 from the moss Physcomitrella patens as the outgroup suggested that the immune regulators AtCBP60a, AtCBP60g, and AtSARD1 form one clade and AtCBP60b-f form a separate clade [6].
We applied phylogenetic analysis to the CBP60 protein family members in land plants and found that the immune-related clade neofunctionalized from the highly conserved, prototypical group at or immediately before divergence of Seed Plants. The immune regulator CBP60a, CBP60g, and SARD1 subfamilies diversified within the immune-related clade at or immediately before the emergence of Angiosperms. The immune regulator subfamilies have been evolving very fast, which suggests strong selection imposed by fast-evolving pathogen effectors. We developed an analytical platform, Protein Evolution Analysis in Euclidean Space (PEAES), based on physical-chemical characteristics of amino acids. Using PEAES, we detected significant coevolutionary interactions among the three immune regulator subfamilies in the CBP60-conserved domain in separate Core Eudicot species linages. The observed coevolutionary interactions were consistent with two coevolutionary mechanism hypotheses. First, two positive regulators evolve to be as different as possible, which makes it difficult for a single pathogen effector to target both at once. Second, the negative regulator evolves to stay similar to the positive regulators at the sites where the positive regulators cannot be very different. If a pathogen effector targets both positive and negative regulators, the negative impact of the effector on immunity due to targeting the positive regulators is tempered by its targeting of the negative regulators. Our discoveries suggest that functionally related protein subfamilies  The immune-related clade diversified at or immediately before the Gymnosperm divergence and has been evolving very fast. The phylogenetic tree of 1024 CBP60 protein sequences from 247 land plant species was inferred using RaXML [48] based on ClustalW alignment [43] and visualized using iTOL [45]. Colored dot at the leaf of the tree represents the land plant group (from Liverwort to Angiosperm) the CBP60 member originated from according to the color code for the phylogenetic clade. The inner ring with the yellow (low) to red (high) color range shows the CaM-binding score predicted by CaMELS [13]. The middle ring shows the CBP60 subfamilies according to AtCBP60 names. The outer ring shows the prototypical group (salmon) and the immune-related clade (blue). The AtCBP60 leaf positions are indicated in the outermost layer. The branch length scale of 0.1 (substitution per site) is shown at the top left.
influence the evolution of each other so that high levels of functional resilience can be achieved under strong and rapidly varying selection imposed by fast-evolving pathogen effectors.

RESULTS
The immune-related CBP60 clade neofunctionalized from the prototypical CBP60 group.
Most plant species had at least one member of the highly conserved, "prototypical" CBP60 group that includes AtCBP60b-f (salmon color arc of the outermost ring in Fig 2). The tree topology indicates that a major clade diversified from the prototypical group at or immediately before the divergence of Gymnosperms (blue arc of the outermost ring in Fig 2). This clade further diversified into the three immune regulator CBP60a, CBP60g, and SARD1 subfamilies (pink, cyan, and green arcs in the second outermost ring in Fig 2) at or immediately before the divergence of Angiosperms. Hence, we call this clade the immune-related clade. It is evident from the branch length that the immune regulator subfamilies have been evolving much faster than the prototypical group. The very fast evolution of the positive immune regulator subfamilies is consistent with the notion that they are subject to selection to evade targeting by fast-evolving pathogen effectors. A fungal effector is known to target AtCBP60g [11]. It is conceivable that multiple effectors target these major positive immune regulator subfamily members at a variety of sites.
None of the six Basal Angiosperm species had SARD1 subfamily members. This could be due to SARD1 deletion in Basal Angiosperms or to SARD1 subfamily diversification from the CBP60g subfamily after Basal Angiosperm divergence. A likelihood-ratio test of the tree topologies for these two possible cases favored the tree structure with SARD1 deletion in Basal Angiosperms (p < 0.01; Figs S2A and S2B). We identified multiple SARD1 subfamily members in recently-published genome sequences of a Basal Angiosperm water lily order [12] (Fig S2C).
This new information strongly supports our conclusion that diversification of the three immune regulator subfamilies occurred at or immediately before divergence of Angiosperms.

The common ancestor of the immune regulator subfamilies appears to have lost CaMbinding ability
AtCBP60a-f have CaM-binding domains (CaMBD) in their C-terminal regions (C-CaMBD), AtCBP60g has one in its N-terminal region (N-CaMBD), but AtSARD1 lacks CaM-binding ability [5,6,9] (Fig 1). We investigated when the C-CaMBD domain was lost during evolution of the immune regulator subfamilies using CaMELS [13], which was the only algorithm among those we tested that correctly predicted C-CaMBD and N-CaMBD among AtCBP60 members.
Generally, C-CaMBD was conserved among CBP60 proteins, except for some specific clades, including the immune regulator subfamilies (Fig 2, innermost ring). C-CaMBD in AtCBP60a and N-CaMBD in AtCBP60g appear to have been reacquired relatively recently. These observations suggest that CaM binding activity may not be essential for the CBP60 immune regulator functions, or alternatively, loss of C-CaMBD may have coincided with evolution of a CaM-binding adaptor protein that works together with the immune regulators. The latter could explain reacquisition of CaMBD in some of the immune regulators.
There were two forms of C-CaMBD loss (Fig 3). One form is deletion of the C-terminal region corresponding to C-CaMBD, such as in the SARD1 subfamily members (Fig 3A). The other form is mutations in C-CaMBD with weakly to moderately alignable sequences remaining in the the C-terminal region. [5]. A secondary structural property of amphipathic helix may be important for CaMBDs [14]. When the CaM-binding prediction scores at the C-CaMBD region were compared to the α-helix length and the hydrophobic moment (as an indication of potential amphipathic α-helix), a long α-helix appeared important while its amphipathicity did not ( Fig   3B). Although we cannot exclude a possible high false negative rate in CaMELS predictions, mutations that led to loss of CaMBD might have disrupted the CaMBDs through disruption of alpha-helices.

Protein Evolution Analysis in a Euclidean Space (PEAES)
Phylogenetic analysis of sequences using standard evolution models assumes stationary, reversible, and homogeneous evolution [15]. These conditions are clearly violated when evolution is under strong selection. We postulated that when a site is under strong selection, the Contributions of the length of alpha-helix and its hydrophobic moment to the CaMbinding prediction score according to the CaMELS algorithm [13]. amino acid at the site is selected based mainly on physical-chemical characteristics of amino acids and is relatively independent of the lineage ancestry when the lineage divergence is sufficiently old. Thus, we decided to use a metric of physical-chemical characteristics of amino acids to evaluate relatedness of selection at each site. Venkatarajan and Braun described physical-chemical characteristics of 20 amino acids by five-dimensional Euclidean coordinates after applying linear dimensionality reduction to 237 physical-chemical properties [16]. We added one more dimension to include absence of an amino acid -a gap in an alignment (Table   S3). This physical-chemical description of amino acids in a six-dimensional Euclidean space allows geometrical tracking of evolution at each site in a protein sequence alignment. For example, the pairwise Euclidean distance between amino acids can be defined as the amino acid dissimilarity measure (Table S4). We call this geometric analysis platform Protein Evolution Analysis in a Euclidean Space (PEAES).
In the subsequent study, we focused on CBP60 members from Core Eudicots because genome sequences (vs. transcriptome sequences) of many diverse species are available among Core Eudicots. To reduce the influence of conservation due to recently shared ancestry, a single species from each of 12 taxonomic orders of Core Eudicots was chosen for the analysis (Fig S3).
To simplify the analysis, when one species has more than one member per subfamily, only one member was selected. This subfamily member selection should not introduce strong bias: in all such cases, the members in a single subfamily were paralogs diversified after divergence of the corresponding taxonomic orders. We also conducted the subsequent analyses with an alternative set of selected CBP60 members and obtained similar observations (indicated as "the alternative set" for the corresponding supplemental figures), confirming the absence of strong bias due to CBP60 member selection.
We focused on the CBP60-conserved domain because it is impossible to obtain a reliable multiple sequence alignment across the immune regulator subfamilies outside the conserved domain ( Fig S4). The CBP60-conserved domain sequence that is highly conserved in the prototypical group contains 293 aa. To align the domain sequences from the immune regulator subfamilies together, three gaps were added, resulting in the multiple sequence alignment consisting of 296 sites (Fig S5). The site position within this 296-site alignment is used subsequently.
With PEAES, the measure of diversity at each site in the aligned sequences was defined as the mean of all pairwise distances among amino acids at that site. The CBP60bcd subfamily represents the prototypical group. It is evident that while many sites are highly conserved in the CBP60bcd subfamily, many sites in each immune regulator subfamily, particularly the CBP60g subfamily, are highly diverse. The means of the diversity measure across the sites are 0.075, 0.178, 0.291, and 0.231 for the CBP60bcd, CBP60a, CBP60g, and SARD1 subfamilies, respectively (dashed horizontal lines in Fig 4). The numbers of highly diverse sites (site diversity measure > 0.4417) are 5, 33, 81, and 58, while the numbers of subfamily-specific highly conserved sites (site diversity measure < 0.0764) are 134, 60, 13, and 33 for the CBP60bcd, CBP60a, CBP60g, and SARD1 subfamilies, respectively. These observations further demonstrate that the immune regulator subfamilies have been evolving very fast compared to the prototypical group. Fig 4. The CBP60-conserved domains of the immune regulator subfamilies are highly diverse among 12 diverse Core Eudicot species. The mean pairwise PEAES distance for each site across the species is shown as a line plot in each panel. One CBP60 member per subfamily was selected for each species. Highly conserved sites (pairwise distance mean ≤ 0.0764) are shown as segments at the bottom in each plot. The segments in black are the sites highly conserved among all CBP60 members of the Core Eudicot species. The subfamily-specific sites are shown in orange, salmon, cyan, and green for the CBP60bcd, CBP60a, CBP60g, and SARD1 subfamilies, respectively. Short segments represent subfamily-conserved gaps in the alignment. Highly diverse sites (pairwise distance mean > 0.4417) are shown at the top in each plot as segments colored according to the subfamily-color code.
Highly non-random, lineage-specific coevolutionary interactions across the immune regulator subfamilies are prevalent.
As they share a common ancestor as recently as immediately before divergence of Angiosperms and they have overlapping or opposing functions in immune signaling, the immune regulators CBP60a, CBP60g, and SARD1 likely form a regulatory module in the immune signaling network. We investigated the possibility that coevolution of the regulatory module components confers resilience against negative impacts of pathogen effectors targeting the module components. There may be pressure for the partially functionally-redundant, positive immune regulators, CBP60g and SARD1, to be as dissimilar as possible at critical sites, which would reduce the probability that they are both targeted by a single pathogen effector (Hypothesis 1). On the other hand, if both the negative regulator CBP60a and one of the positive regulators are targeted by a single pathogen effector, simultaneous impairment of the positive and negative regulators would moderate the negative impact of the effector on immunity [7]. If this is the case, the negative immune regulator CBP60a may be under pressure to remain similar to one or both of the positive regulators at critical sites (Hypothesis 2). Such selection imposed by pathogen effectors could be strong. In addition, the direction of selection and the sites under selection could change over short time periods within each plant species lineage. This is because pathogen effectors evolve very fast and multiple effectors from multiple pathogens may target the regulatory module of a single plant species. To detect coevolutionary interactions under rapidly changing selection, the analysis used must be specific to the site and the species lineage.
We defined the pairwise distance rank (PDR) of a single site as the rank of the pairwise PEAES distance between the members of two subfamilies of a particular species lineage among all permutations of the species lineages for the member of one of the two subfamilies (Fig 5). To allow a species lineage-specific analysis, two hypotheses about the coevolutionary interactions among the three immune regulator subfamilies were derived from Hypotheses 1 and 2 described Fig 5. Pairwise distance ranks (PDRs) based on the PEAES distance metric. First, for each of the 296 amino acid sites in the multiple sequence alignment of the CBP60-conserved domain, a pairwise PEAES-distance matrix was made between members of two subfamilies (CBP60g and SARD1 in the figure as an example) in the 12 Core Eudicot species. Second, in the pairwise PEAES-distance matrix at a particular site (Site #x), for each species (Carrot in the figure as an example) the pairwise distance between the members of the two subfamilies of the species (red square in the matrix) was compared to the pairwise distance between the member of one subfamily of this species (Carrot) and the member of the other subfamily of a different species (pink squares). The rank of the within-species comparison was determined among these 23 pairwise distances. This rank (21.0 for Carrot) is called the pairwise distance rank (PDR) at each site in each species. above.
If a site is not dissimilar between CBP60g and SARD1, CBP60a should have an amino acid similar to the amino acid in CBP60g or SARD1 to confer a protection effect (Hypothesis 3). On the other hand, at a site where CBP60a cannot provide protection through similarity to CBP60g or SARD1 (e.g., because it would impair CBP60a function), CBP60g and SARD1 should be dissimilar to avoid both proteins being targeted by a single effector (Hypothesis 4). To test these hypotheses, the PDR between CBP60g (or SARD1) and CBP60a was plotted against the PDR between CBP60g and SARD1 for all sites of a single species. The plot was divided into four quadrants at the median rank of 12 on both axes ( Fig 6A). Hypotheses 3 and 4 predict that the 1 st and 3 rd quadrants will be significantly enriched with sites. Enrichment in the 3 rd quadrant is consistent with Hypothesis 3, and enrichment in the 1 st quadrant is consistent with Hypothesis 4.
The significance of the site enrichment in the 1 st and 3 rd quadrants over the 2 nd and 4 th quadrants was tested by Fisher's exact test (2-sided).

DISCUSSION
One type of protein family evolution analysis focuses on emergence of conserved domain structures [17]. However, emergence of particular conserved domains may not provide information about evolution of particular biological processes and functions underlying them since some domains, such as DNA-binding domains, have versatile molecular functions that can be easily repurposed for different biological processes (e.g., [18]). Since the CBP60-conserved domains of AtCBP60g and AtSARD1 have sequence-specific DNA-binding activity [8,11],  [21], AtEDS1, AtPAD4, and AtSAG101 [22], and AtNPR1, AtNPR3, and AtNPR4 [23,24]. It will be interesting to apply the PEAES-PDR approach to these protein subfamilies to test whether their evolutionary patterns are also consistent with these hypotheses about coevolution mechanisms that lead to increased resilience against pathogen effectors of the regulatory modules.
The results of the PEAES-PDR analysis can also generate specific, mechanistic hypotheses.
First, Qin et al. reported that AtCBP60g is targeted by the fungal effector VdSCP41 [11].
Although AtSARD1 can coimmunoprecipitate VdSCP41 after overexpression in Arabidopsis protoplasts, this AtSARD1-VdSCP41 interaction appears to be much weaker than the AtCBP60g-VdSCP41 interaction ( Figure 3A, Figure 4C, and Figure 3-figure supplement 2A in [11]). These observations may suggest that AtSARD1 mostly evades being targeted by VdSCP41. VdSCP41 binds a C-terminal region of AtCBP60g (corresponding to the region Cterminal to site 130 of the CBP60-conserved domain) [11]. Our PEAES-PDR analysis detected multiple AtCBP60g sites C-terminal to site 130 potentially protected by dissimilarity between AtCBP60g and AtSARD1 (blue and magenta color bars on the line for Arabidopsis, CBP60g in Fig 7). If VdSCP41 or a very similar effector is a major driver for the dissimilarity between AtCBP60g and AtSARD1, substituting the amino acids at these sites in AtCBP60g to those at the sites in AtSARD1 may substantially weaken the interaction between AtCBP60g and VdSCP41.
Second, the sites indicated by yellow-colored bars in Fig 7 (i.e., sites in the 2 nd quadrant in Fig 6A) are those that are not protected either by dissimilarity between two positive regulators or by similarity between a positive and the negative regulators. In some cases, the yellow-colored sites coincide in both CBP60g-and SARD1-common comparisons in a single species. One possibility is that such sites may not have been targeted by any effectors in a recent lineage history, and hence there is no signature of protective selection. Another possibility is that they may be protected by some other molecular mechanisms, such as immunity mediated by specific resistance (R) proteins [25]. A guard-type R protein detects some biochemical change caused by a pathogen effector in the cognate host protein and triggers immune responses. If there is protection by an R protein, keeping the amino acids at the yellow-colored sites similar between CBP60 and SARD1 may be important for function of the R protein-based protection mechanism.
Identification of proteins that interact with CBP60g and SARD1 whose interactions are sensitive to substitutions of these amino acids at the yellow-colored sites to those at the same sites in other species may lead to discovery of other protection mechanisms.
We demonstrated that coevolutionary interactions across the immune regulator CBP60 subfamilies of overlapping or opposing functions are consistent with the mechanistic hypotheses for increased resilience of this particular regulatory module in the immune signaling network. It is conceivable that this concept can be more generalized among immune signaling network components. Some pathogen effectors target multiple components of the immune signaling network [1], suggesting existence of many other multiple-targeting effectors. At the same time, immune signaling network components are highly interconnected [26], and it is common for them to have overlapping or opposing effects on immunity. These two conditions, multiply targeting pathogen effectors and potential functional interactions among the effector targets, are exactly the conditions that appear to have been driving coevolution for increased resilience of the CBP60 immune regulator module. We developed a PEAES-PDR approach, which was powerful in detecting this type of coevolutionary interaction among alignable sequences and in generating specific and mechanistic hypotheses. Expanding coevolutionary interaction investigations to unalignable sequences of multiply targeted proteins will further advance our understanding of how the immune signaling network has been evolving to maintain or improve resilience in a strongly-selecting and rapidly-changing pathogenic landscape. This will be an exciting future direction in the study of biological network evolution.

The CBP60 protein sequence set
A protein sequence database consisting of sequences from 271 species was constructed. The sequence data sources [27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42] are listed in Table S1. The database was searched by BLASTP using each of the Arabidopsis CBP60 members as a query and the union of the subject sequences that yielded bit scores higher than the Arabidopsis CBP60 that yielded the lowest bit score was identified as the CBP60 protein sequence set, consisting of 1432 sequences. Thus, the diversity of the CBP60 member set was controlled by the level of diversity among AtCBP60 members.
We call this member identification procedure a Furthest Member Selection (FMS) approach.
This FMS-based CBP60 protein sequence set did not include some distantly related CBP60 homologs (Text S1). We left them out of the set because they do not allow functional interpretations of CBP60 evolution since our knowledge of CBP60 functions is limited to the Arabidopsis members and their orthologs. The CBP60 protein sequence set was subjected to quality controls to remove possible products of assembly or annotation errors and to limit sequence representation to one species per genus for balanced overall representation. These quality controls resulted in the final set consisting of 1024 sequences from 247 species (Table   S2). To ease identification of plant species, their common names or genus names are used.

Phylogenetic analysis of CBP60 protein sequences
Generally, ClustalW [43], a Maximum Likelihood (ML) method [44], and iTOL [45] were used for multiple sequence alignment, tree inference, and tree visualization, respectively. ML was also used for inference of most probable ancestor sequences at branch points in trees.

MEGA7
[46] was used for ClustalW and ML with the default parameters, except for 80% site coverage in ML (unless stated otherwise). We used the Most Recent Common Ancestor (MRCA)-Anchored, Clade-by-Clade Tree Inference (MACCTI) approach, in which the aligned sequence region used for tree inference is locally adjusted. Briefly, an overview tree for relationships among all major clades and MRCAs on the main CBP60 lineages was inferred using only the sequences with alignable C-terminal regions, the sequences of each clade and its flanking MRCAs on the main CBP60 lineages were used to infer the subtree for the clade, the MRCAs of the clades were used to infer the backbone tree showing the relationships among the clade MRCAs, and the clade MRCAs in the backbone tree were replaced with the corresponding clade trees to reconstruct the entire tree. Since species phylogeny, duplication, and deletion (SPDD rule) generally explained our tree of 1024 CBP60 sequences at high levels, we concluded that the tree was sufficiently accurate to support the analysis presented here.

CaM-binding site prediction
CaMELS [13] was used for CaM-binding site prediction. The DECIPHER R Bioconductor package [47] was used to predict alpha-helical propensity. A Python script, hydrophobic_moment.py (https://gist.github.com/JoaoRodrigues/568c845915aea3efa3578babfd72423c) was used for calculation of the hydrophobic moment.

PEAES-PDR analysis
PEAES is a way to describe the physical-chemical characteristics of 20 amino acids and a gap in a 6-dimensional Euclidean space. A gap dimension was added to the 5-dimensional descriptions of physical-chemical characteristics of 20 proteinogenic amino acids by [16] (Table   S3). Thus, the dissimilarity between two amino acids (or between an amino acid and a gap) is defined by the Euclidean distance between them in the 6-dimensional space (Table S4). Twelve Eudicot species with high quality genome sequences were selected to represent 12 diverse taxonomic orders (Fig S3). One sequence for each of the CBP60a, CBP60bcd, CBP60g, and SARD1 subfamilies was selected for each species (48 sequences). Since some species had more than one paralog, two sets of 48 sequences were made by selecting different paralogs from these species. Each of these two sets was analyzed separately. The CBP60-conserved domains of 48 sequences were aligned by ClustalW [43], and then the multiple sequence alignment was manually edited to remove insertional polymorphisms represented by only one or two sequences and to make the alignments consistent between the two sets. The alignments for the two sets are provided as supplemental files. To quantify the level of diversity at each site, the mean pairwise PEAES distance was calculated for each site for each subfamily (Fig 4). PDR was calculated at each site of the CBP60-conserved domain multiple sequence alignment between two CBP60 subfamilies for each of the 12 Eudicot species based on the PEAES metric according to the procedure depicted in Fig 5. If there are no species lineage-specific coevolutionary interactions between two CBP60 subfamilies, the mean expectation for PDR is the median rank of 12 among 23 pairwise PEAES distances. Thus, when PDR values for the corresponding sites in two subfamily pairs are plotted against each other, the random expectation is that the sites would distribute equally among the four quadrants of the plot divided at the median ranks of 12 ( Fig   6A). Hypotheses 3 and 4 predicted enrichment of the sites in quadrants 1 and 3. The p-value associated with the odds ratio for this enrichment was calculated by Fisher's exact test for each species for particular two-subfamily pairs. The p-value was corrected for multiple tests by Benjamini-Hochberg's False Discovery Rate across the species in each combination of twosubfamily pairs. Detailed methods and discoveries made based on the phylogenetic relationships but not

SUPPLEMENTAL INFORMATION
Text S1. Examples of CBP60 homologs that were not included in this study Text S2. Detailed methods and the features of the CBP60 protein sequence phylogeny that are not discussed in the main text.  The three immune regulator CBP60 subfamilies diversified at or immediately before the emergence of Angiosperms. Two tree topology models, diversification of the CBP60g and SARD1 subfamilies at or before divergence of Basal Angiosperms (A) or after (B), were tested with CBP60 members from Basal Angiosperms, Magnoliids, and Chloranthales. Tree structure (A) was favored by SH-test (p < 0.01) [48]. Liverwort 0018s0017 was included as an outgroup.
The CBP60 sequence set used in (A) and (B) were supplemented with CBP60 sequences from    Table S1. The list of 271 land plant species used in the study, including the nomenclature used in the text ("Database name"). Table S2. The list of 1024 QC-ed CBP60 sequences ("Final set 1024" tab) and the list of 409 QC-omitted sequences ("QC fail" tab). "CBP60.final.tree.nwk": The Newick format tree file for the CBP60 sequence phylogenetic tree in Figs 2 and S1. "CBP60.4subfam.12species.fas": The FASTA multiple sequence alignment file for Fig S5.