Response to Tcherkez and Farquhar: Rubisco adaptation is more limited by phylogenetic constraint than by catalytic trade-off

Rubisco is the primary entry point for carbon into the biosphere. It has been widely proposed that rubisco is highly constrained by catalytic trade-offs due to correlations between the enzyme’s kinetic traits across species. In previous work, we have shown that these correlations, and thus the strength of catalytic trade-offs, have been over-estimated due to the presence of phylogenetic signal in the kinetic trait data (Bouvier et al., 2021). We demonstrated that only canonical trade-offs between the Michaelis constant for CO2 and carboxylase turnover, and between the Michaelis constants for CO2 and O2 were robust to phylogenetic effects. We further demonstrated that phylogenetic constraints have limited rubisco adaptation to a greater extent than the combined action of catalytic trade-offs. Recently, however, our claims have been contested by Tcherkez and Farquhar (2021), who have argued that the phylogenetic signal we detect in rubisco kinetic traits is an artefact of species sampling, the use of rbcL-based trees for phylogenetic inference, laboratory-to-laboratory variability in kinetic measurements, and homoplasy of the C4 trait. In the present article, we respond to these criticisms on a point-by-point basis and conclusively show that all are either incorrect or invalid. As such, we stand by our original conclusions. Specifically, the magnitude of rubisco catalytic trade-offs have been overestimated in previous analyses due to phylogenetic biases, and rubisco kinetic evolution has in fact been more limited by phylogenetic constraint.

phylogenetic constraints have limited rubisco adaptation to a greater extent than the combined 23 action of catalytic trade-offs. Recently, however, our claims have been contested by Tcherkez and 24 Farquhar (2021), who have argued that the phylogenetic signal we detect in rubisco kinetic traits is 25 an artefact of species sampling, the use of rbcL-based trees for phylogenetic inference, laboratory-26 to-laboratory variability in kinetic measurements, and homoplasy of the C 4 trait. In the present 27 article, we respond to these criticisms on a point-by-point basis and conclusively show that all are 28 either incorrect or invalid. As such, we stand by our original conclusions. Specifically, the 29 magnitude of rubisco catalytic trade-offs have been overestimated in previous analyses due to 30 phylogenetic biases, and rubisco kinetic evolution has in fact been more limited by phylogenetic 31 constraint. 32 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. Introduction the molecular basis of trait variation. In enzymes, for example, single point mutations can have 48 dramatic consequences for catalytic function (Cleton-Jansen et al., 1991;Johnson et al., 2001;49 Villar et al., 1997) whilst divergent sequences can maintain similar biochemical properties 50 (Espadaler et al., 2008). As such, studying phylogenetic signal can be instructive to interpret 51 patterns of biological diversity and their underpinning evolutionary processes, as well as to help 52 map the sequence-function landscape to better understand genotype-phenotype interactions. 53 Aside from the fundamental interest in examining the phylogenetic basis of trait data, quantification 54 of phylogenetic signal is also of importance in comparative biology. This is because the shared 55 ancestry of all taxa as determined by the hierarchical tree of life causes a "serious statistical 56 problem" (Felsenstein, 1985) in cross-species analyses. Specifically, this problem arises due to the 57 fact that biological datapoints are not independent observations, as is a core assumption of 58 conventional statistical methods (table 1). Instead, the phylogenetic non-independence between 59 taxa means that the trait variation captured by a given dataset is an artefact of species sampling, 60 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 5 The necessity to consider phylogenetic context when analyzing interspecies data has been widely 89 adopted in many fields of biology, including ecology (e.g., Wu et al., 2021), animal behavior (e.g., 90 Balasubramaniam et al., 2012), anthropology (e.g., Lukas, Towner and Borgerhoff Mulder, 2021) 91 and conservation science (e.g., Fritz and Purvis, 2010). In contrast, in other sub-disciplines of 92 biology, especially those which study phenomena at the cell or molecular level, phylogenetic 93 statistical practices are less commonplace. Thus, it is likely that many studies which have 94 compared quantitative data in cellular or molecular datasets, particularly those which do so 95 between species, have been affected by the presence of phylogenetic signal. 96 In a recent study, we performed a cross-species analysis of the kinetic traits of the enzyme rubisco 97 (ribulose-1,5-bisphosphate [RuBP] carboxylase/oxygenase). Specifically, we set out to examine 98 the constraints which have limited the enzyme's adaptation whilst correctly accounting for the 99 phylogenetic signal arising from the non-independence of the kinetic measurements (Bouvier et al., 100 2021). Rubisco presents an interesting subject as the basis for such investigation, as despite being 101 the principal carbon-fixing enzyme in the biosphere (Field et al., 1998;Tabita et al., 2008), it is 102 widely considered to be poorly optimised because it exhibits a modest rate of CO 2 turnover 103 (Badger et al., 1998;Bar-Even et al., 2011) and catalyses a mostly counterproductive reaction with 104 O 2 (Bowes et al., 1971;Chollet, 1977;Sharkey, 2020). The initial hypothesis which was put 105 forward to explain this paradox of why this enzyme of paramount biological importance appears 106 poorly adapted for its role in primary CO 2 fixation was pioneered by two comparative studies which 107 reported severe antagonistic correlations between rubisco kinetic traits across species and 108 proposed that these were caused by chemical constraints on the catalytic mechanism of the 109 enzyme (Savir et al., 2010;Tcherkez et al., 2006). However, these studies, as well as all other 110 subsequent analyses which have investigated rubisco kinetic trait correlations (Flamholz et al., 111 2019;Iñiguez et al., 2020;Young et al., 2016), failed to consider the phylogenetic context of the 112 species being analysed and the fact that all existing rubisco are related by evolution from a 113 common ancestor. As such, it is possible that these correlations suffered from Felsenstein's 114 "serious statistical problem" (Felsenstein, 1985) meaning that they are potentially an artefact of the 115 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 6 presence of phylogenetic signal in rubisco kinetics and the non-independence of species on the 116 phylogenetic tree. 117 In Bouvier et al., (2021), we found that all of rubisco's kinetic traits exhibit strong phylogenetic 118 signal (table 2). This signal was observed when analysing the kinetic data across the tree of life, 119 including among C 3 angiosperms, among all angiosperms, and among all photosynthetic 120 organisms for which kinetic data were available, respectively (table 2). Given this observed non-121 independence of the kinetic data, we re-evaluated the kinetic trait correlations between species 122 using phylogenetic least squares regression and found that all were attenuated compared to 123 previously published values (Figure 2A -2C) (Bouvier et al., 2021). Thus, phylogenetic non-124 independence had caused catalytic trade-offs to be over-estimated in previous analyses. Despite 125 this, we showed that there was nevertheless still moderate antagonism between the Michaelis 126 constant for CO 2 (K C ) and carboxylase turnover (k catC ) (variance explained = 21-37%), and 127 between the Michaelis constants for CO2 and O2 (K O ) (variance explained 9-19%), though, all other 128 catalytic trade-offs were negligible or non-significant ( Figure 2B and 2C) (Bouvier et al., 2021). 129 Following this, we demonstrated that phylogenetic constraints explained more variation in rubisco 130 kinetics (variance explained 30-61%), and have thus had a larger impact on limiting enzyme 131 adaptation, compared to the combined action of all catalytic trade-offs (variance explained 6-9%) 132 (Bouvier et al., 2021). In summary, therefore, although rubisco catalytic trade-offs exist, the 133 strength of these trade-offs are weaker than previously thought and represent a minor component 134 in limiting rubisco adaptation compared to phylogenetic constraints. 135 Although the above work sheds light on our understanding of the constraints that have shaped 136 rubisco adaptation, the validity of the results presented in Bouvier et al., (2021) have been brought 137 into question in a recent opinion piece (Tcherkez and Farquhar, 2021). In the present study, we 138 address and refute the criticisms made by Tcherkez and Farquhar. In doing so, we confirm that the 139 data we previously presented is valid and robust and thus our original conclusions are unaltered. 140 Specifically, although catalytic trade-offs are present in rubisco as determined by chemistry, the 141 extent of these constraints on enzyme adaptation is overestimated unless phylogenetically 142 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 7 appropriate methods are used. Instead, phylogenetic constraints have provided a more severe 143 limitation on rubisco optimisation. 144

145
Kinetic and phylogenetic data 146 As the basis of the analysis in the present study, the same rubisco kinetic dataset was used as 147 described in Bouvier et al., (2021 Moreover, consistent with the analysis in Bouvier et al., (2021), the same phylogenetic trees of 155 studied species were used (unless explicitly stated) as previously generated from the coding 156 sequences of the rubisco large subunit (rbcL) gene. 157 To assess whether rubisco in different angiosperms display similar kinetics as a consequence of 158 their phylogenetic relationship, the presence and magnitude of phylogenetic signal in rubisco 159 kinetic traits was assessed using the statistical tools previously applied in Bouvier et al., (2021), 160 including Pagel's lambda (Pagel, 1999), Blomberg's K (Blomberg et al., 2003), Blomberg's K* 161 (Blomberg et al., 2003, Moran's I (Gittleman and Kot, 1990) and Abouheif's Cmean (Abouheif, 162 1999). For an overview of the inherent differences between these methods utilized see (Bouvier et 163 al., 2021), or for more extensive discussion see (Münkemüller et al., 2012). However, in brief, 164 signal strength for each metric typically varies between 0 (absence of phylogenetic signal) and 1 165 (strong phylogenetic signal), and we assess the presence or absence of a phylogenetic signal in 166 each trait as the majority result across these methods (i.e., the consensus result in ≥ 3 out of 5 167 methods tested). 168 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 8 A phylogenetic signal cannot be generated with a randomly simulated arbitrary trait 169 The first criticism of our work made by Tcherkez and Farquhar (2021) states that a phylogenetic 170 signal may be generated using an arbitrary trait that is randomly distributed across species. 171 However, the presence of phylogenetic signal in Bouvier et al., (2021) was assessed using a 172 combination of robust statistical methods based on both non-parametric permutation tests 173 (Blomberg's K, Blomberg's K*, Moran's I and Abouheif's Cmean) as well as likelihood ratio tests (in 174 the utilized implementation of Pagel's lambda). In brief, permutation tests evaluate the probability 175 that the null hypothesis of absence of signal is true by comparing the estimated test statistic for a 176 given dataset to the distribution of values of this statistic obtained when randomly re-shuffling the 177 data among species on the tree across a set number of replicates (n = 10,000 in Bouvier et al., 178 (2021)). Conversely, likelihood ratio tests evaluate the probability that the null hypothesis is true by 179 comparing the ratio of the log likelihood of the null hypothesis to that of the alternate hypothesis 180 using the formula -2[log likelihood (null hypothesis) -log likelihood (alternative hypothesis)]. In 181 both cases, if the computed p-value satisfies the given threshold of significance, the null 182 hypothesis is rejected, and phylogenetic signal is deemed to be statistically significant. Specifically, 183 in Bouvier et al., (2021), the majority of p-values reported for phylogenetic signal in rubisco kinetic 184 traits were below p = 0.001 across the suite of detection methods used (table 2). This means that 185 in these cases, there is less than a 0.1% chance of observing a similar or stronger phylogenetic 186 signal by chance. Thus, to put in plainly, it is not true that an arbitrary trait can spuriously generate 187 a phylogenetic signal in a reliable manner. In fact, by definition, an arbitrary trait would produce a 188 similar signal in fewer than 1 in 1,000 simulations. Nevertheless, although these statistical methods 189 are powerful and perform well under a range of scenarios, we reaffirm below using two 190 independent analyses that an arbitrary trait cannot reliably produce a phylogenetic signal. Thus, 191 the above statement on this matter made by Tcherkez and Farquhar (2021) is false. 192 First, to confirm the validity of our implemented phylogenetic tests and demonstrate that arbitrary 193 traits cannot produce a statistically significant phylogenetic signal, we subject 100,000 replicates of 194 a randomized assignment of kinetic trait data across species on the rubisco tree to phylogenetic 195 signal analysis (i.e., recapitulating the methods of the permutation-based signal tests manually). 196 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 9 Here, when using a cut-off threshold of significance at p ≤ 0.05, interrogation of these randomized 197 data revealed a significant signal could only be detected in 5% of the re-shuffled kinetic trait data 198 when considering the majority of signal detection methods (i.e., exactly matching the proportion 199 that would be predicted to occur by chance), except Pagel's lambda which exhibited a considerably 200 lower significance rate (Supplemental File 1, table S1). This result was similarly observed in the 201 analysis across all angiosperms, as well as across the subset of C 3 angiosperms (Supplemental 202 File 1, table S1). In total, this resulted in a type I (false positive) error rate in the randomized data at 203 between 1.1 and 1.7% when taking the consensus result across at least 3 out of 5 phylogenetic 204 signal detection methods (Supplemental File 1, table S1). 205 To further demonstrate that any arbitrary trait cannot generate a phylogenetic signal, 100,000 206 datasets were also randomly simulated de novo and subject to phylogenetic signal interrogation as 207 above. To achieve this, we used the rubisco phylogenetic tree in Bouvier et al., (2021) and 208 replaced the kinetic data for each species with randomly generated data. For this purpose, the 209 mean and standard deviation of the real kinetic traits were used to guide the simulated data such 210 that 100,000 arbitrary variables were produced with the same distributional properties as each trait. 211 Analogous to above, when using a cut-off threshold of significance at p ≤ 0.05, we observe a 212 phylogenetic signal in 5% of all arbitrary traits in both the analysis across all angiosperms, as well 213 as across the subset of C 3 angiosperms for the majority of detection methods (i.e., again 214 corresponding to that which would be expected by chance) (Supplemental File 1, table S2). 215 Accordingly, as above, the type I false positive rate was observed at between 1.4 and 1.6% when 216 considering the majority result in at least 3 phylogenetic signal detection methods (Supplemental 217 Thus, in summary, although it is possible to generate an artefactual phylogenetic signal using a 219 randomly simulated trait, this scenario is exceptionally unlikely. Moreover, the strength of the 220 phylogenetic signal observed among all simulated arbitrary variables is low (table 2; Supplemental  221 File 1, table S1 and S2 and S3) and considerably weaker than that reported in rubisco kinetic traits 222 (Bouvier et al., 2021). For example, when considering Pagel's lambda, the metric used to correct 223 for phylogenetic effects in our downstream phylogenetic least squares regression analysis (Bouvier 224 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint et al., 2021), on average fewer than 1 in 1,000 of the re-shuffled data or the randomly simulated 225 data produced a signal as strong or stronger then was observed for the real rubisco kinetic data 226 (excluding K O which had no phylogenetic signal in the analysis across all angiosperms) ( Figure (2021), the probability that randomly distributed data would produce a significant 231 phylogenetic signal is 1%, and the probability that this randomly distributed data would produce a 232 signal comparable or stronger to that observed for real rubisco kinetics is less than 0.1%. This 233 raises questions about the method used by Tcherkez and Farquhar (2021) to compute their 234 significant phylogenetic signal in their randomly simulated trait given that no formal description or 235 raw data was provided. 236 237 to Brownian motion 238 In their Opinion article, Tcherkez and Farquhar (2021) further claimed that the phylogenetic signal 239 in rubisco kinetics is artefactual because they can also observe the presence of a signal in a 240 simulated trait that has been modelled to evolve on the rubisco tree by Brownian motion. However, 241 this assertion demonstrates a misunderstanding about basic principles of phylogenetic data 242 interrogation. To explain, in the context of comparative biology, Brownian motion is a widely used 243 model of evolution in which a trait is simulated to evolve stochastically in both direction and 244 magnitude through time. Specifically, in this model, traits evolve by accruing incremental changes 245 along the branches of a phylogenetic tree (i.e., from the last common ancestor at the root to all 246 extant species at the terminal tips) by sampling at each node from a random distribution of possible 247 changes with zero mean and finite constant variance. In this way, Brownian motion is assumed to 248 represent a tractable model to approximate how trait evolution might occur in the real world under 249 a wide range of scenarios. This is because simulated variables acquire several inherent properties 250 which parallel those typically observed during the evolution of many real biological traits. 1) Trait 251 variation among extant species conforms to a normal distribution with a mean equal to the 252 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

A phylogenetic signal is a universal feature of arbitrary traits distributed according
The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint ancestral trait value (i.e., trait evolution is non-directional). 2) Trait variation among extant species 253 increases as a function of evolutionary time encapsulated by the tree (i.e., trait evolution is 254 additive). 3) Trait variation among extant species co-varies as a function of shared ancestry (i.e., 255 traits are strongly phylogenetically structured). Given this latter property, it follows that phylogenetic 256 signal will be a universal feature of all traits simulated on a tree using Brownian motion, 257 irrespective of the underlying phylogenetic tree on which they are simulated. To illustrate this point, 258 we have assessed the presence of phylogenetic signal in 100 arbitrary BM traits simulated to have 259 evolved on each of 100,000 simulated trees and demonstrate that a strong significant phylogenetic 260 signal is observed in ~100% of cases (Supplemental File 1, table S4). This result is recapitulated if 261 the number of terminal tips on the simulated trees are set to equal either the full number of 262 angiosperms in the rubisco dataset, or the reduced number which perform C 3 photosynthesis 263 (Supplemental File 1, table S4). In summary, therefore, the presence of phylogenetic signal in traits 264 simulated by Brownian motion on a tree is ubiquitous and expected. Thus, the fact that Tcherkez 265 and Farquhar (2021) found strong phylogenetic signal in an arbitrary trait that was guaranteed to 266 have strong phylogenetic signal due to the manner in which it was simulated has no direct 267 relevance to the analysis of phylogenetic signal in rubisco kinetic traits. It does not contradict the 268 finding that rubisco kinetic traits have phylogenetic signal, nor does it negate the requirement to 269 account for this phylogenetic signal in downstream statistical analysis of the kinetic trait data. The 270 "serious statistical problem" in rubisco kinetics that exists is not caused by random effects in the 271 data and needs to be correctly considered when computing trait correlations on a phylogenetic 272 tree. 273 274 Another criticism made by Tcherkez and Farquhar (2021) asserts that the presence of 275 phylogenetic signal in rubisco kinetic traits is caused by biases in species sampling. Specifically, 276 they argue that because groups of closely related species are present in the rubisco dataset, this 277 has overestimated the phylogenetic signal in rubisco kinetic traits due to the effect of short branch 278 lengths separating kinetically similar rubisco. However, as above, this argument demonstrates a 279 fundamental confusion about the basic principles of phylogenetic data interrogation. For instance, 280 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Phylogenetic signal is not an artefact of biases in species sampling
The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint phylogenetic signal detection methods work by computing the pairwise co-variation between trait 281 similarity and phylogenetic distance across species. If nearly interchangeable rubisco kinetics are 282 present in a dataset between closely related sister species, this would be accounted for in the 283 phylogenetic context of their recent ancestry and short branch lengths, not overestimated by this 284 effect (Wiens et al., 2008). In fact, the central premise of phylogenetic comparative methods is that 285 they are robust to species sampling, whereas non-phylogenetic methods are not. To illustrate this 286 point, we have removed the species hypothesized to be problematic by Tcherkez and Farquhar 287 (2021) from the analysis of all angiosperms and C 3 angiosperms (including plants in the Oryza 288 clade, as well as those belonging to both the Aegilops and closely related Triticum clades which 289 together share a convoluted evolutionary history (Petersen et al., 2006)) and have shown that an 290 analogous phylogenetic signal is observed in this analysis (Supplemental File 1, table S5) as 291 compared to in the original analysis when these species were present (table 2). This conclusion is 292 true whether all angiosperms, or whether only C 3 angiosperms are considered (Supplemental File  293   1, table S5; table 2). In summary, therefore, biases in species sampling including short branches 294 separating nearly identical sister species, and overrepresentation of certain groups, are not 295 responsible for the phylogenetic signal in rubisco kinetic traits. Indeed, it has in fact been shown 296 that short branches have the inverse effect of causing phylogenetic signal to be underestimated 297 due to complications arising with difficulties in resolving the correct phylogenetic history of species 298 (Wiens et al., 2008). 299 300 A further issue raised by Tcherkez and Farquhar (2021) is associated with the use of the rbcL gene 301 to reconstruct the phylogenetic tree of species in our analysis. Specifically, they contend that tree 302 inference based on rbcL sequence similarity across species may have caused inflation of 303 phylogenetic signal due to overfitting. This is because mutations in rbcL which contribute to kinetic 304 variation among species would also affect tree topology. However, we specifically addressed this 305 possibility in our original study (Bouvier et al., 2021) and have previously shown this to not be an 306 issue as we were able to replicate our results when using a phylogeny generated from only rbcL 307 codon positions which contained synonymous substitutions across species (i.e., where columns in 308 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Phylogenetic signal in rubisco kinetics is not an artefact of using rbcL-based trees
The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 13 the alignment containing non-synonymous substitutions that could affect enzyme kinetics were 309 removed) (Bouvier et al., 2021). Given that the use of this tree did not impact our analysis 310 compared to that based on the complete rbcL tree, the phylogenetic signal in rubisco kinetics was 311 confirmed to not be attributable to overfitting (Bouvier et al., 2021). As such, the full alignment rbcL 312 tree was deemed appropriate for our analyses given that it has a long history in phylogenetic 313 inference of species relationships (APG, 2016(APG, , 1998Gielly and Taberlet, 1994) and more 314 accurately reflects the evolutionary history of rubisco compared to the partial rbcL tree inferred 315 from only synonymous substitutions. 316 Owing to limited availability of publicly sequenced nuclear (n = 26) or chloroplast (n = 8) genomes 317 for species in the kinetic dataset, it was not possible to use whole genome phylogenomic 318 approaches to infer the tree of species in our analysis. Nevertheless, to provide further 319 reassurance in the present study that the phylogenetic signal in rubisco kinetics is not a 320 consequence of the use of rbcL for phylogenetic inference, we repeated our analysis using 321 phylogenies inferred from the coding sequences of several other chloroplast encoded genes which 322 are frequently utilized in systematics for species classification (Bohs and Olmstead, 1997;323 Ferguson, 1998;Hilu and Liang, 1997;Koch et al., 2001;Savolainen et al., 2000;Wolf, 1997;Yen 324 and Olmstead, 2000) and importantly are unrelated to the kinetic traits of interest. Specifically, 325 genes employed to infer these respective alternative trees include the ATP synthase beta subunit 326 (atpB), the NADH dehydrogenase F (ndhF) subunit and maturaseK (matK) which was specifically 327 argued by Tcherkez and Farquhar (2021) to reflect the evolutionary relationships of the C 4 species 328 more accurately in our analysis compared to those inferred by rbcL. Analogous measurements of 329 phylogenetic signal to those reported in our original paper are observed when these alternative 330 gene trees are used (Supplemental File 1, table S6; table 2). In summary, therefore, the presence 331 of phylogenetic signal in rubisco kinetics is not an artefact of using rbcL-based trees to infer 332 phylogeny. 333 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 14 Phylogenetic signal is not an artefact of method-to-method variability in trait 334 measurements 335 A separate concern raised by Tcherkez and Farquhar (2021) is related to the fact that our kinetic 336 data was compiled from numerous sources in the literature. Specifically, in the analysis of 337 angiosperms, the kinetic dataset was compiled across ten independent studies, including (Galmés 338 et al., 2014;Kubien et al., 2008;Occhialini et al., 2016;Orr et al., 2016;Prins et al., 2016;Savir et 339 al., 2010;Sharwood et al., 2016;von Caemmerer et al., 1994;Whitney et al., 2011;Zhu et al., 340 1998 Although we agree that methodological-induced systematic biases are a genuine concern in 347 analysis of any metadata, we have previously taken every effort to mitigate these effects from 348 influencing our analysis. For example, the rubisco meta-dataset originally assembled by Nevertheless, to confirm the robustness of our results to inter-study biases in measurements, we 360 have repeated our analysis here using data derived from a single study in the meta-dataset so that 361 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 15 there can be no influence of method-to-method or laboratory-to-laboratory variability. For this 362 purpose, the subset of rubisco characterized in Orr et al., (2016) were analyzed, as it had sufficient 363 species sampling for a robust statistical analysis. Phylogenetic interrogation of this dataset 364 revealed a similar magnitude and significance of phylogenetic signal in rubisco (Supplemental File 365 1, table S7) compared to that which was originally reported from the meta-dataset (table 2). This 366 same outcome is observed irrespective of whether all angiosperms, or exclusively C 3 angiosperms, 367 were analysed (Supplemental File 1, table S7; table 2). Therefore, phylogenetic signal in rubisco 368 kinetics is not an artefact of method-to-method or laboratory-to-laboratory variability in trait 369

measurements. 370
In addition to the general concerns with our meta-analysis that are addressed above, a more 371 specific criticism of method-to-method biases in our dataset was associated with the Limonium 372 tribe. In particular, Tcherkez and Farquhar (2021) 1, table S9; table 2). In combination, 382 therefore, phylogenetic signal in rubisco kinetic traits is genuine and is not an artefact of the 383 sampled species nor the data being compiled from different sources. 384 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Phylogenetic signal is not an artefact of C 3 vs C 4 photosynthesis
The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint the enzyme across the tree of life. Specifically, given the widespread kinetic differences associated 390 with adaptation to intracellular CO 2 levels in plants with C 3 (higher CO 2 specificity) and C 4 391 metabolisms (higher CO 2 turnover) (Bouvier et al., 2021), Tcherkez and Farquhar (2021) claim that 392 C 3 vs. C 4 kinetic differences are responsible for the observed phylogenetic signal. 393 However, the above hypothesis put forward by Tcherkez and Farquhar (2021)  with transition to C 4 photosynthesis in discrete pockets on the phylogenetic tree will produce 408 hotspots of local phylogenetic signal, this effect will reduce the estimated global phylogenetic 409 signal which we measure. This is because the convergence and homoplasy of the C 4 trait weakens 410 the average statistical dependence between trait similarity and phylogenetic relatedness among 411 the group of sampled species as a whole (Hansen and Martins, 1996;Kamilar and Cooper, 2013 other C 3 -C 4 intermediary species reduces the signal strength relative to when only C 3 angiosperms 415 are considered. Moreover, this result is also observed in the analysis presented by Tcherkez and 416 Farquhar (2021), although this effect is misinterpreted by these authors. Specifically, using k catC as 417 an example Tcherkez and Farquhar (2021) show that C 4 angiosperms are associated with local 418 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 phylogenetic signal when all species are considered, however they detect no global phylogenetic 419 signal in this analysis (Figure 2 of Tcherkez and Farquhar, (2021)). In contrast, when C 4 and other 420 intermediary C 3 -C 4 and C 4 -like species are ignored, Tcherkez and Farquhar (2021) show the local 421 phylogenetic signal associated with these groups is lost and global phylogenetic signal accordingly 422 becomes significant at short phylogenetic distances of the remaining species (Figure 2 of Tcherkez 423 and Farquhar, (2021)). This result demonstrates that although C 3 vs. C 4 kinetic differences drive 424 local phylogenetic signal, they are not responsible (and in fact weaken) the global phylogenetic 425 signal which is of importance to the discussion of Bouvier et al., (2021). This was already 426 discussed in our original study (Bouvier et al., 2021). Thus, the contention of Tcherkez and 427 Farquhar (2021) that C 4 -mediated kinetic changes are responsible for the observed phylogenetic 428 signal across the whole tree is nonsensical, given that this signal is detected when analysing only 429 C 3 angiosperms (as originally reported (Bouvier et al., 2021)), and the strength of the signal in the 430 analysis of C 3 angiosperms is greater than when both C 3 and C 4 angiosperms are considered 431 together. 432

433
Rubisco (ribulose-1,5-bisphosphate [RuBP] carboxylase/oxygenase) is the primary entry point for 434 carbon into the biosphere (Field et al., 1998;Tabita et al., 2008) and accordingly, is the source of 435 almost all organic carbon which has ever existed. As such, gaining an appreciation of how rubisco 436 has evolved is of fundamental importance to our understanding of life on our planet. Interestingly, 437 however, rubisco has long presented an evolutionary paradox. This is because, despite the central 438 importance of rubisco in underpinning host autotroph metabolism and ultimately the global food 439 chain, the enzyme is considered by many to be an inefficient catalyst under the high CO 2 , low O 2 440 conditions of the present day. This is due to the fact that rubisco has a modest turnover rate for its 441 primary CO 2 reaction (Badger et al., 1998;Bar-Even et al., 2011) and also catalyses a costly 442 secondary reaction with O 2 that culminates in the loss of previously fixed carbon (Bowes et al., 443 1971;Chollet, 1977;Sharkey, 2020). Although an updated examination has suggested that 444 rubisco's kinetics are overall perhaps not as bad as often assumed (Bathellier et al., 2018), these 445 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 puzzling catalytic limitations of the enzyme have nevertheless caused rubisco to be an enigma and 446 the source of intensive and sustained research interest for over 40 years. 447 Until our phylogenetically resolved investigation of rubisco kinetic evolution (Bouvier et al., 2021), 448 the principal hypothesis forwarded to explain the above "rubisco paradox" proposed that severe 449 catalytic trade-offs exist between the kinetic traits of the enzyme. Specifically, this theory was first 450 pioneered by two cross-species analyses (Savir et al., 2010;Tcherkez et al., 2006) which reported 451 strong antagonistic correlations between rubisco kinetic traits and posited that these correlations 452 were caused by unavoidable chemical constraints on its catalytic mechanism (despite a limited 453 understanding of this chemical mechanism). In this way, these kinetic trait interdependencies were 454 understood to provide a ceiling on rubisco optimisation by limiting the adaptative capacity of its 455 individual kinetic traits (Savir et al., 2010;Tcherkez et al., 2006). However, both these analyses 456 which first described the rubisco kinetic trait trade-offs (Savir et al., 2010;Tcherkez et al., 2006), in 457 addition to all subsequent analyses which have re-investigated these trade-offs among different 458 taxa (Flamholz et al., 2019;Iñiguez et al., 2020;Young et al., 2016), failed to account for the 459 phylogenetic context of the sampled species being considered in the analysis. This is a serious 460 statistical problem because species (and the rubisco they encode) cannot be treated as 461 independent observations in comparative studies owing to the shared ancestry of all organisms on 462 the hierarchical tree of life (Felsenstein, 1985). Thus, the failure to account for phylogenetic non-463 independence of rubisco data in previous studies has violated one of the core assumptions of the 464 conventional statistical methods which were applied (i.e., independence of residuals; table 1; 465 Figure 1) meaning that the results drawn from these analyses may be, wholly or in part, artefactual. 466 To assess whether the methodological flaws of previous cross-species analyses has led to 467 mistaken inferences about rubisco, in Bouvier et al., (2021) we re-investigated the kinetic evolution 468 of this enzyme in a phylogenetically resolved manner. We discovered for the first time that rubisco 469 kinetic traits exhibit strong and significant phylogenetic signal (Bouvier et al., 2021). The presence 470 of this signal means that kinetic trait values are statistically similar across enzymes as a function of 471 their shared evolutionary history (i.e., the kinetic traits exhibit strong phylogenetic non-472 independence). Accordingly, we re-evaluated the correlations between rubisco kinetic traits when 473 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint accounting for this phylogenetic signal and found that only canonical trade-offs between the 474 Michaelis constant for CO 2 (K C ) and carboxylase turnover (k catC ), and between the Michaelis 475 constants for CO 2 and O 2 (K O ) were robust to phylogenetic effects (Figure 2A -2C) (Bouvier et al., 476 2021). Though, the strength of these trade-offs were weak and considerably attenuated relative to 477 earlier estimates (Figure 2A -2C) (Bouvier et al., 2021). Therefore, although catalytic trade-offs 478 exist, these are not absolute, and have been previously overestimated due to phylogenetic biases 479 in the kinetic data (Figure 2A -2C). Instead, kinetic traits have been able to evolve largely 480 independently of one another during the diversification of rubisco. Moreover, when we further 481 investigated the constraints on the enzyme, we found that phylogenetic constraints explained more 482 variation in rubisco kinetics, and have thus had a larger impact on limiting enzyme adaptation, 483 compared to the combined action of all catalytic trade-offs (Bouvier et al., 2021). In summary, 484 therefore, although rubisco catalytic trade-offs exist as determined by chemistry, these represent a 485 less serious constraint on rubisco kinetic adaptation compared to previous assumptions. Instead, 486 phylogenetic constraints, likely caused by slow molecular evolution in rubisco (Bouvier et al., 2022) 487 and more general constraints on the molecular evolution of chloroplast encoded genes (Robbins 488 and Kelly, 2022), have presented a more significant barrier to improved rubisco catalytic efficiency. 489 In Bouvier et al., (2021) we made every effort to carefully describe the method by which we have 490 computed our phylogenetic signal in rubisco kinetic data. In addition, we were also conscious to 491 discuss what this phylogenetic signal means, as well as to explain in detail how the presence of 492 this signal has incorrectly resulted in overestimated kinetic trait correlation coefficients in previous 493 analyses. However, our results have been brought under criticism by Tcherkez and Farquhar, 494 (2021). Specifically, these authors have argued that the phylogenetic signal we detect in rubisco 495 kinetics is a consequence of computational artefacts due to a combination of biases associated 496 with species sampling (including the use of near identical sister species as well as the 497 overrepresentation of certain groups), the use of rbcL-based trees for phylogenetic inference, 498 method-to-method and laboratory-to-laboratory variability among kinetic measurements in the 499 rubisco meta-dataset, and finally, homoplasy of the C 4 trait across sampled species. On this 500 matter, Tcherkez and Farquhar (2021) argue that our phylogenetic signal in rubisco kinetics is not 501 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 20 valid because they can also generate a signal using our method with both a randomly simulated 502 arbitrary trait as well as a trait distributed according to Brownian motion. In the present response, 503 we critically review each of these claims on a point-by-point basis by re-visiting and extending our 504 original analysis. In doing so, we provide unequivocal evidence that all of the criticisms raised by 505 Tcherkez and Farquhar (2021) were either incorrect, arose from a fundamental misunderstanding 506 of basic phylogenetic concepts, or were already addressed in our original manuscript and proven 507 to be unimportant. As such, we reaffirm that our results in Bouvier et al., (2021) are robust. Thus, 508 our original conclusions, as summarized above, are unaltered. 509 To avoid any confusion on the points made by Tcherkez and Farquhar, (2021), it is worthwhile 510 discussing further the subject of C 3 vs. C 4 photosynthesis in the context of measuring phylogenetic 511 signal in rubisco kinetics. As described by our previous results (Bouvier et al., 2021) and those in 512 (Capó-Bauçà et al., 2022;Cummins, 2021;Iñiguez et al., 2020), we agree that kinetic adaptations 513 in rubisco occur in conjunction with the emergence of CO 2 -concentrating mechanisms. For 514 example, rubisco evolve to become faster (increased carboxylase turnover) and less specific 515 (decreased specificity and CO 2 affinity) during transition from C 3 to C 4 photosynthesis (Bouvier et 516 al., 2021). Nevertheless, although co-evolution between rubisco and C 4 photosynthesis is 517 apparent, as underpinned by changes in the gaseous micro-environment of the enzyme, this is not 518 responsible for the phylogenetic signal in rubisco kinetic traits as proposed by Tcherkez and 519 Farquhar (2021). Indeed, rubisco kinetic adaptation associated with the convergent evolution of C 4 520 photosynthesis in discrete clusters of species on the phylogenetic tree in fact has the opposite 521 effect of degrading the pairwise statistical association between phylogenetic distance and kinetic 522 distance across all sampled rubisco as a whole. Thus, although differences in prevailing gaseous 523 conditions around rubisco exist between C 3 and C 4 species and underpin differences in the 524 trajectory of rubisco adaptation, this does not drive (and in fact weakens) the observed 525 phylogenetic signal in rubisco kinetics. 526 Although the present work is focussed on angiosperms, it should be noted that we have previously 527 observed identical results in rubisco across the tree of life (table 2) (Bouvier et al., 2021). Thus, 528 rubisco evolution is only weakly constrained by catalytic trade-offs and is instead more limited by 529 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 21 phylogenetic constraint. We propose that this phylogenetic constraint arises from a combination of 530 a high degree of purifying selection (Robbins and Kelly, 2022), the requirement for high levels of 531 transcript and protein abundance (Kelly, 2018;Robbins and Kelly, 2022;Seward and Kelly, 2018), 532 the requirement for maintaining complementarity to a wide array of molecular chaperones which 533 assist in protein folding and assembly (e.g., Raf1, Raf2, RbcX, BSD2, Cpn60/Cpn20) and 534 metabolic regulation (e.g., rubisco activase) (Aigner et al., 2017;Carmo-Silva et al., 2015), and 535 finally, the need to preserve overall protein stability within the molecular activity-stability trade-offs 536 (Cummins et al., 2018;Duraõ et al., 2015;Studer et al., 2014). These factors combined contribute 537 to the exceedingly slow rate of molecular evolution in rbcL (Bouvier et al., 2022) which presents a 538 major barrier on rubisco optimisation. 539

540
Prior to our phylogenetically resolved investigation of rubisco evolution (Bouvier et al., 2021), all 541 comparative studies which have measured correlations between rubisco kinetic traits were 542 computed in the absence of accounting for the phylogenetic non-independence of the data. We 543 described how this omission has led to the mistaken inference that the biochemical landscape of 544 rubisco is severely constrained by catalytic trade-offs (Bouvier et al., 2021). In the present study, 545 we revisit these analyses to address a series of criticisms of our work that were put forward in a 546 recent opinion article (Tcherkez and Farquhar, 2021). Here, we demonstrate that all of these 547 criticisms were either misguided or incorrect, and thus our original conclusions are unaltered. 548 Namely, strong phylogenetic signal exists in rubisco kinetic traits and cause an overestimation of 549 catalytic trade-offs unless correctly accounted for. In actual fact, phylogenetic constraints have 550 limited rubisco kinetic adaptation to a greater extent than the combined action of catalytic trade-551 offs. These phylogenetic constraints are caused by multiple evolutionary factors act to limit rubisco 552 molecular sequence evolution (Bouvier et al., 2022) and combined, help to explain why the 553 enzyme is poorly efficient under present-day conditions but better adapted to the former high CO 2 , 554 low O 2 atmosphere in which it evolved. These conclusions agree with recent studies which have 555 also emphasized that kinetic trait correlations are generally too weak to support the rubisco trade-556 off model (Cummins et al., 2019(Cummins et al., , 2018Flamholz et al., 2019;Galmés et al., 2014) as well as 557 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 22 experimental results from rubisco engineering efforts which have been able to successfully 558 produce enzyme variants that deviate from proposed catalytic trade-offs (Wilson et al., 2018;Zhou 559 and Whitney, 2019). 560

561
Kinetic data 562 The rubisco kinetic dataset used in this work was compiled across various sources in the primary 563 literature (Bouvier et al., 2021;Flamholz et al., 2019). This data includes experimentally 564 determined measurements of wild-type rubisco assayed under conditions of pH 7.8-8.0 and 25 °C 565 for CO 2 /O 2 specificity (S C/O ), maximum carboxylase turnover rate per active site (k catC ), and the 566 respective Michaelis constant (i.e., the substrate concentration at half saturated catalysed rate) for 567 both CO 2 (K C ) and O 2 (K O ) substrates. In addition, measurements of the Michaelis constant for CO 2 568 in 20.95% O 2 ambient air (K C air ) were also available as previously derived from K C and K O using the 569 normalizing between S C/O measurements that were determined using an oxygen electrode assay 572 (Parry et al., 1989) and those determined using the high precision gas-phase-controlled 3 H-RuBP-573 fixation assay (Kane et al., 1994) by using wheat as an internal standard in both. In addition, this 574 dataset was also previously modified to average across duplicate entries of species (including 575 synonyms) between studies (Bouvier et al., 2021). It should be noted that in the present study, only 576 data for the 137 angiosperm species with complete measurements of S C/O , k catC , K C , K C air and K O 577 were considered given that this is the taxonomic group which was solely focused on by Tcherkez 578 and Farquhar (2021). Further, as before, all kinetic traits were log transformed prior to interrogation 579 to satisfy the distributional assumptions of the statistical tests being performed. 580 581 All phylogenetic trees inferred from the coding sequence of the rubisco large subunit (rbcL) were 582 obtained from our previous work (Bouvier et al., 2021). This includes the separate rbcL trees which 583 describes the evolutionary history of all angiosperms, as well as the subset of angiosperms which 584 perform C 3 photosynthesis. Phylogenetic trees based on other chloroplast encoded genes 585 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Phylogenetic tree inference based on rbcL and other gene sequences
The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 23 including maturaseK (matK), ATP synthase beta subunit (atpB), and the NADH dehydrogenase F 586 (ndhF) subunit were generated in the present study following the exact method previously 587 described (Bouvier et al., 2021). Specifically, coding sequences of matK, atpB, and ndhF were 588 obtained from NCBI (https://www.ncbi.nlm.nih.gov/) for as many of the angiosperm species in the 589 kinetic dataset as possible. Multiple sequence alignments were generated for each respective 590 gene using mafft L- INS-i (Katoh and Standley, 2013), and alignments were processed to remove 591 any partial, chimeric, or erroneously annotated sequences. Finally, bootstrapped maximum-592 likelihood phylogenetic trees were inferred from each non-gapped sequence alignment by IQ-593 TREE (Nguyen et al., 2015) using the ultrafast bootstrapping method with 1,000 replicates and the 594 Shimodaira-Hasegawa approximate-likelihood ratio branch test, with the best fitting model of 595 sequence evolution chosen automatically. All trees were rooted manually using Dendroscope 596 (Huson and Scornavacca, 2012). For each of these inferred trees based on matK, atpB, and ndhF 597 gene sequences, a corresponding tree containing only C 3 species was obtained using the function 598 drop.tip from the 'ape' package (Paradis et al., 2004;Paradis and Schliep, 2019) in the R 599 environment. 600 Due to the differences in phylogenetic information available for tree building from the alignments of 601 matK, atpB, and ndhF genes as compared to the alignment based on the rbcL gene, additional 602 polytomies which include species with terminal zero-length branches were detected in these 603 alternative chloroplast gene trees (due to species possessing 100% sequence identity for the gene 604 in question). However, it was not appropriate to condense these zero-branch length nodes into 605 single datapoints as previously performed in the analysis based on the rbcL-based trees (Bouvier 606 et al., 2021). This is because these other chloroplast genes are unrelated to rubisco kinetic traits. 607 Thus, averaging kinetic data across zero-branch length species on these other trees is incorrect in 608 the context of the downstream phylogenetic analysis being performed, given that these nodes are 609 known to possess differences in their rubisco sequences which are not captured by the tree. Thus, 610 to get around this issue and maximize the number of species taken forward for analysis, zero-611 terminal length branches in matK, atpB, and ndhF trees were arbitrarily resolved into fully 612 bifurcating trees by using the function multi2di in the 'ape' R package (Paradis et al., 2004;Paradis 613 and Schliep, 2019) and increasing terminal branch length values by a minimal value of 0.000001. 614 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 24 The total number of angiosperms and the total number of C 3 angiosperms encapsulated by each of 615 these alternative gene trees can be found in (Supplemental File 1, table S6). These alternative 616 trees are provided in Newick format in (Supplemental File 2). 617 618 To evaluate whether randomly simulated data could produce an artefactual phylogenetic signal, 619 two separate approaches were employed. First, 100,000 random permutations of the data were 620 performed manually such that in each permutation the kinetic trait data for the species set was 621 subject to a Fisher-Yates shuffle. Each permutation was subject to phylogenetic signal analysis as 622 described below and the proportion of permutations that obtained an equivalent or larger 623 phylogenetic signal to that observed in the real kinetic trait data were recorded. Second, 100,000 624 stochastically simulated trait datasets were generated by randomly sampling from normal 625 distributions inferred from the mean and standard deviation of each real rubisco kinetic trait. To 626 achieve this, the rnorm function was employed from the 'compositions' package (van den Boogaart 627

Generation of randomly simulated arbitrary traits
and Tolosana-Delgado, 2008) in the R environment. Each simulation was subject to phylogenetic 628 signal analysis as described below and the proportion of simulated trait dataset that obtained an 629 equivalent or larger phylogenetic signal to that observed in the real kinetic data were recorded. 630 631 To show that a phylogenetic signal is a universal feature of all traits which are distributed according 632 to Brownian motion on any underlying phylogenetic tree, 100,000 bifurcating trees were randomly 633 simulated using the rtree function from the 'ape' package (Paradis et al., 2004;Paradis and 634 Schliep, 2019) in R. This process was repeated for trees containing both the same number of 635 terminal tips as the full set of angiosperms in the rubisco kinetic dataset and the same number of 636 tips as the subset of C 3 angiosperms, respectively. Next, for each tree, 100 quantitative traits were 637 simulated to evolve under a standard Brownian motion model using the function fastBM in the R 638 package 'phytools' (Revell, 2012). In total, this resulted in 10,000,000 unique Brownian motion trait 639 simulations for the analysis of all angiosperms and C 3 angiosperms, respectively. Each simulated 640 trait was subject to phylogenetic signal analysis using the methods outlined below. 641 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Simulation of kinetic traits on phylogenetic trees by Brownian motion
The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint 25 Phylogenetic signal analysis 642 The magnitude and significance of phylogenetic signal in a given trait on a given underlying 643 phylogenetic tree was computed using five independent detection methods, including Pagel's 644 lambda (Pagel, 1999), Moran's I (Gittleman and Kot, 1990), Abouheif's Cmean (Abouheif, 1999) 645 and Blomberg's K and K* (Blomberg et al., 2003). Specifically, these tests work by assessing the 646 statistical association between trait and phylogenetic similarity among all pairwise combinations of 647 species on the tree. Implementation of these phylogenetic signal detection tests was performed in 648 the R environment as previously described (Bouvier et al., 2021). 649 (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 11, 2023. ; https://doi.org/10.1101/2023.01.07.523088 doi: bioRxiv preprint