Extensive transfer of membrane lipid biosynthetic genes between Archaea and Bacteria

The divergence between Bacteria and Archaea may represent the deepest split in the tree of life. One of the key differences between the two domains are their membrane lipids, which are synthesised by distinct biosynthetic pathways with non-homologous enzymes. This 9lipid divide9 has important implications for the early evolution of cells, and motivates the hypothesis that the last universal common ancestor (LUCA) may have lacked a modern cell membrane. However, we still know surprisingly little about the natural diversity of prokaryotic lipids in modern environments, or the evolutionary origins of the genes that produce them. In particular, the discovery of environmental lipids, such as glycerol dialkyl glycerol tetraethers with a mixture of classically archaeal and bacterial features, suggest that the 9lipid divide9 may be less clear cut than previously assumed. Here, we investigated the distribution and evolutionary history of membrane lipid biosynthesis genes across the two domains. Our analyses reveal extensive inter-domain horizontal transfer of core lipid biosynthetic genes, and suggest that many modern Bacteria and Archaea have the capability to biosynthesize membrane lipids of the opposite "type". Gene tree rooting further suggests that the canonical archaeal pathway could be older than the bacterial pathway, and could have been present in LUCA.

biosynthetic pathways (Fig. 1). This so-called 'lipid divide' (Koga 2011)   Williams et al. 2017; Adam et al. 2018). An alternative view is that LUCA may have 84 had a fully modern, ion-tight membrane, which was heterochiral with respect to 85 et al. 2000), one possibility is that these mixed biochemical properties reflect 136 biosynthetic pathways of mixed bacterial and archaeal origin. This prompted us to 137 investigate the distribution and phylogeny of phospholipid biosynthesis enzymes 138 across the two domains and evaluate the evidence for inter-domain horizontal gene 139 transfer. Our analysis focused on the core enzymes that establish membrane lipid 140 stereochemistry and attach the two carbon chains to the glycerol-phosphate backbone 141 (Figure 1), as the histories of these enzymes are key to understanding the evolution 142 of membrane stereochemistry and biosynthesis. 143 144 145 Results and Discussion 146 147 Extensive inter-domain lateral transfer of core phospholipid biosynthesis genes 148 149 We performed BLASTp searches for the enzymes of the canonical archaeal and 150 bacterial lipid biosynthesis pathways ( Figure 1) against all archaeal and bacterial 151 genomes in the NCBI nr database. Our BLAST searches revealed homologues for all 152 of the core phospholipid biosynthesis genes of both pathways in both prokaryotic 153 domains, with the exception of bacterial enzymes PlsB and PlsX, which we did not find 154 in Archaea. Orthologues of the canonical archaeal genes are particularly widespread 155 in many bacterial lineages (Table 1). Of 48 bacterial phyla, 6 have at least one 156 sequence identified as an orthologue of each of the three archaeal genes ( Glp and GlpK appear in Crenarchaeota and Korarchaeota, while GpsA appears only 180 in a single crenarchaeote (Thermofilum). GpsA is also found in two of the 11 DPANN 181 phyla surveyed (Woesarchaeota and GW2011), while GlpK is found in one phylum 182 (Woesearchaeota) and Glp is found in none. Within the Asgardarchaeota 183 superphylum, no orthologues for GpsA are found, and only one of the four phyla 184 (Lokiarchaeota) has Glp or GlpK. PlsC and PlsY are more restricted, being found  Early origins of archaeal-type membrane lipid biosynthesis genes in Bacteria 207 208 To investigate the evolutionary histories of these genes, we inferred Bayesian single-209 gene phylogenies from the amino acid sequences using the relationships within the group are poorly resolved. Interestingly, the majority of the 231 bacterial G1PDH orthologues do not appear to be recent horizontal acquisitions from 232 Archaea, but instead form a deep-branching clan resolved as sister to the archaeal 233 lineage. The root position that receives the highest posterior support in the relaxed 234 molecular clock analysis is that between the archaeal and bacterial clans, with a 235 marginal posterior probability of 0.68 (Supplementary Table 1). This is substantially 236 higher than the next most probable position, which places the root within the Bacteria 237 with a posterior of 0.1. When rooted using MAD, the same root position is recovered 238 with a marginal posterior probability of 0.62, also substantially higher than the next 239 most probable root of 0.1. Rooting single genes trees can prove difficult, and this 240 uncertainty is captured in the low root probabilities inferred using both the molecular 241 clock and MAD methods. However, these analyses can be used to exclude the root 242 from some regions of the trees with a degree of certainty. In the case of G1PDH, a 243 post-LUCA origin of the gene would predict a root on the archaeal stem or within the 244 archaea. In our analyses, no such root position has a significant probability (i.e. 245 PP>0.05), and therefore the root is highly unlikely to be within the archaea. The

265 266
This root position is consistent with two scenarios that we cannot distinguish based on 267 the available data. One possibility is an early transfer of G1PDH from stem Archaea 268 into Bacteria, either into the bacterial stem lineage with subsequent loss in later 269 lineages, or into the ancestor of Actinobacteria topology implies an ancestral duplication followed by sorting out of the paralogues and 287 multiple transfers into Bacteria. To improve resolution among the deeper branches of 288 the tree, we inferred an additional phylogeny focusing just on the larger of the two 289 paralogues (Supplementary Figure 3). The root within this paralogous sub-tree fell 290 between reciprocally monophyletic archaeal and bacterial clades (PP = 0.8, much 291 higher than the next most likely root, within the bacteria, with PP = 0.07), suggesting 292 that the gene duplication at the base of the GGGPS tree pre-dates LUCA. In sum, our phylogenetic analyses of archaeal lipid biosynthesis genes suggest that 313 GGGPS and DGGGPS were already present in LUCA, with G1PDH either present in 314 LUCA or evolving along the archaeal stem. They also provide evidence for repeated, 315 independent inter-domain transfer of these genes from archaea to bacteria throughout 316 the evolutionary history of life.

318
Transfers of bacterial membrane lipid genes into Archaea 319 320 In contrast to our analyses of proteins of the classical archaeal pathway, phylogenies unclear; yet, support for root positions outside of the Bacteria was never obtained. This 326 is consistent with the hypothesis that the core bacterial pathway first evolved after the 327 bacterial lineage diverged from LUCA.

329
GpsA and glp are two genes that code for glycerol-3-phosphate (G3PDH), which 330 establishes phospholipid stereochemistry in Bacteria. The deep relationships between 331 the archaeal and bacteria sequences in the GpsA tree are poorly resolved (Fig. 3a), 332 while being better resolved for Glp (Fig. 3b). The root position in both trees is poorly 333 resolved for both rooting methods (Supplementary Table 1 for Glp. The tree inferred for GlpK (the gene that codes for glycerol synthase, which 337 can synthesise G3P from glycerol) (Fig. 4a) shows a similar pattern like the 338 phylogenies of GpsA and Glp. Again, the root positions have low posterior support 339 (0.47 and 0.34 for the molecular clock and MAD respectively). However, in each case, 340 there is evidence of multiple recent transfers from Bacteria to Archaea, as we recover 341 several distinct bacterial and archaeal clades with moderate to high support (0.8-1). 342 The main archaeal recipients of these genes are Euryarchaeota which is consistent 343 with reports of bacterial-like fatty acid esters in this group (Gattinger et al. 2002) , and 344 which may suggest the occurrence of an earlier transfer into the stem lineage of this 345 clade. The tree topology also supports a number of more recent transfer events into 346 various archaeal lineages.     (with the next most likely, also within the Bacteria, being 0.1). The PlsY (Fig 4c)

389 390
Comparisons with outgroup rooting 391 392 The most widely-used approach for rooting trees is to place the root on the branch 393 leading to a pre-defined outgroup (Penny 1976), but this can be challenging for ancient 394 genes for which closely-related outgroups are lacking (Gouy et al. 2015 A potential concern when using distantly-related sequences to root a tree is that the 408 long branch leading to the outgroup can induce errors in the in-group topologies due 409 to artefacts of this type than the profile mixture models used here (Lartillot et al. 2007).

412
To investigate whether the differences in root inference between our analyses and 413 those of Yokobori et al. (2016) might be the result of LBA, we performed outgroup  414 rooting analysis on G1PDH, GpsA and Glp, augmenting our datasets with a 415 subsample of the outgroups used by Yokobori et al. and using the same models used 416 to infer the unrooted trees (LG+C60 in each case). The resulting trees (Supplementary 417 Figures 10-12) show different topologies when compared to the unrooted trees 418 (Supplementary Figures 16, 19-20). This suggests that the long branch outgroup may 419 be distorting the ingroup topology.

421
We also performed model testing in IQ-Tree and compared the fit of the chosen 422 models to the models used by Yokobori et al. (see Material and Methods below).

423
LG+C60 was selected for both G1PDH and Glp, while LG+C50 was selected for Gpsa 424 (Supplementary figure 24). The results of these analyses indicate that the empirical 425 profile mixture models which we have used here fits each of these alignments 426 significantly better than the single-matrix models of Yokobori et al. (Supplementary  427  Table 2). However, even analyses under the best-fitting available models show 428 distortion of the ingroup topology upon addition of the outgroup (Supplementary 429 Figures 10-12, 24), when compared to the unrooted topologies (Supplementary 430 Figures 16, 19-20). In each case, we found the root in a different place to those 431 recovered by Yokobori et al. In the G1PDH tree, we find Bacteria, specifically 432 Firmicutes to be most basal, rather the Crenarchaeota found by Yokobori. In the case 433 of GpsA, Yokobori et al. did not find compelling support for an origin in LUCA, but they 434 did recover one archaeal lineage (the Euryarchaeota) at the base of the in-group tree 435 with low (bootstrap 48) support. While our GpsA tree is also poorly resolved, we do 436 not find evidence to support the basal position of the archaeal lineages, and therefore 437 for the presence of GpsA in LUCA. For glp, which Yokobori et al. trace back to LUCA 438 due to the basal position of the archaeal sequences, the outgroup sequences did not 439 form a monophyletic group, and were instead distributed throughout the tree 440 (Supplementary Figure 11). Thus, analyses under the best-fitting available models did 441 not support the presence of bacterial lipid biosynthesis genes in LUCA. Further, the 442 distortion of the ingroup topologies suggests that these outgroups may not be suitable 443 for root inference, at least given current data and methods. 444 445 446 Origin of eukaryotic lipid biosynthesis genes 447 448 Phylogenetics and comparative genomics suggest that eukaryotes arose from a 449 symbiosis between an archaeal host cell and a bacterial endosymbiont that evolved The origin of bacterial-type membranes in eukaryotes is therefore an important 456 evolutionary question that needs explanation. As noted above, multiple explanations 457 have been proposed for the origin of eukaryotic membrane lipids (Woese et al. 1990 We additionally find a PlsC orthologue in Heimadallarchaeota, and PlsC and PlsY 466 orthologues in Heimdallarchaeota and Thorarchaeota (Table 1).

468
To evaluate this hypothesis, we expanded our datasets for GpsA, Glp and PlsC with 469 a representative set of eukaryotic homologues. The resulting trees are poorly resolved 470 ( Supplementary Figures 13-15), but do not support a specific relationship between the 471 eukaryotic sequences and any archaeal lineages, and so do not provide any 472 compelling support for an origin of eukaryotic lipids via the archaeal host cell. organisms that possess them. Therefore, the capability to synthesise both types of 485 membranes may be more widespread than has been appreciated hitherto. However, 486 gene presence is not sufficient to establish membrane composition, as these genes 487 might be involved in other cellular processes. As in B. subtilis, experiments would be 488 needed to test these predictions in any particular case. Crucially, the evidence that 489 these genes undergo horizontal transfer, both early in evolution and more recently, 490 provides a potential mechanism for the remarkable diversity of membrane lipids, and 491 especially ether lipids, in diverse environmental settings (Schouten et al. 2001 It is possible that a transition to the bacterial type was driven by the lower energetic 535 cost of making and repurposing fatty acid ester lipids, although we know of no 536 published experimental data on these relative biosynthetic costs. Alternatively, the 537 bacterial-type membrane lipids comprise a variety of fatty acyl moieties, varying in 538 chain length, unsaturation, degree of branching and cyclisation, and these could 539 impart a degree of flexibility and adaptability that provides a marginal benefit in 540 dynamic mesophilic environments. If so, that advantage could translate to bacterial 541 ether lipids that are also widespread in non-extreme settings and also characterised 542 by a variety of alkyl forms (Pancost et al. 2001 Our results indicate that inter-domain transfer of membrane lipid biosynthesis genes 553 appears to be widespread, providing a potential mechanism for understanding the 554 variety of lipids with mixed characteristics that occur in the environment. Unfortunately, 555 very little is currently known about the stereochemical diversity of environmental lipids; 556 we are aware of only one study (Weijers et al., 2006) that has investigated this for a 557 class of lipids of mixed character, the brGDGTs, which exhibit bacterial-type 558 stereochemistry. Our work suggests that stereochemical diversity, just like other 559 putative features of the lipid divide, should also be re-investigated. Overall, and taken 560 together with evidence from natural and experimental settings for the stability of mixed BLASTp searches to find these sequences in the remaining genomes. For PlsB and 581 PlsX, we searched for the respective terms in the gene database on the NCBI website, 582 and upon finding well-verified occurrences, performed BLASTp searches to find the 583 corresponding amino acid sequences in the remaining genomes. We then used 584 BLASTp to look for bacterial orthologues of the archaeal enzymes and vice versa. We 585 selected sequences that had an E-value of less the e-7 and at least 50% coverage. 586 Accession numbers for sequences used are provided in Supplementary Table 3.  587  588  589 Phylogenetics 590 591 The sequences were aligned in mafft (Katoh et al. 2002) using the --auto option and 592 trimmed in BMGE (Criscuolo and Gribaldo 2010) using the BLOSUM30 model, which 593 is most suitable for anciently-diverged genes. To construct gene trees form our amino 594 acid sequences, we first selected the best-fitting substitution model for each gene 595 according to its BIC score using the model selection tool in IQ-Tree (Nguyen et al. LG+C60 600 was used for G1PDH, GpsA, Glp and GlpK.
LG+C40 was used for GGGPS. A discretised Gamma distribution (Yang 1994