A comparative and functional genomics analysis of the genus Romboutsia provides insight into adaptation to an intestinal lifestyle

Cultivation-independent surveys have shown that the recently described genus Romboutsia within the family Peptostreptococcaceae is more diverse than previously acknowledged. The majority of Romboutsia-associated 16S rRNA gene sequences have an intestinal origin, but the specific roles that Romboutsia species play in the digestive tract are largely unknown. The complete genomes of the human intestinal isolate Romboutsia hominis FRIFIT (DSM 28814) and the soil isolate Romboutsia lituseburensis A25KT (DSM 797) were sequenced. An evaluation of the common traits of this recently defined genus was done based on comparative genome analysis of the two strains together with the previously elucidated genome of the type species Romboutsia ilealis CRIBT. These analyses showed that the genus Romboutsia covers a broad range of metabolic capabilities with respect to carbohydrate utilization, fermentation of single amino acids, anaerobic respiration and metabolic end products. Main differences between strains were found in their abilities to utilize specific carbohydrates, to synthesize vitamins and other cofactors, and their nitrogen assimilation capabilities. In addition, differences were found with respect to bile metabolism and motility-related gene clusters.

Introduction metabolism of the Romboutsia strains, the three annotated genomes were supplied to Pathway tools 148 v18 (46), and a limited amount of manual curation was performed to remove obvious false positives. 149 Next the pathway databases were exported via the built-in lisp interface and the exported data was 150 merged. A reaction was considered to be in the core metabolism if it was present in all three 151 databases, else it was considered to be in the pan metabolism. Both parts were then reimported 152 separately and combined into Pathway tools for further analyses. 153 Genes were matched to the list of essential and non-essential sporulation-related genes 154 compiled by Galperin et al. (47) via different methods. Firstly, the protein-coding genes of Bacillus 155 subtilis subsp. subtilis 168 were annotated via InterProScan and the respective B. subtilis sporulation-156 related proteins were matched to the proteins encoded by the three Romboutsia genomes, if they 157 contained at least 50 % of the same domains. In case multiple matches were possible, the match with 158 the highest domain similarity was picked. The matches were manually curated, and arbitrary proteins 159 and/or false hits were excluded. For every protein, which did not have any match via the domains, the 160 best bidirectional BLAST hit (e-value cut-off of 0.0001) was used instead. Secondly, the genome of R. 161 ilealis CRIB T was manually curated with respect to putative sporulation-related genes. In case the 162 genomes of the other Romboutsia strains did not have any corresponding match for one of the 163 proteins whereas a manually curated hit was present in R. ilealis CRIB T , the best bidirectional hit was 164 assigned. Genomes were manually checked for further missing essential sporulation-and 165 germination-related genes as defined by Galperin et al. (47). Function curation was performed with 166 assistance of the B. subtilis wiki (http://subtiwiki.uni-goettingen.de/). 167 168

169
To gain more insight in the metabolic and functional capabilities of members of the genus Romboutsia 170 within an intestinal environment, we set out to elucidate the genome of a Romboutsia strain of human 171 intestinal origin. To this end, the genome of R. hominis FRIFI T , isolated from ileostoma effluent of a 172 human adult, was sequenced (17). For comparative analysis, we also aimed to determine the genome 173 sequence of an isolate from another habitat, and thus the soil isolate R. lituseburensis A25K T was 174 obtained from the German Collection of Microorganisms and Cell Cultures (DSMZ,Braunschweig,175 Germany). Here we present the genome sequences of both organisms, together with an evaluation of 176 the common traits of this recently defined genus based on comparative genome analysis, including 177 the recently elucidated genome of the type species R. ilealis CRIB T (1, 48). The genomes of R. hominis 178 FRIFI T and R. lituseburensis A25K T (raw data and annotated assembly) have been deposited at the 179 European Nucleotide Archive under project numbers PRJEB7106 and PRJEB7306, respectively.  190 191 To investigate the relationships between these isolates and their closest relatives, a 16S rRNA 192 gene based neighbour joining-tree was constructed with a representative copy of the 16S rRNA gene 193 of the type strains of the three species R. ilealis, R. lituseburensis and R. hominis (Fig. 2) The numbers of genes associated with general COG functional categories are shown in Table  218 2. The biggest differences between both genomes were found within the genes not assigned to any 219 COG category. Within the COG categories, the most noticeable difference was observed within 220 category J, with more than 1% difference. Most other categories are present in comparable 221 abundances, despite the fact that both organisms were isolated from different habitats.

Impact of high number of ribosomal operons on sequencing efforts
250 Gaps in whole genome assemblies are usually located in repetitive regions that include ribosomal 251 operons, which can appear multiple times in the genome. Also for the Romboutsia genomes, the 252 presence of a high number of rRNA operons has been problematic for genome assembly. The assembly 253 of R. hominis FRIFI T contains one gap situated in a long stretch of ribosomal operons. The assembly of 254 the chromosome of R. lituseburensis A25K T contains eleven gaps, of which six are generated due to 255 scaffolding with the use of a reference. Nine of the eleven gaps are situated within or neighbouring 256 rRNA operons or tRNA clusters. A total of 16 copies of the 16S rRNA gene were identified in the 257 genome of R. hominis FRIFI T . This is one of the highest copy numbers reported for the 16S rRNA gene 258 in prokaryotic species up to this date. For R. lituseburensis A25K T , the total number of 16S rRNA genes 259 could not be accurately estimated since some of the rRNA genes are situated next to assembly gaps, 260 but at least 15 rRNA operons seem to be present. Pairwise sequence identity of the 16S rRNA 261 sequences showed that within the genome of strain FRIFI T there was an average sequence identity of 262 99.3 % and the lowest identity between individual copies was 98.4 %. Sequence divergence in the 16S 263 rRNA gene is not uncommon within individual prokaryotic genomes (51, 52). However, for R. hominis 264 FRIFI T the divergence is located in the V1-V2 region of the 16S rRNA gene, one of the regions that is 265 commonly used for sequence-based bacterial community analyses (53). In this region the average 266 sequence identity was only 96.5 % and the lowest identity was only 92.3 %. Consequence of this 267 divergence is that during identity clustering in operational taxonomic units (OTUs) the different copies 268 of the 16S rRNA gene of R. hominis FRIFI T end up in different OTUs, even at the level of 97 % identity, 269 resulting in an overestimation of the diversity in Romboutsia phylotypes. In comparison, the type 270 species of the genus, R. ilealis CRIB T , contains little variation in the 16S rRNA gene sequence (>99 % 271 sequence identity), despite that also in this genome 14 copies of the 16S rRNA gene were identified. 272

Comparison of the genome of Romboutsia lituseburensis to
273 other genomes of this species 274 The genome sequence of R. lituseburensis A25K T was compared to the genome sequence of the same 275 strain (Bioproject PRJEB16174) that has been sequenced by the JGI and that has become publicly 276 available during the course of this project (54). This comparison showed only minor sequence 277 differences. Both genome sequences, including the plasmid sequences, are nearly identical (99.9%), 278 with most differences arising from the gaps within our assembly or from contig ends (~500bp of each 279 contig) within the JGI assembly. One difference was observed within the repetitive gene RLITU_1618, 280 which was shorter assembled in the JGI genome. The surface antigen encoding gene RLITU_0237 was 281 also assembled shorter within the JGI assembly. Duplicated Lysine, Serine and Arginine tRNA genes 282 (location 1433719 -1434325) were omitted in the JGI assembly, probably due to misassembly in this 283 complicated region, which is not resolvable with Illumina short reads. Furthermore the JGI contig GI: 284 1086420641 seems to be assembled differently, since the rRNA region present in this contig was not 285 connected to the protein coding sequences in our assembly, but both were located at different places 286 within the genome. We were unable to locate the first 8kb of JGI contig GI: 1086420759 within our 287 assembly. The remaining parts of this contig match to an area following an assembly gap, and it 288 therefore cannot be excluded that it was missed in our assembly. The only unplaced contig within our 289 assembly was also nearly completely contained in a single contig within the JGI assembly, and 290 therefore did not help to resolve this situation. Overall, it seems that most of the observed differences 291 were due to technical reasons, and not due to underlying differences in the genomes of both strains. The genome sequences of the two newly sequenced strains were compared to the type strain of the 296 type species of the genus, Romboutsia ilealis CRIB T (48). The number of protein coding genes per 297 genome within the various strains was quite variable, ranging from 2359 to 3658 (Table 1). The 298 number of putative homologous genes among the three Romboutsia genomes was determined via 299 amino acid level best bidirectional hits (Fig. 3). In total 1522 genes were shared between all three 300 strains, the core genome, accounting for 42 % to 64 % of the total gene count in the individual 301 genomes, providing a first insight in the genomic heterogeneity within the genus. The bigger the 302 genome, the more unique genes were present, ranging from 19 % to 34 % of the total gene count. 303 304 305 306

genomes. 309
The circles are colour-coded by the Romboutsia strains they represent: blue, R. ilealis CRIB T ; green; R. 310 lituseburensis A25K T ; red R. hominis FRIFI T . Also the total number of genes and the number of unique 311 genes are indicated for each genome. The area of the circle is representative for the number of genes. 312 313 314 The comparative genome analysis showed a general conservation of the genomic structure of 315 the genus Romboutsia around the replication start site, while synteny appears to be lost towards the 316 replication end point. For most pairwise comparisons, synteny was lost at a quarter of the genome in 317 both up-and downstream directions, making roughly half of the genomes syntenic. Breaks of synteny 318 appear to be related to specific recombination events. For example, compared to the other genomes 319 synteny is absent in R. ilealis CRIB T due to the insertion of a prophage, whereas the regions both up-320 and downstream are syntenic. At another spot in the genome of R. ilealis CRIB T synteny is lost due to 321 phage-related genes found around the tmRNA gene, which has been reported to be a common 322 insertion site for phages (55). The position of the tmRNA itself is roughly equal in all three genomes, 323 but no synteny could be observed in its vicinity. Strain/species-specific gene clusters, like the CRISPR-324 Cas system or the fucose degradation pathway present in R. ilealis CRIB T , appear to be situated more 325 towards the less conserved replication end point. One point of conservation in the less conserved area 326 is an inversion of one part of the butyrate fermentation pathway, which is absent in R. ilealis CRIB T , 327 but inverted between R. hominis FRIFI T and R. lituseburensis A25K T . Some significant deletion events 328 appear to have occurred, since they can be observed in the conserved areas of the genomes. For 329 example, the pili encoding gene cluster, which is found in the genome of R. lituseburensis A25K T close 330 to the replication start site, is absent in the genomes of R. ilealis CRIB T and R. hominis FRIFI T except for 331 a twitching motility protein encoding gene. Another example is the biosynthesis cluster for vitamin 332 B12, which is also located in all strains close to the replication start site. While this cluster is complete 333 in R. lituseburensis A25K T and R. hominis FRIFI T , only remnants of the cluster are visible in R. ilealis 334 CRIB T as there is a deletion of nine genes, which prevents the biosynthesis of cob(II)yrinate a,c-335 diamide. This cluster is also situated next to an rRNA operon, of which only the one in R. ilealis CRIB T 336 has an integrase inserted. 337 Core and pan metabolism of the genus Romboutsia 338 An overview of the core metabolism of the Romboutsia strains and strain-specific metabolic features 339 is provided in Fig. 4. All three Romboutsia strains can ferment carbohydrates via the glycolysis, and 340 possess the non-oxidative pentose phosphate pathway. Moreover, from the genomes it was predicted 341 that all strains have the capability to synthesize (and degrade) all nucleotides, cell wall components, 342 fatty acids and siroheme. In addition, it was predicted that all three Romboutsia spp. strains can only 343 produce a limited non-identical set of amino acids. In turn they are, however, also able to ferment 344 numerous amino acids. Furthermore various pathways for assimilation of nitrogen were predicted, as 345 well as a pathway for production of the quorum sensing compound autoinducer AI-2. Some of the 346 metabolic highlights will be discussed in the following paragraphs. lituseburensis A25K T during growth on glucose, including formate, acetate and a small amount of 363 lactate (17). The pathways leading to formate, acetate and lactate production, which have previously 364 been described for R. ilealis CRIB T (48), were also found in the two other Romboutsia strains suggesting 365 that all strains are indeed able to produce formate, acetate and lactate. Butyrate (and iso-valerate) 366 production has been observed for R. lituseburensis A25K T during in vitro growth on undefined medium 367 components such as beef extract, peptone and casitone (but not on yeast extract). The addition of a 368 carbohydrate (e.g. glucose) resulted in a redirection of the fermentation pathways towards other end 369 products such as formate (data not shown). Two pathways leading to butyrate synthesis, the acetyl-370 CoA and the lysine pathways , could be predicted from the genome of R. lituseburensis A25K T . The 371 pathways are co-located in the genome, suggesting that the acetoacetyl-CoA formed during lysine 372 fermentation can be directly used as substrate in the acetyl-CoA pathway for additional energy 373 conservation (56, 57). A lysine-specific permease was predicted in the genome as well, suggesting that 374 exogenous lysine can serve as energy source for this strain. Since an acetyl-CoA acetyltransferase was 375 found in the gene cluster as well, a fully functioning carbohydrate-driven acetyl-CoA pathway is 376 expected. For the final step in butyrate production, a phosphate butyryl transferase/butyrate kinase 377 (buk) gene cluster was identified in the genome. Similar gene clusters (although with some gene 378 inversions) were found in the genome of R. hominis FRIFI T as well, but butyrate production has not 379 (yet) been observed (17). 380 In the genomes of both R. hominis FRIFI T and R. lituseburensis A25K T a reductive pathway for 381 the metabolism of glycerol was predicted, comprising a glycerol dehydratase and 1,3-propanediol 382 dehydrogenase (58). This suggests that these strains are able to ferment glycerol and produce 1,3-383 propanediol as one of the fermentation end-products. Production of 1,3-propanediol has indeed been 384 reported for R. lituseburensis (22). Furthermore, in the genome of R. lituseburensis A25K T the oxidative 385 pathway for glycerol metabolism, including glycerol dehydrogenase and dihydroxyacetone kinase, 386 could be identified as well, suggesting that this strain should be able to use glycerol as sole carbon and 387 energy source. For both R. hominis FRIFI T and R. lituseburensis A25K T growth on glycerol has indeed 388 been observed (17), although the responsible genes could not be predicted in R. hominis FRIFI T . 389 The genomes of all three Romboutsia strains studied here contain genes encoding for enzymes 390 of the Wood-Ljungdahl pathway. A formate dehydrogenase was predicted for all strains except R. 391 ilealis CRIB T . The presence of formate dehydrogenase together with a complete Wood-Ljungdahl 392 pathway categorizes them as potential acetogens, microbes that can grow autotrophically using CO2 393 and H2 as carbon and energy source. This provides them with metabolic flexibility in addition to 394 heterotrophic growth on organic compounds. The role of acetogens in the intestinal tract is not well 395 studied. They have been proposed to play an important role in hydrogen disposal, in addition to 396 methanogens and sulfate reducers (59, 60). 397 Genomes of all three Romboutsia strains contain genes predicted to encode a sulfite 398 reductase of the AsrABC type. Inducible sulfite reductases are directly linked to the regeneration of 399 NAD + , which plays a role in energy conservation and growth, as well as to detoxification of sulfite (61). 400 R. hominis FRIFI T , however, appears to lack the formate/nitrite transporter family protein that was 401 found in the vicinity of the predicted sulfite reductase in the other strains similarly to Clostridioides 402 difficile (previously known as Clostridium difficile) where it was characterized as a hydrosulfide ion 403 channel which exports the toxic metabolites from the cell (62). The genes coding for a complete 404 membrane-bound electron transport system were identified in both genomes of R. hominis FRIFI T and 405 R. lituseburensis A25K T , similar to the Rnf system identified in microbes such as Clostridium tetani, 406 Clostridium ljungdahlii and C. difficile. In these species the system is suggested to be used to generate 407 a proton gradient for energy conservation in microbes without cytochromes. In C. tetani, the system 408 is proposed to play a role in the electron flow from reduced ferredoxin, via NADH to the NADH-409 consuming dehydrogenase of the butyrate synthesis pathway (63). In addition, the Rnf system is 410 proposed to be used by C. ljungdahlii during autotrophic growth (64). In the genome of R. ilealis CRIB T 411 only remnants of an Rnf electron transport system could be found, which might be a result of genome 412 reduction since also no complete butyrate synthesis pathway or acetogenic pathways were found. 413

Fermentation of individual amino acids
414 Species belonging to the class Clostridia are known for their capabilities to ferment amino acids. Of 415 the three Romboutsia strains, R. lituseburensis A25K T appears to be the most resourceful. All three 416 Romboutsia strains are predicted to be able to ferment L-histidine via glutamate using a histidine 417 ammonia lyase. In addition, fermentation of L-threonine was predicted using a L-threonine 418 dehydratase resulting in propionate production, which has been described for R. lituseburensis (20). 419 Fermentation of L-serine into pyruvate using an L-serine dehydratase was predicted for all three 420 strains as well. As aforementioned, R. hominis FRIFI T and R. lituseburensis A25K T are predicted to be 421 able to ferment L-lysine. In addition, R. lituseburensis A25K T is predicted to ferment glycine using the 422 glycine reductase pathway found in other related species including C. difficile (65, 66). A corresponding 423 complex has also been identified in R. hominis FRIFI T , but it is likely to be non-functional, due to a loss 424 of two of the three subunits. Furthermore, the ability to ferment L-arginine (using an arginine 425 deiminase) and L-glutamate (using a Na + -dependent glutaconyl-CoA decarboxylase) was predicted for 426 C. lituseburensis A25K T as well. A glutamate decarboxylase was predicted for R. hominis FRIFI T , 427 suggesting the ability to decarboxylate glutamate to 4-aminobutyrate (GABA) for this strain only. 428

Amino acid and vitamin requirements
429 Pathways for (de novo) synthesis of amino acids were identified in the three Romboutsia strains (Table  430 3). All three strains show similar dependencies on exogenous amino acid sources. Based on genome 431 predictions, R. lituseburensis A25K T is able to synthesize lysine from aspartate, whereas the last 432 enzyme in this pathway is missing in the genomes of R. hominis FRIFI T and R. ilealis CRIB T . In addition, 433 R. hominis FRIFI T and R. lituseburensis A25K T are predicted to synthesise alanine from aspartate and 434 glycine from threonine. Common to all organisms is that the prephenate dehydratase for the 435 biosynthesis of phenylalanine and tyrosine is missing, although all other enzymes for the biosynthesis 436 of chorismate and for the further conversion to both amino acids are present. 437 438 The urease gene cluster, previously identified in R. ilealis CRIB T (48), could not be identified in 446 the two other Romboutsia strains. However, a nitrogenase encoding gene cluster was identified in the 447 genomes of R. hominis FRIFI T and R. lituseburensis A25K T , suggesting that these two strains are able to 448 fix N2. 449 All three strains encode one or several oligopeptide transporters of the OPT family (67). In 450 addition, two oligopeptide transport systems (Opp and App) (68, 69) were predicted in R. hominis 451 FRIFI T and R. lituseburensis A25K T (strain FRIFI T misses the OppA), whereas they were absent in R. ilealis 452 CRIB T . Based on the predicted amino acid dependencies, it can be concluded that these Romboutsia 453 strains are adapted to an environment rich in amino acids and peptides. 454 The metabolic capabilities of the three Romboutsia species are comparable regarding the 455 ability to produce certain vitamins and other cofactors (Fig. 4). None of them is predicted to be able 456 to synthesize vitamin B6, lipoic acid or pantothenate, but it is likely that they are all able to produce 457 siroheme from glutamate and CoA from pantothenate. As previously described for R. ilealis CRIB T (48), 458 the pathway for de novo folate biosynthesis via the pABA branch is present, however, a gene encoding 459 dihydrofolate reductase could not be identified in any of the three Romboutsia strains. However, since 460 this enzyme is essential in both de novo and salvage pathways of tetrahydrofolate, it is highly likely it 461 is present in the genomes. The biosynthetic capabilities of R. lituseburensis A25K T , and R. hominis 462 FRIFI T are larger than that of R. ilealis CRIB T , as they are both predicted to produce biotin, thiamin and 463 vitamin B12. The gene clusters for biotin and thiamin biosynthesis are located in the more variable 464 regions of the genomes as discussed above, and the vitamin B12 biosynthesis pathway is incomplete 465 in R. ilealis CRIB T due to a deletion, as mentioned earlier. Only R. lituseburensis A25K T is predicted to 466 have the capacity to produce riboflavin de novo. Furthermore, R. lituseburensis A25K T is, as the only 467 non-host derived organism in this comparison, the only strain that can synthesize NAD de novo. In addition to the possible BSH, a bile acid 7α-dehydratase encoding gene could be identified 484 in R. hominis FRIFI T . This enzyme is part of the multi-step 7α/ß-dehydroxylation pathway that is 485 involved in the transformation of primary bile acids into secondary bile acids. So far, this pathway has 486 been found exclusively in a small number of anaerobic intestinal bacteria all belonging to the 487 Firmicutes (72,74). The presence of this pathway enables microbes to use primary bile acids as an 488 electron acceptor, allowing for increased ATP formation and growth. High levels of secondary bile 489 acids are associated with diseases of the host such as cholesterol gallstone disease and cancers of the 490 GI tract (75, 76). However, the evidence that bacteria capable of 7α-dehydroxylation are directly 491 involved in the pathogenesis of these diseases is still limited. The pathway has been extensively 492 studied in the human isolate Clostridium scindens VPI 12708 (formerly known as Eubacterium sp. strain 493 VPI 12708 (77)). In addition, 7α-dehydroxylation activity was also reported for Clostridium hiranonis 494 (78) and Paeniclostridium sordellii (previously known as Clostridium sordellii) (79), and other close 495 relatives of the genus Romboutsia (74). Extensive characterization of the 7α/ß-dehydroxylation 496 pathway in C. scindens VPI 12708 has demonstrated that the genes involved are encoded by a large 497 bile acid inducible (bai) operon (72). For R. hominis FRIFI T several other genes were identified in the 498 vicinity of the bile acid 7α-dehydratase gene that showed homology to the genes in the bai operon, 499 but some other (essential) genes seem to be missing. From gene presence/absence it was therefore 500 not possible to predict whether R. hominis FRIFI T has 7α/ß-dehydroxylation activity and that has to be 501 confirmed experimentally. 502

503
The class Clostridia contains some well-known pathogens, including C. difficile, C. botulinum and C. 504 perfringens, for which several toxins have been characterized in depth (80). No homologues of the 505 genes coding for the toxins of C. difficile (toxin A, toxin B, binary toxin) or C. botulinum could be found 506 in the genomes of the three Romboutsia strains. The genome of R. ilealis CRIB T encodes a predicted 507 protein that was annotated as a putative septicolysin (CRIB_2392) since it shares 56 %identity to a 508 protein that has been characterized as an oxygen-labile hemolysin in Clostridium septicum (81). 509 However, the exact role of septicolysin in potential pathogenesis is not known. Homologues are not 510 found in other related species. A homologue for the alpha toxin of Clostridium perfringens (80, 82), a 511 phospholipase C protein involved in the aetiology of gas gangrene caused by C. perfringens (83), was 512 found by BLAST search in the genomes of R. hominis FRIFI T and R. lituseburensis A25K T (49.4 -54.3 % 513 identity at the amino acid level). In addition, R. lituseburensis A25K T is predicted to contain a protein 514 homologous to the perfringolysin O (theta toxin) of C. perfringens, which is a thiol-activated cytolysin 515 that forms large homo-oligomeric pore complexes in cholesterol-containing membranes, which is also 516 involved in gas gangrene aetiology. By BLAST search similar proteins could also be found in the 517 genomes of P. sordellii and Paraclostridium bifermentans (previously known as Clostridium 518 bifermentans), which are close relatives of the genus Romboutsia. There are many homologous 519 enzymes produced by other bacteria that do not have similar toxigenic properties as the C. perfringens 520 proteins (83). For example, the phospholipase C proteins produced by P. bifermentans and P. sordellii 521 were found to have significantly less haemolytic activity than the homologuous protein of C. survive environmental challenges such as nutrient limitation. These endospores are resistant to 553 extreme exposures (e.g. high temperatures, freezing, radiation and agents such as antibiotics and 554 most detergents) that would kill vegetative cells. The ability to form endospores was also studied for 555 the three Romboutsia strains. R. lituseburensis A25K T readily forms mature spores, especially during 556 growth in Duncan-Strong medium and Cooked meat medium, that both contain large quantities of 557 proteose peptone, and spore formation was observed in almost every cell (data not shown). 558 Previously, the endospore forming capabilities of R. ilealis CRIB T and R. hominis FRIFI T have been 559 studied (90). Using different media and incubation conditions it was observed that the process of 560 sporulation appears to be initiated, however, no free mature spores could be observed. 561 The whole process of sporulation and subsequent spore germination involves the expression 562 of hundreds of genes in a highly regulated manner. At a molecular level the process is best understood 563 in the model organism Bacillus subtilis (98). For species belonging to the class Clostridia, the process 564 of sporulation is mainly studied in microbes in which sporulation has been shown to play a big role in 565 other processes such as virulence (C. perfringens, C. difficile, C. botulinum and C. tetani) or solvent 566 production (Clostridium acetobutylicum). Studying these microbes has made it clear that there are 567 significant differences in the sporulation and germination process in species belonging to the class 568 Clostridia compared to members of the Bacilli (47, 99, 100). The B. subtilis proteins involved in the 569 early stages of sporulation, i.e. onset (stage I), commitment and asymmetric cell division (stage II), and 570 engulfment (stage III), are largely conserved in Clostridia species. However, many of the proteins that 571 play a role in later stages, i.e. cortex formation (stage IV), spore coat maturation (stage V), mother cell 572 lysis and spore release (stage VII), appear to be less conserved. For example, limited spore outer layer 573 conservation was observed in C. difficile compared to B. subtilis (101). Comparative genomic based-574 studies have tried to define the minimal set of genes essential for sporulation in clostridial species, 575 but this has appeared to be challenging (47,100,102). In all spore-formers, initiation of sporulation is 576 controlled by the transcription factor Spo0A, a highly conserved master regulator of sporulation. 577 Phosphorylation of Spo0A leads to the activation of a tightly regulated cascade involving several sigma 578 factors that regulate the further expression of a multitude of genes involved in sporulation. There are, 579 however, significant differences in the regulation of the sporulation pathway between different 580 clostridial species (103) of which we do not completely understand the impact on sporulation itself, 581 highlighting that there is still a big gap in our knowledge on the complex process of sporulation. 582 The genomes of the three Romboutsia genomes were mined for homologues of sporulation 583 specific genes according to the publication of Galperin et al. (47). All three Romboutsia strains have 584 similar sets of sporulation-related genes, with R. ilealis CRIB T having the least number of genes (147 585 genes) and R. lituseburensis A25K T having the most (183 genes) ( Table S1). The only protein that is 586 deemed essential for sporulation, but which was only found in the genome of R. lituseburensis A25K T , 587 is the Stage V sporulation protein S which has been implicated to increase sporulation (104). For R. 588 lituseburensis A25K T , the sporulation regulator Spo0E was predicted to be absent, due to a point 589 mutation in the start codon of the corresponding gene, changing it to an alternative start codon. This 590 regulator is suggested to be involved in the prevention of sporulation under certain circumstances 591 (105); impact of the point mutation on the presence/absence of the protein and on regulation of 592 sporulation in R. lituseburensis A25K T will have to be determined. Interestingly, the stage V sporulation 593 proteins AA and AB, encoded by spoVAA and spoVAB, that are essential for sporulation in Bacilli since 594 mutants lead to the production of immature spores (106), are absent in sporulating Clostridium 595 species, but are present in all three Romboutsia strains. Furthermore, R. lituseburensis A25K T is the 596 only strain that contains the sps operon that has been shown to be involved in spore surface adhesion 597 (107). Absence of this operon in B. subtilis resulted in defective germination, and more hydrophobic 598 and adhesive spores, however, given that these proteins are also absent in nearly all clostridial species, 599 their role in sporulation and germination in the Romboutsia strains still has to be determined. As also 600 noted by Galperin et al. (47), there are other species that have been demonstrated to be spore-601 forming but which also lack some of the genes that are deemed to be essential, e.g. spoIIB, spoIIM, 602 and other proteins from the second sporulation stage in Lysinibacillus sphaericus C3-41v (47). In 603 comparison, it is interesting to note that the genome of C. hiranonis, a close relative of the genus 604 Romboutsia (and C. difficile), appears to contain only 21 of the essential sporulation genes, missing 605 for example most of the proteins related to the second and third stage of sporulation, while C. 606 hiranonis is known to form spores ((108), and own observations). Altogether, based on gene 607 presence/absence it is not possible to predict whether these Romboutsia strains are indeed able to 608 successfully complete the process of sporulation and release endospores. An asporogenous 609 phenotype could be credited to the absence or mutation of a single gene. 610 Initiation of sporulation is still a topic of interest. Accessory gene regulatory (agr)-dependent 611 quorum sensing, and thus most likely cell density, has been proposed to play an important role in 612 efficient sporulation (109). For C. difficile, however, quorum-sensing has been shown not to play a role 613 in initiation of sporulation, and recently a more direct link between nutrient availability and 614 sporulation was suggested (110). The uptake of peptides by the Opp and App oligopeptide transport 615 systems appears to prevent initiation of sporulation in nutrient rich environments (69). Both transport 616 systems are absent in R. ilealis CRIB T , but are present in the two other Romboutsia strains. 617 During sporulation, a number of species produce inclusion bodies and granules that are visible 618 by phase contrast and electron microscopy. This is also true for R. lituseburensis A25K T in which 619 electron translucent bodies are visible in TEM pictures (Fig. 1), similar to the carbohydrate or 620 polyhydroxybutyrate inclusions observed in for example Clostridium pasteurianum (111), C. 621 acetobutylicum (112) and C. botulinum (113). The development of these inclusion bodies appears to 622 coincide with the initiation of sporulation. Based on this observation, it can be speculated that by 623 intracellular accumulation of a carbon and energy source these microbes ensure they can complete 624 the sporulation process with only limited dependence on external carbon and energy sources. 625

626
Based on the comparative genome analysis presented here we can conclude that the investigated 627 genomes of the genus Romboutsia encode a versatile array of metabolic capabilities with respect to 628 carbohydrate utilization, fermentation of single amino acids, anaerobic respiration and metabolic end 629 products. A relative genome reduction is observed in the isolates from intestinal origin. In addition, 630 the presence of bile converting enzymes and pathways related to host-derived carbohydrates, point 631 towards adaption to a life in the (small) intestine of mammalian hosts. For each Romboutsia strain 632 unique properties were found. However, since currently only one genome was available for each 633 species, it is impossible to unequivocally predict which properties might apply to each species and 634 which properties are strain-specific. Isolation and genome sequencing of additional strains from 635 diverse environments is needed to provide a more in-depth view of the metabolic capabilities at the 636 species-as well as the genus level and to reveal specific properties that relate to adaptation to an 637 intestinal lifestyle. Availability of data and material 644 All data has been uploaded to the European Nucleotide Archive under project numbers PRJEB7106 645 and PRJEB7306 646 the TEM picture of R. lituseburensis A25K T . We would also like to thank Jasper Koehorst for help with 666 the annotation, Bart Nijsse for assisting with the different software packages, and Jesse van Dam for 667 help with the Pathway Tools lisp interface. In addition, we would like to thank William Trimble from 668 the Argonne National Laboratory for helpful discussions concerning genome assembly with MiSeq 669 data sets. 670