Molecular Characterization of Fischerella uthpalarensis, the first subsection V cyanobiont from a tropical Azolla species containing dual nitrogenases

There have been theories presented on Azolla cyanobionts, known for voracious nitrogen fixation, vertical transmission of cyanobiont and helping transform a greenhouse planet to an icehouse one ~49 Mya. One such theory encapsulates the existence of two cyanobionts, named Major and Minor. We show here the identity of a possible minor cyanobiont of Azolla named Fischerella uthpalarensis. A likely cyanobiont with straight or curved filaments that were truly branched was isolated. Seven gene fragments, namely 16s rDNA (Forward and Reverse), RNA Polymerase, ITS1 region (Forward and Reverse), nifD and GroEL genes were utilized to identify the isolated cyanobiont. The best match based on BLASTn search tool was found in the RNA Polymerase beta subunit (rpoC) gene fragment, that showed 99.54% identity with 55% coverage to Fischerella muscicola. Phylogenetic inferences with the rpoC genetic locus and the GroEL protein sequence suggest a likely Fischerella genus identity. Furthermore, VnfDG and VnfN fragments too were amplified using PCR and sequenced to demonstrate that this cyanobiont has alternate nitrogenase genes, on top of the molybdenum counterpart, providing an advantage in lifestyle. We encountered a higher level of genomic-level synonymous substitutions, which was not reflected in protein sequences, namely VnfDG and VnfN gene products, which may be due to codon heterogeneity. We also propose for F. uthpalarensis atypicality in codon usage due to the likely acquisition of the V-nitrogenase operon from a presumed recent horizontal gene transfer (HGT) event. The cyanobiont from this study showcases a higher preference for AT over GC at the VnfDG composite locus again hinting at a symbiotic lifestyle.


Introduction 28
Azolla is a genus of independent, floating, aquatic, pteridophyte water ferns belonging to 29 the Azollaceae family. Azolla pinnata belongs to section Rhizosperma. Azolla fronds harbor 30 a mutualistic symbiosis with a filamentous, heterocyst-forming, nitrogen-fixing 31 cyanobacterium most popularly named Nostoc azollae. The cyanobacterium is found 32 within a narrow region in the extracellular ovoid cavity, which is ensheathed by a 33 mucilaginous network [1], [2] (Qiu and Yu, 2003;Pereira and Vasconcelos, 2014). The 34 relationship between Azolla and cyanobacterium is believed to have originated millions 35 of years ago as a permanent plant-cyanobacterium symbiosis. Azolla has also been 36 studied for its use in agriculture as a nitrogen biofertilizer [3], [4] (Wagner, 1997;Pabby et 37 al., 2003). The genome of Azolla filiculoides was sequenced in 2018; this fern species may 38 be useful as a substitute for urea and as a solution to climate change [5] (Li et al., 2018). 39 The theory of an exclusive cyanobiont living in Azolla has now been overtaken by the 40 "two strain" hypothesis, whereby major and minor contenders, both of the genus Nostoc 41 or Anabaena result in a subsection IV cyanobacterium [2], [6], [7] (Zimmerman et al., 1989; 42 Gebhardt and Nierzwicki-Bauer, 1991; Pereira and Vasconcelos, 2014). It is estimated 43 that the cyanobionts only make up 1% of the biomass of Azolla fronds [8] (Rai et al., 44 2000) Lectins produced by plants, enabling recognition by the cyanobiont at lectin 45 interfaces, are heterogeneous in Azolla species [9] ( McCowen et al., 1986) suggesting to us 46 that there are distinctive, varied and specific cyanobionts for different Azolla species. 47 This alludes to distinctive entry points for diverse cyanobionts and a gamut of host 48 specificities [10] (Papaefthimiou et al., 2008), of which all contenders are perhaps yet to 49 be discovered. 50 A few studies have led to successful isolation of the secondary cyanobiont[6], [7], [11]-51 [13] (Ladha and Watanabe, 1982;Zimmerman et al., 1989;Gebhardt and Nierzwicki-52 Bauer, 1991;Rajaniemi et al., 2005;Sood et al., 2008). Separation of the cyanobiont from 53 Azolla is important for its identification using molecular biology tools. The identity of the 54 cyanobiont of Azolla spp still remains uncertain [2] (Pereira and Vasconcelos, 2014); aside 55 from the major-minor hypothesis of two Nostoc/Anabaena strains, there may be another 56 genus living symbiotically inside Azolla [2], [14], [15] (Caudales et al., 1992;Ekman et al., 57 2008; Pereira and Vasconcelos, 2014). It is widely accepted that the major cyanobiont is a 58 species of Nostoc azollae; however, the identity of a secondary cyanobiont has been 59 unclear until this study. 60 Cyanobacteria and cyanobionts have been reported to possess 3 types of nitrogenases, 61 the most common being the molybdenum-dependent nitrogenase, the second most 62 common being the vanadium-reliant nitrogenase and the rarest, the iron-based 63 nitrogenase [16] (Hu et al., 2012). So far, vanadium nitrogenase dependent 64 microorganisms have been discovered in wood chips, soils, and mangrove sediments, 65 which show that they have the potential to arise from diverse environments [17] (McRose 66 et al., 2017). Outside of the genus Azotobacter, there are also, a handful of other soil 67 bacteria that fix nitrogen with the help of vanadium, namely the genera, Methylocysis, 68 Phaeospirillum, Rhodomicrobium, Rhodopseudomonas, Methanosarcina and Tolumonas. It has 69 been calculated that alternative (V) nitrogenases contribute 14-21% of diversity of 70 nitrogenases, and a 24% contribution, to nitrogen fixation, which extends their 71 importance to rice growing soils [17] (McRose et al., 2017). In cyanobacterial symbioses, 72 namely the genera Peltigera (lichen), Anthoceros and Blasia, the latter two being 73 bryophytes, biology appears to favor the presence of a vanadium-nitrogenase on top of 74 the molybdenum counterpart [18] (Nelson, 2019). 75 The fact that there is higher levels of vanadium in the soil appears to have little 76 repercussions on the favorable expression of the molybdenum counterpart. In the genus 77 Azotobacter, vanadium is transported inside using siderophores such as V-azotochelin 78 only when the cells are conditioned to low molybdenum concentrations, which suggests 79 that the internalization of vanadium to be coupled to the preferential activity of the V-80 dependent nitrogenase. Such internalization is dependent on "vanadophores", a subset 81 of siderophores that are able to transport vanadium internally [19] (Rehder, 2008). It has 82 been demonstrated that the vanadium-dependent nitrogenases use ABC transporters for 83 the internalization of soluble and bioavailable vanadate, in microorganisms such as 84 Anabaena variabilis [20] (Pratte et al., 2006). 85 In this study, we report evidence for the first subsection V cyanobiont in an 86 endosymbiotic relationship with a plant (likely to be Azolla microphylla). The sequence of 87 experiments performed in this study include: the isolation of the cyanobiont, detailing 88 the technical replicates and negative controls used to authenticate the isolation 89 procedure; morphological investigation of the cyanobiont using optical microscopy; and 90 the sequencing of five separate loci, followed by BLASTn analyses of fragments against 91 the NCBI non-redundant nucleotide database. We have also reported in a recent paper 92 that this cyanobacterium is likely to possess an "alternate" vanadium nitrogenase in 93 addition to the molybdenum counterpart [21] (Pushpakumara and Gunawardana, 2018), 94 to which we present genomic evidence. 95

Culture of the Azolla Cyanobiont in a Free-living State 97
The isolation and cultivation of the cyanobiont of Azolla plants in a free-living state was 98 performed according to a previous protocol [21](Pushpakumara and Gunawardana, 99 2018). The protocol is explained in brief in the following paragraph. The isolation 100 procedure was carried out inside a standard laminar flow. The working surface and 101 walls of the laminar flow cabinet were cleaned using 70% ethanol before starting the 102 isolation procedure. 103 Fifteen grams of fresh fern tissue was weighed and washed in running tap water for 10 104 minutes. The ferns were then surface sterilized in 10% Clorox for two minutes, followed 105 by a one-minute rinse in sterile 0.01 M HCl. The ferns were then rinsed for one minute in 106 sterile distilled water twice. Following the disinfection, ferns were homogenized in 50 107 mL BG110/8, filtered, then the filtrate was centrifuged at 500 × g for 5 minutes. The 108 resulting cell pellet was washed in 2 mL BG110/8 and 500 µL was transferred to 10 mL 109 BG110/8 in a McCartney bottle. The medium was incubated under natural conditions, 110 exposing an approximate 12-hour day and 12-hour night cycle for two weeks with 111 intermittent shaking by hand. After two weeks incubation, 5 mL of culture was 112 transferred to 10 mL BG110/8 and incubated under the same conditions for another two 113 weeks. After two weeks, 10 mL of culture medium was introduced to an increased 114 volume (20 mL) of BG110/8 in a conical flask and incubated on a gyrotory shaker at 150 115 rpm close to a window to receive sunlight during daytime. After 1 week of incubation, 116 the culture was exposed to full strength N-free BG110, where 3 mL of full-strength N-free 117 BG110 was introduced in 5-day intervals until there was a significant biomass growth. 118

PCR amplification and sequencing. 121
PCR primers and conditions for PCR reactions of this study are provided respectively in 122 Table S1 and S2 in supplementary materials. Amplified PCR products were analyzed 123 using 1.5% agarose gel electrophoresis. The size of the band was confirmed using a 1 Kb 124 ladder (Promega). Sequencing was performed bidirectionally using the respective 125 forward and reverse primers at Macrogen (Korea). 126

Sequence Alignments and Phylogenetic Analyses 127
Sequence alignments were performed using ClustalW Omega server for both DNA and 128 protein sequences [ Pereira and Vasconcelos, 2014). Scientists 147 believe that it has either lost autonomy or will during epochs to come. Only a few gene 148 families have been retained by the eroded genome, while the functionality of genes 149 responsible for glycolysis and nutrient uptake have been lost [27] (Ran et al., 2010). The 150 genomic erosion of the major cyanobiont has made its independent culture difficult, if 151 not impossible. The degree of gene loss has led to a poor outlook for culturing the major 152 cyanobiont as a monoculture. It is noteworthy that the major cyanobiont is vertically 153 transmitted, unlike other plant endosymbionts, which are horizontally transmitted [28] 154 (Zheng et al., 2009). Therefore, there is no infection/colonization of the host; instead, the 155 cyanobiont is inherited inter-generationally. 156 We initially investigated the cultivation of a cyanobiont, past literature has called 157 "Minor", in comparison to the uncultivable "Major" cyanobiont. For facility, we call this 158 isolate "Minor" due to solely its cultivable status, and to demarcate it away from the 159 major partner, which we know as Nostoc azollae. We are in no way suggesting 160 exclusively the candidacy of the minor cyanobiont here, only suggesting an easy 161 identifier, to a cultivable microorganism. It is noteworthy that many studies of the 162 Anabaena/Nostoc/Trichormus-Azolla symbioses have attempted to separate the two 163 perceived organisms and culture the cultivable cyanobacterium in a free-living state.
164 Although isolation methods have been successful, there has been disagreement over the 165 appropriate surface sterilization procedures used for Azolla, which requires closer 166 investigation [2] (Pereira and Vasconcelos, 2014). In the present study, a surface 167 sterilization procedure was used on 15 g (fresh weight) of fern tissue, which was initially 168 washed under running tap water for 10 min. Ferns were then surface-sterilized in 10% 169 Clorox for two minutes and rinsed for 1 min in sterile 0.01 M HCl. The ferns were then 170 rinsed twice in sterile distilled water for 1 min. We also performed three control 171 experiments (technical replicates), namely 5 to 6 fronds, which were transferred to a 172 McCartney bottle containing 10 mL of one-eighth-strength N-free BG110 medium (N-free 173 BG110/8) under sterile conditions. The bottle was closed, and hand-shaken intermittently 174 for 15 minutes. Fronds were then carefully removed using aseptic technique. The bottle 175 containing 10 mL N-free BG110/8, was then incubated under natural conditions 176 (approximately 12 hours day and 12 hours night cycle) for two weeks and hand shaken 177 2-3 times per day. A green-colored growth was observed after 2 weeks in the surface 178 sterilized and crushed fronds, while there was no growth in negative controls [21] 179 (Pushpakumara and Gunawardana, 2018). This demonstrates that the isolate we 180 describe later as a subsection V cyanobiont is not a transient epiphytic colonizer, but is a 181 permanent endosymbiont. To date, only a single occurrence of a Fischerella species has 182 been reported in the leaf compartment, which was found in a phyllosphere in the rain 183 forests of Costa Rica [29] (Finsinger et al., 2008). 184 A key question is why no subsection V species have previously been reported from the 185 leaves of Azolla species. The level of care devoted to the isolation process in this study 186 and strength of the disinfection procedure ensure that the present findings are authentic. 187 Furthermore, negative controls failed to yield a detectable culture. A recent study 188 showed the use of a two-step approach to collect other minor bacterial inhabitants from 189 the leaf cavity of Azolla (1% bleach for 40 seconds and four rinses of water) 190 [30] (Dijkhuizen et al., 2018). The present study details a more comprehensive approach 191 consisting of four steps: water, Clorox, HCl, then water. The success in isolating 192 cyanobiont from 3 technical replicates provided strong evidence that the cyanobiont is a 193 permanent candidate that resides abundantly in the leaf cavity. 194 We grew the isolated cyanobiont and extracted DNA from the culture. From the 195 genomic DNA template, the loci 16s rDNA (1432-1439 bp), RNA polymerase C (861 bp), 196 ITS region (variable) nifD (338 bp) and GroEL (1485 bp) gene fragments were amplified 197 for identification and taxonomic purposes. 16s rDNA is the gold standard for eubacterial 198 taxonomy and we assessed this locus using both NCBI nucleotide database. It is 199 noteworthy to mention that cyanobionts are the only nitrogen fixers found within According to bacterial taxonomic rules, >97% identity shared between 16s rDNA 204 sequences is sufficient for the classification of a bacterium at the species level 205 [33] (Tindall et al., 2010). Further, >95% is the upper boundary for estimating a genus; the 206 lower boundary is >90%, although the lower boundary is not frequently used for 207 taxonomic purposes [34], [35](Janda and Abbott 2007; Sabat et al., 2017). Initially, the 208 nifD DNA fragment and cell morphology studies were used to determine the species of 209 the cultured cyanobiont. The nifD contig derived from the forward and the reverse 210 sequences, shared >92% sequence identity with Fischerella sp. UTEX 1903 (Table S3). 211 The occurrence of polymorphisms varies between 16s rDNA and nifD genes; it is 212 estimated that the polymorphisms in nifD genes are 6-fold greater than 16s rDNA genes, 213 which makes them more variable and prone to mutate in related sequences [36] (Qian et 214 al., 2003). A 396bp rpoB sequence has been shown before to be able to discriminate 215 between Mycobacterial species which are 91.8% similar in sequence, suggesting that the 216 usage of the nifD gene (338bp) to discriminate between members of IV and V 217 cyanobacteria, is just as strong in phylogenetic resolve [37] (Devulder et al., 2005). 218 The 16s rDNA sequence obtained from post-amplification sequencing, pointed to a 219 Fischerella muscicola relation, with the query sequences in forward (B23S primer) and 220 reverse (pA primer), aligning with 61%-65% coverage to a similar locus in Fischerella 221 muscicola when BLASTn was used for the determination of close relatives. The level of 222 identities for 16s rDNA were 95%-97% which was sufficient for genus level identification 223 but not for the determination of species level match. In fact, the >800 bp partial fragment 224 of rpoC presented us with a 99.54% identity (at 55% coverage) with the same free-living 225 microorganism, again suggesting that the likely cyanobiont was closest in taxonomy to 226 the genus Fischerella. However, the sequencing data from all five loci (seven reads in 227 total) were insufficient for the identification of the cyanobiont at the species level. The 228 genome of Fischerella muscicola, isolated from a paddy field in India in 1951, has been 229 sequenced and is likely to present the closest identity to the cyanobiont we have 230 isolated. 231 RNA polymerase genes (rpoB and rpoC) are implicated in superior growth rates in 232 E.coli, and consequently can be subjected to growth maximizing mutations. While the 233 closest match of the rpoC contig assembled in this study, was Fischerella muscicola, 234 there was a considerable portion of the sequence that was not covered by the sequenced 235 861 bp fragment. This segment (1-349 bp) did not align with any sequence in the NCBI 236 nucleotide database. It has been demonstrated that large-scale genome rearrangement 237 and variation in genes and gene content, are key factors in cyanobiont genomes 238 containing vanadium-dependent nitrogenases. When a Maximum Likelihood 239 phylogenetic tree was built using rpoC gene sequences from 22 cyanobacteria, rooted to 240 the subsection IV, Nostoc sp. LP11, we could clearly establish the evolutionary 241 relationship between the cyanobiont of this study and Fischerella muscicola as they 242 formed a monophyletic clade, with 100% bootstrap support ( Figure 1 The internally transcribed spacer region is found between 16s and 23s rRNA sequences, 262 which can be easily amplified using primers that bind to the conserved flanking 263 sequences. The ITS region can be strongly variable which transforms such regions to be 264 good candidates for phylogenetic inferences at higher taxonomic levels. The region 265 sequenced by the ITSCYA236F primer encompasses a region that is 240 bp is length 266 within the locus AF105134 of Fischerella muscicola, representing a region of tRNA-Ile and 267 tRNA-Ala genes (incomplete sequences), internal transcribed spacer (partial sequence) 268 and 23s ribosomal RNA (partial sequence). The region sequenced by the ITSCYA236F 269 reverse primer is representative of a similar fragment; tRNA-Ile and tRNA-Ala genes 270 (incomplete sequences), internal transcribed spacer (partial sequence), and 23s ribosomal 271 RNA (partial sequence) spanning from 9 to 250, of the AF105134 (460 bp total sequence) 272 fragment. Cyanobacteria possess three kinds of 16s-23s ITS regions, the most prevalent 273 being the ones that contain the two tRNA genes, that has been detected in Nostoc,

274
Anabaena and many other Nostocales cyanobacteria, which is perhaps synonymous with 275 the findings of this study. We too assessed phylogeny from GroEL proteins since they 276 are included/listed as loci/products capable for phylogenetic inferences (Figure 2 (Gugger and Hoffman, 2004). It can also be seen that cell division occurs in 329 many planes, as in the genus Fischerella, while forming V-shaped junctions with a 330 "keystone or bridging cell" at the center that anchors three-way branching (Figure 3). On 331 the other hand, other Fischerella species (e.g. F. muscicola) possess T-shaped junctions 332 that branch at perpendicular angles[40] (Gugger and Hoffman, 2004). To summarize, 333 sequencing and morphology results suggest that a novel subsection V cyanobiont in a 334 plant symbiosis, the first of its kind, has been isolated in this study.

3.2: Dual Nitrogenases of Fischerella uthpalarensis 352
We showed in two previous articles, that the minor cyanobiont inside a likely tropical 353 Azolla microphylla, has a likely vanadium-dependent nitrogenase, on top of the 354 molybdenum counterpart [21], [41] (Atugoda et al., 2018; Pushpakumara and 355 Gunawardana, 2019). We show here conclusively using both PCR amplification and 356 sequencing of two loci, that an alternate vanadium nitrogenase is encoded by the 357 cyanobiont genome. The PCR products from vnfDG (A fusion single gene made of vnfD 358 and vnfG) and VnfN were synonymous with the expected size of the amplicons, while 359 sequencing showed that they encoded for parts of an alternate vanadium nitrogenase 360 ( Figure S1(A) and (B)). We too have amplified nifH and nifD genes from the 361 cyanobiont's genome ( Figure S1 (C)) showcasing both nitrogenases are present. 362 It has been demonstrated that cyanobionts of Peltigera lichens and bryophyte genera, 363 Blasia and Anthoceros, have cyanobionts with vanadium-dependent nitrogenase-364 encoding capacities in their genomes, and our study adds another cyanobiont to the 365 growing list of cyanobionts proven to encode vanadium nitrogenases [18] (Nelson, 2019). When the vnfN gene fragment sequenced in this study, was searched using the BLASTn 375 tool, the best alignment was found in Nostocales cyanobacterium HT-58-2 at a paltry 376 65% sequence coverage and 88.34% sequence identity, while the second-best alignment 377 was found with Peltigera membranacea (LA31632_AccXBB013(IINH)) cyanobiont, at 378 65% coverage and 85.88% identity (Table S4). There were no candidates from the genus 379 Fischerella in the list of homologous gene fragments when the DNA sequence was used 380 for BLASTn searches. In contrast to the BLASTn results, when the vnfN fragment was 381 translated in all six open reading frames (ORFs), the longest translated sequence gave us 382 a protein product that had 95% sequence coverage and 99% sequence identity to the 383 vnfN protein from Fischerella muscicola when searched using the BLASTp search tool 384 (Table S4). 385 A similar story was shown for the bidirectionally sequenced amplicon produced by 386 VnfDG2F and VnfDG4R primers, again showing 82.32% identity with 82% coverage to 387 the nearest match, while when the longest ORF was translated, the translation product 388 showed 98.48% sequence identity to Fischerella muscicola (Table S4). A higher deviation 389 in both the vnfN and vnfDG amplicons against the nearest neighbor at the DNA level 390 was not reflected in the sequence deviation in relation to "protein sequence" queries, 391 which yielded a near perfect match (98-99% identity); i.e. synonymous substitutions are 392 commonplace in sequence. The explanation to this observation can entail several 393 possibilities from genome instability, differences in generation time (how fast 394 recombination occurs giving rise to mutations), codon usage heterogeneity, DNA-repair 395 systems and host-supplied proteins that may have impacts on the types of mutations [42] 396 (Lopez et al., 2019). 397 The rice plant ( Figure 4) in particular, is known to be surrounded by a few vanadium 398 dependent N-fixing microorganisms. Both, Anabaena azotica FACHB-118 and Anabaena 399 sp. CH1, associated with rice paddies contain an alternate nitrogenase [43]. Furthermore, 400 bacteria of the genus Azotobacter are widely cultured from paddy fields and such 401 bacteria employ V-dependent nitrogenases for nitrogen fixation. Vanadium can be 402 found as four oxidation states that are relatively stable in aqueous environments, namely 403 V 2+ , V 3+ , VO 2+ , and VO 2+ (the oxidation states 2, 3, 4, and 5, respectively). Vanadium is approximately 1-100 fold higher in the soil compared to molybdenum. 416 Molybdenum on average, is found at 1-10 ppm in Asian paddy soils while measured as 417 20-30 ppm in a subset of Asian soils. The mean molybdenum content in tropical paddy 418 soils is within the bracket of 2.3 -3.3 ppm out of which the lowest value was found in Sri 419 Lanka, West Malaysia and Cambodia [44] (Domingo and Kyuma, 1983). In contrast, a 420 separate study measures vanadium content as between 3 to 500 ppm [45] (Swaine 1955, 421 Bowen, 1979. The analysis of vanadium in tropical Asian soils, gives a mean value of 422 166 ppm that is within the above ranges [44] (Domingo and Kyuma, 1983). Whether such 423 skewed numbers of the ratio between vanadium and molybdenum contribute to the 424 opulence of V-nitrogenase armed microorganisms in paddy cultivated areas, is a future 425 empirical crusade. Chemically, Vanadium is redox-sensitive and therefore prone to be 426 transformed to a reduced redox potential in flooded environments such as paddy fields. 427 A higher solubility of vanadium is likely to occur under most flooding conditions due to 428 vanadate(V)-sorbing iron complexes attenuated in dissolution capacities and due to 429 vanadate(V) being reduced to vanadyl (IV) that forms complexes with organic matter in 430 irrigated rice fields [46] (Gustafsson, 2019). Therefore, the availability of vanadium 431 increases with irrigation/flooding which may be the reason for the richness of V-432 dependent microorganisms in paddy fields. 433

3.3: Molecular clues as to Horizontal Gene Transfer and Symbiosis 434
We draw upon "alien/atypical" codon usage to gather evidence of horizontal gene 435 transfer events. It is hypothesized that recent horizontally-acquired genes have 436 "alien/atypical" codon usage and undergo an amelioration period from donor to 437 recipient lifestyles following the transfer event [42] [47] (Ermolaeva, 2001). Furthermore, highly expressed genes such 441 as nitrogenases are dependent on the availability and abundance of the correct tRNA 442 species for translation of mRNA to a fully-fledged protein [48] (Wang et al., 2011). Highly 443 expressed genes such as VnfDG are under higher natural selection for highly efficient 444 codons. 445 We looked at the functionally-key cysteine (6 in number in the partial fragment) within 446 the protein sequence of the VnfDG gene product. In analyzing codon usage 447 heterogeneity (Table S5), we found evidence as to the higher utilization of the TGT 448 codon for cysteine (over the TGC counterpart), which was atypical in relation to the 449 common usage of the two codon types ( Figure S2(A) and (B); Table S5). Such a 450 phenomenon was not observed in free-living Anabaena variabilis full length VnfDG 451 gene where TGC>TGT in coding for cysteines (data not shown). In fact, three TGT 452 codons (out of a total four) encoding cysteines in F. uthpalarensis VnfDG partial 453 sequence are found as either TGC (2) and CGT (1) in the model organism for alternate 454 nitrogenases, A. variabilis ( Figure S2 (B)). 455 Cysteines are crucial from a functional perspective and were not available at the origin 456 of life. Cysteines are equipped with thiol/sulfhydryl groups, have a propensity to form 457 disulfide bonds, are found as highly-conserved residues in protein sequences, forms 458 clusters in close proximity, possess high metal binding affinities, while have a duality of 459 opinion in relation to its hydrophobic/non-hydrophobic nature [49] (Poole 2015). Our 460 observations propose that widespread DNA level synonymous substitutions can be 461 attributed to codon heterogeneity stemming from different codon usage strategies in 462 Fischerella uthpalarensis. Two cysteines in the VnfDG gene product of F. uthpalarensis, 463 the authors propose here to be stemming from important mutations ( figure S2 (B)). 464 When the DIANNA web server was used to show function of the above Cys-98, the 465 result revealed a case for a half-cystine (Cys-98), which are known to form disulfide 466 bonds with downstream or upstream residues (Table S5). 467 The authors suggest the newly acquired nature of the alternate nitrogenase-encoding 468 operon could mean that the codon usage preferences are yet to be reassigned within the 469 new host, and consequently can be designated as atypical or alien. We propose here that Fischerella uthpalarensis provides a possible "symbiotic" system 514 to study due to availability of the genome of the likely parent strain, Fischerella 515 muscicola, and due to the cultivability of the microorganism from fronds of Azolla 516 plants 517     538 Table S3: The locus, primers and the BLAST data obtained using BLASTn search tool.
539 Table S4: BLAST searches of sequenced DNA of selective sequenced genes encoding vanadium 540 dependent protein products and their translated counterparts. Table S5: Prediction of half-cystine, free cysteines and ligand-bound cysteines in the partial VnfDG 542 gene product of F. uthpalarensis.   Funding: There is no funding to report since this study was an undergraduate project.

556
Data Availability Statement: The datasets generated during and/or analysed during the current 557 study are available from the corresponding author on reasonable request.