ABSTRACT
During evolution, microorganisms exposed to high UV doses have developed a complex battery of physiological and molecular mechanisms to cope with UV stress and damage recently called as the UV-resistome. As a precedent, the UV-resistome was analyzed at the genomic and proteomic level in poly-extremophiles bacteria isolated from High-Altitude Andean Lakes. In this work, we go further by exploring the impact of UV-B radiation on microbiomes in different geographic regions across the globe. The abundance of photoprotection and DNA repair pathways in each microbiome was use to configure the world-wide UV-metaresistome. Metagenomics combined with georeferenced climate data indicated that the higher the UV-B dose suffered by the microbiome the higher the abundance and diversity of the UV-metaresistome genes. In contrast, a substantial depletion on microbial diversity was observed in higher irradiated environments. A positive correlation between CPF abundance and radiation intensity/photoperiod was detected. CPF genes were investigated in the global dataset, being present in most abundant organisms of communities that are facing significant exposure to sunlight. Three extra clades not identified so far and a widespread occurrence of cryptochromes-related genes were reported.
INTRODUCTION
Radiation, although essential for the existence of complex life, can become adverse when its natural dose is altered. In the past decades, a rise of biologically harmful UV radiation (UV) at the Earth’s surface was evident together with its detrimental effect on all life forms. This rising was a natural consequence of the drastic reduction of stratospheric ozone caused by increased concentrations of chlorofluorocarbons (CFCs) (Aucamp, 2007) and other halon gases in the upper atmosphere (Russell III et al., 1996).
The amount of UV radiation reaching the ground comprises only a small proportion of global radiation, about 6–7 % of UV-A (320–400 nm) and less than 1.0 % of UV-B (280–315 nm) (Hu et al., 2008). Biological damage is wavelength dependent: UV-A causes only indirect damage to DNA, proteins, and lipids through reactive oxygen intermediates. On the other hand, UV-B and UV-C (100 to 280 nm) cause both indirect and direct damage to DNA because of the strong absorption at wavelengths below 320 nm by the DNA molecule (Mitchell and Karentz, 1993). Thus, UV-A/UV-B are biologically relevant as their actual effects on living organisms can be of ecological impact. Numerous studies reported the UV-B effects on plants (Searles et al., 2001; Xiong and Day, 2001; Robinson et al., 2005; Ruhland et al., 2005; Yan et al., 2012), animals (Robson et al., 2005; Bao et al., 2014) and microorganisms (Zaller et al., 2002; Avery et al., 2004; Rinnan et al., 2005; Piccini et al., 2009). Increased UV-B is known to alter photosynthetic capacity and reduce biomass production in plants (Xiong and Day, 2001; Jansen et al., 2010). In marine environments, UV-B drives the dynamics of larval krill, affects the coral–algal symbiosis and alters bacterial assemblages impacting higher trophic levels (Häder et al., 2011). UV-B has the potential to cause negative effects on bacterioplankton since bacteria possess simple haploid genomes and their small size precludes effective cellular shading and reduces the benefits of protective pigmentation (Garcia-Pichel, 1994). These organisms play a central role in the cycling of nutrients as bacteria may account for up to 90% of the cellular DNA in aquatic environments. In fact, they constitute a fundamental block chain in carbon flow so any influence of UV on bacteria in the environment will impact the rest of the ecosystem food chain (Karentz, 1995; Joux et al., 1999). At soil level, UV-B influences decomposition processes by altering microbial diversity (Ballaré et al., 2011), thus affecting crops.
During evolution, microorganisms exposed to high UV doses have developed a range of strategies to cope with UV stress and damage. Recently, it was pointed out the need to understand the integral UV-response as a complex battery of physiological and molecular mechanisms known as UV-resistome (Kurth et al., 2015; Portero et al., 2019). Thus, the UV-resistome depends on the expression of a diverse set of genes devoted to evade or repair the damage provoked direct or indirectly. Ideally, it encompasses the following subsystems: (1) UV avoidance and protection strategies; (2) UV or other stress sensors with their corresponding response regulators; (3) Damage tolerance and oxidative stress response; and (4) DNA damage repair. As a precedent, the UV-resistome was analyzed at the genomic and proteomic level in a poly-extremophile Acinetobacter sp. Ver3, a gamaproteobacterium isolated from High-Altitude Andean Lakes (HAAL). The genes from most of the described subsystems were identified and supported the outstanding UV-resistance profile observed for this strain (Kurth et al., 2015). In a follow-up work, the UV-B resistome subsystems on avoidance, protection strategies and DNA damage repair were explored for Ver 3 as well in HAAL Gram positive bacteria: Exiguobacterium sp. S17 and Nesterenkonia sp. Act20 (Portero et al., 2019).
An example of UV avoidance and protection strategies can be found in motile biomass producers and consumers, which uses vertical migration in aquatic ecosystems to avoid excessive radiation. Sessile (attached) organisms rely on selection of habitat to limit solar exposure. On the other hand, different taxonomic groups have developed a number of photoprotective substances such as melanins, mycosporines, mycosporine-like amino acids (MAAs), scytonemin, carotenoids, phycobiliproteins, ectoine and several other UVR absorbing compounds of yet unknown chemical structure (Häder et al., 2011). In most cases, screening pigments are not effective in bacteria, because of their small size, which would require the concentration of these screening compounds to be excessively high to provide sufficient absorption. In Exiguobacterium sp. S17 and Nesterenkonia sp. Act20, pigment extraction indicated the presence of carotenoid-like compounds in cells suggesting an antioxidative defense or protective role for them (Portero et al., 2019).
A great number of specific and highly conserved DNA repair mechanisms has been developed against DNA damage, as there are photoreactivation (PR), excision repair (NER and BER), mismatch repair (MMR) and homologous repair (HR). In addition, damage tolerance (dimer bypass), SOS (save our soul) response, checkpoint activation, and programmed cell death (PCD) or apoptosis efficiently act against DNA lesions ensuring the genomic integrity.
Of particular importance is PR, executed by photoreactivating enzymes known as “photolyases”. Photolyases are monomeric proteins of 53–66 kDa that contain flavin adenine dinucleotide (FAD) as cofactors and antenna pigments such as deazaflavin or methenyltetrahydrofolate derivatives (S Weber, 2005). Photolyases are considered among the earliest solutions of nature to the threat of DNA damage by high UV-irradiation intensity (Sancar, 2000, 2003). These enzymes target the most abundant products formed by UV-B, which are cyclobutane pyrimidine dimers (CPD). Such CPD lesions bring polymerases to a standstill, eventually leading to cell death. These enzymes bind tightly to CPDs in the dark and can be activated by different wavelengths, such as UV-A and photosynthetic active radiation (400 to 700 nm). The activation causes the splitting of the two C–C bonds in the CPD unit and results in the re-formation of the two separate pyrimidine bases (S Weber, 2005). Photolyases together with the structurally closely related cryptochromes (Cry) form a divergent family of photoactive proteins present in all three biological domains of life, called as cryptochrome/photolyase family (CPF). However Cryptochromes (Cry) have no photolyase activity and function as signaling molecules regulating diverse biological responses such as entrainment of circadian rhythms in plants and animals (Roenneberg and Merrow, 2005; Harmer, 2009). In all HAAL models for UV-resistome a nearly full set of family members was founded, including CPD photolyases supplemented by iron–sulfur bacterial cryptochromes and photolyases (FeS-BCP), plus a chryptochrome DASH in Exiguobacterium sp. S17 (Albarracín et al., 2012, 2014; Kurth et al., 2015; Portero et al., 2019).
Investigations at metagenome-level about the effects of sunlight on microbial communities are mostly confined to photoreceptors such as microbial rhodopsins (Pushkarev and Béjà, 2016; Pushkarev et al., 2018) and LOV-domains (Pathak et al., 2012). Singh et al. even reported abundances for several light-related genes in microbiomes from different environments (Singh et al., 2009). However, no work investigated the proposed components of the so-called UV-resistome on microbiomes coming from a broad UV intensity range. In this work, we aim to characterize the occurrence and diversity of molecular components from two sub-systems of the UV-resistome -pigment protection and DNA repair-. Using a metagenomic approach, we studied microbiomes exposed to different intensities of UV-B and photoperiods. This contribution will aid to the understanding of the effects that climate change and loss of atmospheric ozone are able to cause on microbial communities in the short-term. In addition, we focus this work on microbial communities from extreme environments, models for early life sciences and astrobiology, and important sources for novel extremoenzymes/extremolytes with, otherwise, offer promising biotechnological applications.
RESULTS
Comparison of UV-B intensity profiles between worldwide microbiomes
An essential aspect of our analysis was the linkage of metagenomic data with UV-B georeferenced values obtained from glUV datasets. The maps processed by QGIS (Fig. 1) indicated the existence of three groups according the UV-exposure regimes: High (UVHigh), Mid (UVMid) and low-exposed (UVLow). Belonging to the first group, Lake Diamante red biofilm (DM), Socompa stromatolites (SS) and Tibetan Plateau sediment (TB) were exposed to UV-B intensities of 9677, 9536 and 8885 J/m2/day respectively. In contrast, UVLow microbiomes from Olkiluoto Island groundwater (OK), Lake Montjoie (MT) and Greenland cryoconite (CR) were linked to low UV-B intensities of 77, 719 and 1759 J/m2/day respectively. In between, the microbiomes of the Amazon River (AM), Lake Rauer (RA) and Dewar Creek hots spring (HT), with an intensity of 5630, 2289 and 2281 J/m2/day grouped together.
The above intensity values were the expected considering the expedition dates; the samples of the UVHigh group were taken during summer, when the insolation is maximum on the high altitude environments of the Argentinean Puna and the Tibetan Plateau (February and August) and the incidence of UV-B in both regions become the highest of the planet. On the other hand, the UVLow samples were taken during August, November and February which correspond to the autumn and winter seasons in the northern hemisphere, darker than their equals in the southern hemisphere. In turn, the UVMid comprised the AM microbiome belonging to a tropical environment, while RA and HT correspond to extreme environments (cold and hot, respectively) at high latitudes.
An interesting fact not quite evident from the UV-datasets is revealed by the calculated day-length for each georeferenced metagenome (https://www.suncalc.org). Two microbiomes coincided with extreme photoperiods; the OK microbiome with the shorter photoperiod in the study, which was only 7.8 hours, and RA which was sampled when Antarctica went through its longest period of sunlight in the year, a complete day under the sun. Important to remark is that we assumed that the environments were exposed to similar conditions of indicated solar irradiance for a considerable wide period of time already before sampling. In consequence, the diversity of genes and species will be a reflection of the ecological pressure of radiation on the environment (among other factors) as the geographic conditions (latitude, altitude, and orography) did not change considerably in those sampled regions for decades. In addition, the intensity of solar irradiation over each microbiome will also depend on their on-site spatial disposition being maximum for samples taken from soils or shallow water of lakes, springs, oceans and streams but much lower in sediments, groundwater or deep water.
Using the solar irradiation data, we assessed the effect of UV-B in microbiomes exposed to different intensity/time of insolation. The response variables evaluated were 1) microbial diversity, 2) content and diversity of CPF genes 3) abundance and diversity of other DNA repair mechanisms, such as NER, BER, MMR and Homologous Repair and 4) abundance of genes related to photoprotective pigments such as carotenoids and ectoine.
Microbial diversity
Microbial diversity present in each microbiome was evaluated using species richness and Shannon’s diversity index parameters. Interestingly, both indexes decreased as the UV-B irradiation on the microbiomés environment increases (Fig. 2). In microbiomes exposed to intensities below 4000 J/m2/day, species richness oscillates within a wide range (11-103) while SS microbiome with intensities above 8000 J/m2/day only had 21 species. Shannon index indicated poor diversity on the three microbiomes of the UVHigh group. All UVHigh samples and HT (UVMid sample) presented the lower indices of the whole study.
Quantitative analysis of the UV-metaresistome
In order to evaluate the abundance of the UV resistance genes in the complete set of microbiomes, a database of such genes was built using Uniprot sequences linked to KEGG orthology numbers, which allowed us to assemble the metabolic pathways. The resulting abundances for each gene and metabolic pathway were analyzed (Table S1-S7) together with the abundance for each DNA repair/photoprotection strategy (Fig. 3).
Microbial communities of the UVHigh group, TB and SS, reported the largest number of functions together with the highest abundance of the whole set of functions (UV-metaresistome). The case of DM microbiome is interesting as despite being the most exposed environment, its microbiome presented the lowest diversity of resistance strategies. However, this result may be due to technical and taxonomical biases. The archaeon Halorubrum sp., which constitutes the dominant genus detected by the taxonomic analysis performed on DM microbiome (Fig. 7), lack the type of polymerase proposed by KEGG in NER and BER pathways. KEGG prokaryote models for NER and BER establish DNA polymerase I (K02335) as the responsible of the final gap filling, but this enzyme seems not to be present in the genus, which probably uses another. This fact is evidenced by a gene (K02335) abundance of one or two orders of magnitude lower in DM compared with the rest of microbiomes (Table S1 and S3). Moreover, a rapid blast of K02335 linked sequences of Swiss-Prot against Halorubrum sp. genomes from the NCBI does not show any significant alignment. As the NER and BER pathways require DNA polymerase I (K02335) to be considered existent in our metagenomic analysis, they are not visible in the DM stacked bar. In the case of Homologous Recombination pathway, we also used as a reference the prokaryotic set of genes proposed by KEGG, while is known that Archaea uses a different one (Seitz et al., 1998; Rzechorzek et al., 2014). Finally, It was already stated that MMR is absent in phylum Euryarchaeota (Anderson et al., 2009; Grasso and Tell, 2014) which includes Halorubrum sp.
SS was the community that had the whole set of molecular resistance pathways, even for those resistance mechanisms poorly represented or null in others. It is noteworthy the abundance of Ectoine Biosynthesis (180 e-02 RPKG) representing 20.29% of the SS metaresistome. This function has only a limited presence in RA (17 e-02 RPKG) with a contribution of 2.44%. Ectoine is a compound synthetized by bacteria in response to high salinity, which helps on the hydration of proteins and cell membranes, while used as an active component of some sunscreens. Its presence in SS is consistent with the hypersaline nature of the microbiome environment. The significant presence of mismatch repair (MMR) were also noticed in the SS microbiome, with an abundance of 47 e-02 RPKG and a representativeness of 5.37% of the metaresistome. This mechanism of DNA repair has little or no presence in the rest of the microbiomes.
The distribution of CPF genes in the three-irradiation groups followed an ecological significant pattern: they were flagrant in microbiomes with UVHigh and UVMid, while insignificant or null in communities with UVLow or null exposure -OK and GU respectively. In DM CPF genes became more than relevant. Their abundances were 249 e-02 RPKG, which were greater than in the rest of the microbiomes. Furthermore, they contribution to the DM metaresistome was 68.40%, the highest of the whole study followed by RA with 34.14%.
The relationship between abundance of CPF and intensity/time of insolation was studied (Figure 4.a) showing an upward trend of CPF abundance as UV-B irradiation increases. Particularly, RA, an UVMid microbiome, completely desviate from the trend due to an outstanding abundance of this class of genes. We had previously mentioned that RA has the longest photoperiod of the study, so we set out to verify if there is a relationship between the abundance of CPF and the photoperiod. Figure 4.b shows that there is upward trend of CPF as the photoperiod increases, finding RA quite in line with the trend. Thus, both factors, the intensity and the photoperiod could be influencing the abundance of CPF in the microbial communities.
Assembled CPF genes Analysis
Alignment editing and subsequent phylogenetic analysis resulted in a tree with 214 CPF sequences and numerous clades (Fig. 5), which mostly correspond to subfamilies already studied in previous works: CPD photolyases classes I, II and III, DASH cryptochromes which include single strand photolyases (Selby and Sancar, 2006), FeS-BCPs group which consist of proteins having an iron-sulfur cluster either prokaryotic [6-4] photolyases or cryptochromes (Graf et al., 2015) and thermostable CPD photolyases which have FMN as antenna cofactor and a thermostable nature (Ueda et al., 2005), a group of unstable position in phylogeny (Portero et al., 2019).
In addition, this work reports three extra clades not identified so far and called as unidentified I, II and III (UI, UII and UIII) with 41%, 88% and 96% of bootstrap support, respectively. In total, these groups represent 24.76% of the global sequences. Fig. 6 shows the distribution of each subfamily by microbiome. Interestingly, the clades UI and UII have most of their components coming from a single microbiome; 85.7% of UI is constituted by sequences from the RA microbiome while 87.5% of UII corresponds to MT sequences. It is possible to observe that UI subfamily is the unique in DM and is dominant in RA, both communities with the highest peak of CPF abundance. Furthermore, UI and UII clades have a homogeneous taxonomic consistency, with their sequences being classified as Halobacteria and Actinobacteria, respectively. On the other hand, UIII seems to be a clade formed by sequences of different taxonomic origins, including Proteobacteria, Bateroidetes and Verrucomicrobia phylum.
It is clear the widespread presence of cryptochrome-holding subfamilies cry-DASH and Fes-BCPs among the microbiomes (Fig. 6). Although the relative abundance of Fes-BCPs clade remained similar among the microbiomes, the abundance of cry-DASH followed a pattern of affinity for light, being abundant in SS and TB communities (UVBHigh), AM (UVBMid) and RA (longest photoperiod).
Taxonomic identification of each gene was carried out using the BLAST algorithm. CPF sequences were classified by genus and paired with the same information previously obtained for each microbiome through MetaPhlAn. The CPF sequences that were grouped together with their homologue with an identity lower than 70% by BLAST were clustered as an unclassified category. Both, the genus diversity from the whole microbiome (top bar) and the set of CPF sequences recovered from that microbiome (lower bar), are shown in Figure 7. The classification rate of the CPF genes was generally high (>60%) in the samples DM, SS, TB, AM, RA and GU but lower in the rest. The lowest percentage of sequences was registered at MT, with barely 23% of the sequences being assigned to some genus. This may be because the community itself has a poorly referenced diversity (top bar). Despite this, MT had the greatest diversity of genus (12) assigned to CPF along with SS and RA (11 each).
By pairing the information of both bars we evaluate the relevance of CPF in each community. The most abundant taxa in DM, SS, RA and HT communities possess CPF genes. In AM, only Ralstonia and Vibrio display CPF; both genuses together represent 35% of the community. In the case of CR, only Hymenobacter and Agromyces possess these genes, adding both a representativeness of 7%. Pedobacter was the unique taxon in TB with CPF genes and it represents less than 4% of the community. Finally, Escherichia which has an abundance of just 0.44% was the unique CPF contributor in GU. Neither in MT nor OK found matches between the MetaPhlAn and BLAST classifications.
DISCUSSION
UV-B is a recognized driver of ecological processes regulating numerous biological patterns and processes. The availability of a new and specific set of global UV-B surfaces (glUV) containing monthly mean UV-B data and six derived UV-B surfaces allow us to study the effect of UV exposure on selected environments worldwide. This is the first work in which UV-B was considered as an ecological variable in a sequenced-based metagenomic study of microbial communities.
It is known that UV-B negatively affects microbial diversity (Ballaré et al., 2011) which concur with our findings in this work. Species richness (Fig. 2A) was quite variable at low radiation intensities probably due to the different environmental conditions that can limit or promote species diversity in each community before UV-B intensity reaches a critical role. Once a certain threshold has been exceeded -above 6000™ J/m2/day-when UV-B becomes a limiting factor, the number of species decreases due to selective pressure on those that do not possess efficient molecular mechanisms to defend themselves from UV-B, being unable to adapt to the new assemblage of species. A similar situation was observed when applying the Shannon index (Fig. 2B) which incorporates equitability of the species abundances in addition to its ability of detecting rare species. Shannon index, although with variable values in UVLow microbiomes, inevitably decreases in UVHigh ones. This suggests that the microbes with full capacity to defend themselves against UV-B rays become dominant, relegating others to a less substantial place in the community.
By performing quantitative metagenomics we evaluated diversity and abundance of each DNA repair/photoprotection pathway in all the communities (Fig. 3). SS proved to be the most diverse as contains the complete set of pathways, even with a nearly equitable distribution of its abundances. The SS microbiome corresponds to a non-lithified modern stromatolites growing at the shore of the remote volcanic lake Socompa at 3570 m a.s.l. in the Puna (Argentinean Andes) (Farías et al., 2011) on a desert area that withstand the most elevated doses of global solar radiation on Earth (Piacentini et al., 2003; Albarracín et al., 2015). These complex communities of stratified microbial diversity along physico-chemical gradients (Toneatti et al., 2017) developed under the pressure of extreme environmental factors similar to the ones present in Early Earth’s atmosphere. The bacteria isolated from these stromatolites are poly-extremophiles able to resist severe stress conditions, including UV-B, heavy metals, salinity and arsenic.
The SS microbiome, which is the most complex and diverse of those belonging to UVHigh group, it is also complex and diverse in its way of defending itself against UV-B. SS maintains a relatively high percentage of gene copies dedicated to the synthesis of pigments such as carotenoids and ectoine. Ectoine is an amino acid derivative produced mainly by aerobic, chemoheterotrophic and halophilic bacteria in response to osmotic stress and unfavorable environmental conditions. Ectoine is also known for its practical use for enzyme stabilization, human skin protection, anti-inflammatory treatment and inhibitory effects in neurodegenerative diseases (Bownik and St pniewska, 2016). It stabilizes cell membranes, enzymes, and nucleic acids at extreme temperatures or higher salt concentrations in some bacteria (Smiatek et al., 2012). Furthermore it absorbs UV-A radiation, preserves DNA from breaking down in various cell types and reduce oxidative stress caused by UV (Buenger and Driller, 2004; Botta et al., 2008; Sajjad et al., 2018). It is remarkable that SS were singular in regard to his significant presence of ectoine synthase, since this fact is compatible with the stressful environment in which it was sampled, including high salt concentration in the close hypersaline Socompa lake (17%) and high exposure to UV irradiation leading to high osmotic and oxidative stress respectively. Further study of how ectoine contributes to the functioning of these microbial communities in extreme environments could shed light on new biotechnological applications.
The significant abundance of MMR in SS can be related to several factors. MMR proteins provide DNA stability under the stress of environmental fluctuations, which in the case of Lake Socompa may be temperature, arsenic, hypersalinity and UV. It has become clear that the major process influencing evolution in DNA is MMR (Džidić et al., 2003). It controls mutation rates and interspecies recombination. Inhibition of MMR allows in-vivo recombination between diverged DNA fragments and rises mutation rates (102 to 103 fold) creating de novo variation, which is the genesis of new biodiversity and novel biosynthetic compounds. Strains will easily acquire beneficial mutations by reacquiring MMR wild type alleles in horizontal gene transfer through their hyper-recombination phenotype. The abundance of this pathway by the SS community may be beneficial as it can provide high mutation-rates in these communities through cycles of lose-and-reacquire of the mut genes, otherwise needed to rapidly evolve and adapt under fluctuating ecological circumstances.
TB, DM and RA also show signs of genomic adaptation to radiation. TB was the third most exposed microbiome which showed a high abundance of gene copies assigned to 6 of the 7 pathways examined, DM is a different case, since in spite of being the first in radiation intensity, it has the lowest number of defense strategies. However, this result may be due to taxonomical and technical biases. As mentioned above, the community is dominated by the archaeon Halorubrum sp., which uses a different polymerase from that proposed by KEGG for the prokaryotic NER and BER metabolic pathways. This made both pathways invisible in our search, although it is very likely that both exist because they are important DNA repair systems in Archaea (Grasso and Tell, 2014; Grogan, 2015). Additionally, HR pathway was not detected because the respective archaeal genes were not used. On the other hand, MMR is a type of DNA repair that has not been reported for the Euryarchaeota phylum, including Halorubrum sp. We believe that a same analysis we performed in this work, made with the right genes, should unveil an abundant and diverse UV-metaresistome in the DM case compared with other archaeal-biased metagenomes of lower insolation.
The relevance of CPF in the ecology of microbial communities becomes more evident when its tendency along UV-B intensity gradients and photoperiod is studied (Fig. 4). Our study reveals a rising trend of CPF abundance in microbial communities as their environments receive greater radiation or day duration extends. This is likely due to an increase in DNA damage caused by UV-B, which are mainly pyrimidine dimers. In such a situation, populations could increase the expression of CPF by increasing the gene copy number. In addition, those species of the community, which lack CPF in their genome, would be replaced by species that contain these genes. In either case, the overall increase in CPF copy abundance would indicate that the community improves its defense capabilities against UVB using the highly efficient mechanism of photoreactivation or modulating enzymatic mechanisms triggered by light sensing by cryptochromes.
The CPF group of proteins comprises mostly genes of photolyases with different specializations, and cryptochromes whose functions are largely unknown. The family has been divided into different subgroups considering phylogeny, kind of chromophore, specialization, host organism, structure, etc. After assembling the reads and aligning the sequences corresponding to CPF, we were able to classify our sequences in the main subgroups discovered to date and we also detected the presence of other unknown subgroups (Fig. 5). It is interesting that the sum of the sequences belonging to these last unknown groups comprise approximately a quarter of the total, thus offering us a large pool of candidates for new functions, specializations or molecular specificities.
As already mentioned, the unique photolyase present in DM belongs to clade UI (Fig. 6), apparently novel and also present in RA. We already know that DM and RA are the samples with the highest radiation intensity and the highest photoperiod respectively. Apparently, Halorubrum sp., the sole organism detected by our analysis in DM, could cope with UV-B mainly using this photoreceptor without any other complement from the CPF family. We propose this gene as a candidate to further studies of photoreactivation, characterization and molecular structure analysis in order to shed light on this portion of the phylogenetic tree of CPF which has not been studied yet.
It is a fact the presence of the Cry-DASH class in SS, TB, AM and RA while of lower relative abundance or absent in other samples of less exposure to light. It has been reported in previous works that these cryptochromes are in fact photolyases with affinity for single-stranded DNA and in some cases RNA (Selby and Sancar, 2006). It is possible that Cry-DASH plays a complementary role of standard photolyases with affinity for double-stranded DNA, contributing to the global increase of photoreactivation in these microbiomes. The FeS-BCPs group had a remarkable behavior, being present in almost all the communities. Previously, it has been suggested that this class complements other photolyases by performing the function of a [6-4] photolyase (Zhang et al., 2013; Graf et al., 2015), avoiding or decreasing in this way the use of the inefficient NER system. Our work gives more support to this claim by showing its ubiquitous and abundant presence in microbial communities.
Another interesting fact is that CPF and thus photoreactivation are usually present in the most abundant organisms of the UVHigh microbiomes (Fig. 7). In addition, SS and RA show a high number of genus in which photoreactivation is present. This suggest that photoreactivation may act as a successful system for assuring survival and predominance of taxa in UV-stressed environments. This fact contrasts with those less insolated samples, where natural selection does not act in favor of photoreactivation promoting neither abundance nor diversity of individuals carrying these genes.
CONCLUSIONS
This work showed the correlation of high UV exposure of a given microbiome with its low microbial diversity and high specific UV-resistance molecular components in a world-wide study. Thus, a major expression of certain components of the UV-resistome was observed in response to higher radiation intensities. A downward trend in microbial diversity and an upward trend of CPF abundance were consistently observed as UV-B irradiation increases. Likewise, cryptochrome-like genes were found abundant in most exposed microbiomes indicating a complementary role to standard photolyases. Also, we observed that CPF are more likely present in most abundant organisms of the UVHigh microbiomes, suggesting an evolutionary important force for survival and dominance in highly irradiated environments. In accordance, this work reported three novel CPF clades not identified in previous analyses.
Finally, metagenomics proved to be an excellent tool useful to reveal an important correlation of microbiomes UV-exposure with diversifying resistance mechanisms and the increase of their gene copies gene diversity. Additional methods such as metatranscriptomics and metaproteomics should be implemented in order to unveil the molecular dynamics of the proposed UV- metaresistomes.
EXPERIMENTAL PROCEDURES
UVB data processing and metagenome selection
Monthly mean UVB glUV datasets (Beckmann et al., 2014) were utilized as guides to choose samples using sample information. The glUV datasets from the Helmholtz Centre for Environmental Research are daily measurements summarized into monthly mean UVB Erythemal Daily Dose values and averaged across all years in the period of 2004-2013. These layers cover global land and marine areas and have a spatial resolution of 15 arc-minutes. They are in the latitude/longitude coordinate reference system and the datum is WGS8426. The datasets were processed using QGIS (www.qgis.org) and worldwide colored maps were generated by month. The maps were employed to select potential samples to be analyzed from different world places according to a visual criteria of irradiance. The sample information provided by the database (“biosample” linked section of each SRA entry), specifically collect date and geographic location, becomes crucial here as the monthly mean irradiation value for each microbial community in the time when it was sampled can be known.
The metagenomes selected from NCBI using biosample information were the following, each one with its respective SRA entry: Lake Diamante (ERR1824222) (Rascovan et al., 2016), Socompa stromatolite (SRR3341855) (Kurth et al., 2017), Tibetan Plateau sediment (SRR3322106) (Chen et al., 2016), Amazon River (SRR1790676) (Satinsky et al., 2015), Lake Rauer (SRR6129205) (Tschitschko et al., 2018), Dewar Creek hot spring (SRR5580900), Greenland cryoconite (SRR5275901) (Hauptmann et al., 2017), Lake Montjoie (SRR5818193) (Tran et al., 2018), Olkiluoto Island groundwater (SRR6976411) and human gut (SRR6517782) (Ye et al., 2018). The metagenome of the human gut is assumed as UVB free environment, acting as a negative control.
General overview of the metagenomes
DNA from all microbial communities was obtained and sequenced by shotgun strategy through Illumina technology except DM, which was sequenced with 454 GS FLX Titanium instrument. Quality filtering and merging yielded a range between 0.64 and 13.66 Gb for further analysis (Table 1). Only TB and AM datasets reported high percentage of merging with FLASH (Magoč and Salzberg, 2011), however the low-merged dataset from OK was also used for downstream steps as it ended with a reasonable size (between the range mentioned above). MEGAHIT (Liu et al., 2015) was used for assembly since it can deal with large and complex datasets in a time- and cost-efficient manner. Protein prediction with Prodigal (Hyatt et al., 2010) over contigs with >999 bp outputted a rate of 1000-1500 proteins / Mb, which is congruent with the high coding density expected for microbial species.
Quality control and assembly
The pre-processing and assembly of the metagenomes one by one implied the usage of several bioinformatic tools. Adapters were removed from Illumina raw reads using fastq-mcf tool of ea-utils v1.04.676 (Aronesty, 2011). This step was not needed for Lake Diamante metagenome, as it was sequenced with 454 technology. The quality filtering and trimming was performed with the same program using parameters l>50 and q>20. The program kneaddata v0.6.1 (https://bitbucket.org/biobakery/kneaddata/wiki/Home) was used with the --bypass-trim option to clean contaminant sequences from human. Pair end reads were merged with FLASH v1.2.11 (Magoč and Salzberg, 2011) in order to recover unpaired longer reads. The pair end reads with low percentage of merging were leaved in paired state for assembly. Assembly of filtered reads was performed with MEGAHIT v1.1.2 (Liu et al., 2015). Assembled contigs were annotated with Prodigal v2.6.2 (Hyatt et al., 2010) which outputted translated protein sequences.
To perform quantitative metagenomics using the reads directly, the pair end reads that remained with low percentage of merging were concatenated together by the forward and reverse using a script available at the GitHub (https://github.com/LangilleLab/microbiome_helper/blob/master/concat_paired_end.pl).
Reference database building
Considering the main functional orthologues of the UV-resistome subsystems main components, a reference protein database was built. KEGG Orthology number corresponding to our pathway of interest was linked with the UniProt database, and then filtered results by EC number, known molecular function or biological process, uniref 90 clusters, and sequence length >100 subsequently. The single, concatenated and merged reads were aligned to the aforementioned databases with PALADIN which outputted read counts. The counts were normalized to Reads per Kilobase per Genome using the following formula: (Read counts÷Number of Genomes)÷Gene Length.
The protein families and pathways chosen fall in two UV-resistome subsystems: DNA repair and photoprotection. At the DNA repair level these were photolyase/chryptochrome family (K06876, K01669), base excision repair (K01247, K03649, K10563, K10773, K10800, K01246, K03648, K01151, K01142, K02335, K01972), nucleotide excision repair (K03701, K03702, K03703, K02335, K01972), mismatch repair (K03555, K03572, K03573, K01141, K10857, K03601, K07462, K02337, K01972), homologous recombination repair (K03629, K03584, K03553, K02337, K02335, K03550, K03551, K01159), while at the photoprotection level we screened for carotenoid biosynthesis (K15745, K14595, K09847, K09844, K09845, K09846, K10027, K10208, K02292, K10209, K10210, K10211, K02294, K09836, K02294, K22502, K14746, K02291, K09835, K14605, K09839), and ectoine biosynthesis (K06720). Finally, the fasta headers of each file were labeled with the name of its respective gene and then files were merged in a single one.
Estimation of gene abundance, pathway abundance and diversity
To search and quantify the abundance of the genes in the different metagenomes, the preprocessed unpaired reads were aligned against the UV-resistant reference database created before. The protein alignment was performed with PALADIN v1.3.1 (Westbrook et al., 2017) which outputs a table with alignment counts for each gene. Alignments were filtered by maximum quality = 60 and the abundances were normalized in reads (or counts) per kilobase per genome (RPKG) having estimated first the number of genomes for each metagenome through MicrobeCensus v1.1.0 (Nayfach and Pollard, 2015).
The abundance of the complete pathways was quantified in a following manner. First the pathways were divided in several component reactions or steps according to KEGG pathway maps and added the abundances of the genes that catalyze the same step with the porpoise of compute the abundance for each step separately. Then, was selected the step with the minimum value as the representative value for the whole pathway (Table S1-7). In the case of carotenoid biosynthesis pathway, was added all the abundances globally as each gene can produce a different variant of carotenoid.
Diversity analysis of each metagenome was performed with MetaPhlAn v2.7.7 (Truong et al., 2015) with the --bt2_ps parameter set to ‘sensitive’. MetaPhlAn profiles the composition of microbial communities from metagenomic shotgun sequencing data with species-level. It relies on ∼1M unique clade-specific marker genes identified from ∼17,000 reference genomes (∼13,500 bacterial and archaeal, ∼3,500 viral, and ∼110 eukaryotic). Shannon diversity index was obtained with Vegan 2.5-2 R package.
Analysis of the annotated genes
The protein sequences were aligned using Diamond v0.9.22 software (Buchfink et al., 2014) against the CPF genes of the reference database built earlier. Alignment parameters were 50% identity, 70% query coverage and e < 10-5. The retrieved files coming from each metagenome were modified at the sequence headers to hold the name of its respective metagenome. Next, all files were merged in a single one and filtered by sequence length > 400 residues.
The filtered sequences were used for a phylogenetic analysis. The phylogenetic tree was built with FastME v2.0 (Lefort et al., 2015) using the Jones–Taylor–Thornton rate matrix (Jones et al., 1992) with 1000 bootstrap replicates. A consensus of the 1000 resulting trees was selected for further processing and visualization using iTOL (Letunic and Bork, 2016). Additionally, a taxonomical identification of the sequences was performed through BLAST web server (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
Supplementary table S1. Estimation of the Base Excision Repair pathway abundance.
Supplementary table S2. Estimation of the Cryptochrome/Photolyase Family abundance
Supplementary table S3. Estimation of the Nucleotide Excision Repair pathway abundance.
Supplementary table S4. Estimation of the Mismatch Repair Pathway abundance.
Supplementary table S5. Estimation of the Homologous Repair pathway abundance.
Supplementary table S6. Estimation of Carotenoid Biosynthesis pathway abundance.
Supplementary table S7. Estimation of the Ectoine Biosynthesis pathway abundance.
ACKNOWLEDGEMENTS
VHA and MEF are staff researchers from the National Research Council (CONICET) in Argentina. DA is recipients of a doctoral fellowship from CONICET. The authors have produced this manuscript in spite of the delays of funding execution from National Agencies in Argentina, mainly FONCyT (PICT 2013 2991) and CONICET (PIP 2015 0519, PIO-UNCA y PICT V 3825-2016), and the drastic devaluation of Argentinean currency which began in 2016.