DNA barcoding analysis of more than 1000 marine yeast isolates reveals previously unrecorded species

For the yeast population and diversity marine habitats are the least explored niches. The aim of the present study is to create a comprehensive DNA barcode library for marine derived yeast species. As we sequenced the ITS gene for 1017 isolates belonging to 157 marine derived yeast species in 55 genera, 28 families, 14 orders, 8 classes of 2 Phyla (viz., Ascomycota and Basidiomycota) of which 13 yeast species were first time barcoded, we witnessed yeast species of both terrestrial and marine endemic origin. Due to the large volume of sequencing trace files, the variable length of extracted sequences, and the lack of reference sequences in public databases, difficulties were faced in taxonomic sequence validation. The length of the majority (62.24%) of the sequences were between 600 and 649 base pairs. K2P intra-species distance analysis performed for selective groups yielded an average of 0.33%, well below the previously proposed yeast barcode gap. ITS gene tree based identification conducted for selective species in Ascomycota and Basidomycota, precisely clustered the same species into one group. Approximately 60% of the yeast species identified in this study were previously unrecorded from the marine environment, of which 16.5% were recognised as human pathogens. Apart from releasing the barcode data in GenBank, provisions were made to access the entire dataset along with meta-data in the Barcode of life database. This research constitutes the largest dataset to date for collecting marine yeast isolates and their barcodes.


Introduction
Fungi were well studied in terrestrial and fresh water environments as opposed to marine habitats (Gulis et al., 2009;Raja et al., 2018). Yeast is a fungus that does not enter sexual states on or inside a fruiting body, the development of which is mainly the result of fission or budding (Kutzman and Fell, 2015). Yeast is a polyphyletic group (Kutty and Philip, 2008) whose unique unicellular growth differentiates between filamentous fungi. Yeast that required seawater for its growth was described as marine yeast (Chi, 2012). Marine yeast has involved numerous significant marine ecosystem processes such as nutrient cycling, decomposition of plant material and marine animal parasitism (Jones and Pang, 2012), which are facultative or obligatory in marine environments (Jones et al., 2013). Marine yeast could be parasitic, mutualistic or saprophytic and may therefore be associated with various marine invertebrates including crabs, clams, mussels, prawns, oysters or other substrates (de Araujo et al., 1995;Kosawa da Costa et al., 1991;Pagnocca et al., 1989). Typically the yeast species isolated from the natural ecosystems are often the most commonly isolated from the organisms that exist therein (de Araujo et al., 1995).
The marine yeast is diverse and largely classified within two major phyla viz., Ascomycota and Basidiomycota Boekhout et al. 2011). Most of marine derived yeast species belonging to Ascomycetes are of terrestrial origin with widespread phylogenetic diversity (Fell, 2012). Various difficulties have existed in marine yeast nomenclature and taxonomy since its first isolation (Fischer and Brebeck, 1894) which was recently overcome using molecular taxonomy (Fell, 2012). DNA barcoding has simplified the recognition of biological species using short DNA fragment sequencing and analysis (Hebert et al., 2003). One of the vision behind DNA barcoding was the simple identification of biological species by non-experts for the advancement of biological and medical research.
Although the definition of species was not widely applicable (Wheeler and Meier, 2000), in particular for non-obligatory sexually reproductive species such as fungi, the identification of species is a key step for various biological fields such as ecology, agriculture, biotechnology and medicine to identify biological interactions, for example; biodiversity assessment, bioremediation and pathology (de Queiroz, 2007).
Internal transcribed spacer (ITS) gene has been recognised as DNA barcode which successfully delineates fungal species (Schoch et al., 2012;Velmurugan et al., 2013;Vu et al., 2019) and ITS works even better for classification of yeast species than filamentous fungi (Vu et al., 2016). Lack of validated data for yeast research is considered to be a drawback (Bidartondo, 2008). The CBS-KNAW Microbial Biological Resource Centre (www.cbs.knaw.nl) comprises the largest number of validated yeast species collected from all types of natural and human (medical strains) environments that were represented in the recently published comprehensive dataset of yeast DNA barcodes (Vu et al., 2016). In this study we investigated marine environments such as mangrove swamps and continental shelf sediments in northern parts of Indian Ocean for marine yeast diversity using DNA barcodes.
We aim to synthesise and publish sizable amount of marine derived yeast DNA barcodes with validated data in the public databases such as GenBank and BOLD. Besides evaluating such database for identification of yeast species, we expect that even after the producing a broad yeast barcode dataset (8669 barcodes for 1351 yeast species) (Vu et al., 2016), marine habitats will still be able to incubate several yeast species that have yet to be barcoded. The goal of the study was to collect as many marine derived yeast cultures as possible for a comprehensive synthesis of the DNA barcode library. Studies are rare in exploring large scale marine environments for culturable yeast species.

2.1.Study area and sample collection
Between Nov, 2008 and Jan 2013, sediment samples were extensively collected from two separate ecosystems, viz., 1) inter-tidal sediments flats under mangrove trees along Indian's coastline, 2) continental shelf sediments off India's southeast coast.  (table S1). In-built with FORV-Sagar Sampada, the depths of sediment sampling were measured using multi-beam echo sounder (capable of measuring up to 1000m depth). The cruise collected a total of 96 sediment samples. Salinity of continental shelf sediments ranged from 33to 35ppt. All culture media were prepared using the ambient seawater (AS) obtained from respective collection sites. AS was filtered through 0.22µ cellulose filter membrane (Miilipore) and autoclaved, before being used to prepare culture media.

2.2.Isolation of yeast-like cells
Two types of media were used for enrichment. Either one or both of the media (for most samples collected after 2010) was used for enrichment of yeast cells before plate culture. Briefly, after homogenization of the sediment samples in the collection container, yeast cells were enriched by adding one gram (g) of sediments to 100ml of Yeast/Malt extract (YM) broth (3 g malt extract, 3 g yeast extract, 10 g dextrose, and 5 g peptone, in 1L sterile-AS) and/or GPY broth (2% glucose, 1% peptone, 0.5% yeast extract in 100ml sterile-AS) supplemented with an antibiotic cocktails (300 mg L -1 penicillin, 300 mg L -1 streptomycin, 250 mg L -1 sodium propionate, and 0.02% of chloramphenicol) to inhibit bacterial growth.
We used both enrichment media for sediments sampled after 2010, to increase the number of yeast species being isolated. At 150 rpm, the enrichment broth with sediment samples in a 250 mL Erlenmeyer flasks was shaken on a rotary shaker, incubated for 2-3 days at 17-20 ºC (temperature >20 ºC was found to accelerate filamentous fungal growth in plate cultures).
Autoclaved AS has been used as control. After incubation from corresponding broth cultures, 100µl to 1000µl (based on the turbidity of the broth) was spread over (in triplicates) YM and/or GYP agar plates (composition as same for broth preparation with addition of 1.5% agar). Culture media was autoclaved twice (at 100ºC for 30min) during two consecutive days to reduce mould contaminations (Gadanho and Sampaio, 2005). The remaining broth was conserved at 4ºC with the over lay of mineral oil for future use, just in case the incubated plates did not produce any colonies or over production of filamentous fungi. The inoculated plates were incubated for 10-20 days or until colonies appeared and continuously monitored at every 24 hours. In order to promote the full recovery of yeast-like species including slow growing colonies, prolonged incubation period with concurrent removal of fast growing colonies were adopted. Also care was taken to stop the incubation when there was a high probability of mould over growth.
Microscopic analyses of yeast-like colonies were started from the minimum of 5 days of incubation with methylene blue staining (checked for single type forms and to ensure no association of bacterial cells) and purified by streaking onto fresh agar (YM or GYP) plates (to prevent growth over other colonies or to save from rapidly growing moulds). Gradually one representative morphotype of each colony per sample (i.e., when the yeast colonies were <50 numbers or when the mould was entirely absent or scarce in the agar plates) was streaked twice on corresponding agar plates for purification. The representative number of colonies (40-70 colonies) is randomly selected and purified twice in other cases (i.e., when yeast colonies is >50 or dense mould growth in the agar plates).
Enrichments cultures as mentioned above were dine for continental shelf sediments on the ship board Microbiology lab, FORV-Sagar Sampada. After 2 sub-culturing of yeastlike cells, the isolates were grown for 2-3days in the shaker (at 120 rpm) 1mL of GPY broth (in duplicates) prepared in 2mL microfuge tubes. The culture in one tube was used for DNA isolation and other cryopreserved (culture increased to 1.5ml volume using 2% GPY broth, 10% glycerine) in the Marine Microbial Culture Facility, Centre of Advanced Study in Marine Biology, Annamalai University. There were a total of 1398 colonies for molecular analysis (916 colonies from mangrove sediments, and 482 colonies from continental shelf sediments). Each purified cultures were numbered under DBMY (DNA Barcoding Marine Yeast) acronym. The key features described in Yarrow (1998) and Kurtzman et al. (2010) were used for the macro-and micro-morphological analysis for identification.

2.3.DNA extraction, polymerase chain reaction and DNA sequencing
In order to recover the yeast-like cells grown in the 2mL microfuge, the tubes were centrifuged at 8000 X g for 5 minutes. Following manufacturer's instructions, DNA was extracted from the pellets using GeneiPure Yeast DNA preparation kit (GeNei) or DNeasy blood and tissue kit (Qiagen). The DNA was eluted in the elution buffer (provided with the kit) and stored at -20ºC. The ITS primers; ITS1 (5'-TCCGTAGGTGAACCTGCGG) and ITS4 (5'-TCCTCCGCTTATTGATATGC) (White et al., 1990) was used for amplification.
The primers targets the DNA fragments containing, partial 18S ribosomal RNA gene; complete sequences of internal transcribed spacer 1, 5.8S ribosomal RNA gene, internal transcribed spacer 2 and partial sequences of 28S ribosomal RNA gene. PCR was performed on a thermal cycler 130045GB (GeNei) under following conditions: 4 min at 94 ºC, followed by 30 cycles of 30 s at 94 ºC, 40 s at 48 ºC for annealing and 90 s at 72 ºC, with a final extension at 72 ºC for 7 minutes. PCR amplicon were separated by 1.5% agarose gel electrophoresis. Amplicons were sequenced two ways using commercial sequencing services of Macrogen (South Korea) or Bioserve Biotachnologies Pvt. Ltd. (India).

2.4.DNA sequence analysis
Following sequencing, the forward and reverse sequences were assembled using BioEdit ver. 5.0 (Hall, 1999). Sequences were aligned using CLUSTAL X (Larkin et al., 2007) and manually adjusted in MEGA X (Kumar et al., 2018). DNA sequences were compared with GenBank sequences using BLAST algorithms (Altschul et al., 1997) and a cut-off species threshold of 98.41% (Vu et al., 2016) was used for delamination of yeast species. The species that were first time barcoded (i.e., when species threshold is <98%) were confirmed by doub;e checking BLAST search similarity values and by searching for ITS gene sequences of the species in GenBank.
Using the reference sequences extracted from GenBank, the Neighbor-Joining method (Saitou and Nei, 1987) was used for tree based yeast species identification. In the bootstrap test, the percentage of replicate trees in which the associated taxa clustered together (100 replicates) (Felsenstein, 1985) is indicated as circles next to the branches. The tree is drawn to scale, with branch lengths in the same units as those used to infer the phylogenetic tree from evolutionary distances. The evolutionary distances have been computed using the Kimura 2-parameter method (Kimura, 1980) and are in the units of the number of base substitutions per site. All positions containing gaps and missing data were eliminated (complete deletion option). Evolutionary analyses were conducted in MEGA X (Kumar et al., 2018). The NJ trees were manipulated in interactive Tree of Life (iTOL) database (Letunic and Bork, 2019) for better representation.
DNA sequences generated in the present study were release to GenBank and could be accessed through accession numbers: KJ706221-KJ707237. Entire dataset produced in this study could also be accessed through Barcode of life database under the project title "DNA barcoding marine yeast" with a tag, "DBMY" or through a digital object identifier; http://dx.doi.org/10.5883/DS-MYIC .

3.2.Character based identification and BLAST analysis
The length of the ITS sequence recovered varied from 542bps to 891bps (Fig. 1). The majority of the sequences (62.24%; n=633 sequences belonging to 81 species) were between 600 and 649 bps. Minimum length of 552 -599 bps was only recovered only for 6 sequences.
All remaining recovered sequences were larger than or equivalent to 600bp length. The list of 1017 barcodes with their respective species match was given in table S2 containing details of the GenBank reference sequence (with percentage of similarity, its accession numbers and its taxonomy).
BLAST analysis revealed that 13 yeast species were barcoded for the first time (table 2) and individual search of the ITS gene of those species in GenBank did not yield any results, indicating those species were first time barcoed. Despite Candida spp. was specious genera reported and DNA barcoding, the ITS gene sequences of Candida carvajalis (n=15), C.
duobushaemulonii (n=6) and C. haemulonis (n=6) was barcoded for the first time, as they were absent in the reference database until now. Even though previous extensive study, barcoded 1351 yeast species producing 8669 barcodes (Vu et al., 2016), the above mentioned 13 Candida spp. species was not included in their collection, as the previous study did not explore marine environments. Nucleotide diversity values directly proportionated the K2P distances (Fig. 2).  used to evaluate the tree based identification. We used a maximum of 3 sequences generated in this study belonging to 5 selective species against the corresponding available GenBank reference sequences to create NJ tree of Cryptococcus spp. All selected species in the Cryptococcus spp. genera, precisely clustered its corresponding reference sequences in one clade (Fig. 3). This indicates the efficacy of ITS gene sequences in delineating yeast species.
The overall mean K2P pairwise distance of Cryptococcus spp. genera was 0.7% which is well within the proposed yeast species cut off (1.59%) (Vu et al., 2016).
Tree based Colacogloea spp. identification reveals that the individual species clusters the in one clade along with its corresponding GenBank reference sequence (Fig. 4). The overall kimura-2 parametric distance was 1.4% which is well within the cut-off value proposed by Vu et al. (2016).

New occurrences of marine derived yeast species
Only 25.5% (n=40) of the isolated marine yeast species in this study were previously isolated form marine environment. Example; Sterigmatomyces halophilus was known to improve marine fishe immunity (Reyes-Becerril et al., 2017). Yarrowia lipolytica has been known for its crude oil degradation capability (Hassanshahian et al., 2012) and for its dimorphic growth when it is especially isolated from oil polluted seawater (Zinjarde et al., 1998). Another species isolated in this study, Candida oceani, first isolated from hydrothermal vents in Atlantic (Burgaud et al., 2011), was noted for its ability to withstand high hydro-static pressure (Burgaud et al., 2015). Bandonia marina reclassified from Candida marina (Liu et al., 2015), first isolated from the marine environment (Van Uden and Zobell, 1962), was also recognized for its hydrocarbonoclastic potential (Itah and Essien, 2005), and was previously isolated from tar balls obtained from the northwest coastal waters of India (Shinde et al., 2017). Other species (~58%; n=91) recorded in this study were either reported to occur in soil, plants and animals (including insects) and their presence in the marine environment was previously unrecorded. Approximately 16.6% of the yeast species reported were potential human pathogens (n=26 species).

Yeast species previously unrecorded from marine environment
Approximately 60% of the yeast species (n=94) reported in this study were previously unrecorded in the marine environment. Since it would be exhaustive to describe the previous source of occurrences of 94 yeast species, we present a few examples below.
Cystobasidiopsis lactophilus has been reclassified from Sporobolomyces lactophilus (Wang et al., 2015), previously isolated from phyllosphere of the coniferous trees (Nakase et al., 1990). Previously known to occur in forest soils (Mašínová et al., 2018), Oberwinklerozyma yarrowii reclassified from Rhodotorula silvestris (Wang et al., 2015) was first recorded to occur in mangrove sediments. River run off could be an important transport media for this terrestrial species to reach mangrove sediment. Slooffia cresolica reclassified from Rhodotorula cresolica (Wang et al., 2015) was considered to be a part of soil microbiome (Middelhoven and Spaaij, 1997) and its abundances was also correlated with soils with high oil contaminations (Csutak et al., 2005). Formerly known for high hydrocarbon levels (Lyla et al., 2012) are the continental shelf sediments from which these strains were isolated in the present study. Similarly, in the present study Candida catenulata previously isolated from polluted sites with hydrocarbon (Habibi et al., 2017;Babaei et al., 2018) was also isolated.
First isolated from tree associated beetles (van der Walt et al., 1971), is Trigonosporomyces hylophilus reclassified from Candida hylophila (Wang et al., 2015) and their isolation from the mangrove sediments in this study suggests their potential occurrences in mangrove habitat related insects. Udeniozyma ferulica and Vonarxula javanica, reclassified from Rhodotorula ferulica and Rhodotorula ferulica, respectively (Wang et al., 2015) were known to occur in polluted river waters (Sampaio and Van Uden, 1991). River run offs could be the reason for these species to occur in mangrove and continental shelf sediments.
The fact that Debaryomyces mycophilus was first isolated from wood lice (Thanh et al., 2002) opens the possibility that this species may also occur in mangrove habitat related insects, as the genetic materials of insect could be obtained and studied from the sediment of its habitat (Thomsen et al., 2009). Debaryomyces pseudopolymorphus has been extensively involved in the wine fermentation and associated processes (Potgieter, 2004;Villena et al., 2006;Arevalo-Villena et al., 2007). D. pseudopolymorphus isolation and its function in the mangrove environment is new and unknown. Scheffersomyces shehatae has been known to occur in degrading woods (Kordowska-Wiater et al., 2017) and in wood digesting insects (Suh et al., 2013) was also commonly used for the production of bio-ethanol (Tanimura et al., 2015;Kordowska-Wiater et al., 2017). The association of S. shehatae with mangroves and its associated insects could be further explored. Colacogloea falcatus reclassified from Sporobolomyces falcatus (Wang et al., 2015) were first isolated from dead plant leaves (Nakase et al., 1987). Also, they were isolated from plants phylosphere (Nakase et al., 2003;Takashima and Nakase, 2000) and acidic soils (Delavat et al., 2013). Their presence in sediments of mangrove and continental shelf was unknown until now.
Pichia guilliermondii has widespread occurrences such as plant endophytes (Zhao et al., 2010), citrus fruit flora (Arras et al., 1998), beetle associated (Suh et al., 1998), and in sewage sludge (de Silóniz et al., 2002). Therefore their isolation in this study may be correlated with multiple sources. Kazachstania aerobia, first isolated from plants (Magalhaes et al., 2011) and latter recognised as plant associated yeast (Lu et al., 2004) was unknown to occur in mangrove related habitats until now. Sporobolomyces koalae was first isolated from koalas bear (Satoh and Makimura, 2008), and found in other animals such as horses (Fomina et al., 2016) were unknown to occur in marine related habitats until now.

Conclusion
There could be ~3.8 million unknown fungal species (Hawksworth and Lücking 2017) and the environmental selection pressure plays a crucial role in new species evolution (Handelsman, 2004, Hibbett, 2016. The present study was first of its kind in exploring large scale of marine environments for culturable species of yeast. As a result, 1017 barcodes of 157 marine yeast species were produced, of which 91 barcodes of 13 species was barcoded for the first time. This study recorded terrestrial yeast species introduced into the marine environment (ex., Cystobasidiopsis lactophilus, Oberwinklerozyma yarrowii) and marine endemic species whose occurrences was restricted to specific marine ecosystem (Ex.,

Bandonia marina, Candida oceani). The DNA barcodes have been published via GenBank
and BOLD databases for public use, which will also improve the yeast species barcode coverage and taxonomy in the public databases. These DNA barcodes can also help identify and estimate marine yeast diversity from environmental samples, as many metagenomic diversity studies suffers from lack of local species barcode library (Hawksworth, 2001;Handelsman, 2004;Hibbett, 2016).
Our next challenge will be to explore the biochemical and industrial potential of the isolated strains by venturing into new marine environments with continuous expansion of the barcode databases. This may be the largest DNA barcode dataset for culturable marine yeast species. The yeast barcode data produced may be used to explore taxonomic distribution of specific physiological traits (ex., theromotolerance), species of climate and pathological significance (Robert et al. 2015). Correlation of the yeast barcode data with other traits such as the ability to produce various metabolites and industrial products of biotechnological significance (example, antibiotics) would be a valuable resource for yeast researchers willing to apply DNA barcoding technology beyond taxonomic and identification applications. The research was partially supported by Chiang Mai University.  (2018). Isolation of Rhodotorula mucilaginosa from blood cultures in a tertiary care hospital.