Identifying seaweeds species of Chlorophyta, Ochrophyta and Rhodophyta using DNA barcodes

Strengthening the DNA barcode database is important for a species level identification, which was lacking for seaweeds. We made an effort to collect and barcode seaweeds occurring along Southeast coast of India. We barcoded 31 seaweeds species belonging to 21 genera, 14 family, 12 order of 3 phyla (viz., Chlorophyta, Ochrophyta and Rhodophyta). We found 10 species in 3 phyla and 2 genera (Anthophycus and Chnoospora) of Ochrophyta were barcoded for the first time. Uncorrected p-distance calculated using K2P, nucleotide diversity and Tajima’s test statistics reveals highest values among the species of Chlorophyta. Over all K2P distance was 0.36. The present study revealed the potentiality of rbcL gene sequences in identification of all 3 phyla of seaweeds. We also found that the present barcode reference libraries (GenBank and BOLD) were insufficient in seaweeds identification and more efforts were needed for strengthening local seaweed barcode library to benefit rapids developing field such as environmental DNA barcoding. We also show that the constructed barcode library could aid various industrial experts involved in seaweed bio-resource exploration and taxonomy/non-taxonomic researches involved in climate, agriculture and epigenetics research in precise seaweed identification. Since the rise of modern high-throughput sequencing technologies is significantly altering bio-monitoring applications and surveys, reference datasets such as ours will become essential in ecosystem’s health assessment and monitoring.


Introduction
Seaweeds are marine macroalgae that inhabit the littoral zone and are significant in terms of marine ecology and economics (Dhargalkar and Pereira, 2005). Seaweeds were taxonomically distributed in 3 major phyla; 1. Ochrophycea, commonly called as brown algae because of its xanthophyll pigment 'fucoxanthin', 2) Chlorophyta, commonly called as green algae because of dominant chlorophyll pigments 'a' and 'b', and minor xanthophyll pigments; and 3) Rhodophyta, commonly called as red algae because phycocyanin and phycoerythrin pigments (O'Sullivan et al., 2010). Globally more than 4000, 1500, and 900 species of Rhodophyta, Ochrophyta and Chlorophyta, respectively were documented (Dawes, 1998). Chlorophyta and Rhodophyta were dominant in the tropical and subtropical waters whereas Ochrophyta dominates cold temperate waters (Khan and Satam, 2003). Seaweeds plays a vital role in contribution of pharmacologically active compounds as 30% of marine derived pharmaceutical formulations were from Seaweeds (Blunt et al., 2007). In marine biodiversity point of view, dynamics of seaweed diversity are of major concerns as they constitute 40% of all marine invasive species documented so far .
Identification of seaweeds based on morphological characters is difficult as most genera were known for its diverse morphotypes. For example; Durvillaea antartica in response to local hydrodynamic conditions forms distinct morphotype (Méndez et al., 2019).
Also convergent evolution has simultaneously occurred in numerous distantly related macroalgae resulting in similar forms which complicates morphology based identification. For example; the uniqueness of kelp forests defined by its stiff stipes was the result of 5 separate evolutions (Starko et al., 2019).
DNA barcoding involves sequencing a gene fragment from precisely identified specimens to form a database and facilitate species identification (even by non-experts) simply by comparing the same gene sequences sequenced from unidentified specimens (Hebert et al., 2003, Mitchell, 2008. DNA barcoding has proved to be an efficient techniques for monitoring marine macro-algae (Kucera et al., 2008;Lee and Kim, 2015;Montes et al., 2017;. The CBOL (Consortium for the Barcode of Life) has proposed rbcL (RuBisCO large subunit) and matK (maturase K) as DNA barcodes for plants (Hollingsworth et al., 2009). Absence of matK in green algae (excerpt Charophyte (Sanders et al., 2003)) made them inappropriate for barcoding Chlorophyta (Caulerpa sp.). Hence 3 there is an urgency in evaluating a universal barcode gene for all 3 phyla (Ochrophyta, Rhodophyta and Chlorophyta) of seaweeds. rbcL barcodes has proven to be a potential barcode gene for identification of natural, aquacultured and processed seaweeds Saunders and Moore, 2013;Gu et al., 2020). rbcL gene has also proven its ability in unmasking overlooked seaweed diversity (Saunders and McDevit, 2013), identification of cryptic invasive (Saunders, 2009; and exotic (Mineur et al., 2012;Montes et al., 2017) species, besides discovering new seaweed species (Griffith et al., 2017).
Documenting and accounting seaweed DNA barcodes for precise identification is important as seaweeds responds to variable water currents with flexible morphology and strengths (Sirison and Burnett, 2019).
A comprehensive reference genetic library of seaweed DNA barcodes, coupled with taxonomic and geo-referenced data is required for precise utilization of DNA barcoding technology . Since the significance of biotic indices in climate change studies are increasing (Brodie et al. 2017) and number of new seaweed taxa being discovered is on rise , creating a barcode library for extant seaweed species are the need of the hour as it would facilitate the identification of existing and discovery of new species. Such libraries could also facilitate precise identification of seaweeds for commercial, ecological and legislative purposes .
Strengthening the local barcode reference database is of paramount importance in using reference database for an accurate species identification (Hleap et al., 2020) which is currently lacking for seaweeds . Though peninsular India contains more than 860 seaweed species (Kannan and Thangaradjou, 2007;Jha et al., 2009;Ganesan et al., 2019;Mantri et al., 2019), DNA barcoding efforts are limited to sequencing one (Bast et al., 2014a, b;Bast et al., 2015;Bast et al., 2016a, b) or two species (Mahendran and Saravanan, 2014) at a time. For a first time we sampled and barcoded wide range of seaweed species belonging to all three phylum viz., Ochrophyta, Rhodophyta and Chlorophyta. The main objective of the study is to test the efficacy of DNA barcode reference libraries in precisely identifying Indian seaweeds and built a barcode reference library for Indian seaweeds.  (Table 1). The seaweeds along with its holdfast were collected during low tide in the intertidal and sub-tidal region where the vegetation was discontinuous and occurring in patches. The samples were washed thoroughly with seawater to remove sand and other debris (marine soil debris, attached shell, mollusks adhering debris and associated biota). The second washing was done with deionised water to get rid of excess salts (Sivasankari et al., 2006). After washing, the samples were transported to the lab in sterile zip lock bags in cold conditions. The samples were kept in -20˚C till further use. The samples were thawed in artificial seawater in the lab. Morphology of all the Seaweed samples were carefully examined. The morphological and anatomical characteristics of the collected seaweeds were observed and identified using microscopic and macroscopic (such as frond size, leaf shape, leaf border, vesicles (air bladder), and receptacles) comparative analyses. The typical characteristics taken into account include internal structure, color, size & shape and by comparing to the existing photographs and data previously published for this region (Dinabandhu, 2010). The seaweeds were also identified based on the taxonomic keys described by Srinivasan (1969Srinivasan ( ) & (1973, Rao (1987), 5 Chennubhotla et al., (1987), Ganesapandian and Kumaraguru (2008), Jha et al., (2009) and with catalogue of benthic marine algae of Indian Ocean (Silva et al., 1996). Total of 31 species were identified and the voucher specimens were named from (DNA Barcoding of Indian Seaweeds) DBIS1 to DBIS31. The same name tags were used for molecular analysis.

Sample collection and identification
The metadata including pictures and systematic positions of the identified species could be accessed in Barcode of Life Database (BOLD; www.boldsystems.org) under the project title "DNA barcoding of Indian Seaweeds" or using a unique tag "DBISW".

Sequence analysis
For few samples, PCR amplification and DNA sequencing reactions were repeated either to improve the length of DNA sequences recovered or the quality of the final chromatograph.
Only good quality sequences (with precise base calling and 90% of total length) were included in the study. All sequence chromatographs were manually double checked for quality using Chromas Pro version 2.6.6. Forward and reverse chromatograms were aligned using Bio Edit (Hall, 1999). The sequences were aligned in Clustal X ver. 2.0.6 ( Thompson et al., 1997) and Molecular Evolutionary Genetic analysis (MEGA) ver. X was used for 6 phylogenetic and pair-wise distance analysis (Kumar et al., 2018). The pair-wise distance was calculated as per Kimura-2 parametric distance model (Kimura, 1980). NJ tree was redrawn using Interactive Tree Of Life (iTOL) (Letunic and Bork, 2019) for better representation of tree based identification. Kimura-2 parameter distance model (Kimura, 1980) was to calculate the distances between the sequences. All three codon positions and non-codon positions were included and all the alignment positions containing gaps and missing data was eliminated from the analysis. MEGA X was also used to conduct nucleotide diversity and Tajima's neutrality (Tajima, 1989;Nei and Kumar, 2000) tests. The rbcL sequences produced in the present study was available for public in Genbank and could be accessed through accession numbers MT478065-MT478095.

3.2.DNA barcoding Ochrophyta and its utility
Among the 9 species of Ochrophyta barcoded in the present study, Chnoospora implexa are well known for its anti-microbial properties (Seker et al., 2019;Rani et al., 2020).
Whereas Dictyota dichotoma are known for hosting rich mycobiota diversity (Pasqualetti et al., 2020) and its anti-cancer potential (El-Shaibany et al., 2020). In recent years, genetic characterization of D. dichotoma under acidified ocean conditions receives special attention (Porzio et al., 2020). Padina tetrastromatica and P. tetrastromatica barcoded from the same sampling location were known for high mineral compositions (Vasuki et al., 2020) and antiviral (HIV) activities (Subramaniam et al., 2020) respectively. Hence generating barcodes for the above said species would benefit help non-taxonomists experts from mineral and pharmaceutical industries for species identification. Turbinaria spp. were known to contain diverse pharmacologically active formulations , extracts of T. ornata barcoded in the present study was known to synthesise silver nano-particle (Renuka et al., 2020) and nano material (Govindaraju et al., 2020). DNA barcoding T. ornata is significant as they exhibit dynamic morphological characteristics based on the strength of ambient water currents (Sirison and Burnett, 2019).
Sargassum spp. known for delivering wide range of natural and pharmaceutical formulations , significantly influences the coastal ecosystems (Nguyen and Boo, 2020) and the species barcoded in the present study such as S. polycystum forms a thick mat like growth which profoundly influences the coastal waters (example, Vietnam coast (Nguyen and Boo, 2020)). S. tenerrimum barcoded in the present study was previously known for its application as catalyst in biochar production (Kumar et al., 2020) and in lead absorption (Tukarambai and Venakateswarlu, 2020). DNA barcoding the S. swartzii which was estimated to play important role in studying the future ocean temperature rise (Graba-Landry et al., 2020) will facilitate easy identification for climate researchers. DNA barcoding the seaweed species such as Sargassum linearifolium which forms its own ecosystems by hosting diverse invertebrate communities (Lanham et al., 2015) and exhibit with high intraspecies variability and adaptive morphological changes (Stelling Wood et al., 2020) gains more importance. Sargassum polycystum barcoded from the present sampled area were also known for its wide range of phytochemical composition (Murugaiyan, 2020). Hence the 9 Ochrophyta barcodes produced in the present study will be useful to taxonomic, nontaxonomic experts, pharmaceutical, minerals, naturopathic and nano-technology industries, researchers of climate science, agriculture and epigenetics.

DNA barcoding Rhodophyta and its utility
Acanthophora spicifera barcoded in the present study are known for branched-long cylindrical thallus (Nassar, 2012) and preferred habitat for brachyuran (Granado et al., 2020) and horse shoe crab species (Butler et al., 2020) whose occurrences is seasonal (example; South-eastern Brazil coast (Lula Leite et al., 2020)). A. spicifera sampled from present sampling study area was also known to accumulate Cadmium in its tissues which were biomagnified to its animal in-habitants (Ganesan et al., 2020). Hence generating DNA barcodes for such species will aid in effective environmental monitoring. Gracilaria corticata barcoded from the present sampling area were known to for its high sulphate and mineral content, which was currently utilized by food industry . G. cortica extracts plays vital role in Zinc removal (Heidari et al., 2020), Zinc oxide (Nasab et al., 2020) and silver nano-particle synthesis (Rajivgandhi et al., 2020). The extracts of Gracilaria folifera barcoded in the present study were known for mosquito larvicidal activities (Bibi et al., 2020).

Gracilaria salicornia barcoded in the present study was an invasive species of
Hawaiian coastal waters (Hamel and Smith, 2020) and less preferred substratum for microbial grazing (Tan et al., 2020). It was also known that when G. salicornia and Acanthophora spicifera co-occurs, epiphytic micro-algal assemblages prefers A. spicifera, rather than G. salicornia (Beringuela et al., 2020). Hence the DNA barcodes of G. salicornia generated in the present study will be useful for exploring invasive potential and in macroalgal identification during microbial-macro-algal interaction studies. Gracilariopsis longissima are actively cultured seafood (Bermejo et al., 2020) and a source of bioactive compounds (Susanto et al., 2019). Hydropuntia edulis were known for its high content of UV-absorbing compounds (Tanaka et al., 2020). Hence the generated barcodes will be useful for taxonomic non-experts of food, pharmaceutical and cosmetic industry. However it's worth mentioning that the density of H. edulis in the present sampled area was alarmingly declining due to un-sustainable harvesting usually for local food grade agar production (Rao et al., 2006;Ganesan et al., 2011).
Hypnea musciformis barcoded in the present study were known for bio-preservative compounds which improves shelf life period of seafood (Arulkumar et al., 2020). Previous 1 1 studies has shown that biochemical composition (protein, carbohydrate and lipids) of Hypnea valentiae were known to vary seasonally from the sampled area (Murugaiyan and Sivakumar, 2020). Further studies could be carried out to explore the possibility of linking DNA barcodes to inter and intra-species biochemical constituents of seaweeds. Jania rubens that hosts diverse amphipod species (Kh. Gabr et al., 2020) are also known for its diverse haplotypes (Harvey et al., 2020) and bioactive resources (Rashad and El-Chaghaby, 2020). Further studies could explore, if DNA barcodes could effectively delineate various haplotypes of J.
rubens. In species like Kappaphycus alvarezii barcoded in the present study and the ones occurring in Brazilian coast (Nogueira et al., 2020), various haplotypes were known to contain variable composition of anti-oxidants (Araújo et al., 2020). They were also actively used for the nutrient removal in integrated fish culture systems (Kambey et al., 2020). DNA barcodes of Palisada perforata produced in the present study can also be used to identify the same species occurring in Egyptian (Kh. Gabr et al., 2020) and Persian coasts (Abdollahi et al., 2020), as DNA barcoding works universal. The carrageenans of Sarconema filiforme barcoded in the present study were known for anti-inflammatory and prebiotic activity (du Preez et al., 2020), which was an anthropogenically threatened species in Tanzanian coast (Kayombo et al., 2020). Hence the generated barcodes will aid in environmental monitoring for ensuring continuous perpetuation of this species. DNA barcodes of Rhodophyta generated will be useful for species identification by non-experts of seafood (and its by-prodcuts), pharmaceutical, cosmetic and nano-technological industry and in environmental monitoring to explore the presence of invasive and optimal perpetuation of threatened seaweed species besides the morphological variability exhibited by Rhodophyta (Rodríguez and Otaíza, 2020).

DNA barcoding Chlorophyta
DNA barcodes of Caulerpa chemnitzia could be useful for researchers of phytochemical industries. Phytochemical and bactericidal activity of C. chemnitzia barcoded in present study area were known for high quantities of terpenoids, tannins and phenolic resins (Krishnamoorthy et al., 2015). C. chemnitzia were also known to occur in Bangladesh coastal waters (Bay of Bengal) (Abdullah et al., 2020). Recently sulfated polysaccharide of Caulerpa racemosa were positively evaluated for anti-inflamatory activities (Ribeiro et al., 2020). The extracts C. racemosa were also used as sun screens (Ersalina et al., 2020) and source of anti-bacterial compounds (Belkacemi et al., 2020). C. racemosa collected from same sampled are of the present study were also known for bio-diesel production (Balu et al., 2020). Hence the DNA barcodes of C. racemosa will be useful for researchers of 1 2 pharmaceutical, cosmetic and bio-diesel industries for seaweed species identification. The edible species, Caulerpa scalpelliformis barcoded from same sampled area were also known to bio-accumulate various heavy metals (Rajaram et al., 2020), increasing the chances for exploring this species in bio-remediation applications.
Chaetomorpha antennina barcoded in the present study were known for its fastidious growth patterns by rapid nutrient uptake (Imchen and Ezaz, 2018). Correlating DNA barcodes with ambient seawater conditions could aid in finding optimum species for the given ecosystems in near future. Ulva lactuca barcoded in the present study were good bio indicator of trace metal contamination (Bonanno et al., 2020), which are known for rapid absorption of cadmium metals (El-Sheekh et al., 2020). Recently they are used as model organisms for energy budget studies for managing diverse ecological conditions (Lavaud et al., 2020). Hence the DNA barcodes of U. lactuca would of immense use in environmental monitoring. Ulva reticulata barcoded in the present study was known for high carbon sequestration potential (Sathakit et al., 2020) and its biomass were used for bio-diesel and bio-ethanol production (Osman et al., 2020).

Tree based identification
NJ tree precisely clustered members of 3 different phyla into 3 different groups (Fig.   1 Rhodophyta are grouped in orange, green and red clades respectively. Sequences retrieved from GenBank were indicated by "accession-number_species-name" in the tree. Example, "JX069175 Kappaphycus alvarezii", whereas the sequences generated from the present study was indicated without accession numbers and are distinguished by orange colour fonts. Though previous study identified rbcL as potential candidate for species delineation in Chlorophyta (Kazi et al., 2013) and not as potential as mitochondrial genes in Rhodophyta (Ale et al., 2020;Siddiqui et al., 2020), the present study through tree-based identification reveals that rbcL as a potential candidate for barcoding seaweeds belonging to all 3 phyla. Alshehri et al. (2019) with limited sampling of seaweeds (n= 8 species) from all 3 phyla has also shown that rbcL could aid in precise species identification.

Genetic distance analysis
Kimura 2-parametric (K2P) (or uncorrected p) distance in all three phyla was positively correlated with nucleotide diversity (π) and Tajima's statistics (D

Conclusion
We have synthesised a comprehensive barcode data for 31 seaweed species (with 12 species barcoded for the first time) occurring along the Southeast coast of India. The present barcode reference libraries are insufficient in marine macro-algal identification for Indian species and more efforts for DNA barcoding the local species is necessary to facilitate the environmental monitoring efforts. Building a comprehensive local barcode reference library could contribute to resolving macroalgal taxonomy and systematics and address biogeography pertaining to invasion of non-indigenous species. This could also result in the development and application of cost-effective and better biodiversity monitoring projects which could contributes to the EU Directives and UN conventions. Hence strengthening the local barcode libraries by barcoding all species (>800 Indian seaweed species were documented globally) could facilitate cost-effective biodiversity surveys and effective environmental barcoding programmes in near future. The generated barcodes will be useful for various industrial (Pharmaceutical, fuel, seafood, cosmetic and nano-technological) and research (climate change, species distribution) applications. However regular monitoring of seaweeds in the marine environment will be necessary to evaluate the ecological shifts due to climate change (Bringloe et al., 2019). The rise of modern high-throughput sequencing technologies will significantly alter bio-monitoring applications and surveys in the near future (Fonseca et al., 2010;Hajibabaei et al., 2011;Leray et al., 2015). As a result, reference datasets such as ours will become essential for assessing health and monitoring various aquatic environments using seaweed barcodes. Further studies could increase number of barcodes per species from same and different geographical locations to shed lights on phylogeographic signals to trace back the origin of drifting seaweeds (Guillemin et al., 2020).