Extracellular chemosymbiont populations in a shallow-water thyasirid clam potentially shaped by priority effect

Chemosymbiosis is a highly successful strategy that evolved in several animal groups, best known for dominating chemosynthetic ecosystems such as deep-sea hot vents and hydrocarbon seeps but also found in other systems such as reducing sediments in shallow water. The symbiont population structure can be determined by the host genetic inheritance, geographic partitions, and environmental factors, among others if the symbionts are acquired horizontally from the environment. Here, we suggest that the earliest colonies can also influence the episymbiont population, known as the “priority effect”, using the thyasirid cleftclam Thyasira tokunagai. This species is abundant in reducing sediments in Yellow Sea, hosting an extracellular sulfide-oxidizing symbiont (genus Sedimenticola) in the pouch-like structure in the gill. From samples taken across the whole Yellow Sea, complete symbiont genomes revealed only two dominant strains which were also verified by phylogenomic analysis. The two strains share key functional genes but exhibit a single-base difference in the 16S rDNA. We found mutually exclusive communities of these two strains in the 30 individual hosts studied, typically dominated or even monopolized by one strain. Inconsistent phylogenetic relationships between hosts and symbiont indicate the genetic heterogeneity of the holobiont, and the mean fixation index FST of each symbiont population within sampling sites showed a lack of correlation with geographic distance and environmental factors, suggesting another driving force. We deduce that the likely scenario is 1) the dominant symbiont strain is determined during initial acquisition; 2) the propagation of this initial strain as the host develops; 3) the continuous but restricted recruitment of the other strain during the adult stage. Overall, our results highlight the importance of “priority effect” in shaping the chemosymbiont population structures even in the horizontally transmitted episymbiont in a whole marginal sea area.


Introduction
The close association between eukaryotes and microorganisms, or symbiosis, has been ubiquitously reported across various host groups and both terrestrial and aquatic ecosystems [1,2].Symbionts play a crucial role in helping their host in nutritional acquisition and pathogen defense, in turn gain advantages in the development, local adaptation, speciation, and evolution [3,4].A key example of symbiotic mutualism is between invertebrate animals and chemosynthetic bacteria found in chemosynthesis-based marine ecosystems, best known for deep-sea hydrothermal vents and hydrocarbon seeps in the absence of sunlight.The bacteria oxidize reducing substances (e.g., hydrogen gas H 2 , hydrogen sulfide H 2 S, and methane CH 4 ) and produce organic matter for their hosts [5], such as siboglinid tubeworms such as Riftia [6][7][8], abyssochrysoidean and peltospirid snails [9][10][11], bathymodioline mussels [12,13], vesicomyid and thyasirid clams [14,15], forming the base of diverse and flourishing communities.
Most organisms host just one specific symbiont lineage, indicating strong selection and cooperation within the holobiont.Different hosts in the same environment often host phylogetically distinct symbionts, even when the symbionts are functionally similar [11,15].The chemosymbionts are mostly methane-or sulfide-oxidizing bacteria (MOB and SOB); Gammaproteobacteria is the most common while Alphaproteobacteria or Epsilonproteobacteria have also been reported [16,17].Some hosts are capable of hosting multiple clades of symbionts, for instance, pliocardiine clams in Vesicomyidae exclusively host two clades of SOB (genera Ruthia and Vesicomyococius) [18,19], and a similar pattern has also been reported in vestimentiferan tubeworms and most lucinid clams [20,21].The peltopsirid snail Gigantopelta aegis has dual symbiosis [11], while highly diverse symbiotic communities are known from the mussel Bathymodiolus azoricus [22,23].
Chemoysmbiosis is often obligatory for the host's development due to their heavy energy dependence on symbionts, typically with reduced digestive systems compared to their asymbiotic relatives [24][25][26].The mode of symbiont acquisition is variable.For instance, stringent vertical acquisition has been proposed in pliocardiine clams, evidenced by the presence of symbionts in the gonad and larvae, and the overall congruent coupled phylogeny between host and symbiont [27][28][29], though some phylogenetic incongruencies in species such as Turneroconcha magnifica and Calyptogena fausta suggest occasional hybrid or horizontal acquisition across different host taxa [19].Most other hosts, however, use horizontal transfer where they acquire symbionts from the environmental pool post-settlement.Horizontal acquisition is typified by a more diverse symbiont community, with the absence of symbionts in gonads and at the larval stages, and also the inconsistent or even disordered phylogeny between host and symbiont seen in bathymodioline mussels [12,30,31], lucinid clams [32], and Alviniconcha snails [33].A mixed mode of the above two acquisition modes has been proposed in solemyid clams and the peltospirid snail Chrysomallon squamiferum [34,35].
It has been hypothesized that locally adapted symbiont strains colonize the host symbiotic organ, which may aid hosts to better survive in its immediate environment and allow for success in a wider range of habitat conditions.This assumes that there are relatively homogeneous communities of symbionts at a local scale but heterogeneous among sites.Previous studies demonstrated that the symbiont community in horizontally acquired symbiosis is linked to biogeographic distances, environment, and host genetics [35].The geographic difference explains most of the symbiont composition in the mussel inhabiting hydrothermal vents across the Mid-Atlantic Ridge [36].In this case, 16 strains can co-existing in the gill tissue of the host, while the host genetic influence and geographic division play important roles in structuring the symbiont population [37].Similarly, the mixed acquisition of symbiont strains promotes the local adaptation and evolutionary success of Chrysomallon squamiferum snails in different vents [9].Moreover, the genetic diversity of horizontally acquired symbionts in a single host is reported as varying with the acquisition period in the life span, reflecting intra-host homogeneity and interhost heterogeneity.The high pairwise F ST values in Bathymodiolus brooksi mussels indicated the uptake of environmental bacteria in a restricted process and the communities in them might be determined by the earlier colonization via self-infection [38].However, for these reported studies, these symbionts are exclusively endosymbiotic within host cell's cytoplasm, it is unknown whether the episymbionts population structures are shaped similarly or by more complicated factors.
The episymbionts live on the surface of the epithelial area, meaning they are more able to communicate with other bacteria and the surrounding environment.For instance, the shrimp Rimicaris exoculate could acquire epibiotic bacteria along the life cycle [39].Here, our study focuses on the process of episymbiont population structure in chemosynthetic holobionts, using the thyasirid cleftclam Thyasira tokunagai in the Yellow Sea as a case study.Thyasira tokunagai is a member of the widely distributed T. gouldii species complex commonly found in reducing sediments.Through various analyses, including examining the chemosymbiotic basis, the 16S rRNA gene of symbionts, transmission electron microscopy (TEM) and fluorescence in situ hybridization (FISH) analyses on the symbiotic mode, and the carbon fixation rate via radioactive carbon tracing, we gained insights into the mechanisms that facilitate these interactions.By sequencing 30 individuals from nine populations across the whole Yellow Sea, we discovered that despite the homogeneity of the thyasirid clam, it hosts two dominant strains with a single base difference in the 16S rRNA gene.This allowed us to decipher the symbiont's pan-and core-genomic features, the symbiont, the strain-level population structures of the symbiont via single nucleotide polymorphisms (SNPs) analyses, population structure via full mitogenome sequencing of the host, and the gene expression.Our findings shed light on the major internal factors that shape intra-host structure for episymbionts and elucidate the potential host-symbiont interactions in the chemosymbiotic marine invertebrates.

Sampling description
Thyasira tokunagai (Fig. 1a) were collected from a total of twelve sites in the Yellow Sea between 42-72 m depth from three cruises on-board the R/V Lanhai 101 in August 2020, April 2021, and October 2021, with the site details and environmental parameters shown in Table S1 andTable S2.A 0.1 m 2 box corer was employed to collect the surface sediments (~2 cm).The T. tokunagai were manually picked out from the sediments via a 0.5 mm aperture mesh once they were on board and then were properly fixed immediately according to the morphology.Sample storage details can be found in supplementary information.

Nucleic acid extraction, library construction, and sequencing
The soft tissues of T. tokunagai were used for DNA extraction using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) following the company's protocol.Meanwhile, DNA was also extracted from approximately 0.5g of ambient surface sediments (wet weight) by using the PowerSoil DNA Isolation Kit (Qiagen, Hilden, Germany).NanoDrop Lite (Thermo Scientific, USA) and 1% agarose gel electrophoresis were used to check the DNA quantity and quality, respectively.The mitochondrial cytochrome c oxidase subunit I (COI) gene fragment was amplified by the universal degenerated primers [40].Similarly, the full-length 16S rRNA gene of bacteria from sediments and T. tokunagai gill were also amplified by the primers 27F and 1492R [41].The PCR product of the COI gene was sequenced by BGI (Qingdao, China) from both ends using Sanger sequencing.The PCR product of the 16S rRNA gene was sequenced on the platform of the PacBio RS II System by Novogene (Beijing, China) with CCS mode.For the Metagenome sequencing, DNA libraries were constructed using NEBNext® Ultra™ DNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer's recommendations.For oxford nanopore technologies (ONT) sequencing, the DNA of whole tissue was extracted using QIAGEN Genomic-tip 100/G (QIAGEN, Germany) according to the manufacturer's instructions, and the library was generated using SQK-LSK109 (Oxford Nanopore Technologies, UK) following the manufacturer's recommendations.
Total RNA was extracted using TRIzol reagent (Invitrogen, CA, USA) with the guidance of the manufacturer's protocol.RNA integrity and quantity were measured using the Bioanalyzer 5400 system (Agilent Technologies, CA, USA).cDNA was obtained by removing the prokaryotic and eukaryotic ribosomal RNA from the total RNA for the construction of a meta-transcriptomic library.The nucleic acid for metagenome and meta-transcriptome sequencing was subjected to Illumina NovaSeq 6000 sequencing at Novogene (Beijing, China) with paired-end mode and a read length of 150bp, and the ONT library was subjected to PromethION platform at Novogene (Beijing, China).All sequencing details are found in Table S3 Mitogenome assembly, annotation, and population structure analysis Raw reads were trimmed to remove low-quality sequences and adapters using Trimmomatic v.0.39 [42] with the following parameters: ILLUMINACLIP: TruSeq3-PE-2.fa:2:30:10,LEADING:20, TRAILING:20, SLIDINGWINDOW:4:15, MINLEN:100.NOVOPlasty v.4.3.1 [43] were employed to construct the mitochondrial genomes with default parameter (160M reads randomly selected per sample).Assembled mitogenomes were annotated on the MITOS web server [44] with the default setting except "the genetic code: 5 invertebrates", and further checked manually.Afterwards, 13 protein-coding genes (PCGs) of 30 individuals were aligned separately using MAFFT v.7.515 [45] with the default parameter.Population structure analysis was performed by STRUCTURE v.2.3.4 [46] with the settings of "K: from 2 to 7, 2,000,000 iterations and 10% of burnin".The most optimal K was determined using Structure Harvester [47] web server with the delta K method.

Stable isotope analysis
The soft tissues of three T. tokunagai were freeze-dried for two hours at -60 and ground into powder.The 4 × 6 tin cup packaging samples were used to weigh approximately 0.1 mg of powder sample and then subjected to the isotope ratio mass spectrometer (IRMS, Thermo Fisher Scientific Inc., Waltham, USA) for stable isotope determination in Third Institute of Oceanography (Xiamen, China).The carbon isotope abundance ratio of international standard VPDB (Vienna Peedee Bellemnite) was used to calculate the δ 13 C value, with an analysis accuracy of the δ 13 C value of ± 0.2 ‰.Similarly, the nitrogen isotope abundance ratio of nitrogen in the air was used to calculate the δ 15 N value with an analysis accuracy of ± 0.25 ‰.

DIC assimilation rate measurement based on radioactive carbon analysis
Radiotracer assays were used to determine the rate of dissolved inorganic carbon (DIC) assimilation by introducing a 14 C-labeled DIC tracer to the homogenized gill in a solution of MMJHS medium [48] and quantifying the amount of 14 C incorporated into total organic carbon (TOC) [49].Four replicates of gill tissue samples (three experimental samples and one negative control) were filled into 10 ml serum vials without headspace and sealed with sterile PTFE septa and aluminum caps.After that, 100 µL of 14 C-DIC solution (~4 × 10 4 Becquerel, Bq) was injected into each serum vial through the stopper by displacing the same volume of water.Before injecting the 14 C-DIC tracer, the microorganisms of negative controls were removed by adding 0.5 mL of 100% trichloroacetic acid (TCA).All samples were incubated in the dark at 28 for 72 h, and the microorganisms of the experimental group were removed with the addition of 0.5 mL TCA and filtered onto 0.2 μm GSWP membranes (polyethersulfone, Millipore) after incubation.The filters were rinsed with 35 ‰ sodium chloride (NaCl) solution [50] and transferred into 7 mL scintillation vials containing a 6 mL scintillation cocktail (Ultima Gold ™ Cocktail, PerkinElmer).The radioactivity of the filters was determined using a Tri-Carb 3110TR liquid scintillation counter [51].The turnover rate constant (k, 1/day) of DIC was calculated using equation (1), and the assimilation rate (Ass-rate, µmol/L / day) of DIC was calculated using equation ( 2): Here, DPM-14 C-POC is the radioactivity on the filter, DPM-14 C-DIC is the total activity of the added DIC tracer, t is the incubation time (day), and [DIC] is the DIC concentration (µmol/L) in the sample.

Transmission electron microscope
Pre-fixed dissected gill tissue was firstly washed with 0.1 M PBS three times for 15 min each time and fixed with 1 % osmic acid after being washed again with 0.1 M PBS three times for 15 min.Then, the tissue was dehydrated increasingly in a methanol series (50 %, 70%, 90 % once, and 100 % three times) for 15 min each and embedded in Epon 812 resin.Afterward, the embedded blocks were cured at different temperatures (37 , 45 , and 60 ), lasting for 1 day at each temperature.A Reichert ULTRACUT Ultrathin slicer (Austria) was used to slice the ultrathin slides at 70 nm thick.Slides were stained using the uranium acetate lead citrate double staining method that uranium acetate and lead citrate staining for 15 min each.Images were captured by a JEM 1200-EX (Japan) Transmission Electron Microscope at an accelerating voltage of 80 kV.
These predicted genes of pangenome were searched against the NCBI NR database by using BLASTp in DIAMOND v.2.0.15.153 [65] with an E-value cut-off of 1e -5 .The results further were used for Gene Ontology annotation by Blast2GO v.6.0 [66].Meanwhile, Clusters of Orthologous Group 2020 (COG2020) [67] was adopted to classify the functional groups of genes in the pangenome.The genes of the pangenome were annotated using BlastKOALA [62] by searching against the KEGG database.
To decompose the diversity of symbiont at the strain resolution, StrainPanDA [64] was adopted based on the newly constructed pangenome of 30 MAGs and the full set of clean reads.In contrast, the former software STRONG [68] infers strain composition based on core COGs but it could not handle large-number datasets (over 10M reads in our case) due to its memoryconsuming issue.In this work, STRONG was also employed for comparison, with less than 10M clean reads in each sample were used as input and the following parameter: bayespaths: nb_strains: 16, nmf_runs: 10, min_orf_number_to_merge_bins: 18.To verify the correlation of relative abundance of two symbiont strains and inferred from the software above, the 16S rRNA gene of Ca.Sedimenticola endothyasira oligotype G (G type) was mapped to 30 metagenomic datasets using Bowtie v.2.3.5 [69], and the coverage of 16S rRNA gene of Ca.Sedimenticola endothyasira of each dataset was checked by Integrative Genomics Viewer v.2.4.1 (IGV) [70].To check whether the diversity of strains was caused by recombination, Rhometa v.1.0.3 [71] was employed to estimate the population recombination rate with the input of clean reads.
In addition, to find the taxonomy of two strains of symbiont in chemosynthetic sulfur-oxidizing bacteria (SOB), a total of 57 symbiont genomes assigned by GTDB-tk v.2.1.1 [59] were downloaded for phylogenomic analysis with Bathymodiolus azoricus symbiont (MOB) as an outgroup.A pipeline of phylogenomic analysis was modified from the published works [72,73].

SNP calling
To understand the symbiotic population structure in T. tokunagai in the Yellow Sea, the single nucleotide polymorphisms (SNPs) analysis was adopted to find the nucleotide diversity within (π within ) and between host (π between ), with the assumption of each host as a symbiotic population.
The whole genome sequencing reads were aligned to the pangenome (as a reference genome) using Bowtie v.2.3.5 [69].A pipeline of Genome Analysis Toolkit (GATK) v.4.2.6.1 [74] was used to call SNPs in the symbionts with a standard pipeline.In detail, the .bamfiles were formatted using GATK AddOrReplaceReadGroups and GATK SortSam.Then, the duplicated 30 .vcffiles) was counted using GATK CountVariants.All .vcfwere merged using GATK CombineGVCFs for plotting the PCA of SNPs.To understand the population structure at the SNP level tentatively, plink v.1.9[75] was used for principal component analysis (PCA).The SNPs of core genes were extracted for downstream analyses.The per-gene nucleotide diversity (π) within single host individuals (π within ), between host individuals (π between ), and the fixation index (F ST ) were calculated following the codes (https://github.com/deropi/BathyBrooksiSymbionts/tree/master/Population_structure_analyses)from Ansorge et al [37].PCA of the pairwise per-gene π between and π within of core genes supported by PERMANOVA on pairwise Bray-Curtis dissimilarities with 9999 permutations using PAST version 4.03 [76].In addition, to understand the influence of geographic distance and environmental factors (i.e.depth, temperature, and dissolved oxygen) on the symbiont population formation, the correlation between the mean F ST of the symbiont population within each sampling site, and geographic distance and environmental factors, meanwhile, we performed the redundancy analysis (RDA) analysis in vegan R package based on the symbiont allele frequencies [AD divided by DP], with environmental factors, geographic distance and oligotype followed the pipeline [36].

Differences between two symbiont oligotype in genome-level and gene expression
To find the difference between two oligotypes (i.e.G and A type), based on the pangenome presence-absence result, the specific genes only can be found in the oligotype A or G were determined at the gene level.Meanwhile, the difference also can be checked in the gene expression of host and symbiont with the following steps: clean reads were mapped to pangenome using Bowtie v.2.3.5 [69] and then were quantified using Samtools v.1.9[77], resulting in the symbiont-derived reads and symbiont-free reads.All the eukaryotic reads were subjected to Trinity v.2.13.2 [78] for de novo assembling a transcriptional profile in the host.The section in transcripts with coding potential was predicted using TransDecoder v.5.5.0 (https://github.com/TransDecoder/TransDecoder).CD-HIT v.4.8.1 [79] was used to remove the redundant sequence with the setting of "c = 0.8".The quality of transcripts was checked using BUSCO v.5.4.4 [80].Salmon v.1.9.0 [81] was performed to quantify the gene expression levels, and the normalized gene expression matrix (TMM) obtained was further utilized in the PCA analysis normalized by log-transformed as well as further added one to uncover the gene expression pattern of each individual.In addition, the longest CDS sequences were annotated by searching against the NR, KEGG, and GO databases above.For symbiont gene expression, the pangenome was used as a reference to quantify the gene expression level with the rest analyses the same as the part in the host.

Results and Discussion
The undifferentiated population of Thyasira tokunagai and its chemosymbiotic capacity Compared to other benthic fauna and environmental samples from the Yellow Sea (Table S4), Thyasira tokunagai exhibited the lowest δ 15 N value (-0.23 ± 0.22 ‰, n = 3), suggesting a potential autotrophic supplement on its nitrogen source (Fig. 1b and Table S4).The δ 13 C value of T. tokunagai (-20.52 ± 0.43 ‰, n = 3) was higher than that of particulate organic matter and phytoplankton, but lower than other benthic fauna.These results indicate that T. tokunagai relies heavily on the chemosymbiont for nutrition.The assimilation rates of inorganic carbon in T.
tokunagai were quantified using radioactive 14 C-labeled bicarbonate (Table S5), which were 0.32, 0.29, and 0.55 µmol DIC per day per individual (average: 0.39 μmol DIC/day).According to the species' average density (~75 samples/m 2 ) based sampling from our three research cruises, the carbon fixation flux in the T. tokunagai population was estimated to be 0.35 milligram (mg) Carbon every square meter per day in the Yellow Sea.Considering that T. tokunagai is a dominant species in the Yellow Sea and Japan Sea [82][83][84][85], and its close relative Thyasira cf.
gouldii is a pan-arctic species widely discovered in the Atlantic Ocean [86], the role of DIC fixation by thyasirids in the T. gouldii complex cannot be neglected (Supplementary Note 1), and chemosynthesis potentially contributes to ocean carbon budgets [87].
STRUCTURE assigned the individuals into two lineages using the alignment of 13 proteincoding genes (PCGs) of 30 mitogenomes based on the delta K (K = 2; Fig. 1c), suggesting the individuals in the Yellow Sea originated from the same ancestor.Meanwhile, the pairwise comparisons of the mitochondrial cox1 gene revealed an average of 99.84 % similarity (Table S6 and Fig. S1b) and the haplotype network (Fig. S1c and Table S7) was of a 'star-burst' type, although these specimens collected from nine distinct sampling sites.The haplotype network showed that nearly all haplotypes lack a clear geographical affinity, indicating that there is no significant differentiation among the nine sampling populations, showing frequent gene exchange and a panmixia for T. tokunagai in the Yellow Sea.
A species (Candidatus Sedimenticola endothyasira) in the genus Sedimenticola is the dominant clade among the bacterial community in the gills of T. tokunagai based on the full-length 16S rRNA gene amplicon sequencing, accounting for over 80% except for two individuals (i.e., 70.6% in NYS5_3 and 67.7% in NYS7_4).This Sedimenticola symbiont could not be found in the ambient surface sediment (Fig. 2a and Fig. S2), Notably, there was only a single base pair difference (G vs A) at the 563 rd position of the top two ASVs in this clade with 46.28 % and 44.55 % (average percent), respectively (Fig. 2b), which was further verified by 16S rDNA clone library and Sanger sequencing (Supplementary Note 2).The phylogenetic tree of the 16S rRNA gene of symbiont confirmed that its closest relative was a symbiont of Thyasira cf.goudlii (Fig. S3).Combining ONT and Illumina sequencing, a total of 30 high-quality circular genomes (i.e. MAGs) were assembled, with completeness > 99.23%, contamination rate < 0.34%, and size of 4.5Mb (Table S9).The average nucleotide identity (ANI) of these 30 MAGs ranged from 98.90 to 99.94% (Fig. 4c and Table S10).Based on the 563 rd base pair of the 16S rRNA gene, they could be categorized into two groups, with 12 in Ca.Sedimenticola endothyasira oligotype G (with G in the 563 rd position) and the rest 18 for Ca.Sedimenticola endothyasira oligotype A (with A in the same position).Results from phylogenomic analysis based on 1745 single-copy core genes of these 30 MAGs showed the two distinct phylogenetic clusters corresponding well to the two oligotype defined by 16S amplicon sequencing (Fig. S5).We also found that those within same oligotype group had a higher ANI identity (average of between two oligotypes group: 99.16%; average within oligotype G group: 99.77%; average within oligotype A group: 99.58%).Collectively, we show the dominant Sedimenticola symbiont encapsulating two distinct strains but with very high genomic similarity.The second clade Spirochaeta_2 account for 3.69 ± 4.36%.The co-existence of Sedimenticola and Spirochaeta_2 was also reported in a lucinid clam [88].
The presence of the symbiont was examined by FISH analysis, showing the Sedimenticola symboint was concentrated in the bacteriocytes located at the middle part of the gill filament apart from the ciliated filament tip (Fig. 2c), and symbiont was enveloped by cell membranes.
No signal was detected based on negative control (Fig. S6).In addition, TEM observation (Fig. 2d) of gill tissue showed that symbionts were localized in extracellular pouch-like structures among the microvilli, together with FISH results, implying an episymbiotic mode with the gill in T. tokunagai where bacteria are maintained outside of the host cytoplasm [89].

Two strains without major metabolic differences co-existing in a single host individual
Regarding symbiotic diversity at the strain level, T. tokunagai, there were two representative strains classified including Ca. Sedimenticola endothyasira oligotype G and Ca.Sedimenticola endothyasira oligotype A and their abundance was quantified in StrainPanDA (Fig. 4a).Notably, there was a positive correlation between the percentage of G type from StrainPanDA and the percentage of metagenomic reads of G base pair at the 563 rd position in 16S rRNA gene (R 2 = 0.969, p < 0.001; Fig. S7c).Congruent result was also observed in STRONG, another software for strain decomposition but with limitations in data input and the identification of representative three strains (Fig. S7a), these results showed that a positive correlation between STRONG and StrainPanDA (R 2 = 0.991, p < 0.001; Fig. S7b).The placement of these two representative strains in genus Sedimenticola were also verified by phylogenetic reconstruction at the genomic level (Fig. 4b and Fig. S4).
The metabolic potential in the symbiont of T. tokunagai was highly conserved (Fig. 3b).It encodes the full enzymes in carbon fixation and utilization, including the reductive pentose phosphate cycle (Calvin cycle), glycolysis/gluconeogenesis, tricarboxylic acid cycle (TCA), and oxidative phosphorylation, enabling the symbiont assimilating DIC in the former part.In the Calvin cycle, the ribulose-bisphosphate carboxylase large chain (K01601) was found in both oligotypes.The complete dissimilatory nitrate reduction pathway enables the symbiont to proceed with respiration under an anaerobic or hypoxic environment but hinders it from producing ammonia due to the lack of the nitrite reductase (NADH) large subunit (K00362).A complete dissimilatory sulfate reduction pathway and a partial SOX system were also found, mainly containing soxA, B, X, Y, and, Z, but soxCD was lacking.The incomplete assimilatory sulfate reduction pathway was detected, which contained sat and cysC.We also found the genes and enzymes related to hydrogen oxidation containing hoxF, U, Y, H, hybC, and hyaB.The symbiont encodes ABC transporters and PTS pathways, indicating the capacity of heterotrophy.
The bacterial chemotaxis and flagellar assembly pathways were found.Additionally, the symbiont has a relatively complete capacity for the biosynthesis capacity of amino acids (20), and vitamins and cofactors (6) (Table S11), suggesting a capacity for chemosymbiosis.
Based on the previous results from phylogenetic and SNP analyses, we looked into the potential differences in the two oligotypes of the T. tokunagai symbiont (i.e., oligotype A and oligotype G).
At the coding-gene level, we compared the two oligotypes groups based on the pangenome information, showing that there were 11 specific genes identified for oligotype A group and a different 11 for the oligotype G group (Table S12).Among those, the oligotype A specific genes were related to chemotaxis protein, ammonium transporter, and alpha/beta fold hydrolase, while those specifically belonging to oligotype G were related to cytochrome c, phosphodiesterase and ArsJ-associated glyceraldehyde-3-phosphate dehydrogenase.As for gene expression, we quantified and annotated the host and symbiont transcripts (Table S13 and Table S14), indicating there was no congruent pattern with the different oligotypes shown in PCA, no matter at the host (Fig. 6a) or symbiont levels (Fig. 6b).Moreover, no key functional difference in the top 50 of the highest expressed chemosymbiosis-related genes could be found, in both the host (Fig. 6c) and symbiont (Fig. 6d).Taken together, the differences between these two major oligotypes might be limited to gene sequence variants instead of gene functions.

Horizontal acquisition of symbionts with strong selection
Our results revealed distinct bacterial communities in the gills of T. tokunagai compared to their surrounding sediments, and the bacterial communities in sediment from the Yellow Sea were roughly consistent with a previous study [90].Though the lack of symbionts detected in the sediments may superficially appear to suggest a vertical transmission mode where the symbiont is passed down from parents to the offspring, it is not uncommon for horizontally transferred symbionts to be undetected from environmental samples [91][92][93].Previous studies reported that thyasirid clams use their super extensile foot to 'mine' sulfide; and the magnetosome in its symbionts [94,95] imply that they might be derived from the specific niche of sediments, which may not be fully covered in the present study.The metabolic potential of the symbiont genomes is suggestive of horizontal transmission.Vertical transmission typically leads to the loss of essential genes (e.g., flagellar and chemotaxis genes) in the symbiont genome required for a freeliving lifestyle, known as genomic reduction [25].The genomic sizes of the T. tokunagai symbiont MAGs were 4.5Mb (Fig. 3a), much larger than that typical of vertically transmitted symbionts which mostly exhibit genome sizes less than 2Mb [2,14,96].Similar to previous metagenomic research of T. cf.gouldii [97], a set of genes encoding flagella and chemotaxis were identified in both oligotypes of the T. tokunagai symbiont, indicating that they have the potential to survive outside their host.The presence of the TCA cycle in their genomes also suggests that they can use extraneous carbon sources from the environment directly during a non-symbiotic or free-living stage.It was also evidenced by the inconsistency of phylogeny between host mitochondria and the corresponding symbionts (Fig. S5).In addition, the sampling site of T. tokunagai covered most of the extent of the Yellow Sea, with the distance from north to south extending over 500 km.Besides, there were only two strains observed in 30 individuals.
Therefore, we interpret that T. tokunagai most likely acquired the Sedimenticola symbionts from the environment via horizontal transmission, controlled by a highly selective mechanism between the host and symbionts.

Heterogeneous symbiotic populations among host individuals
The quality of pangenome is strongly correlated with the assembly level of MAGs.The use of fragmented MAGs might result in some incorrect comparisons among strains.Here, a comprehensive pangenome of T. tokunagai symbionts was constructed from 30 circular MAGs of each host, which consisted of 3256 core genes (79% in 4106 on average), 2008 accessory genes, and 371 specific genes (Fig. S8a; Table S15 and Table S16).Comparatively, previous studies retrieved lower portions of genes as core genes, such as 40-53% in bathymodioline mussels and 62.6%-68.8% in the peltospirid snail Chrysomallon squamiferum [9,37].Most of the elements participating in key processes of chemosymbiosis and fundamental metabolism were detected in the core genes, such as the TCA cycle, the Calvin cycle for carbon fixation, glycolysis, sulfide-oxidization, biosynthesis of amino acids and cofactors, sugar transfer, chemotaxis, and flagellar assembly (Fig. 3b).The accessory genes and specific genes were annotated by the search against the COG database (Fig. S8b and Table S14), of which several COG categories were mainly involved: signal transduction mechanisms (T); energy production and conversion (C); cell wall/membrane/envelope biogenesis (M); translation, ribosomal structure and biogenesis (J); replication, recombination, and repair (L); amino acid transport and metabolism (E).
Furthermore, with the 3256 core genes as the reference sequences, we deduced the structure of symbiont populations at the SNP level.In total, 101,688~199,595 SNPs were detected from 30 populations with the setting of "ploidy = 2" (Fig. 4a and Table S17).The shared SNPs of oligotype A (SNP density: 26.62~39.75SNPs/Kb) dominant groups account for 0.032% of total SNPs in the third position (Fig. S9a), while the shared SNPs of oligotype G dominant groups (SNP density: 22.28~43.72SNPs/Kb) account for 0.025% in the first position (Fig. S9b).PCA with a PERMANOVA test based on pairwise Bray-Curtis dissimilarities based on the total SNPs, showed the symbiotic community in T. tokunagai was shaped following their dominant oligotype rather than sampling sites (PERMANOVA test, F = 782.3,p = 0.22, Fig. 5a).The pairwise host individuals belonging to the same oligotype groups have a relatively low F ST value and a relatively high ANI value with the oligotype group clustered closely (Fig. 4c), further confirming that the symbiont communities were divided into two groups (i.e.oligotype G and oligotype A) at the SNP level, which was consistent with the aforementioned 16S rRNA gene sequencing and phylogenomic tree results.The same oligotype group has similar levels of intra-host π (average oligotype A: 1.23; average oligotype G: 1.51) and F ST within the same oligotype individuals (average oligotype A: 0.0023, average oligotype G: 0.0021; Table S18).Meanwhile, within the same oligotype (i.e.either within oligotype A symbionts or within oligotype G symbionts), the inter-host π is larger than the intra-host π (PERMANOVA test, F = 83.47,p < 0.0001, Fig. 5e), suggesting there is more difference between individuals compared to within a single individual.
In addition, the population recombination rate of thirty metagenome datasets was estimated, with the undetectable event of population recombination for distinct symbiont strains within single individuals, which further suggested that the highly diverse community at the strain level is likely to be ascribed to the accumulation of mutation.

Priority effect on the communities of episymbiont in thyasirid clam
The population structure of horizontally transmitted symbionts could be shaped by a series of factors, including host genetics, environmental factors, and geographic factors [98].Based on the symbiont population variant data, we revealed almost no correlation between the mean fixation index F ST of pairwise symbiont population from each sampling site and environmental factors (i.e.water depth: r s = -0.200;p = 0.61; temperature: r s = 0.467; p = 0.21; dissolved oxygen: r s = -0.550;p = 0.13; Fig. 5c).Similarly, the correlation between the mean F ST of pairwise symbiont population from each sampling site and geographic subdivision (r s = -0.085;p = 0.83; Fig. 5c) was performed.The redundancy analysis (RDA) showed that the variable oligotype contributed the dominant variation (42.5%) in the differentiated population, and more than half of it (23.1%)was regarded as the sole contributor (Fig. 5d).By contrast, only 3% and 0.1% variation were ascribed to the geographic distance and environmental factors as the sole factor.Overall, the analyses between the symbiont population versus the host genetics (as represented by mitogenome genotyping), environmental factors (depth, temperature, and DO), and geographic factors (distance) have shown a lack of correlation between them and symbiont population structure, indicating some other factors could shape the symbiont population structure.
Here, we suggest that the population structure of two highly similar symbionts in T. tokunagai is likely controlled by the earliest type of symbionts colonized in the gill (e.g., oligotype A or oligotype G), reported in the literature as the "priority effect" [38,99].Our result showed that T.
tokunagai tended to harbor either one of two oligotype (G or A) instead of a mixed mode between the two (Fig. 2a and Fig. 4a).Nonetheless, the average nucleotide identity between two oligotype' genomes was 98.71%, above the proposed threshold of inter-species variation of prokaryotes (95%) [60].They shared most metabolic pathways, which indicated they were almost indistinguishable from each other.In contrast to the symbiont population of the peltospirid snail Chrysomallon squamiferum [9] and Bathymodiolus mussels [37], the F ST and intra-host both revealed the same oligotype have a lower difference at the SNPs level (average of intra-host π:1.44; average of intra-host π:2.50; average of F ST within oligotype: 0.0022; average of F ST between two oligotypes: 0.0035), indicating the presence of two strains at the population level.Given that the environmental pool of symbionts (two oligotypes: A and G) is a homogeneous [100] and all the microbes share a similar chance to be acquired by newly settling T. tokunagai host, there should be a maximum frequency around the equivalent value (e.g., 50% in terms of oligotype G or A).Surprisingly, we detected the potential anti-normal distribution in the ratio of A:G, with only 2 in 30 samples (Fig. 5b), suggesting the restricted process in the first colonization, as reported in the mussel Bathymodiolus brooksi [38].It is possible that the thyasirid only acquires symbionts in a very short span after settling to the seafloor, though this remains unclear.As mentioned before, there was a high density of T. tokunagai in the Yellow Sea within the aggregated reducing habitats.Symbionts would be released from dead individuals, which could provide a source of symbionts for acquisition.We also cannot reject the possibility of acquiring symbionts contained within water currents.Therefore, assuming horizontal transmission (discussed above), we propose the plausible scenario where: 1) the dominant strain of Sedimenticola in T. tokunagai is determined during the initial symbiont acquisition stage just after larval settlement and early gill development; 2) the propagation of the initial type of symbiont as the T. tokunagai develops into adulthood; 3) the continuous but restricted recruitment of the other type of symbionts during the adult stage.

Conclusion
In this study, we characterized chemosymbiosis in the thyasirid clam Thyasira tokunagai sampled from the Yellow Sea.We show that the symbiont was likely mixotrophic, localised in a pouch-like structure delimited by microvilli on the gill, capable of carbon fixation verified by radioactive isotope tracing, and horizontally transmitted from the environment.With just two functionally equivalent strains and just one dominate in each individual, the thyasirid becomes a perfect model to study the factors determining the episymbiotic communities in chemosynthetic holobionts.We employed population genetics to compare the symbiotic diversity among 30 individuals, where host inherited genes, geographic distance, and environmental factors could not reasonably explain the differences in the composition of the two strains among individuals.
We deduced that the formation of the symbiont population in the thyasirid clam is possibly governed by the " priority effect." These findings are meaningful in understanding the determination of microbial community assembly in other holobionts and how the association between host and microorganism forms in chemosymbiosis.
analysis.HW contributed to the 16S rDNA sequence analysis.SM estimated the DIC assimilation rate of the symbiont.CC identified the samples.YL, XL, MS, CC, YL, XL, GZ, WZ, and JS contributed to the writing and editing of the manuscript.

Fig. 1
Fig. 1 Chemosynthetic capacity and population structure analysis of Thyasira tokunagai in the Yellow Sea.(a) Shell and external anatomy of Thyasira tokunagai from the Yellow Sea.(g, gill; f, foot; d, digestive diverticula; m, mantle; aa, anterior adductor; pa: posterior adductor), and a total of twelve sampling sites in the Yellow Sea during three research cruises in August 2020, April 2021, and October 2021.(NYS: North Yellow Sea; SYS: South Yellow Sea) (b) The stable isotopic niche of Thyasira tokunagai in the macrobenthic community of the Yellow Sea.The red solid circle represents Thyasira tokunagai.(c) Host population structure analysis based on the 13 protein coding genes of the mitogenome of 30 individuals from nine sampling sites in the Yellow Sea.

Fig. 2
Fig. 2 Bacterial community composition and symbiont distribution in the gill of Thyasira tokunagai revealed by 16S rRNA gene, fluorescence in situ hybridization, and transmission electron microscope.(a) Genus-level relative abundance based on the full-length 16S rRNA gene of the bacterial community in the gill of Thyasira tokunagai.All ASVs other than showed ASV in gill and sediment were merged under 'Others'.(b) Two ASVs belonging to Sedimenticola genus and their (i.e.Ca.Sedimenticola endothyasira oligotype G and Ca.Sedimenticola endothyasira oligotype A) minor differences are one base at the 563 rd position.(c) Fluorescence in situ hybridization of Thyasira tokunagai showing the dominant bacteria in the gill tissue.All cell nuclei were stained with DAPI, the cell membrane was stained by concanavalin-A, and symbionts were hybridized by the specific probe of Ca.Sedimenticola endothyasira.(d) The morphology and position of the symbiont were observed by transmission electron microscope.(MV: microvilli; S: symbiont).

Fig. 3 Fig. 4
Fig. 3 Circle genomes and metabolic capacity of symbiont.(a) A circular genome was reconstructed by the combination of ONT and Illumina reads.The genome size was 4.5M with a GC content of 52.16%.(b) Metabolic pathway related to chemosymbiosis of symbiont wasconstructed based on the pangenome information.Here, Ca.Sedimenticola endothyasira is highly conserved with a free-living status, which encodes the full enzymes in carbon fixation and utilization and has the potential to utilize sulfides and hydrogen as energy sources.

Fig. 5
Fig. 5 Population structure analysis at the SNP level uncovered the difference of symbiont population.(a) PCA based on the total SNPs revealed symbiont population differentiation related to oligotype (G or A type) instead of sampling sites (two oligotypes: pseudo-F = 782.3and p-value = 0.22).(b) Frequency of Ca.Sedimenticola endothyasira oligotype A (A type) percentage from StrainPanDA in 30 samples.Here, we divided into five groups, and the fitting curve was the non-normal distributions curve with p-value < 0.05.(c) The correlation between the mean F ST of the symbiont population within each sampling site, and geographic distance and environmental factors (containing water depth, temperature and dissolved oxygen), showing that the formation of symbiont correlation were weakly influenced by geographic distance and environmental factors.(d) Variation partitioning of explanatory variables in RDA.Here, a total of 3 responding variables included: geographic distance, environmental factor and symbiont oligotype.Values less than 0 are hidden.*** = p < 0.001; ** = p < 0.01.(e) Principal component analysis (PCA) of intra-host (filled dots) nucleotide diversity (π values) and pairwise inter-host (empty dots) nucleotide.PCA was supported by PERMANOVA on pairwise Bray-Curtis dissimilarities with 9999 permutations.The GA type represents the pairwise π F ST between two oligotypes.Others are the same as above.The bubble size represents the different π values, showing that each symbiont population have different nucleotide diversity (within-host π and between-host π: pseudo-F = 83.47 and p-value < 0.0001).

Fig. 6 2 Ca.
Fig. 6 Gene expression difference of the two oligotypes individuals with the top 50 TPM related to chemosymbiosis.(a) PCA supported by the normalized gene expression data of the host.(b) PCA supported by the normalized gene expression data of symbiont.(c, d) Comparative gene expression differences related to chemosymbiosis were shown with the top 50 transcripts per kilobase million (TPM) of host and symbiont, respectively.TPM was normalized by the zscore method.G type individuals were represented by the blue color, while A type individuals were represented by the yellow color.