Genomes of two indigenous clams Anomalocardia flexuosa (Linnaeus, 1767) and Meretrix

Abstract


Background and Summary
Clams refer to the common name for several kinds of bivalve molluscs.The Veneridae family contains more than 700 described living species of bivalves or clams, and most of them are edible and exploited as food in different cultures around the world, including America, Asia and Europe (Huber, 2010) 1 .Clam digging activities, which refer to harvesting clams from below the surface of tidal sand or mud flats, also has long history in many places including Hong Kong.In the last century, clam digging in Hong Kong were mainly confined to villagers or recreational collection using hand tools on beaches during low tides for consumption or as a source of income.Nevertheless, clam digging activities have grown increasingly popular in recent years which threatens the clam populations and disturbs benthic biodiversity in some areas (Griffiths et al., 2006; So et al., 2021) 2,3 .Unlike many other places where sustainable clam digging practices, such as limiting the number of clams taken and/or temporary closure of clamming sites, Hong Kong does not have her own practices in the meantime due to the lack of information on the population structure of clams.Among the common clams that can be found in Hong Kong, such as that of Anomalocardia and Meretrix species which are the two frequently collected genera by local clam-diggers (So et al., 2021)  3 , genomic resources are currently lacking which hinders our understanding of their connectivity at different geographical locations.
Here, utilizing PacBio HiFi long reads and Omni-C sequencing data, we present two chromosomal-level genomes of common clams in Hong Kong, Anomalocardia flexuosa and Meretrix petechialis.Together with transcriptome data from various tissues, we produce highquality predicted gene models for the two clam species.These genome assemblies and transcriptome data provide valuable genomic resources for the understanding of genetic diversity and connectivity for future population genomics research in view of conserving local clam species and assessing the sustainability of clam digging activities.

Sample collection and High molecular weight DNA extraction
A. flexuosa and M. petechialis samples were collected in Shui Hau, Lantau Island, Hong Kong (22°13'14.2"N113°55'09.0"E)on 6 th July, 2023 and Yi O, Lantau Island, Hong Kong (22°13'58.4"N113°51'02.0"E),on 28 th August, 2022, respectively.Approximately 300 mg adductor muscle was used for high molecular weight (HMW) DNA extraction for both A. flexuosa and M. petechialis.For A. flexuosa, the tissue was first ground into powder with liquid nitrogen, from which HMW DNA was isolated by NucleoBond HMW DNA kit (Macherey-Nagel), following the manufacturer's protocol.For M. petechialis, HMW DNA was extracted using MagAttract HMW DNA Kit (Qiagen), following the manufacturer's instructions.The DNA samples were eluted with 120 µL of elution buffer (PacBio Cat.No. 101-633-500) and were subjected to quality check by the Qubit® Fluorometer, NanoDrop One Spectrophotometer, and overnight pulse-field gel electrophoresis.

PacBio library preparation and long-read sequencing
Prior to library preparation, approximately 5 µg of HMW DNA isolated from A. flexuosa and M. petechialis in 120µL of elution buffer were transferred to a g-tube (Covaris Cat.No. 520079) for DNA shearing with 6 passes of centrifugation at 1,990 x g for 2 min.The fragment size of sheared DNA samples was assessed with overnight pulse-field gel electrophoresis.A SMRTbell library was constructed for both samples using the SMRTbell® prep kit 3.0 (PacBio Cat.No. 102-141-700), following the manufacturer's instructions.Qubit® Fluorometer and overnight pulse-field gel electrophoresis were used to examine the quantity and quality of the SMRTbell libraries.Subsequently, the Sequel®II binding kit 3.2 (PacBio Cat. No. 102-194-100) was used for the final library preparation with primer annealing, polymerase binding and the addition of internal DNA control.The two libraries were loaded at an on-plate concentration of 50-90 pM with diffusion loading mode.The sequencing was performed on the PacBio Sequel IIe system for a 30-hour movie to generate HiFi reads for each sample.One SMRT cell was used for sequencing for A. flexuosa and M. petechialis, respectively.Finally, 21.83 Gb and 30.58Gb of Hifi reads were obtained for A. flexuosa and M. petechialis with average lengths of 8,017 bp and 10,729 bp and data coverages of 20X and 29X, respectively (Table 1).

Omni-C library preparation and sequencing
An Omni-C library was prepared for A. flexuosa and M. petechialis, respectively, using the Dovetail® Omni-C® Library Preparation Kit (Dovetail Cat.No. 21005), following the manufacturer's instructions.Approximately 50 mg of flash-freezing powered tissue was used for crosslinking with the addition of formaldehyde in 1 mL 1X PBS for each sample, followed by nuclease digestion.The lysate samples were assessed by Qubit® Fluorometer and TapeStation D5000 ScreenTape and were proceeded with the library preparation protocol.After the final quality check with Qubit® Fluorometer and TapeStation D5000 ScreenTape, the Omni-C libraries were sent to Novogene Co. Ltd for sequencing on an Illumina HiSeq-PE150 platform, from which 60.4 Gb and 56.6 Gb Omni-C data were generated for A. flexuosa and M. petechialis, respectively (Table 1).

Transcriptome sequencing
Total RNA was isolated from various tissues including foot and adductor muscle, mantle, digestive gland, gill and gonad for A. flexuosa and foot, digestive gland, gill and gonad for M. petechialis, using the mirVana™ miRNA Isolation Kit (Ambion), following the manufacturer's protocol respectively.The RNA samples were subjected to quality control using NanoDrop One Spectrophotometer, and gel electrophoresis.The qualified samples were sent to Novogene Co. Ltd for polyA selected RNA sequencing library construction and 150 bp paired-end sequencing.A total of 31.7 Gb and 23.1 Gb transcriptome data were obtained from different tissue types of A. flexuosa and M. petechialis, respectively (Table 1).

Repetitive elements annotation
Transposable elements (TEs) of the two genome assemblies were annotated as previously described (Baril et al, 2024) 17 using the automated Earl Grey TE annotation pipeline (version 1.2, https://github.com/TobyBaril/EarlGrey)with "-r eukarya" to search the initial mask of known elements and other default parameters.Briefly, this pipeline first identified known TEs from Dfam with RBRM (release 3.2) and RepBase (v20181026).De novo TEs were then identified, and consensus boundaries were extended using an automated "BLAST, Extract, Extend' process with 5 iterations and 1000 flanking bases added in each round.Redundant sequences were removed from the consensus library before the genome assembly was annotated with the combined known and de novo TE libraries.Overlap and defragment annotations were removed prior to final TE quantification.A total of 338.3 Mb and 427.2 Mb of repeat contents were annotated from the genomes of A. flexuosa and M. petechialis, which account for 31.05% and 40.85% of the assembly, respectively (Figure 2; Table 4).Of the classified TEs, LINE, DNA, and Rolling Circle contribute to the major proportions (Figure 2), which are listed in Table 4.

Syntenic analyses
Macrosynteny analysis revealed a 1-to-1 pair relationship between the 19 pseudochromsomes of A. flexuosa and M. petechialis using JCVI utility libraries 18 (Figure 3), showing a conserved chromosome architecture among the two species.

Data Records
The raw reads generated in this study, including

Figure 1 .
Figure 1.A) Pictures of A. flexuosa (left) and M. petechialis (right); B) Statistics of the genome

Figure 3 .
Figure 3. Macrosynteny plot of the 19 pseudochromosomes between A. flexuosa and M.

Figure 4 .
Figure 4. Genome assembly quality control (QC) and contaminants detection for A. flexuosa

Table 1 .
Genome and transcriptome sequencing data