Little skate genome exposes the gene regulatory mechanisms underlying the evolution of vertebrate locomotion

The little skate Leucoraja erinacea, a cartilaginous fish, displays pelvic fin driven walking-like behaviors using genetic programs and neuronal subtypes similar to those of land vertebrates. However, mechanistic studies on little skate motor circuit development have been limited, due to a lack of high-quality reference genome. Here, we generated an assembly of the little skate genome, containing precise gene annotation and structures, which allowed post-genome analysis of spinal motor neurons (MNs) essential for locomotion. Through interspecies comparison of mouse, skate and chicken MN transcriptomes, shared and divergent MN expression profiles were identified. Conserved MN genes were enriched for early-stage nervous system development. Comparison of accessible chromatin regions between mouse and skate MNs revealed conservation of the potential regulators with divergent transcription factor (TF) networks through which expression of MN genes is differentially regulated. TF networks in little skate MNs are much simpler than those in mouse MNs, suggesting a more fine-grained control of gene expression operates in mouse MNs. These findings suggest conserved and divergent mechanisms controlling MN development system of vertebrates during evolution and the contribution of intricate gene regulatory networks in the emergence of sophisticated motor system in tetrapods.


38
The little skate Leucoraja erinacea is a marine species of jawed vertebrate which shared a 39 common ancestor with tetrapods 473 MYA 1 . Yet, it displays a walking-like behavior which resembles  Putative CTCF motifs are predicted (described in materials and methods) and indicated by red bars and 101 arrowheads and the ones conserved by purple bars and arrowheads. The exons in the genome is denoted 102 6 by orange bars. The sequence identities of alignment blocks are indicated by the intensity of yellow 103 color.

104
A scaffold N50 of 61.3Mb of our little skate genome assembly was comparable to a recently 105 published high-quality thorny skate genome assembly 13 (62.1Mb; Fig. 1B). The number of protein-106 coding genes (17,230) and transcripts (17,952) predicted in the new genome assembly was similar to 107 the number in the thorny skate genome (Fig. 1B). As a qualitative assessment of our genome assembly, 108 we compared an existing sequence of the HoxA cluster constructed using a BAC clone 21 (FJ944024), to 109 our assembled genome. Coding sequences in our assembled HoxA cluster aligned well with the fully 110 sequenced BAC clone (Fig. S5), highlighting the reliability of the new assembly. We observed complete 111 loss of HoxC cluster in the little skate genome which was also shown in the thorny skate genome 6,13,22 112 and relatively high repeat contents nearby genomic regions in the scaffold41 (s41), which contains the 113 largest number of the HoxC neighbor genes (Fig. S6). Although a residual fragment of HoxC cluster 114 was reported in the shark genomes 17 , cloudy cat shark, bamboo shark and whale shark genomes are not 115 visualized here due to an absence of gene annotation files. 116 Comparing the new little skate genome with the one of thorny skate, we found high level of 117 conservation between the species in terms of BUSCO gene contents (Fig. 1C). Interestingly, we also 118 observed that the BUSCO gene synteny of little skate showed more similar patterns with chicken and 119 mammals than with zebrafish ( Fig. 1C). In addition to gene contents, we also compared putative CTCF 120 binding sites, which are known to regulate expression of HoxA genes during development 23,24 (Fig. 1D). 121 In tetrapods, HoxA genes are expressed in spinal cord MNs and specify their subtype identities 7, 25, 26 , 122 and similarly in little skate, HoxA genes are expressed in spinal cord MN subtypes 4 . Among previously 123 reported CTCF sites in mouse 24 , one located between Hoxa7 and Hoxa9, and between Hoxa10 and 124 Hoxa11 were conserved across diverse species and were also observed in little skate HoxA cluster 125 despite the phylogenetic distance (Fig. 1D). On the other hand, the CTCF binding sites near Hoxa13 126 upstream and between Hoxa5 and Hoxa6 which are conserved in other species were not found in   Overall, we observed a greater number of mouse MN genes highly expressed in pec-MNs (pec-MN vs. 158 tail-SC fold change (FC) ≥ 2; 17 genes) compared with tail-SC (FC ≥ 2; 9 genes; Fig. 2B). On the other 159 hand, a greater number of interneuron markers was observed in tail-SC (FC ≥ 2; 29 genes), which is 160 composed predominantly of interneuron cell-types, than in pec-MNs (FC ≥ 2; 8 genes).

161
The evolution of genetic programs in MNs was tested by unbiasedly comparing highly 162 expressed genes in pec-MNs of skate with the ones from MNs of mouse and chicken, two well-studied 163 tetrapod species. Genes highly expressed by MNs in mouse were identified by comparing the previously 164 reported transcriptome of mouse MNs at embryonic day I 13.5 to sensory neurons (SNs) at e14.5 28 .

9
Chick embryonic transcriptome data was generated, and the genes highly expressed in chick MNs were 166 identified by comparing the transcriptome of brachial MNs (br-MNs) with that of brachial spinal cord 167 at HH25 29 . Comparing the DEGs, MN genes specifically lost in chick and skate were investigated using 168 the mouse genome as the reference. Several MN genes were specifically lost in skate, while no MN 169 genes were found to be lost in chick (Table. S1). Moreover, comparison of orthologous MN DEGs 170 revealed shared and species-specific expression (Fig. 3A). Among the shared DEGs, 40, 23, and 62 171 genes were identified in the skate and mouse pair, skate-chick pair and mouse-chick pair, respectively.

172
The 5 genes that were commonly found in all mouse, skate and chick MN DEGs include Foxp1, Ret,173 Slc5a7, Dlgap2 and Cacng2. Interestingly, the majority of the MN DEGs was found in a species-specific 174 manner (Fig. 3A); 191, 1,734 and 413 DEGs were exclusively found in skate, mouse and chick MNs, 175 respectively. Gene ontology (GO) analysis revealed that neurodevelopment-related terms constituted 176 large proportions among the shared DEGs (Fig. 3B). Among the species-specific MN DEGs, the GO 177 terms associated with neurodevelopment were also identified along with many terms related to cell 178 morphogenesis and transport. Species-specific DEGs in mouse MNs and chick br-MNs include more 179 terms related to neurodevelopment than skate-specific MN DEGs (Fig. 3C).   The genome-wide distribution analysis of ACRs showed that many ACRs were found in intron 204 and intergenic regions for all ACR types (intron, 35.0%; intergenic region, 45.1%; Fig. 4B); however, 205 normalizing each region by its length, the density of ACRs was found to be highest in promoters and proportion of the genome. After observing that the ACR density is higher near the TSS, the correlation 210 between chromatin accessibility in the promoter region and the gene expression level was investigated.

211
As a result, we found that the genes with greater depth of ATAC-seq in their promoter region are 212 generally expressed at higher levels than those genes with closed chromatin form in both fin-MNs and 213 tail-SC (Fig. S9), indicating that ACRs in the promoter region are likely to be associated with gene 214 activation.

215
Using the ACR data, enrichment test of putative TF binding sites was performed for pec-MNs 216 and tail-SC DEGs to reveal potential regulatory mechanisms (Fig. 4D). Overall, 18, 3 and 35 TFs were    for post-genome analysis. Therefore, improving the quality of the little skate reference genome is a first 294 step towards a more complete analysis of gene regulatory networks. To improve the quality of the little 295 skate genome, whole genome sequencing data including PacBio long reads and Illumina short reads 296 were generated. Even though this study used fewer resources relative to that of more recent sequencing 297 pipelines, which use multiple sequencing technologies including chromosomal conformation capture 298 and physical maps 13 , this study could assemble a reference genome which is highly contiguous and 299 offers reliable coding and regulatory region data. This was accomplished by using the assembly of a 300 closely-related species-thorny skate-as a reference during scaffolding process. However, it is 301 important to note that the contig N50 of this little skate genome is still lower than that of the thorny 302 skate assembly. Also, the higher number of scaffolds and the presence of little skate contigs which had 303 not been localized to pseudo-chromosomes of thorny skate may indicate that little skate and thorny 304 skate genomes contain a few structural disagreements.

305
Despite this limitation, the new reference genome allowed for a more reliable RNA-seq 306 analysis. Through re-analyzing the previously generated RNA-seq data 4 , we found genes which are 307 consistent with the previous immunohistochemical analyses 4 . In addition, we also observed that the 308 expression patterns of MN and interneuron markers in mouse 27 are highly similar in skate, supporting 309 previous conclusions. A greater number of DEGs were found through re-analysis of RNA-seq data using 310 the new reference genome (Fig. 2), which reflects the limitation of the previous analyses using the 311 zebrafish transcriptome. Aligning the new gene annotation data against the previous de novo 312 transcriptome, and investigating the distribution of alignment length, we also found that the previous 313 de novo transcriptome contains many fragmented genes (Fig. S4A). The previous transcripts generally 314 cover only a small fraction (<20%) of the transcripts predicted in our genome assembly while the new 315 transcripts cover close to 100% of the previous transcriptome. An example case is illustrated in the 316 Foxp1 gene, a well-known limb/fin MN fate determinant (Fig. S4B). This improved DEG analysis 317 therefore offers a more complete view of gene expression profiles of MNs in little skate.  (Fig. 5C).

325
During the evolution of motor behaviors, a more complex nervous system likely emerged to 326 control more sophisticated limb movements, such as hand dexterity. Little skate displays walking-like 327 behavior using only around 10 muscles in the pelvic fin, while mammals can use up to 50 muscles in 328 the limb. How the system evolved to generate a more intricate nervous system remains an open question.

329
One mechanism to achieve this might be to increase the number of regulatory modules that control gene ACRs as in non-neuronal genes as opposed to mouse with significantly increased neuronal gene ACRs 335 (Fig. S10). While birds display very complex behaviors, its genome contains short intergenic regions 36 .

336
In our study, we propose another way of achieving complex nervous system, through more intricate TF 337 networks. As illustrated in the Fig. 5C, a greater number of shared TFs binds to their downstream TFs 338 in mouse MNs than in skate pec-MNs, allowing an intricate control of gene expression and thus the 339 more complex nervous system of mouse compared to that of skate even though the hypothesis remains 340 to be validated in additional organisms.

341
In summary, the comparative analysis of the current study discovered evidence of species-342 specific changes apart from the conservation. Divergent expression of MN genes (Fig. 3A)     RNA-seq data processing and differential gene expression analysis 463 Quality of RNA-seq raw reads were checked with FastQC (v. 0.11.9). Next, the raw data was   To account for different gene length in multiple species, RPKM normalization was used. P-value of 478 0.05 and the fold change of 1.5 were used to identify MN DEGs in the multiple species RNA-seq data.

479
To control for MN expression, tail-SC in skate was used to calculate Z-score in skate while mouse SNs 480 and chick br-SC were used for mouse and chick, respectively. The visualization of the Z-score 481 transformed gene expression was done via R "ComplexHeatmap" package. The functional enrichment 482 of the candidate genes was investigated using biological process terms of Panther 60 . 483 24 484 ATAC-seq data processing and differential accessibility analysis 485 For chromatin accessibility assay, ATAC-seq data was used. Similar to RNA-seq data, the 486 quality of the raw sequence data was investigated with FastQC (v. 0.11.9) followed by trimming of 487 sequences with low quality and NexteraPE adapter sequences using trimmomatic-0.39 55 with following analysis was performed with "annotatePeaks.pl" of HOMER software 62 using the collection of homer 508 vertebrate TFs and the binding motifs of Hox gene families available in HOCOMOCO database v11 63 .