Gene expression patterns in chordate embryonic development suggest partial applicability of Haeckel’s postulates

The relationship between embryonic development and evolution historically investigated based on embryo morphology, could now be reassessed using mRNA expression endophenotype. Here, we investigated the applicability of von Baer’s and Haeckel’s arguments at mRNA expression level by comparing the developmental changes among nine evolutionarily distinct species: from oyster to mouse. In agreement with models based on von Baer’s postulates, up to 36% of mRNA expression indicated nearly linear conservation of species’ developmental programs. By contrast, 5-15% of developmental expression profiles, enriched in neural genes, displayed an alignment pattern compatible with the terminal edition paradigm proposed by Haeckel. Thus, the development-evolution relationship based on mRNA expression agrees with early concepts based on embryo morphology and demonstrates that the corresponding patterns coexist in chordate development.


Introduction 37
The general relationship between ontogeny and phylogeny, or development and pairwise alignment, PM genes did not align uniformly to all shortened mouse 141 developmental intervals but tended to peak at a particular fragment. For instance, 142 oyster PM genes aligned best to the mouse developmental fragments ending at stages 143 3-4, while amphioxus PM genes -to the fragments ending at stages 5-6 ( Figure 2B,C 144 and Figure 2C-figure supplement 7). Overall, PM genes in species evolutionarily 145 8 proximal to the mouse tended to align to increasingly longer truncated mouse 146 developmental sets ( Figure 2D). This phylogenetic ordering of ontogenetic patterns 147 was not caused by the alignment procedure, which was not biased to any 148 developmental stage, and robust to the use of evolutionarily old genes present in each 149 of the nine investigated species (Supplementary file 3: Figure S1). At the same time, 150 this phenomenon matches the phylogeny-ontogeny relationship among multiple 151 species proposed by Haeckel, even though we did not include this aspect in PM model 152 formulation. By contrast, repeating the alignment procedure using species with more 153 ancient body plans, such as amphioxus or ciona, instead of the mouse, did not yield a 154 consistent significant excess of genes following PM predictions (Supplementary file 4: 155 Figure S2). 156

Visualization of developmental expression patterns 175
We next investigated developmental expression trajectories of the 244 orthologous 176 genes classified as development-related in all nine species using the nearly linear, 177 species' alignment ( Figure 3A). These genes were grouped into six clusters based on 178 their developmental profiles in the unsupervised clustering analysis ( Figure 3B). 179 Remarkably, 66% of the genes fell within a single cluster representing a decreasing 180 expression pattern conserved across all nine species (CL9.1) ( Figure 3C). By contrast, 181 the expression trajectories in the other clusters differed more among species, with the 182 extent of differences being directly proportional to corresponding phylogenetic 183 distances ( Figure 3E and Figure 3E-figure supplement 10). Accordingly, CL9.1 genes 184 substantially overlapped with CM genes (one-sided Fisher's exact test, odds ratio = 3, 185 p < 0.0001) and were shown in the same biological processes as CM genes, including 186 spliceosome and RNA processing (Supplementary file 7: Table S4).

10
The same analysis conducted using 2,038 development-related orthologs shared 188 among six vertebrate species revealed a CL9.1-like developmental pattern represented 189 by a single cluster (CL6.1) ( Figure 3D). Genes in clusters CL9.1 and CL6.1 190 overlapped significantly and were enriched in similar GO terms ( Figure 3F and 191 Supplementary file 8:   Table S1). To quantify gene expression, we mapped RNA-seq 264 reads to the corresponding genome (Supplementary file 1: Table S1) using Tophat 265 (v2), allowing up to three mismatches and indels, except for ciona (Ci). In the case of 266 ciona, we mapped reads to the genome with up to five mismatches and indels as the 267 ciona RNA-seq data and the genome data have slightly lower quality compared to the 268

rest. 269
To define the expressed genes, we required the maximal expression across all 270 development stages exceeding 1 FPKM. For each stage, we calculated expression as 271 the mean of expression of the replicated samples. More than 70% of coding genes 272 annotated for a given species were reliably detected, except for amphioxus (41%) 273 (Supplementary file 10: Table S6). 274

Identification of development-related genes 275
We defined developmental alterations of gene expression levels using polynomial For each gene, we chose the best regression model with the developmental stage (by 278 15 rank) as predictor and expression level as a response with Benjamini-Hochberg 279 corrected p < 0.05. The genes that fit a significant regression model were termed as 280 development-related genes ( Figure 1D -Source data 1). 281

Identification of stage-associated genes 282
We defined genes preferentially expressed at a particular developmental stage as 283 stage-associated genes in each species following the method described in ( Table  289 S2). 290

Alignment of mouse developmental stage sets to developmental profiles of other species 291
To assess the transcriptome similarity in the course of developmental stages between 292 mouse (Mm) and other species, we used stage-associated and 1:1 orthologous genes 293 to align the developmental stages between mouse and other species followed by the Paring score = -log10 (Bonferroni corrected P-values) 300 The paring score was used to quantify the significance of the overlap on each pairwise 301 stage comparison between the mouse and another species. From the paring score, we 302 identified the relationship between mouse and other species ( Figure 1E). To check the 303 stability of this relationship, we repeated the comparison 500 times with randomly 304 assigned stage-associated genes for each stage of each species using the same 305 procedure. 306 307 According to the above scheme, we further assigned the corresponding stage 308 alignment between mouse and other species using the Needleman-Wunsch algorithm 309 with the gap penalty equal to one. To estimate the stability of alignment based on the 310 mean of the replicates, we randomly chose one individual per stage, aligned two 311 species 500 times, and calculated the frequency for each pair of alignment. The 312 resulting frequency was presented by the thickness of the line in Figure 3A.

Overlap of CM/PM genes between species 365
To calculate the significance of the overlap of CM or PM genes between species, we 366 sampled the same number of CM or PM genes in each species from all 367 development-related genes 1,000 times and recalculated the number of overlapping 368 genes to obtain the empirical distribution. The overlap significance p-value was 369 calculated as the proportion of the values, which were as larger or greater than the 370 actual overlapping gene number. Given all pairs of species involved, a 371 Bonferroni-corrected p-value of less than 0.05 was used as the cutoff of the overlap 372 significance (Supplementary file 5: Table S3). 373

Amino acid conservation of CM/PM gene 374
We compared the evolutionary conservation between proteins encoded by CM and 375 PM genes using the Ka/Ks metric (Yang & Nielsen, 2000). Since in this study, we are 376 not studying Ka/Ks variations across the evolutionary branches -we focused on the 377 comparisons of PM versus CM genes within each individual species, we conducted 378 this analysis mainly in a pairwise manner between a given species and mouse. Firstly, 379 20 we extracted the coding sequences of each gene based on the corresponding 380 annotation information and then translated the sequences into amino acid sequences 381 using the function "translate" implemented in the Bioconductor package "Biotrings". 382 The longest protein sequence was selected as the gene protein sequence. Next, we 383 performed protein sequence alignments between non-mouse species and the mouse 384 using the function "pairwiseAlignment" in "Biostrings", with the scoring matrix set as 385 sequences mentioned above convert to nucleotide levels. We next utilized the 393 CODEML application in PAML to calculate the Ka/Ks for given cross-species 394 ortholog. The significance of the difference between PM and CM genes in each 395 species was assessed using one-sided Wilcoxon rank-sum test ( Figure 2G-figure  396 supplement 9). 397

Gene expression interpretation to mouse developmental stage 398
To compare the similarity of the expression profiles across developmental stages of 399 nine species, we used the predicted developmental stage alignment presented in Figure  400 21 3A to create a unified alignment of eight species to the mouse developmental curves. 401 To do so, we mapped 33 stages cumulatively interpolated from eight species to the full 402 mouse developmental curve fitted using cubic smoothing spline with ten degrees of 403 freedom. We then compared gene expression curves among nine species based on 404 z-transformed expression of each gene interpolated at these 33 stage points. 405

Clustering of gene expression profiles in six vertebrate and nine chordate species 406
To investigate the expression pattern diversity in nine or six species, we used 407 hierarchical clustering (hclust function in R) of z-transformed gene expression 408 trajectories aligned among species with (1 -rho) as the distance measure, where rho is 409 the Spearman correlation coefficient. We chose k equal six, as optimal, based on visual 410 inspection of clusters obtained using different k values. 411

Functional annotation of developmental expression patterns 412
To check the functions of genes in each cluster, we applied Gene Ontology (GO) and 413 Benjamini-Hochberg correction on "molecular function", "biological process", and 423 "cellular component" was applied independently. GO terms with BH corrected p < 424 0.05 were reported. We applied no multiple test correction to chordate' gene cluster 425 enrichment analysis due to low statistical power of the dataset and used a more 426 stringent nominal p-value cutoff of p < 0.001. 427 For the pathway enrichment test, the reference genes were the same as for 428 GO-based analysis. We used hypergeometric test (phyper in R) to assess the 429 enrichment in each KEGG pathway. Bonferroni correction was applied for genes in 430 vertebrate clusters. Pathways with corrected p < 0.05 were reported as significantly 431 enriched. No correction was applied to genes from chordate clusters due to low 432 statistical power, and the nominal pathway enrichment p-value was set to p < 0.05. This 433 relaxed cutoff was used to assess the potential overlap between enriched functions 434 found in vertebrate and chordate clusters (Supplementary file 8: Table S5). 435
s ta g e 2 s ta g e 5 s ta g e 9 s ta g e 1 1 s ta g e 1 3 s ta g e 1 7 s ta g e 1 9 s ta g e 2 1 s ta g e 2 3 s ta g e 2 6 s ta g e 2 8 s ta g e 3 1 s ta g e 3 7 s ta g e 4 3 s ta g e 4 8 s ta g e 6 1 s ta g e 6 6 s ta g e 2 s ta g e 5 s ta g e 9 s ta g e 1 1 s ta g e 1 3 s ta g e 1 7 s ta g e 1 9 s ta g e 2 1 s ta g e 2 3 s ta g e 2 6 s ta g e 2 8 s ta g e 3 1 s ta g e 3 7 s ta g e 4 3 s ta g e 4 8 s ta g e 6 1 s ta g e 6 6 2 c e ll 6 c e ll m o ru la b la s to c y s t E 7 .5 E 8 .5 E 9 .0 E 9 .5 E 1 0 .5 E 1 1 .5 E 1 2 .5 E 1 3 .5 E 1 4 .5 E 1 5 .5 E 1 6 .5 E 1 7 .5 E 1 8 .5     Figure TK5  TK7  TK9  TK11  TK13  TK14  TK15  TK17  TK21  TK23  TK25   Significance of the difference between the two distributions was assessed using 718 one-sided Wilcoxon rank-sum test. ***, p < 0.0005, **, p < 0.005, *, p < 0.05, +, p < 719 0.5, #, p > 0.5. In contrast to results displayed in Figure 2G and Figure 2G - figure  720 supplement 8, the Ka/Ks ratios here were estimated using multiple species alignment. 721 Numbers within bars show the number of housekeeping genes in each group. The 777 significance of the differences between gene groups was assessed using one-sided 778 Fisher's exact test. ***, p < 0.0005, **, p < 0.005, *, p < 0.05, +, p < 0.5.