A single-cell transcriptional timelapse of mouse embryonic development, from gastrula to pup

The house mouse, Mus musculus, is an exceptional model system, combining genetic tractability with close homology to human biology. Gestation in mouse development lasts just under three weeks, a period during which its genome orchestrates the astonishing transformation of a single cell zygote into a free-living pup composed of >500 million cells. Towards a global framework for exploring mammalian development, we applied single cell combinatorial indexing (sci-*) to profile the transcriptional states of 12.4 million nuclei from 83 precisely staged embryos spanning late gastrulation (embryonic day 8 or E8) to birth (postnatal day 0 or P0), with 2-hr temporal resolution during somitogenesis, 6-hr resolution through to birth, and 20-min resolution during the immediate postpartum period. From these data (E8 to P0), we annotate dozens of trajectories and hundreds of cell types and perform deeper analyses of the unfolding of the posterior embryo during somitogenesis as well as the ontogenesis of the kidney, mesenchyme, retina, and early neurons. Finally, we leverage the depth and temporal resolution of these whole embryo snapshots, together with other published data, to construct and curate a rooted tree of cell type relationships that spans mouse development from zygote to pup. Throughout this tree, we systematically nominate sets of transcription factors (TFs) and other genes as candidate drivers of the in vivo differentiation of hundreds of mammalian cell types. Remarkably, the most dramatic shifts in transcriptional state are observed in a restricted set of cell types in the hours immediately following birth, and presumably underlie the massive changes in physiology that must accompany the successful transition of a placental mammal to extrauterine life.

. Higher quality sci-RNA-seq3 data as generated by an optimized protocol. a, Histograms of log 2 (UMI count) per single nucleus for each of 15 sci-RNA-seq3 experiments. For the 14 newly performed experiments (run_13 to run_26), upper (blue line) and lower (red line) thresholds used for quality filtering correspond to the mean plus 2 standard deviations and mean minus 1 standard deviation of log2-scaled values, respectively, after excluding cells with >85% of reads mapping to exonic regions (except for the lower bound of 500, which was manually assigned for run_25), are shown with vertical lines. The data of run_4, which was reported previously 11 , was subjected to the same thresholds used in the original study, i.e. the mean +/-2 standard deviations of log2-scaled values (blue and red vertical lines, respectively), after excluding cells with >85% of reads mapping to exonic regions. Run_23_A & B were from the same sci-RNA-seq3 experiment, but with nuclei which were sequenced separately. b, Although most of the embryos from the same approximate stage (e.g. E14.0-E14.75) were included in the same sci-RNA-seq3 experiment (Supplementary Table 1), we profiled extra nuclei in some experiments for a handful of timepoints to ensure sufficient coverage. Here we sought to leverage those instances to check for potential batch effects across experiments. For this, on the embedding learned from all of the data, we asked whether these cells' profiles are more similar to cells from the same experiment or, alternatively, cells from the same time window. Left: for a random subset of cells from E14.75 which were profiled in experiment run_22 (primarily E17.0-E17.75), we performed a k-nearest neighbors (kNN, k = 10) approach in the global 3D UMAP to find the nearest neighboring cells either from the same experiment (red) or the same time window (E14.0-E14.75) but different experiment (blue). The percentages of the nearest neighboring cells from the two groups for individual cells are presented in the histogram. Right: for a random subset of cells from E13.5 & E13.75 which were profiled in experiment run_19 (primarily E10.5-E11), we performed a k-nearest neighbors (kNN, k = 10) approach in the global 3D UMAP to find the nearest neighboring cells either from the same experiment (red) or the same time window (E13.0-E13.75) but a different experiment (blue). The percentages of the nearest neighboring cells from the two groups for individual cells are presented in the histogram. In both examples, we observe that nearest neighbors are overwhelmingly cells from a different experiment (but the same time window), rather than cells from the same experiment (but a different time window).

48
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023. ; . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023. ;

Supplementary Figure 4. Cell type annotations.
For each of the 26 major cell clusters, we performed subclustering and then annotated each of 190 subclusters using at least two literature-nominated marker genes per cell type label (Supplementary Table 5).

50
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023.

51
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023. ; Supplementary Figure 6. Transcriptional heterogeneity in the posterior embryo during the early somitogenesis. a, The same UMAP as in Fig. 2h, colored by gene expression of marker genes which appear specific to the subpopulation of notochord cluster that is Noto+, including posterior Hox genes (Hoxc6, Hoxc8, Hoxa10), and genes involved in Notch signaling (Hes7), Wnt signaling (Wnt3) and mesodermal differentiation (Tbx6). b, Cell proportions falling into the ciliated nodal cell cluster for embryos with different somite counts. c, The same UMAP as in Fig. 2h, colored by gene expression of marker genes which appear specific to the subpopulation of the notochord Noto-and more strongly Shh+, including Sox10, Bmp3, Nrg1, and Erbb4. d, The same UMAP as in Fig. 2j, colored by gene expression of marker genes which appear specific to the posterior gut endoderm, including T, Hoxa7, Hoxb8, Hoxd13, and Hoxc9.

52
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023. ; Supplementary Figure 7. Transcriptional heterogeneity in renal development. a, The same UMAP as in Fig.  3a, colored by expression of marker genes which appear specific to anterior intermediate mesoderm (Pax2+,Pax8+,Sim1+,Lhx1+,Ret+), posterior intermediate mesoderm (Pax2+,Pax8+,Gdnf1+,Wt1+,Osr1+,Hoxc6+), ureteric bud (Ret+, Wnt11+) or metanephric mesenchyme (Wt1+, Six2+, Eya1+). References for marker genes are provided in Supplementary Table 5. b, The predicted absolute number (log2 scale) of cells of each renal cell type at each timepoint. The predicted absolute number was calculated by the product of its sampling fraction in the overall embryo and the predicted total number of cells in the whole embryo at the corresponding timepoint (Fig. 1e). For each row, the first timepoint with at least 10 cells assigned that cell type annotation is labeled, and all observations prior to that timepoint are discarded. c, The same UMAP as in Fig. 3a, colored by expression of marker genes which appear specific to podocytes (Nphs1+, Nphs2+), proximal tubule cells (Slc27a2+, Lrp2+), ascending loop of Henle (Umod+, Slc12a1+), distal convoluted tubule (Slc12a3+, Pvalb+), collecting duct intercalated cells (Atp6v1g3+, Atp6v0d2+) or collecting duct principal cells (Aqp2+, Hsd11b2+). References for marker genes are provided in Supplementary Table 5. d, The same UMAP as Fig. 3a is shown three times, with colors highlighting cells from before E18.75 (left), E18.75 (middle), or P0 (right). Dotted cycles highlight cells which appear to correspond to the proximal tubule. e, The same UMAP as in Fig. 3a, colored by expression of marker genes which appear specific to the ureteric bud tip (Wnt11+, Ret+, Etv5+) or stalk (Wnt7b+, Tacstd2+) 54 . Ureteric bud tip and stalk are highlighted by blue and red circles, respectively. f, The same UMAP as in Fig. 3a, colored by expression of marker genes which appear specific to connecting tubule cells (Aqp2+, Aqp3+, Aqp4-) or collecting duct cells (Aqp2+, Aqp3+, Aqp4+, Aqp5+) 64 .

Supplementary Figure 8. Transcriptional heterogeneity in mesenchyme. The same UMAP as in
The copyright holder for this preprint (which this version posted April 5, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 Supplementary Figure 9. The emergence of mesenchymal subtypes from the patterned mesoderm. a, The predicted absolute number (log2 scale) of cells of each mesoderm cell type at each somite count. The predicted absolute number was calculated by the product of its sampling fraction in the overall embryo and the predicted total number of cells in the whole embryo at the corresponding timepoint. Because cell numbers were only predicted for the broader bins (Fig. 1e), rather than individual somite counts, these were used for roughly corresponding sets (0-12 somite stage: E8.5; 14-15 somite stage: E8.75; 16-18 somite stage: E9.0; 20-23 somite stage: E9.25; 24-26 somite stage: E9.5; 27-31 somite stage: E9.75; 32-34 somite stage: E10.0). For each row, the first somite count with at least 10 cells assigned that cell type annotation is labeled, and all observations prior to that somite count are discarded. b, Re-embedded 2D UMAP of 110,753 cells from the selected cell types of mesoderm (clusters 1-12 as listed in panel a) from 5-20 somite stage embryos. c, The same UMAP as in panel b, but with inferred progenitor cells colored by derivative cell type with the highest mutual nearest neighbors (MNN) pairing score. d, Normalized MNN pairing score between mesodermal territories (column) and their inferred derivative cell types (row) from 5-20 somite stage embryos. The selected cell populations are first embedded into 30 dimensional PCA space, and then for individual derivative cell types, MNN pairs (k = 10 used for k-NN) between their earliest 500 cells (in absolute time) and cells from mesodermal territories are identified. e, Re-embedded 2D UMAP of 275,000 cells from the selected cell types of mesoderm (clusters 1-12 as listed in panel a) from 26-34 somite stage embryos. f, The same UMAP as in panel e, but with inferred progenitor cells colored by derivative cell type with the highest MNN pairing score. g, Normalized MNN pairing score between mesodermal territories (column) and their inferred derivative cell types (row) from 26-34 somite stage embryos. The selected cell populations are first embedded into 30 dimensional PCA space, and then for individual derivative cell types, MNN pairs (k = 10 used for k-NN) between their earliest 500 cells (in absolute time) and cells from mesodermal territories are identified. Figure 11. Marker gene expression for different neuroectodermal territories. The same UMAP as in Fig. 5a, colored by gene expression of marker genes for each neuroectodermal territory. References for marker genes are provided in Supplementary Table 5. 57 . CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Supplementary
The copyright holder for this preprint (which this version posted April 5, 2023. ;https://doi.org/10.1101https://doi.org/10. /2023 Supplementary Figure 17. Rapid shifts in transcriptional state occur in a restricted subset of cell types upon birth, and differ between vaginally and C-section delivered pups. a, The number of nuclei profiled for each animal shown in Fig. 7c. A small number of nuclei from additional fetal samples from the original set of experiments were also profiled for quality control (samples 10-12). b, 2D UMAP visualization of the birth-series dataset (n = 962,697 cells). Colors correspond to 26 major cell cluster annotations (Fig. 1e). Two major cell clusters (the primitive erythroid and testis & adrenal major cell clusters) shown in the original dataset but missed here are highlighted in gray. Primitive erythroid cells are not present at these timepoints and testis & adrenal cells are collapsed to the epithelial cells major cell cluster due to their low numbers. c, Re-embedded 2D UMAP of 19,696 cells of the adipocyte major cell cluster. d, Re-embedded 2D UMAP of 7,986 cells of the lung & airway major cell cluster. e, For these three major cell clusters, we co-embedded cells from three vaginally delivered pups (samples 1-3 in Fig. 7c) and six pups delivered by C-section (samples 4-9 in Fig. 7c), followed by subsetting a uniform number of cells per sample. For cells from each of the three vaginally delivered pups, we calculated the number of their 10 nearest neighbors in the PCA embedding (n = 30 dimensions) from other samples. f, Re-embedded 2D UMAP of cells from these three major cell clusters, based on cells from three vaginally delivered pups and six pups delivered by C-section. For each row, the same UMAP is shown multiple times, with colors highlighting cells from individual pups (or two pups, in the case of the 0-min C-section timepoint).

63
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023. ; https://doi.org/10.1101/2023.04.05.535726 doi: bioRxiv preprint Supplementary Figure 18. Quantitatively estimating cell number for individual mouse embryos as a function of developmental stage. a, Based on the experimentally estimated cell numbers of the 12 embryos (ranging from E8.5 to P0), we applied polynomial regression (degree = 3) to fix a curve across embryos between the embryonic day and log2-scaled cell number. P0 was treated as E19.5 in the model. b, The estimated "doubling time" of the total cell number in a whole mouse embryo are plotted as a function of timepoints. The timepoints with the longest (E17.0) and shortest (E8.5) estimated "doubling times" are highlighted.

64
. CC-BY 4.0 International license available under a was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which this version posted April 5, 2023. ; https://doi.org/10.1101/2023.04.05.535726 doi: bioRxiv preprint