Abstract
Human limbs emerge during the fourth post-conception week as mesenchymal buds which develop into fully-formed limbs over the subsequent months. Limb development is orchestrated by numerous temporally and spatially restricted gene expression programmes, making congenital alterations in phenotype common. Decades of work with model organisms has outlined the fundamental processes underlying vertebrate limb development, but an in-depth characterisation of this process in humans has yet to be performed. Here we detail the development of the human embryonic limb across space and time, using both single-cell and spatial transcriptomics. We demonstrate extensive diversification of cells, progressing from a restricted number of multipotent progenitors to myriad mature cell states, and identify several novel cell populations, including perineural fibroblasts and multiple distinct mesenchymal states. We uncover two waves of human muscle development, each characterised by different cell states regulated by separate gene expression programmes. We identify musculin (MSC) as a key transcriptional repressor maintaining muscle stem cell identity and validate this by performing MSC knock down in human embryonic myoblasts, which results in significant upregulation of late myogenic genes. Spatially mapping the cell types of the limb across a range of gestational ages demonstrates a clear anatomical segregation between genes linked to brachydactyly and polysyndactyly, and uncovers two transcriptionally and spatially distinct populations of the progress zone, which we term “outer” and “transitional” layers. The latter exhibits a transcriptomic profile similar to that of the chondrocyte lineage, but lacking the key chondrogenic transcription factors SOX5,6 & 9. Finally, we perform scRNA-seq on murine embryonic limbs to facilitate cross-species developmental comparison at single-cell resolution, finding substantial homology between the two species.
Introduction
Human limb buds emerge by the end of the 4th post conceptional week (PCW4) and develop to form arms and legs during the first trimester. By studying model organisms such as the mouse and chick, it is known that development of the limb bud begins in the form of two major components. The multipotent parietal lateral plate mesodermal (LPM) cells condense into the skeletal system and skeletal muscle progenitor (SkMP) cells migrate from the paraxial mesoderm to the limb field, forming muscle1,2. These multipotent progenitors are encapsulated within a layer of ectoderm, a subset of which (termed the apical ectodermal ridge/AER) governs mesenchymal cell proliferation and aids in the establishment of the limb axes through fibroblast growth factor (FGF) signalling3. As the limb bud elongates, these AER signals no longer diffuse to the most proximal mesenchymal cells and differentiation begins4. Throughout the remainder of the first trimester in humans, the limbs continue to mature in a proximal-distal manner, such that by PCW8 the anatomies of the stylopod, zeugopod and autopod are firmly established. This maturation is tightly controlled by a complex system of temporally and spatially restricted gene expression programmes5–7. As with any complex system, small perturbations in even a single programme can result in profound changes to the structure and function of the limb8. Indeed, approximately 1 in 500 humans are born with congenital limb malformations9,10.
Although model organisms have provided key insights into cell fates and morphogenesis that are translatable to human development and disease, at present it remains unclear how precisely these models recapitulate human development. Furthermore, the lack of complementary spatial information in such studies precludes the assembly of a comprehensive tissue catalogue that provides a global view of human limb development in space and time. Encouragingly, the Human Developmental Cell Atlas community has recently applied cell-atlasing technologies such as single cell and spatial transcriptomics to several tissues to give novel insights into development and disease11–15. The application of these techniques to human embryonic and fetal tissue therefore holds much promise in furthering our understanding of the developing human limb16,17.
In this study, we performed droplet-based single-cell transcriptomic sequencing (scRNA-seq) and spatial transcriptomic sequencing to reconstruct an integrated landscape of the human hindlimb during first trimester development. We then performed scRNA-seq on murine embryonic limbs in order to compare the process of limb development across species at this level of resolution. Our results detail the development of the human limb in space and time at high resolution and genomic breadth, identifying fifty-five cell types from 114,000 captured single cells, and spatially mapping these across four timepoints to shed new light on the dynamic process of limb maturation. In addition, our spatial transcriptomics data gives insights into the key patterning and morphogenic pathways in the nascent and maturing limb, with a focus on genes associated with limb malformation.
Finally, our integrated analysis of human and murine limb development across corresponding time periods reveals extensive homology between a classical model organism and the human, underlining the importance and utility of such models in understanding human disease and development. Our study provides a unique resource for the developmental biology community, and can be freely accessed at https://limb-dev.cellgeni.sanger.ac.uk/.
Results
Cellular heterogeneity of the developing limb in space and time
To track the contribution of the different lineages in the developing limb, we collected single-cell embryonic limb profiles from PCW5 to PCW9 (Fig. 1a). This time window covers the early limb bud-forming stages as well as later stages of limb maturation (Fig. 1a). In total, we analysed 114,047 single-cells that passed quality control filters (Extended Data Fig. 1a). After cell cycle expression module removal by regression, and batch correction (see Methods), we identified 55 cell types and states (Fig. 1b; see Methods; Extended Data Fig. 1b and Extended Data Table 1 for marker genes).
32 of these cell states represent cells derived from the LPM. They contain mesenchymal, chondrocyte, osteoblast, fibroblast and smooth muscle cell states involved in the maturation of cartilage, bone and other connective tissues, consistent with previous investigations of the cellular makeup of the limb18. In addition to these LPM-derived cells, a further eight states form a complete lineage of muscle cells that migrate as PAX3+ progenitors from the somite. These go on to differentiate in the limb to form myoprogenitors and myotubes.
Other non-LPM cell states include four of the primitive and definitive erythrocytes, two types of myeloid cells, three types of vascular endothelial cells and three types of neural crest-derived cells. Finally, we identified three epithelial cell states, among which were the AER cells at the distal rim of the limb bud that express SP8 and WNT6 (Extended Data Fig. 2a,b). Examining the relative abundance of each of these cell states across different gestational ages revealed how the cellular landscape of the developing limb changes over time. Within each of the aforementioned lineages, a clear pattern emerged whereby progenitor states were chiefly isolated from PCW5 & 6, with more differentiated cell states emerging thereafter (Extended Data Fig. 2c,d).
To further dissect the cellular heterogeneity with spatial context, and to build on limb patterning principles established in model organisms, we performed spatial transcriptomic experiments for limb samples from PCW5 and PCW8. Using the 10X Genomics Visium chips, we were able to generate transcriptome profiles capturing on average between 1,000 and 5,000 genes per voxel (Extended Data Fig. 1c). We then applied the cell2location package19 to transfer cell state labels from our single-cell atlas to deconvolute Visium voxels (See Methods). The resulting cell composition map of Visium slides at each time point demarcated the tissue section into distinct histological regions (Fig. 1c, d). In the PCW5.6 samples, interestingly, a clear zonal segregation of progenitor cell types was observed, dividing the progress zone into two layers that we name the “outer” and “transitional” progress zones. The outer progress zone cells (PZC) are located at the distal periphery of the limb bud. Encased in it are the transitional PZC together with SOX9-expressing chondroblasts of the developing autopod (Fig. 1c). This novel spatial distinction was accompanied by subtle transcriptomic differences, with the outer progress zone specifically expressing a number of genes implicated in digit patterning, including LHX2 and TFAP2B. Mutations in the latter cause Char syndrome, a feature of which is postaxial polydactyly20. The transitional progress zone specifically expresses IRX1, a key gene in digit formation that establishes the boundary between chondrogenic and non-chondrogenic tissue21,22. By performing differential expression testing between outer and transitional PZC, we were able to characterise gene modules that define each population (Extended Data Fig. 3a). We then calculated the expression score for each of these modules among the other major lineages, revealing that the transitional PZC module was upregulated in the chondrocyte lineage, with no other clear upregulation of either module in any other lineage (Extended Data Fig. 3a, b). This suggests that these cells may be a transitional state between undifferentiated progress zone cells and committed chondroblasts. Indeed, the genes that define the transitional progress zone largely relate to skeletal system development (Extended Data Fig. 3c).
In addition to the PZC, prehypertrophic chondrocytes (PHC) expressing India Hedgehog (IHH) localised to the mid-diaphysis of the forming tibia and the metatarsals. At the proximal limit of the sample, both MEIS2-expressing proximal mesenchymal cells (PrMes) and CITED-1+ mesenchymal cells (Mes2) were observed, in keeping with the early stages of limb development (Fig. 1c).
For analysis of the PCW 8.1 sample, we developed an inline pipeline to align and merge multiple visium sections (see methods). This allowed us to analyse the entire lower limb as one structure (Fig. 1d; Extended Data Fig. 4). In the osteochondral lineage, articular chondrocytes were located at the articular surfaces of the developing knee, ankle, metatarso-phalangeal and interphalangeal joints, while osteoblasts closely matched to the mid-diaphyseal bone collar of the tibia and femur. The perichondrial cells from which they differentiate matched to comparable region, though they extended along the full length tibia and femur (Fig. 1d); a finding confirmed by immunofluorescence staining for RUNX2 and THBS2 (Extended Data Fig. 5a). Prehypertrophic chondrocytes matched again to the mid-diaphysis of the tibia and analysis of gene expression revealed some collagen-X expression in this region, in keeping with chondrocyte progression to hypertrophy (Fig.1d). Additionally, we were able to capture glial cells expressing myelin genes (Extended Data Fig. 1c), and an accompanying FOXS1-expressing fibroblast subtype (named “perineural fibroblast’ by us here) enriched in the periphery of the sciatic nerve in the posterior compartment of the thigh and its tibial division in the deep posterior compartment of the leg (Fig. 1d, Extended Data Fig. 5b-d). (We were not able to capture neurons in our single-cell data, most likely due to the distant location of their cell bodies within the spinal ganglia and anterior horn of the spinal cord.)
Interestingly, cell states with related (but not identical) transcriptomic profiles did not necessarily occupy the same location, which we are able to quantify based on our cell2location deconvolution analysis of Visium and scRNAseq data. This reveals that within the fibroblast lineage, three clusters were co-located with KRT15-expressing basal cells and SFN-expressing cells of the periderm23, suggesting a role in the dermal lineage and prompting their annotation as dermal fibroblasts (DermFiB) and their precursors (F10+DermFiBP & HOXC5+DermFiBP)(Extended Data Fig. 5e). A further fibroblast cluster expressing ADH1B (ADH+FiB) co-localised with muscle cells, with no equivalent population found in the dermal region (Extended Data Fig. 5f-h).
Similarly, we were able to spatially resolve two clusters with subtle transcriptomic differences within the tenocyte lineage. Both clusters expressed the classical tendon markers scleraxis (SCX) and tenomodulin (TNMD), with one population of cells expressing increased biglycan (BGN) and Keratocan (KERA); molecules which play a role in the organisation of the extracellular matrix, while the other population expressed higher levels of pro-glucagon (GCG) that is important in metabolism. Analysis with cell2location matched the former cluster to the long flexor tendons of the foot, as well as the hamstrings, quadriceps & patellar tendons around the knee joint. The latter cluster, however, matched to the perimysium, the sheath of connective tissue that surrounds a bundle of muscle fibres (Fig. 1d; Extended Data Fig. 5i,j). We therefore annotated these clusters as tenocyte (Teno) and perimysium, respectively.
Overall, these findings provide new insights into the subtle transcriptomic differences within cell compartments including the muscle, tendon, bone and stromal lineages. This integrated analysis serves as an example of how spatial transcriptomic methodologies can improve our understanding of tissue architecture and locate cell states themselves within the context of developmental dynamics of an anatomical structure such as the whole limb.
Patterning, morphogenesis and developmental disorders in the limb
During organogenesis of the limb, individual cell identities are in part, determined by their relative position within the limb bud. This developmental patterning is controlled by a complex system of temporally and spatially restricted gene expression programmes. For example, key aspects of proximal-distal patterning are controlled by the AER24,25. In contrast anterior-posterior axis specification is chiefly controlled by the zone of polarising activity (ZPA) through SHH signalling26–28. Within the autopod, precise regulation of digit formation is mediated through interdigital tissue apoptosis29–31. We utilised Visium spatial transcriptomic data to explore the locations of transcripts of all these classic pattern-forming genes on the same tissue section, finding notable consistency with previous in situ hybridisation experiments in the mouse (Extended Data Fig. 6a-e). This included several key genes known to govern the proximal identity, including MEIS1 & 2, PBX1 and IRX332–35, as well as genes regulating limb outgrowth and distal morphogenesis such as WNT5A, GREM1, ETV4 and SALL136–39. Similarly, classical mammalian anterior-posterior (AP) genes were captured, including HAND1, PAX9, ALX4 and ZIC3 (anterior) and HAND2, SHH and PTCH1 and GLI1 (posterior)26,40–46.
The homeobox (HOX) genes are a group of 39 genes split into four groups termed “clusters”, each of which is located on a separate chromosome. During limb development, genes in the A & D clusters act in concert with the aforementioned axis-determining genes to dictate limb patterning in mammals47. In mice, these genes are expressed in two waves. The first wave occurs in the nascent limb bud, with the expression of 5’ groups in both clusters exhibiting a posterior prevalence. During the second wave of expression, this posterior prevalence is lost in the A cluster, persisting only in the D cluster48,49. Our spatial transcriptomic data captured the expression patterns of the A and D clusters early in the 6th post-conception week (Extended Data Fig. 6f). As expected, their expression matches the second wave of expression in mice, with a loss of posterior prevalence in the HOXA cluster and its maintenance in the HOXD cluster. For both clusters, an increase in group number corresponded to more distally restricted expression, with group 13 genes limited to the most distal part of the limb bud. An exception to this was HOXA11 expression, which showed no overlap with HOXA13, in keeping with the expression pattern of these two genes during the second HOX wave in mice. This mutual exclusivity in expression domain is thought to be due to HOXA13/D13-dependent activation of an enhancer that drives antisense transcription of HOXA11 in pentadactyl limbs50. Indeed, our data revealed a clear switch to the antisense transcript in the distal limb bud, although this switch did occur more proximally than the limit of HOXA13/D13 expression (Extended Data Fig. 6f). This suggests that in humans, additional mechanisms may be involved in the switch to antisense transcription.
In order to investigate gene expression patterns during digit formation, we obtained coronal sections through a PCW6.2 foot plate to reveal the five forming digits together with the intervening interdigital space (IDS; Fig. 2a). We then manually annotated digital and interdigital voxels based on H&E histology. Differential expression testing between digital and interdigital regions across two adjacent sections demonstrated an enrichment of classical survival-promoting genes in the digital regions, such as IRX1 & 2, while genes involved in interdigital cell death, such as MSX 1 & 2, LHX2 and BMP7 were upregulated in the IDS (Fig. 2b). Similarly, interdigital regions showed an enrichment of molecules involved in the retinoic acid (RA) pathway, such as retinol binding protein (RBP) 4 and Signalling Receptor And Transporter Of Retinol (STRA) 6. Conversely, the RA metabolising enzyme CYP26B1 was profoundly upregulated in the digital regions. These findings underline the importance of RA in triggering interdigital cell death in the hand and foot plate51.
In addition to these known players in digit formation, we also identified several genes which likely play novel roles in this process. For instance, Wnt inhibitory factor (WIF) 1 and its downstream target DKK1 were significantly upregulated in the IDS. These proteins act in a coordinated manner to trigger apoptosis through activation of p21 and p53 and inhibiting c-MYC and BCL2, suggesting a role in the regression of the interdigital tissue of the hand and foot plates52. In addition, there was profound upregulation of the monocyte chemoattractant CCL2 in the IDS (Fig. 2b), in keeping with programmed cell death in this region. Interestingly, we detected only a weak presence of macrophages in the IDS, with the majority of macrophage signatures mapping to vasculature-associated regions (Extended Data Fig. 6g,h). Similar to an aortic macrophage population uncovered in recent studies53,54, the majority (56.6%) of these macrophages are TREM2+, though we cannot exclude the possibility that rather than being a vascular population, these macrophages are in the process of migrating to become tissue-resident elsewhere55. Finally, we histologically annotated each digit in the PCW6.2 foot plate to search for genes that vary with digit identity (Extended Data Fig. 6i). We identified four genes that were upregulated in the great toe including ID2 and ZNF503, both of which are known to have anterior expression domains in the limb, as well as the regulator of cell proliferation PLK2 and the cancer-associated gene LEMD-156–59. HOXD11 was downregulated in the great toe, in keeping with its posterior prevalence. We found no differentially expressed genes in the remaining digits.
We next cross-referenced the list of digit-IDS differentially expressed genes against a list of 2300 single gene health conditions. We found genes involved in several types of isolated (or non-syndromic) brachydactyly (BD) were significantly upregulated in the digital tissue (Fig. 2c). These included IHH (Type A1 BD), BMPR1B (Type A2) and NOG (Type B2)60. Several genes where variations produce complex syndromes which include brachydactyly as part of their phenotype were also upregulated in digital tissue. These included COL11A2 (Oto-Spondylo-Mega-Epiphyseal Dysplasia / OSMED), SOX9 (Cook’s syndrome) and FGFR3 (Achondroplasia)61,62. Conversely, genes which are varied in syndromes with syndactyly as part of their phenotype were significantly upregulated in the IDS. These include DLX5 (split hand-foot malformation), MYCN (Feingold Syndrome type 1), BMP4 (Microphthalmia VI) and TWIST1 (Saethre-Chotzen Syndrome)63–66. We then searched the EMAP eMouse Atlas Project (http://www.emouseatlas.org) for spatial expression data of these genes in the mouse, and found markedly similar patterns for those with available data67 (Extended data Fig. 7). Similarly, where murine models of the aforementioned heritable conditions exist, their phenotype is broadly comparable to the human (Extended Data Table 2)60,62,65,68–83.
Our spatial atlas provides a valuable reference of gene expression under homeostatic conditions for comparison with genetic variations for which phenotypes may begin to penetrate during embryonic development.
Regulation of cell fate decisions of mesenchymal-derived lineages
Our single-cell and spatial atlases revealed a high diversity of mesenchymal-derived cell types and states. In order to better understand what transcriptional mechanisms may control their specification, we inferred cell-fate trajectories in the 32 mesenchyme-associated states by combining diffusion maps, partition-based graph abstraction (PAGA) and force-directed graph (FDG) (see methods). We combined this with transcription factor (TF) network inference using the SCENIC package to identify distinct modules of active TF networks associated with progression through each lineage84.
As expected, the global embedding resembled a ‘spoke-hub’ system, whereby multipotent mesenchymal cells are embedded centrally, with committed cell types radiating outward as they begin to express classical cell-type specific marker genes (Fig. 3a, b). The central hub of mesenchymal cells consisted of five clusters with subtle differences in their transcriptome. A first population, here named Mesenchyme 1 (Mes1), expressed PITX2, a key initiator of hindlimb bud formation that is known to play a role in left-right identity and global limb patterning85. Proximal mesenchymal cells (PrMes) expressing the regulator of proximal identity, MEIS2, were also captured. Two further clusters (Mes2, Mes3) of mesenchyme expressed CITED1, a molecule which localises to the proximal domain of the murine limb bud and plays an unclear role in limb development86. In addition to CITED1, the Mes3 cluster also expressed the homeodomain protein HOPX. This regulatory molecule is thought to suppress adipogenesis in bone marrow stromal cells, and its role in the nascent limb bud has not to our knowledge been characterised to date87. The Mes4 cluster exhibited similar overall expression patterns to the other mesenchymal cells, with the addition of low levels of PRAC1, a molecule identified as maintaining a prostate gland stem cell niche but with no known role in limb development88. In addition to these populations, we also identified a subpopulation of mesenchymal cells expressing ISL1 (ISL1+Mes). These cells represent a mesenchymal niche within the nascent hindlimb bud which contributes to the posterior elements of the limb89. Finally, a cluster bridging mesenchymal cells and those in the chondrocyte lineage, but lacking expression of classical marker genes of either, was identified. This transitional state possibly represents cells forming mesenchymal condensations (MesCond) at the core of the limb, prior to commitment to the chondral lineage.
Examining the abundance of cell types by gestational age revealed how cellular heterogeneity within the mesenchymal compartment evolves during limb development (Fig. 3c). During PCW5, the majority of the cells captured were mesenchymal progenitors. This was particularly notable at PCW5.1 and 5.4, where mesenchyme accounted for 85% and 65% of all cells respectively. The relative abundance of mesenchymal cells in the limb declined thereafter, with almost none present at PCW8 & 9. A multitude of TF networks were predicted to be active in these progenitor populations (Fig. 3d; Extended Data Table 3). ALX4 and ISL1, genes known to dictate limb bud establishment, were both expressed across mesenchymal progenitor populations42,89,90. Similarly, MEOX2 and ALX1 were present in several populations, reflecting their role in limb patterning and morphogenesis91,92. ZMAT4 and ZFHX3 showed similarly diffuse activity, and the role of these TFs has not been clearly characterised in limb development. ZFHX3 has been implicated in demarcating the developing perichondrium and periosteum in the chicken, though interestingly showed no activity in these cell types in our data93. ZMAT4 has previously been shown to be upregulated in bone marrow mesenchymal cells when compared to limbal epithelial cells and mesenchymal cells, though no specific role in limb development has been identified94. The SHH regulator GATA6, another key patterning TF, was restricted to Mes1 and PrMes95. SHOX2 was active in the proximal mesenchyme, in keeping with its function of tissue specification in the proximal limb96. Finally, cells of the progress zone showed activation of a distinct module of TFs, including LHX2 and LHX9, as previously described in the mouse97.
The chondrocyte lineage increased in number steadily over time, accounting for 25% of the cells captured at PCW5.6, increasing to 50% at PCW7.2. Within this lineage, a shift from progenitor to more mature cell types was observed during the period studied, with uncommitted osteochondral progenitors (OCP) and immature chondroblasts giving way to maturing, resting and prehypertrophic chondrocytes (Fig. 3c). This progression was accompanied by changes in regulon activity (Fig. 3d). For example, SHOX activity was highly specific for OCPs. This TF has been widely implicated in secondary ossification through its interactions with the master regulators of chondrogenesis - SOX5, 6 and 998. Altered SHOX expression has been found in syndromes of altered skeletal growth, including Turner syndrome and idiopathic short and tall stature99. Its role in primary ossification is not well characterised, and its potential role in driving mesenchymal progenitors to form OCPs could shed further light on the mechanisms underlying such syndromes.
The switch from OCP to chondroblasts (and subsequent chondrocytes) was associated with the activation of SOX5,6 and 9, with the latter localising to chondrocyte condensations at PCW 5.6 and the developing tibia, fibula and digits at PCW6.2 (Fig. 3d,e). This trend was observed for other known regulators of chondrogenesis, including THRB and NKX3-2100,101. Interestingly, FOXJ1 was predicted to have similar activity in the chondrocyte lineage. In addition to its established role in ciliation, this TF has been shown to regulate dental enamel development 102. Furthermore, like SOX5 and 9, it is regulated by IRX1; a TF which specifies the digits and establishes the boundary between chondrogenic and non-chondrogenic tissue in the developing chick limb21. Several known regulators of chondrocyte hypertrophy were specific to PHCs, including Osterix (SP7), RORA, DLX3 and RUNX3, with the latter localising to the tibial diaphysis at PCW 6.2103–106. Finally, RUNX2 was, as expected, predicted to be active in osteoblasts and the perichondrial cells from which they are derived, with expression again localising to the tibial diaphysis.
Our experiments also captured the cells of the interzone; mesenchymal cells that reside at the sites of future synovial joints and give rise to their constituent parts. This cluster expressed the classical interzone marker GDF5 and emerged at the end of PCW5, giving rise to articular chondrocytes expressing lubricin (PRG4) at PCW8 (Fig. 3b,c). Intriguingly, the articular chondrocytes were not predicted to exhibit SOX5/6/9 activity, but instead possessed a separate TF programme focused on inhibiting classical osteochondral transcripts (Fig 3d). For example, ELF3 has been shown to inhibit SOX9 in cultured human chondrocytes, whilst TFAP2C has demonstrated the same effect in human colorectal cancer cell lines107,108. In murine models, the glucocorticoid receptor NR3C1 inhibits the osteoblastic transcripts COL1A1 and Osteocalcin (BGLAP)109,110.The role of SOX9 in articular cartilage development and homeostasis is uncertain, with inducible loss in mice resulting in no degenerative change at the joint postnatally111,112.
Tendon progenitors expressing high levels of SCX but low TNMD emerged during PCW5 before declining in number and being replaced by tenocytes and perimysium expressing high levels of TNMD from PCW7 onward (Fig. 3b, c). Several TFs that control tenogenesis were predicted to be active in these cell types, including SIX1 and MKX113,114. Additionally, HEYL activity was elevated in tenocytes (and to a lesser degree in perimysium) when compared to tendon progenitors (Fig. 3d). The role of this TF in tenogenesis has not been characterised, however it has been shown to suppress the expression of MYOD1 in muscle stem cells under increased loading conditions115. Our data suggest this mechanism may extend to tenocyte specification in the embryonic limb. TEF also showed upregulation in these cell types. This gene is known to play a role in extracellular matrix regulation in cardiac tissue, and may therefore play a similar role in the developing tendon 116 Interestingly, the homeobox transcription factor HLX was specific to perimysium. This TF plays a role in the development of many organs including the liver, diaphragm, bowel, spleen and myeloid lineage, but no function in tendon development has been described117.
Finally, different fibroblast and smooth muscle populations within the limb exhibited clearly distinct TF activities (Extended Data Fig. 8a,b). Dermal fibroblasts showed activity in known regulators of this lineage, including FOSL2, PRDM1, SP5 and HOXC5118–120. Perineural fibroblasts showed activity in FOXS1, a molecule previously associated with sensory nerves121. Smooth muscle cells (SmMC) and their precursors (SmMP) both showed activity in GATA6, which is thought to regulate their synthetic function 122. In addition, SmMC showed activity in several additional TFs with known roles in smooth muscle maturation, such as HEY2 and HES6123,124.
Regulation of embryonic and fetal myogenesis
Limb muscle originates from the dermomyotome in the somite1,2. Classically, its formation begins with delamination and migration from the somite regulated by PAX3 and co-regulators such as LBX1 and MEOX2, followed by two subsequent waves of myogenesis: embryonic and fetal125. During embryonic myogenesis, a portion of PAX3+ embryonic skeletal muscle progenitors are destined to differentiate and fuse into multinucleated myotubes. These primary fibers act as the scaffold for the formation of secondary fibers derived from PAX7+ fetal skeletal muscle progenitors, which are themselves derived from PAX3+ muscle progenitors126–128.
To dissect these limb muscle developmental trajectories in detail from our human data, we took cells from the eight muscle states, re-embedded them using diffusion mapping combined with PAGA and FDG. Three distinct trajectories with an origin in PAX3+ skeletal muscle progenitors (PAX3+ SkMP), emerged (Fig. 4a). The first trajectory (labelled 1st Myogenesis) starts from PAX3+ SkMP and progresses through an embryonic myoblast state (MyoB1) followed by an early embryonic myocyte state (MyoC1), and finally arrives at mature embryonic myocytes. This trajectory is in keeping with embryonic myogenesis. Along the second trajectory, the PAX3+ SkMP lead to PAX3+PAX7+ cells, followed by a heterogeneous pool of PAX7+ SkMP cells that are mostly MyoD negative (Fig. 4a,b). This represents a developmental path that generates progenitors for subsequent muscle formation and regeneration. The final trajectory (labelled 2nd Myogenesis) connects cell states that express PAX7 first to fetal myoblasts (MyoB2), then early fetal myocytes (MyoC2) and finally mature fetal myocytes.
Comparing these myogenic pathways, we noticed that PAX3 expression gradually phases out along the trajectory of the embryonic myogenic pathway, while it is almost absent in the fetal myogenic pathway (Fig. 4b).This is consistent with a previous study that captured Pax3+ Myog+ cells in the mouse limb 129. Interestingly, ID2 and ID3 that are known to attenuate myogenic regulatory factors 130,131 are also more highly expressed in embryonic myogenesis than fetal, which may imply different upstream regulatory networks. Additional genes such as FST, RGS4, NEFM and SAMD11 were also identified to be marking the first myogenic pathway while KIF19, TNFSF13B, KRT31 and RGR mark the second (Fig. 4b). In fact, Keratin genes have been found to facilitate sarcomere organisation132.
Next, we performed SCENIC analysis to search for transcription factors driving each myogenic stage. A large number of stage-specific transcription factors were identified (Fig. 4d; Extended Data Table 4), including known muscle regulators such as MSX1, PAX3, PAX7, PITX2, SIX2, MYOD1, MYOG and LBX1 (Extended Data Fig. 8.c,d). Indeed, PAX3 had a higher activity score in cell states during embryonic myogenesis while PAX7 has higher scores during fetal myogenesis. Although PITX2 was reported to be transcribed at similar levels during embryonic limb myogenesis133, we observe a higher activity score and abundance during embryonic myogenesis than fetal myogenesis (Extended Data Fig. 8.c), possibly related to its different regulatory roles134. Its related family member, PITX1, shows overlapping activity hotspots based on analyses using SCENIC. Interestingly, while known as a hindlimb-specific transcription factor, we find PITX1 expressed in both forelimb and hindlimb muscle cells (Fig. 4e), including a fraction of PAX3+ cells as early as PCW5 (Fig. 4f), suggesting a potential regulatory role in embryonic myogenesis.
Complementary to SCENIC analyses focusing on activators, we also investigated in several transcriptional repressors such as MSC (also known as Musculin, ABF-1 or MyoR), TCF21(Capsulin), and families of ID, HES and HEY proteins. We observed specific expressions of MSC, HES1 and HEY1 in PAX7+ skeletal progenitors. The most prominent repressor, MSC is a bHLH transcription factor that has been shown to inhibit MyoD’s ability to activate myogenesis in 10T1/2 fibroblasts135 and rhabdomyosarcoma cells136. In addition, in C2C12 murine myoblasts, MSC facilitates Notch’s inhibition of myogenesis (although it appears to exhibit functional redundancy in this role)137. To test whether human MSC also plays a role in repressing PAX7+ skeletal muscle progenitor maturation, in addition to its widely accepted role in B-cell development138,139, we knocked down MSC in primary human embryonic limb myoblasts. Our RT-qPCR results showed profound upregulation of late myocyte genes (Fig. 4g). This suggests that MSC is key to maintaining limb muscle progenitor identity.
Spatially resolved microenvironments exhibit distinct patterns of cell-cell communication
In order to investigate communication between cell types, we utilised the CellphoneDB python package to identify stage-specific ligand-receptor interactions by cell type in the developing limb140,141. This output was then filtered to reveal signalling pathways between co-located populations of cells at the histological level, determined in an unbiased way using cell2location factor analysis (Fig. 5a).
In the early (PCW5.6) limb bud, NOTCH signalling was predicted to occur in its distal posterior aspect through the canonical ligand Jagged (JAG)-1 (Fig. 5b). This interaction occurs between adjacent cells, with JAG1 bound to the cell rather than being secreted, triggering proteolytic cleavage of the intracellular domain of NOTCH receptors with varying activity depending on the NOTCH receptor involved142–144. JAG1 is induced by SHH in the posterior distal limb bud, with its anterior expression inhibited by GLI3R41,145. Our spatial transcriptomic dataset confirms this expression pattern in the early limb bud, with several voxels containing both JAG1 and NOTCH family transcripts (Fig. 5b; white asterisks). In addition, HES1, a downstream target of NOTCH, was expressed in these voxels, supporting the predicted activity of this signalling pathway. This novel finding sheds further light on the mechanisms controlling limb morphogenesis and has implications for conditions where this signalling axis is disrupted, such as the posterior digit absence characteristic of Adams-Oliver syndrome and the 5th finger clinodactyly of Alagille syndrome146,147.
In the limb bud samples at PCW6.2, we captured weak but reproducible signals of FGF8 in the AER epithelial cells while FGF10 was detected in the adjacent mesenchyme (Fig. 5c, d). It is known that FGF8 and FGF10 are expressed in the adjacent ectoderm and mesoderm respectively and form a feedback loop through FGFR2 that is essential for limb induction148. Indeed, FGFR2 expression overlapped with that of FGF8 and FGF10 (Fig. 5d) and showed higher abundance in the AER than the progress zone, consistent with previous findings148. Interestingly, our single-cell and spatial atlases also suggest that FGFR2 has the lowest expression in the outer progress zone, interposed by the AER and transitional progress zone (Fig. 5c,d), which indicates a potential repression mechanism in the outer progress zone. As expected, FGFR2 was also expressed in the osteochondral lineage (Fig. 5e). The importance of this receptor in skeletal development is highlighted by the limb phenotypes observed in the FGFR2-related craniosynostoses, such as radiohumeral synostosis, arachnodactyly and bowed long bones149. FGF8, together with FGF5 and 9, were also expressed in regions with high expression of myogenic proteins, including their receptor FGFR4 (Fig. 5d-f; Extended Data Fig. 9a,b). This is in agreement with its previously reported role in activating MyoD expression in the chicken limb bud150.
The distribution of other FGF family members in the nascent limb were in keeping with observations in model organisms (Fig. 5e). FGFR1 showed a broad expression pattern across many cell populations, including skeletal muscle, bone and tendon. This receptor plays multiple roles in limb development, and its inactivation in the mouse results in truncated limbs with three digits; a deformity most marked in the hindlimb151. FGFR3 was expressed in intermediate and prehypertrophic chondrocytes, reflecting its role as a negative regulator of endochondral ossification152. Our spatial transcriptomic data further confirm the cell-specific spatial distribution of FGFs and their corresponding receptors (Fig 5f).
Homology and divergence between human and murine limb development
Limb development has long been studied in model organisms, while assays directly performed on human samples are less common. To explore differences between mice and humans, and to understand evolutionary principles, we collected 13 mouse limb samples for scRNA-seq, and combined our newly generated data with 18 high-quality (Extended Data Fig. 10a) limb datasets from three previously published studies 133,153,154 to build a comprehensive mouse embryonic limb atlas (Extended Data Fig. 10b, c). To compare the mouse and human transcriptome, we used the diagonal alignment algorithm MultiMAP155 to align single-cells based on matched orthologs, while also considering information from non-orthologous genes. The resulting integrative atlas (Fig. 6a-c) with aligned cell-type clusters show a highly conserved cell composition between Human and Mouse (Fig. 6b, Extended Data Fig. 10d).
But differential compositions are also observed in a small number of cell types. As expected, the mouse limb dataset has a greater abundance of PAX3+ SkMP and Early LPMC cells (Fig. 6d, brown circle; Extended Data Fig. 10d) that are enriched in early embryonic limb development, primarily because more early (before E12) mouse samples were collected given the limited access to early embryonic human samples (before PCW5). In addition, mouse limbs contained a higher percentage of epithelial cells and immune cells (Fig 6d, orange circle), possibly due to faster maturation of the epidermal and early immune system in the mouse. Consistent with this, a recent comparison of mouse and chicken limbs at the single cell level found that gene modules of epithelial and immune cells exhibit higher evolutionary turnover156. Interestingly, whilst more cells of the cluster PAX3+ SkMP in mouse are PAX3+ MSC- (3448) than PAX3+ MSC+ (592), the reverse is true for human (116 PAX3+ MSC-cells vs 1532 PAX3+ MSC+ cells), indicating differences in the transcription factor repertoire between mouse and human PAX3+ SkMP.
To systematically compare pattern formation between mouse and human limbs, we dissected forelimbs and hindlimbs from a human embryo and a mouse counterpart, each separated into proximal, middle and distal segments to compare with our first trimester human samples. This allowed us to address the differences between forelimb and hindlimb along the proximo-distal axis at matched time points in human and mouse development.
Overall, mice and humans demonstrate highly similar cell-type compositions along the P-D axis. In both human and mouse forelimbs, proximal mesenchymal cells are enriched towards the proximal end, while progress zone cells (TransPZC and OuterPZC) are highly enriched in the distal part as expected (Fig. 6e). Additionally, interzone cells are enriched in the middle segment, where we intentionally included the joints. The same is true for the hindlimb. Comparison of forelimbs and hindlimbs demonstrated that both humans and mice show minimal differences in terms of cell type composition (Fig. 6e). This suggests that the composition of cell types of the developing limb is highly conserved between humans and mice even when pinpointing the broad anatomical regions. To perform a more stringent comparison, we took cells from the thirty-two LPM-derived states to compare ortholog expression signatures between proximal and distal segments in mouse and human. Both species recapitulate known P-D biased genes such as MEIS1 (proximal) and HOXD13 (distal) and known forelimb/hindlimb biased genes such as TBX5 (forelimb) and TBX4 (hindlimb) (Extended Data Fig. 10e). Overall, we show that the gene neighbourhoods controlling forelimb/hindlimb identity and P-D axis formation are highly conserved in evolution.
Discussion
Our developmental limb atlas combines single-cell RNA and spatial transcriptomic analyses of embryonic limb cells from multiple time points in the first trimester in order to form the first detailed characterisation of human limb development across space and time. We identify fifty-five cell states within eight tissue lineages in the developing limb and place them into anatomical context, building on existing knowledge of cellular heterogeneity gained from model organisms133. Our spatial data also reveals the expression of key regulators of limb axis identity, including the homeobox genes, in the nascent human limb.
In addition to recapitulating model organism biology, our atlas enables the identification of novel cell states. We identify a population of perineural fibroblasts surrounding the sciatic nerve and its tibial division, and we confirm their location using immunostaining. We also characterise several populations of mesenchymal cells, each defined by the expression of marker genes that in many cases play unclear roles in limb formation and should spur further investigation. The scale and resolution of our atlas also enables the construction of a refined model of cell states and regulators in partially overlapped and paralleled primary and secondary myogenesis in the limb marked by different panels of regulators, with the identification and validation of MSC as a key player in muscle stem cell maintenance.
Our atlas also leverages spatial data by placing subtly distinct single cell clusters into their anatomical context, shedding light on their true identity. In particular, two clusters of cells with subtly different transcriptomes mapped to the progress zone in two distinct bands, which we term “outer” and “transitional” layers. The gene module of the more proximal transitional layer showed some similarity to that of chondroblasts, suggesting it may be beginning to differentiate towards these committed cell types. Similarly, two clusters in the tendon lineage map to the tendon and perimysium, giving insight into the subtle differences between these related tissues. Furthermore, through histological annotation of the developing autopod, we connected physiological gene expression patterns to single gene health conditions that involve altered hand phenotype, demonstrating the clinical relevance of developmental cell atlas projects. We further maximised the utility of this study by presenting an integrated cross-species atlas with unified annotations as a resource for the developmental biology community that we expect will strengthen future studies of limb development and disease that utilise murine models.
Whilst the combination of single cell and spatial transcriptomics is an established method for tissue atlasing, we recognise the challenges of combining different technologies. For example, our single-cell data captured large numbers of chondrocytes, including prehypertrophic chondrocytes which mapped to the mid-diaphysis of the forming bones. Interestingly, analysis of spatial gene expression revealed some collagen-X expression in these regions; a marker gene for mature, hypertrophic chondrocytes. Our scRNAseq experiments did not capture any cells expressing collagen-X, suggesting that permeabilisation and RNA capture with visium in this case was a superior method for profiling matrix-rich tissues such as mature cartilage. We expect these technical considerations to feed forward into future atlasing endeavours involving cartilage, bone and other dense tissues.
Methods
Human tissue sample collection
First trimester human embryonic tissue was collected from elective termination of pregnancy procedures at Addenbrookes Hospital, Cambridge, UK under full ethical approval (REC-96/085; for scRNA-seq and Visium), or at Guangzhou Women and Children’s Medical Center, China under the license of ZSSOM-2019-075 approved by the human research ethics committee of Sun Yat-sen University (for experimental validation). Written, informed consent was given for tissue collection by the patient. Embyonic age (post conception weeks, PCW) was estimated using the independent measurement of the crown rump length (CRL), using the formula PCW (days) = 0.9022 × CRL (mm) + 27.372.
Human tissue processing and scRNA-seq data generation
Embryonic hindlimbs were dissected from the trunk under a microscope using sterile microsurgical instruments. Four samples (a hindlimb and a forelimb from both PCW5.6 and 6.1) were then further dissected into proximal, middle and distal thirds prior to dissociation. For the PCW5.1 sample, no further dissection was performed and the limb bud was dissociated as a whole. For all other samples, the limb was dissected into proximal and distal halves prior to dissociation.
Dissected tissues were mechanically chopped into a mash, and then were digested in Liberase TH solution (Roche, 05401135001, 50 μg/ml) at 37°C for 30-40 min till no tissue pierce visible. Digested tissues were filtered through 40 μm cell strainers followed by centrifugation at 750g for 5 min at 4°C. Cell pellets were resuspended with 2% FBS in PBS if the embryos were younger than PCW8, otherwise red blood cell lysis (eBioscience, 00-4300) was performed. The single-cell suspensions derived from each sample were then loaded onto separate channels of a Chromium 10x Genomics single cell 3’version 2 library chip as per the manufacturer’s protocol (10x Genomics; PN-120233). cDNA sequencing libraries were prepared as per the manufacturer’s protocol and sequenced using an Illumina Hi-seq 4000 with 2×150bp paired-end reads.
Mouse tissue sample collection and scRNA-seq data generation
Timed pregnant C57BL/6J wild type mice were ordered from Jackson Laboratories. Embryos were collected at E12.5, E13.5 and E16.5. Only right side forelimbs and hindlimbs were used in this study: n=5 at the E12.5 timepoint, n=5 at E13.5 and n=2 at E16.5. Hindlimbs and forelimbs were pooled separately in ice cold HBSS (Gibco, 14175-095), and dissected into proximal, mid and distal limb regions, which were again separately pooled in 200 μl of HBSS placed in a drop in the centre of a 6 cm culture plate. Tissues were then minced with a razor blade, and incubated with an addition of 120 μl of diluted DNAse solution (Roche, 04716728001) at 37C for 15 minutes. DNAse solution: 1 ml UltraPure water (Invitrogen, 10977-015) 110 μl 10X DNAse buffer, and 70 μl DNAse stock solution. 2 ml of diluted Liberase TH (Roche, 05401151001) was then added to the plate, and the minced tissue suspension was pipetted into a 15 ml conical centrifuge tube. The culture plate was rinsed with 2 ml, and again with 1 mlL of fresh Liberase TH which was serially collected and added to the cell suspension. The suspension was incubated at 37°C for 15 minutes, triturated with a P1000 tip, and incubated for an additional 15 minutes at 37°C. Liberase TH solution: 50X stock was prepared by adding 2 ml PBS to 5 mg of Liberase TH. Working solution is made by adding 100 μl 50X stock to 4.9 ml PBS. After a final gentle trituration of the tissue with a P1000 tip, the suspension was spun at 380g in a swinging bucket rotor at 4°C for 5 minutes. After removing the supernatant, cells were resuspended in 5 ml of 2% fetal bovine serum in PBS, and filtered through a pre-wetted 40 μm filter (Falcon, 352340). After spinning again at 380g at 4°C for 5 minutes, the supernatant was removed and cells were resuspended in 200 ul 2% FBS in PBS. A small aliquot was diluted 1:10 in 2% FBS/PBS and mixed with an equal volume of Trypan Blue for counting on a hemocytometer. The full suspension was diluted to 1.2 million cells/ml for processing on the 10x Genomics Chromium Controller, with a target of 8000 cells/library. Libraries were processed according to the manufacturer’s protocol, using the v3 Chromium reagents.
Visium spatial transcriptomic experiments of human tissue
Whole embryonic limb samples at PCW6-8 were embedded in OCT within cryo wells and flash-frozen using an isopentane & dry ice slurry. Ten-micron thick cryosections were then cut in the desired plane and transferred onto Visium slides prior to haematoxylin and eosin staining and imaged at 20X magnification on a Hamamatsu Nanozoomer 2.0 HT Brightfield.
These slides were then further processed according to the 10X Genomics Visium protocol, using a permeabilisation time of 18min for the PCW6 samples and 24 minutes for older samples. Images were exported as tiled tiffs for analysis. Dual-indexed libraries were prepared as in the 10X Genomics protocol, pooled at 2.25 nM and sequenced 4 samples per Illumina Novaseq SP flow cell with read lengths 28bp R1, 10bp i7 index, 10bp i5 index, 90bp R2.
Digit region analysis of Visium data
For differential gene expression testing, regions of interest were annotated based on H&E histology, and significant feature analysis performed using the 10X Genomics Loupe Browser 4.1.0, selecting for locally distinguishing features between the two regions. Results were visualised using the violin plot function in Loupe.
Alignment, quantification and quality control of human scRNA-seq data
Droplet-based (10X) sequencing data were aligned and quantified using the Cell Ranger Single-Cell Software Suite (v.2.1.1, 10X Genomics) against the Cell Ranger hg38 reference genome refdata-cellranger-GRCh38-3.0.0, available at: http://cf.10xgenomics.com/supp/cell-exp/refdata-cellranger-GRCh38-3.0.0.tar.gz. The following quality control steps were performed: (i) cells that expressed fewer than 200 genes (low quality), and over 10,000 genes (potential doublets) were excluded; (ii) genes expressed by less than 5 cells were removed; (iii) cells in which over 10% of unique molecular identifier (UMIs) were derived from the mitochondrial genome were removed.
Alignment and quantification of human Visium data
Raw FASTQ files and histology images were processed, aligned and quantified by sample using the Space Ranger software v.1.0.0. which uses STAR v.2.5.1b52 for genome alignment, against the Cell Ranger hg38 reference genome refdata-cellranger-GRCh38-3.0.0, available at: http://cf.10xgenomics.com/supp/cell-exp/refdata-cellranger-GRCh38-3.0.0.tar.gz.
Alignment, quantification and quality control of mouse scRNA-seq data
Droplet-based (10x) sequencing data were aligned and quantified using the Cell Ranger Single-Cell Software Suite (v.3.0.2, 10x Genomics) against the Cell Ranger mm10 reference genome refdata-gex-mm10-2020-A, available at: https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-mm10-2020-A.tar.gz. The following quality control steps were performed: (i) cells that expressed fewer than 200 genes (low quality) were excluded; (ii) genes expressed by less than 5 cells were removed; (iii) cells in which over 10% of unique molecular identifier (UMIs) were derived from the mitochondrial genome were removed.
Doublet detection of human scRNA-seq data
Doublets were detected with an approach adapted from a previous study157. In the first step of the process, each 10X lane was processed independently using the Scrublet to obtain per-cell doublet scores. In the second step of the process, the standard Scanpy processing pipeline was performed up to the clustering stage, using default parameters. Each cluster was subsequently separately clustered again, yielding an over-clustered manifold, and each of the resulting clusters had its Scrublet scores replaced by the median of the observed values. The resulting scores were assessed for statistical significance, with P values computed using a right tailed test from a normal distribution centred on the score median and a median absolute deviation (MAD)-derived standard deviation estimate. The MAD was computed from above-median values to circumvent zero truncation. The P values were corrected for false discovery rate with the Benjamini–Hochberg procedure, and a significance threshold of 0.1 was imposed. Cells with a significant corrected P values were detected as doublets and removed.
Data preprocessing and integration of human scRNA-seq data
Preprocessing included data normalisation (pp.normalize_per_cell with 10,000 counts per cell after normalisation), logarithmise (pp.log1p), highly variable genes (HVGs) detection (pp.highly_variable_genes with batch key (sample identities)), data feature scaling (pp.scale), cell cycle regressing (tl.score_gene_cell_cycle and pp.regress_out), and principal component analysis (PCA) (tl.pca with 100 components) performed using the Python package Scanpy (v.1.6.0). bbknn (v.1.3.11) was used to correct for batch effect between sample identities with the following parameters (n_pcs = 70, neighbors_within_batch = 3, trim = 200, metric= “euclidean”).
Data preprocessing and integration of mouse scRNA-seq data
Preprocessing included data normalisation (pp.normalize_per_cell with 10,000 counts per cell after normalization), logarithmise (pp.log1p), highly variable genes (HVGs) detection (pp.highly_variable_genes and select for highly correlated ones as described in 133) per batch and merging, data feature scaling (pp.scale), cell cycle regressing (tl.score_gene_cell_cycle and pp.regress_out), and principal component analysis (PCA) (tl.pca with 100 components) performed using the Python package Scanpy (v.1.8.2). bbknn (v.1.5.1) was used to correct for batch effect between sample identities with the following parameters (n_pcs = 100, metric= “euclidean”).
Clustering and annotation of human scRNA-seq data
We first performed dimension reduction using Uniform Manifold Approximation and Projection (UMAP) (scanpy tl.umap with default parameters) on 1903 highly variably genes. Next, we applied Leiden graph-based clustering (scanpy tl.leiden with default parameters) to perform unsupervised cell classification. To make sure all expected Leiden clusters could clearly be mapped onto their UMAP embedding coordinates, we performed the partition-based graph abstraction (PAGA) (tl.paga with the Leiden clusters) and rerun UMAP with the initial position from PAGA. Cluster cell identity was assigned by manual annotation using known marker genes and computed DEGs using self-designed framework. The detailed description of the method for DEGs detection is available on Github (https://github.com/ZhangHongbo-Lab/DEAPLOG).
Deconvolution of human Visium data- cell2location
To map cell types identified by scRNA-seq in the profiled spatial transcriptomics slides, we used the cell2location method 19. In brief, this involved first training a negative binomial regression model to estimate reference transcriptomic profiles for all the cell types identified with scRNA-seq in the developing limb. Next, lowly expressed genes are excluded as per recommendations for use of cell2location. Next, we estimated the abundance of cell types in the spatial transcriptomics slides using the reference transcriptomic profiles of different cell types. To identify microenvironments of co-localising cell types, we used non-negative matrix factorisation (NMF) implementation in scikit-learn, utilising the wrapper in the cell2location package 158. A cell type was considered part of a microenvironment if the fraction of that cell type in said environment was over 0.2.
Alignment and merging of multiple visium sections
In order to analyse the whole PCW 8.1 human hindlimb, we took three consecutive ten micron sections from different regions and placed them on different capture areas of the same Visium LP slide. The first section spanned the distal femur, knee joint and proximal tibia (sample C42A1), the second the proximal thigh (C42B1) and the third the distal tibia, ankle and foot (C42C1).
The images from these 3 visium capture areas were then aligned using the TrackEM plugin (Fiji)159. Following affine transformations of C42B1 and C42C1 to C42A1, the transformation matrices were exported to an in house pipeline (/22-03-29-visium-stitch/limb_reconst.ipynb) for complementary alignment of the spot positions from the SpaceRanger output to the reconstructed space. In addition we arbitrarily decided that for regions of overlapping spots we kept the spots from the centre portion (see supp) and the same decision was made for the reconstructed image in order to maintain the 1-1 relationship between the image and genetic profile. Next we merged the 3 library files by an in-house pipeline (22-03-29-visium-stitch/N1-join-limb.ipynb) and matched the reconstructed image to the uniform AnnData object.
Trajectory analysis of human scRNA-seq data
Development trajectories were inferred by combining diffusion maps (DM), PAGA and force-directed graph (FDG). The first step of this process was to perform the first nonlinear dimensionality reduction using DM (scanpy tl.diffmap with 15 components) and recompute the neighborhood graph (scanpy pp.neighbors) based on the 15 components of DM. In the second step of this process, PAGA (scanpy tl.paga) was performed to generate an abstracted graph of partitions. Finally, FDG was performed with the initial position from PAGA (scanpt tl.draw_graph) to visualise the development trajectories.
Cell-cell communication analysis of human scRNA-seq data
Cell–cell communication analysis was performed using the CellPhoneDB (v.2.1.4) for each dataset at the same stage of development. The stage matched Visium data was used to validate the spatial distance and expression pattern of significant (P < 0.05) ligand-receptor interactions.
Enrichment analysis of transcription factors (TFs)
To carry out transcription factor network inference, Analysis was performed as previously described160 using the pySCENIC python package (v.0.10.3). For the input data, we filtered out the genes that were expressed in less than 10 percent of the cells in each cell cluster. Then, we performed the standard procedure including deriving co-expression modules (pyscenic grn), finding enriched motifs (pyscenic ctx) and quantifying activity (pyscenic aucell).
Integration of human and mouse scRNA-seq data
Processed human and mouse data were merged together using outer join of the ortholos. The matched dataset was then integrated by MultiMAP using the MultiMAP_Integration() function, using separately pre-calculated PCs and the union set of mouse and human highly-variable genes. Downstream clustering and embedding were performed as usual and cell-type annotation was based on marker genes. Cell-type composition of proximal, middle and distal segments of the same limb was visualised using plotly.express.scatter_ternary() function. To capture the differential expression of sparsely captured genes, the odds ratio of the percentages of non-zero cells between groups of cells was used to select for proximal/distal or fore/hind biassed genes with a cutoff at 30 fold and 3 fold respectively.
Immunohistochemistry
The lower limbs were post-fixed in 4% PFA for 24 h at 4°C followed by paraffin embedding. A thickness of 4 μm sections were boiled in 0.01 M citrate buffer (pH 6.0) after dewaxing. Immunofluorescent staining was then carried out as described previously161. Primary antibodies for RUNX2 (1:50, Santa Cruz, sc-390715), THBS2 (1:100, Thermo Fisher, PA5-76418), COL2A1 (1:200, Santa Cruz, sc-52658), PITX1 (1:30, Abcam, Ab244308), PAX3 (1:1, DSHB, AB_528426 supernatant) and ALDH1A3 (1:50, Proteintech, 25167-1-AP) and MYH3 (1:3, DSHB, AB_528358 supernatant) were incubated overnight at 4°C. After washing, sections were incubated with appropriate secondary antibodies Alexa Flour 488 goat anti-mouse IgG1 (Invitrogen, A-21121), Alexa Flour 647 goat anti-mouse IgG2b (Invitrogen, A-21242), Alexa Flour 488 goat anti-mouse IgG (H+L) (Invitrogen, A-11029) and Alexa Flour 546 goat anti-rabbit IgG (H+L) (Invitrogen, A-11035) at room temperature for 1h, and were mounted using FluorSave Reagent (Calbiochem, 345789). For 3, 3-diaminobenzidine (DAB) staining, we used Streptavidin-Peroxidase broad spectrum kit (Bioss, SP-0022) and DAB solution (ZSGB-BIO, ZLI-9017) following the manufacturers’ manuals. Primary antibodies PI16 (1:500, Sigma-Aldrich, HPA043763), FGF19 (1:500, Affinity, DF2651) and NEFH (1:1000, Cell Signaling, 2836) were applied. Single-plane images were acquired using an inverted microscope (Leica, DMi8).
MSC knockdown in human primary myoblasts
Isolation of human primary myoblast cells
The thighs from human embryos were processed as described162, except that the dissociated cells were not treated with erythrocyte lysis solution, and were incubated with anti-human CD31 (eBioscience, 12-0319-41), CD45 (eBioscience, 12-0459-41) and CD184 (eBioscience, 17-9999-41) antibodies for cell sorting. Fluorescent activated cell sorting (FACS, BD, influx) sorted CD31-CD45-CD184+ cells were cultured in complete growth medium DMEM supplemented with 20% FCS and 1% penicillin/streptomycin (Gibco, 15140122).
siRNA transfection
Human primary myoblasts were seeded into a 6-well plate one night before transfection. When the cell density reached approximately 50% confluence, oligos of small interfering RNA (siRNA) against MSC (si-MSC) and negative control (NC) were transfected using Lipofectamine 3000 reagent (Invitrogen, L3000015) at a final concentration of 37.5 nM. After incubation for 16 h, the growth medium was replaced with differentiation medium containing 2% horse serum and 1% penicillin/streptomycin in DMEM. After culturing for an additional 6-8 h, the cells were collected for RNA extraction. Initially three siRNA oligos (Bioneer, 9242-1, 2, 3) were tested, and the third one with sense sequences 5′-GAAGUUUCCGCAGCCAACA-3′ were used in this study.
RNA extraction and quantitative PCR (qPCR)
Total cell RNA was extracted with the EZ-press RNA purification kit (EZBioscience, B0004D), and the cDNA was synthetised using the PrimeScript RT Master Mix Kit (TaKaRa, RR036A). The qPCR was performed using PerfectStartTM Green qPCR Super Mix (TransGen Biotech, AQ601) on a Real-time PCR Detection System (Roche, LightCycle480 II). RPLP0 served as an internal control, and the fold enrichment was calculated using the formula 2−ΔΔCt. The following primers (5’-3’) were used:
RPLP0 forward: ATGCAGCAGATCCGCATGT, reverse: TTGCGCATCATGGTGTTCTT;
MSC forward: CAGGAGGACCGCTATGAGAA, reverse: GCGGTGGTTCCACATAGTCT;
MYOG forward: AGTGCCATCCAGTACATCGAGC, reverse: AGGCGCTGTGAGAGCTGCATTC;
MYH2 forward: GGAGGACAAAGTCAACACCCTG, reverse: GCCCTTTCTAGGTCCATGCGAA;
MYH3 forward: CTGGAGGATGAATGCTCAGAGC, reverse: CCCAGAGAGTTCCTCAGTAAGG;
MYH4 forward: CGGGAGGTTCACACAAAAGTCATA, reverse: CCTTGATATACAGGACAGTGACAA;
TNNT1 forward: AACGCGAACGTCAGGCTAAGCT, reverse: CTTGACCAGGTAGCCGCCAAAA.
Extended data figures and legend
Data Availability
All of our newly generated raw data are publicly available on ArrayExpress (mouse scRNA-seq, E-MTAB-10514; human Visium, E-MTAB-10367; human scRNA-seq, E-MTAB-8813). Previously published raw data can be found from ENCODE portal (ENCSR713GIS) and GEO (GSE137335 and GSE142425). Processed data and be downloaded and visualized at our data portal (https://limb-dev.cellgeni.sanger.ac.uk).
Code availability
All in-house codes can be found on github (https://github.com/Teichlab/limbcellatlas/).
Author contributions
S.A.T. and H.Z. supervised the project; S.A.T. initiated and designed the project; X.H. and Y.F. carried out human tissue collection; B.W. carried out mouse tissue collection; B.W.,L.M., L.B., R.E., and E.F. performed scRNA-seq; E.T. performed Visium spatial experiments; S.W.,K.R., and E.T did in situ staining and functional experiments; H.Y., C.L. and H.Z. provided experimental support. B.Z.,P.H and J.E.L. analysed sequencing data and generated figures. V.K., K.P., M.P. and N.Y. provided computational support. M.S. and B.J.W.contributed to interpretation of the results. B.Z.,P.H, J.E.L., S.W., H.Z. and S.A.T. wrote the manuscript. All authors contributed to the discussion and editing of the manuscript.
Competing interests
In the past three years, S.A.T. has consulted for or been a member of scientific advisory boards at Roche, Qiagen, Genentech, Biogen, GlaxoSmithKline and ForeSite Labs. The remaining authors declare no competing interests.
Acknowledgements
We thank Jana Lalakova for illustrating human limb syndromes. We thank Ken To for proofreading the manuscript. We thank Matt Thomson’s lab for help with mouse 10X loading. We thank members of the Teichmann lab, Zhang lab, Marioni lab, Haniffa lab and Behjati lab for discussion and feedback. This work was supported by the National Key Research and Development Program (grant 2019YFA0801703), National Natural Science Foundation of China (grant 31871370), Science and Technology Program of Guangzhou (grant 202002030429), and Advanced Medical Technology Center, The First Affiliated Hospital, Zhongshan School of Medicine, Sun Yat-sen University (to H.Z.); China Postdoctoral Science Foundation (grant 2021M700936), and Natural Science Foundation of Guangdong (grant 2019A1515011342) (to S.W.). P.H holds a non-stipendiary research fellowship at St Edmund’s College, University of Cambridge. J.E.L is funded by the wellcome trust under the clinical PhD programme.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.
- 13.
- 14.
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.
- 34.
- 35.↵
- 36.↵
- 37.
- 38.
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.
- 45.
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.
- 58.
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.
- 105.
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵
- 113.↵
- 114.↵
- 115.↵
- 116.↵
- 117.↵
- 118.↵
- 119.
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵
- 126.↵
- 127.
- 128.↵
- 129.↵
- 130.↵
- 131.↵
- 132.↵
- 133.↵
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.
- 144.↵
- 145.↵
- 146.↵
- 147.↵
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵