Morphogenetic coupling leads to pattern emergence in the presomitic mesoderm

Pattern formation in development has been principally studied in tissues that are not undergoing extensive cellular rearrangement. However, in most developmental contexts, gene expression domains emerge as cells re-arrange their spatial positions within the tissue, providing an additional, and seldom explored, level of complexity to the process of pattern formation in vivo. To investigate this issue, we addressed the regulation of TBox expression in the presomitic mesoderm (PSM) as this tissue develops in zebrafish embryos. Here, cells must differentiate in a manner that leads to well-defined spatial gene expression domains along the tissue while undergoing rapid movements to generate axial length. We find that in vivo, mesoderm progenitors undergo TBox differentiation over a broad range of time scales while in vitro their differentiation is simultaneous. By reverse-engineering a gene regulatory network (GRN) to recapitulate TBox gene expression, we were able to predict the populationlevel differentiation dynamics observed in culture, but not in vivo. In order to address this discrepancy in differentiation dynamics we developed a ‘Live Modelling’ framework that allowed us to simulate the GRN on 3D tracking

plored, level of complexity to the process of pattern formation in vivo. To investigate this issue, we addressed the regulation of TBox expression in the presomitic mesoderm (PSM) as this tissue develops in zebrafish embryos. Here, cells must differentiate in a manner that leads to well-defined spatial gene expression domains along the tissue while undergoing rapid movements to generate axial length. We find that in vivo, mesoderm progenitors undergo TBox differentiation over a broad range of time scales while in vitro their differentiation is simultaneous. By reverse-engineering a gene regulatory network (GRN) to recapitulate TBox gene expression, we were able to predict the populationlevel differentiation dynamics observed in culture, but not in vivo. In order to address this discrepancy in differentiation dynamics we developed a 'Live Modelling' framework that allowed us to simulate the GRN on 3D tracking data generated from large-scale time-lapse imaging datasets of the developing PSM. Once the network was simulated on a realistic representation of the cells' morphogenetic context, the model was able to recapitulate the range of differentiation time scales observed in vivo, and revealed that these were necessary for TBox gene expression patterns to emerge correctly at the level of the tissue. This work thus highlights a previously unappreciated role for cell movement as a driver of pattern formation in development.
As an embryo takes shape, two processes must occur in perfect harmony: cell fate decision making and cell movements that drive morphogenesis. At the tissue level, these two processes appear ordered and predictable leading to the emergence of well-defined gene expression domains throughout the embryo. While the role played by gene regulatory networks (GRNs) and morphogens in patterning the embryo has been extensively studied in multiple contexts (1), much less is known about the role of cell movements during this process. Here, we aim to determine how extracellular signals, cell-intrinsic gene-regulatory networks and cell movements act together to generate stable TBox expression domains along the zebrafish presomitic mesoderm (PSM). Within this tissue, cells transition from a tbxta positive progenitor state, upregulate tbx16 as they become specified to the pre-somitic mesoderm, and then switch to a tbx24 postive state as they enter the PSM prior to somitogenesis. Both FGF and Wnt signalling are known regulators of TBox gene expression within the zebrafish PSM (2)(3)(4). Response of the TBox GRN to Wnt and FGF inputs must be tightly coordinated in space, as PSM maturation ultimately directly links with oscillations of Notch signal activity which are known to be important for somitogenesis (5). To assess how TBox gene expression dynamics, Wnt/FGF signal pathway activity, and cell movements act together during PSM maturation we first obtained quantitative measures of each of these components.
To quantify the pattern of TBox expression across the PSM, we performed in situ Hybridisation Chain Reaction (HCR) stains at the 20 somite stage for tbxta, tbx16 and tbx24 mRNA, that show the form spatially well-defined domains along the PSM ( Figure 1A-B; D-G). In order to only capture the PSM expression of tbxta, the notochord and notochord progenitors were masked prior to analysis to exclude them from the measurements ( Figure S1A-D). The normalised mean expression of each of the three TBox transcription factors was plotted on a normalised posterior to anterior axis, between 0 at the posterior end of the PSM and 1 at the posterior boundary of the most recently formed somite (Fig. 1C). Expression was normalised across multiple embryos between 0 and 1, with 0 representing the lowest mean expression, and 1 the highest mean expression. This describes a transition from an initial domain of tbxta/tbx16 co-expression in the PSM, to only tbx16, followed by only tbx24 expression in the anterior-most end of the tissue.
We next sought to quantify the spatial distribution of Wnt and FGF signalling activity, as these signals have been demonstrated to have a role in spatiotemporal regulation of somite formation. Wnt/TCF signalling was spatially quantified using a 7XTCF promoter fusion to a GFP reporter (6), with HCR performed against the gfp mRNA to give an immediate readout of the reporter activity ( Figure 1H). FGF activity was quantified using an antibody against diphosphorylated ERK, as a readout of FGF ligand binding the FGF receptor ( Figure 1I). These signalling profiles across the PSM were plotted along the same normalised axes as the TBox transcription factors and show a posterior to anterior gradient of Wnt, followed by an overlapping domain of FGF ( Figure 1J) While these profiles describe the gene expression domains' spatial arrangements, it is important to note that the rate of cell rearrangement varies significantly along the PSM. In particular, previous studies have characterised this tissue as undergoing a fluid-to-solid transition close to the tbx16/tbx24 expression boundary (7,8), where cell rearrangements quickly go from being extensive to practically non-existent. As a consequence of extensive cell mixing in the posterior progenitor zone, individual cells must somehow control their differentiation dynamics to match the timing of entry into the anterior PSM. To directly visualise the position at which the fluid-tosolid transition occurs, a number of photolabels were placed in the posterior, mid and anterior PSM using the photoconvertible nuclear protein, KikumeGR (9-11) ( Figure 1K-M). Over the (Continues on the next page) Figure 1: (From the previous page) Using HCR, the domains of gene expression for the three TBox transcription factors (A) Tbxta and (B) Tbx16 and Tbx24 (also known as Tbx6) were identified. (C) These were quantified along a normalised PSM length where 0 is posteriormost PSM and 1 is the posterior boundary of the most recently formed somite. Normalised expression is taken by mean intensity of each HCR signal divided by the maximum mean value from multiple embryos. Line plotted is the mean fitted curve taken from multiple fixed samples. (D-G) Zoom in of multiplex HCR showing all three TBox transcription factors in the posterior as a 3D rendering of a confocal image and 2µm slice. Wnt signalling was quantified using (H) a 7XTCF::GFP reporter with HCR used against gfp mRNA to give an immediate readout of transcription. (I) FGF signalling was quantified using an antibody against diphosphorylated ERK and (J) plotted on the same normalised axis. Using (K) small round labels placed along the length of the PSM using a photoconvertible nuclear protein, Kikume-GR, the degree of label spread (L-M) can be observed over time. Most spread is observed in the posterior label, with the least spread in the mid-and anterior label. (N) labels placed in the most posterior region of the PSM demonstrate that (O-P) the label spreads significantly over 90 minutes with the (P) first cell entering a newly formed somite in 3 hours. Plots in C and J show trend line compiled from 9 embryos. Labels in K-M representative of 2 labelled embryos. Labels in N-P representative of 4 embryos. course of two hours, the posterior label becomes fully intermixed with unlabelled cells, whereas the middle and anterior labels remain unmixed, showing that these labelled cells travelled anteriorly as a single cohort (Supplementary Movie 1). Labels such as these can also me used to infer the minimum length of time taken by a mesoderm progenitor to differentiate into mature mesoderm and form a somite. We placed smaller posterior labels in the progenitor zone of a 20 somite stage zebrafish embryo and measured how long before the first labelled cell reached a somite. The fastest cell from these posterior labels reproducibly entered the most recently formed somite in three hours, while the rest of the cells remain spread throughout the PSM ( Figure 1N-P) revealing that progenitors differentiate into mature PSM over a broad range of time scales. We assume that the longest time period over which a cell can take to differentiate is at least six hours as a small proportion of labelled cells are still located in the progenitor domain many hours after labelling. Together, these labels provide the range of timescales over which a cell can transition through the PSM (Figure 2A).
If we assume that the fastest cell traverses the PSM at constant speed, taking the most direct route from posterior to anterior in three hours ( Figure 2B), we can translate space into time and approximate its gene expression and signalling dynamics from the previously obtained quantitative gene expression and signalling profiles ( Figure 1C; 1J) where the normalised PSM length on the x-axis now represents three hours of development. Our labelling experiments have shown, in agreement with previous studies, that cells spend varying amounts of time in the progenitor region, and progress towards the somites at a relatively constant pace once they have entered the PSM. To account for the slower differentiation dynamics of cells which remain longer within the posterior tailbud, we assume that they experience the high Wnt, low FGF and express high tbxta/tbx16 levels while they are in the tailbud ( Figure 2D). Using these generalised gene expression and dynamics, we set out to reverse engineer a GRN model to address the question of how spatio-temporal coordination of cell fate decisions may be regulated in this system.
We formulated a GRN model of TBox transcription factors and their external signals Wnt and FGF in terms of a dynamical system (See Materials and Methods). Gene products of the three TBox genes considered (tbxta, tbx16 and tbx24) constitute the network's nodes and are represented by the state variables of the system, while the interactions between the genes and from the external signals (Wnt and FGF) are represented by model parameters ( Figure 2C).
As we have seen, mesoderm progenitors take anywhere between three and six hours to make their way into a somite from the tailbud, expressing a stereotypical progression of TBox genes ( Figure 1A-C; Figure 2A-B). A suitable GRN will therefore be able to tune the onset of TBox differentiation according to differing lengths of time spent in the tailbud. We used a Markov Chain Monte Carlo (MCMC) algorithm to infer candidate networks which recapitulate the range of TBox gene expression dynamics observed in vivo. Out of 100 independent optimisation runs, 56 networks were able to recapitulate the expression dynamics of the full spectrum of differentiation timescales observed in progenitor cells ( Figure 2D). All 56 networks share the sign of more than half of their parameters (13/24), and pairwise comparison between the networks shows that on average they differ by three interactions. Although the networks do not form clear distinct clusters ( Figure S2) they resemble each other in topology. We decided to select a network for further study by filtering based on the sign of three network interactions that are very well-established in the literature: the activation of tbx24 by Tbx16 (2,12), the activation of tbx16 by FGF (3,13) and the repression of tbx24 by Wnt (12) ( Figure 2C, Table S1).
In order to understand how the inferred network controls the onset of differentiation in response to Wnt and FGF signalling dynamics we turned to phase space analysis. The network has been formulated as a non-autonomous dynamical system to accommodate the signalling dynamics, which causes the underlying phase portraits to also be dynamic. We use instantaneous phase portraits (14,15) at every time point to characterise how the changing dynamical regime shapes the cells' trajectories (Supplementary Movie 2-5). We find that regardless of the time spent in the progenitor state (tailbud) the progression of dynamical regimes is always the same: cells start at a high tbxta/tbx16 stable steady state, then a bifurcation at Wnt = 0.98 and FGF = 0.22 annihilates this steady state, releasing the trajectory which makes its way towards low tbxta/tbx16 values, before converging towards a newly appeared stable steady state at high tbx24 values. According to the model, the variation in differentiation timescales between different cells stems from the different amounts of time spent at the initial stable steady state, which is sustained by the initial concentrations of (high) Wnt and (low) FGF. However, the model also predicts that this steady state will disappear, and therefore that differentiation will begin, upon exposure of the cells to small amounts of FGF (FGF = 0.36) while still in a high Wnt signalling environment (Wnt = 0.98), suggesting that the observed signalling dynamics are not strictly required for the correct up-regulation of tbx24 ( Figure 2E) once a relatively low threshold of FGF has been met. This threshold in the embryo is already encountered by cells at 11% posterioranterior (the tip of the tailbud being 0% and the most posterior somite boundary 100%) position.
Note that a 11% posterior-anterior position is well embedded in the tailbud region.
The observation that posterior progenitor cells should differentiate into tbx24 positive paraxial mesoderm in the absence of exposure to the full dynamics of Wnt and FGF signals is surprising given the known role of FGF in regulating PSM development and maturation. To test this prediction experimentally, we made use of an in vitro culture method that has previously been shown so support the autonomous oscillations of the segmentation clock (16). We explant the posterior 30% of the progenitor zone at the 18 somite stage and cultured them as single cells in a fully defined L15 medium without any signal supplementation ( Figure 3E-F). Our assumption  (From the previous page.) (C) Using these timescales and patterns of gene expression in space, a reverse engineered gene regulatory network was produced. This model was demonstrated to reproduce (D) the gene expression pattern across a range of timescales between the fastest and slowest assumed time scales. The signalling dynamic of (E) FGF is shown to be essential for initiating the differentiation of progenitors to tbx24 postitive, with only a small increase in FGF, up to 36% of the maximum in vivo, required to initiate differentiation is that once removed from their signalling environments, cells' Wnt and FGF levels will remain constant, and that, in line with model predictions, only those cells that have already experienced the FGF threshold will differentiate. Those cells below the threshold of FGF will remain in a progenitor state. To determine the activity of Wnt and FGF pathways in dissociated cells we first performed qPCR against axin2 and sprouty4 ( Figure 3A Figure  3H) qPCR profiles for both tbx16 and tbx24, particularly over the first four hours post dissociation with more discrepancy at six hours, possibly due to increased cell death. If we now turn to look at single cell expression dynamics in the populations of in silico-dissected tailbud progenitors, the population-level model predicts, in agreement with the model of the generalised cell, that a bi-modal distribution will emerge reflecting the fact that some cells differentiate (upregulate tbx24 and down-regulatetbx16), namely those that have been exposed to FGF levels above the threshold, while others don't (do not up-regulate tbx24 and maintain tbx16), namely those that haven't been exposed to the minimum FGF threshold, and that those that do differentiate will do so synchronously ( Figure 3I; 3K). This same synchronous differentiation was We hypothesised that the cell movements driving PSM elongation might themselves regulate when cells surpass the minimal FGF threshold required for differentiation -thereby creating a feedback between tissue morphogenesis and gene regulatory networks from which pattern at the tissue level could emerge. To explore this idea, we next aimed to determine how our GRN model would predict Tbox gene expression when simulated in the real morphogenetic context of PSM developement. We generated 3D cell tracking datasets in which the movements of individual cells within PSM are followed for three hours ( Figure 4A). Using multi-photon imaging, the tracks of single cells could be generated using automatic nuclei tracking, with a track ac-   In addition, we propose that the fluid-to-solid transition (8) might be responsible for the observed heterogeneity in the progenitor domain, causing a minority of differentiated cells to be transiently pushed backwards as the PSM solidifies. Therefore, tissue morphogenesis acts as a direct regulator of pattern formation, both in terms of enabling coordinated differentiation towards the anterior, as well as generating the observed gene expression heterogeneity within posterior progenitors. This heterogeneity might play a role in the maintenance of uncommitted progenitors, or as our work seems to suggest, might not be functional.
Across the presomitic mesoderm (PSM), a posterior to anterior gradient of FGF has been demonstrated to coordinate the spatiotemporal organisation of PSM cells into somites in the anterior, by interactions with Notch Signalling (16,(18)(19)(20)(21). In addition to this, FGF has also been demonstrated to regulate the early EMT process which generates the mesenchymal population of progenitor cells within the posterior of the PSM. This mesenchyme has also been demonstrated to be responsible for elongating the embryonic axis (3,7). Finally FGF signalling has been demonstrated to regulate cell fate decisions of mesodermal progenitor cells (3,12). FGF is therefore able to regulate cell movements, tissue morphogenesis and also cell fate decisions, however how one signal is able to be accurately interpreted in such a variety of ways within a single tissue remains unclear. Our model proposes that the function of FGF in regulating PSM differentiation may be limited only to the posterior most region of the PSM, perhaps allowing for a spatial compartmentalisation of its function to allow for multiple different cell responses to a single signalling profile. Alternatively, single cell dynamic profiles of FGF pathway activity may be essential for providing more precise interpretation to control multiple aspects of cell function (22) . Further work is required to incorporate precise spatial manipulation of FGF activity across the PSM, within the context of simultaneously monitoring the impact on cell movements through the use of Live Modelling approaches.
The observation that cells in vitro differentiate simultaneously is highly reminiscent of the differentiation of stem cells when responding to external signal sources in culture. Here too, cells often undergo direct differentiation, without experiencing the range of dynamic differentiation profiles that we have seen are required to generate gene expression patterns within a 3D morphogenetic context. Our results provide insight into the manner in which the cell movements of morphogenesis themselves control the temporal regulation of differentiation, via displacing cells relative to the sources or sinks of extracellular signals. Such a mechanism represents an example of downward causation (23) (information flow from the tissue to the single cell level i.e. in the global rate of tissue morphogenesis and accompanying cell rearrangements) and illustrates its central role patterning the embryo. Further investigation will be key in order to improve our understanding of how gene expression patterns emerge within developing multi-cellular systems, and will help identify which control parameters, at various levels of biological organisation, must be modified and how in order to reliably bioengineer organoids for therapeutic applications.

In Situ Hybridisation Chain Reaction (HCR)
Embryos were raised to the required stage then fixed in 4% PFA in DEPC treated PBS without calcium and magnesium at 4 • C overnight. Embryos were then stained using HCR following the standard zebrafish protocol found in (26). Probes, fluorescent hairpins and buffers were all purchased from Molecular Instruments. After staining, samples were counterstained with DAPI and mounted under 80% glycerol.

Immunohistochemistry
Embryos were raised to the required stage then fixed in 4% PFA in DEPC treated PBS without calcium and magnesium at 4 • C overnight. Embryos were then blocked in 3% goat serum in 0.25% Triton, 1% DMSO, in PBS for one hour at room temperature.

Imaging and Image Analysis
Samples were imaged using a Zeiss LSM700 inverted confocal at 12 bit, 20X or 40X magnification, with an image resolution of 512x512 pixels. Single cell HCR was imaged using a Nikon Ti inverted widefield microscope at 63X magnification.
Image analysis of confocal images was done using the line drawing tool on Fiji (27,28) set to a width of 50 pixels. Lines were drawn following the curve of the embryo, through the centre of the PSM from posterior PSM to the posterior most clear somite boundary. Profiles were normalised to the length of the PSM and signal intensity as individual embryos by dividing the measured value by the maximum value of that embryo.
Nuclear segmentation of whole embryos stained using HCR was conducted using a tight mask applied around the DAPI stain using Imaris (Bitplane) with a surface detail of 0.5µm.
Touching surfaces were split using a seed size of 4µm. Values were exported as X, Y, Z coordinates relative to the posteriormost tip of the PSM where X, Y, Z were equal to (0, 0, 0). The PSM was then segmented by hand by deleting nuclear surfaces outside of the PSM, including notochord, spinal cord, anterior somites and ectoderm. Only the PSM closest to the imaging objective, therefore of highest imaging quality was measured with the distal PSM also removed.
Intensity mean values of each transcription factor HCR signal were exported and normalised between 0 and 1 by dividing each cell's mean signal intensity by the maximum measured within that sample, per gene. PSM length was normalised individually between 0 and 1 by division of the position in X by the maximum X value measured in each embryo.
Single cell image analysis was conducted using Imaris (Bitplane) by generating loose surface masks around the DAPI stain to capture the full nuclear region and a small region of cytoplasm. Surface masks were then filtered to remove any masks where two cells joined together or small surfaces caused by background noise, or fragmented apoptotic nuclei. The intensity sum of each channel was measured and normalised by the area of the surface, as surface area and transcript intensity had been demonstrated to correlate. Expression level was then normalised between 0 and 1 using the maximum value measured for each gene, in each experiment.
Live imaging datasets of the developing PSM was created using a TriM Scope II Upright 2-photon scanning fluorescence microscope equipped Insight DeepSee dual-line laser (tunable 710-1300 nm fixed 1040 nm line). Embryo was imaged with a 25X 1.05 NA water dipping objective. Time step and frame number as per figure legend. Embryos laterally in low melting agarose with the entire tail cut free to permit normal development (29).

Model formulation
We formulated the Tbox gene regulatory network using a dynamical systems formulation. The models aim is to recapitulate the dynamics of Tbox gene expression for any cell, or rather a general cell, in the developing zebrafish PSM. We use a connectionist model formulation previosly used to model other developmental patterning processes (30).
The mRNA concentrations encoded by the Tbox genes tbxta, tbx16 and tbx24 are represented by the state variables of the dynamical system. For each gene, the concentration of its associated mRNA a at time t is given by g a (t). mRNA concentration over time is governed by the following system of three coupled ordinary differential equations: where R a and λ a respectively represent the rates of mRNA production and decay. φ is a sigmoid regulation-expression function used to represent the cooperative, saturating, coarse-grained kinetics of transcriptional regulation and introduces non-linearities into the model that enable it to exhibit complex behaviours: where G = {tbxta, tbx16 , tbx24 } is the set of Tbox genes while S = {Wnt, FGF} is the set of external regulatory inputs provided by the Wnt and FGF signalling environments. The concentrations of the external regulators g s are interpolated from quantified spatial mRNA expression data ( Figure 1J) and translated into time as explained in the main text to used as dynamic inputs to the model. Changing Wnt and FGF concentrations over time renders the parameter term s∈S E sa g s (t) time-dependent and therefore the model non-autonomous (14,15).
The interconnectivity matrices W and E house the parameters representing the regulatory interactions among the TBox genes, and from Wnt and FGF to the Tbox genes, respectively.
Matrix elements w ba and e sa are the parameters representing the effect of regulator b or s on target gene a. These can be positive (representing an activation from b or s onto a), negative (repression), or close to zero (no interaction). h a is a threshold parameter representing the basal activity of gene a, which acknowledges the presence of regulators absent from our model.
Model parameters are detailed in Table S1.

Model fitting and selection
We reverse-engineered values for parameters R a , λ a , W , E, and h a by fitting the model to the inferred gene expression dynamics of a cell that took 6 hours to enter a somite from an initial progenitor state. Priors were set on 4 of the parameters: parameters representing autoactivations must be positive as should be the parameter representing the regulation of tbxta from Wnt, to accommodate the ample evidence supporting the nature of this interaction in the literature (4). No other constraints were imposed on the parameters.
Markov Chain Monte Carlo was used to infer the parameter values that best recapitulated the experimental data. We used the emcee package for Python and ran the fitting algorithm independently 100 times. Each run returned an estimated 'best' a parameter set. The 100 parameter values were then simulated, compared to the data and re-simulated for cells that took 3, 4 and 5 hours respectively to undergo differentiation across the PSM from a progenitor state.
Of the 100 initial networks, 56 recapitulated accurately the gene expression dynamics all the different scenarios.
We selected a network for our analysis by further imposing the sign of three interactions which are strongly supported by the literature. These are the activation of tbx24 by Tbx16 (2,12), the activation of tbx16 by FGF (3,13) and finally the inhibition of tbx24 by Wnt (12) . These interactions weren't initially imposed as priors in order to not further bias the fitting, but were introduced at this stage to help with model selection to avoid choosing a network with interactions that might be knowingly wrong in sign. It is well known that networks with different topologies are often able to generate the same dynamics (31), making it difficult to differentiate between models without the aid of further evidence.
Out of these 56 networks, only one satisfied these three conditions imposed a posteriori: network 11. In order to investigate how similar and therefore representative C11 is compared to the other 56 networks we used clustering of the 56 networks ( Figure S2) demonstrating that they do not cluster well, and in fact are all broadly similar to one another.

Model analysis
The gene regulatory network model was formulated as a system of coupled ordinary equations.
As such, it is amenable be analysed using the established tools and concepts from dynamical systems theory which we use to elucidate the mechanisms driving the observed dynamics (31) In addition, our system is rendered non-autonomous by the changing signalling Wnt and FGF environments, which we accommodate by calculating instantaneous phase portraits (14,15).

Live modelling framework
In brief, our live-modelling framework simulates the gene regulatory network previously in-