## ABSTRACT

Economical efficiency has been a popular explanation for how networks organize themselves within the developing nervous system. However, the precise nature of the economic negotiations governing this self-organization remain unclear. We approach this problem by combining high-density microelectrode array (HD-MEA) recordings, which allow for detailed characterization of the ongoing extracellular electrical activity of individual neurons *in vitro*, with a generative modeling approach capable of simulating network formation. The best fitting model uses a homophilic generative wiring principle in which neurons form connections to other neurons with similar connectivity patterns to themselves. This homophily-based mechanism for neuronal network emergence accounts for a wide range of observations that are described, but not sufficiently explained, by traditional analyses of network topology. Using rodent and human monolayer and organoid cultures, we show that homophilic generative mechanisms account for the topology of emerging cellular functional connectivity, representing an important wiring principle and determining factor of neuronal network formation *in vitro*.

## INTRODUCTION

During mammalian brain development, neuronal networks show remarkable self-organization that gives rise to complex topological properties, including a greater-than-random clustering and modular community structure^{1,2}, hierarchies^{3}, heavy-tailed connectivity distributions^{4,5}, and richly interconnected hubs^{6,7}. These distinctive characteristics likely endow neuronal networks with robustness and the capability to support dynamic functional computations^{8,9}.

Neuronal network development can be characterized across multiple scales^{10,11}. At the cellular level, neurons form computational units within circuits. Here, the role of individual neurons can be determined by a combination of factors, such as their laminar location, connectivity, neurochemical sensitivities and morphology^{12}. During embryonic development, a series of spatiotemporal-defined genetically-hardwired programs regulate the expression of cell-type specific recognition molecules to initiate axonal and dendritic outgrowth, which ultimately leads to formation of synapses^{13–15}. Although there is now a large body of evidence on the mechanisms of specific guidance cues during circuit formation^{16}, linking this knowledge to explain the emergence of complex topological features remains challenging.

At the whole-brain level in humans^{17}, connectivity between brain regions can be inferred via diffusion tensor imaging (DTI)^{18,19}, as myelinated axonal connections, or functional magnetic resonance imaging (fMRI)^{20}, as correlated patterns of activity. Inferring connectomes from fetal brains *in utero*^{21}, or from preterm infants^{22}, have confirmed the early presence of organizational hallmarks, such as hubs, a rich-club architecture and a modular small-world organization. Building on such architectures, studies demonstrate that the functional role and organization of brain regions later on is shaped by their inter-regional connectivity and that a region’s inputs during development influences its functional specialization^{23}. This principle allows brain regions to undergo a spatially-organized functional shift, for example, from distinct sensory and motor systems to more integrated connections with association cortices, likely supporting an acceleration in cognitive development^{24}.

There is growing evidence that some key organizational properties of nervous systems are conserved across scales, and, in some cases, across species^{10,11,25–29}. Nervous systems both at the macro- and micro-scale, for example, have been shown to entail a canonical pattern of small-worldness^{30,31}, a rich-club topology^{32–34} and a modular structure^{1,2,35}. These complex organizational hallmarks allow for functional hierarchies, in which distinct segregated modules perform specialized local computations reflecting basic representational features of incoming signals, while their intermediary nodes integrate those signals to code for a more complex representation of the incoming signal^{36}.

One prominent explanation for these consistent organizational hallmarks is that they reflect the economics of forming and maintaining connections^{37–39}. Given finite available resources, trade-offs between incurred costs (e.g. material, metabolic) and functional value have to be made continually by all distributed units to ensure optimal network function. In this view, ideas, such as Peters’ rule^{40}, which suggest that synaptic contacts simply occur if neurons are close enough in space and if their axons and dendrites overlap, are not sufficient to explain the existence of a connection^{41,42}. Although spatial embedding clearly has an impact on network architecture^{43}, costly features, such as long-range connectivity hubs, likely exist because they confer some additive functional value beyond the cost of its formation and maintenance^{37}. Principles apply not only to the connectivity between brain regions, but also to the cellular and subcellular level^{9,17,44}.

If an economic trade-off represents an important principle that guides network development, then *what are the specific mechanisms that determine the outcome of that trade-off*? Furthermore, are these mechanisms conserved across spatial scales? Advances in generative network models (GNMs) provide a formal way of testing competing mechanistic accounts of network formation^{45–57}. These computational models simulate the probabilistic formation of networks over time under specific mathematical rules. For example, recent work at the whole-brain level has shown, in a large dataset of neurodiverse children, that structural inter-regional connectivity can be simulated with a GNM^{46} which uses a simple economic wiring equation balancing connection costs with topological value^{45–47}. These two components together define the probability of connections forming iteratively over time. However, the extent to which such models reflect underlying biological processes remains unclear due to the indirect nature of *in vivo* imaging.

An alternative approach to studying neurodevelopmental mechanisms is to culture cellular networks *in vitro*. This allows for the tracking and modeling of the very early events of network formation. High-density microelectrode arrays (HD-MEAs) provide a *direct* window into the cellular processes that occur as individual neurons form connections, and in time, complex functional connectivity networks^{58}. Today’s HD-MEAs contain large numbers of densely packed electrodes, capable of longitudinal measurements of the extracellular electrical activity of developing neuronal networks at very high temporal and spatial resolution^{59,60}. The spontaneous electrical activity, inherent to developing neurons *in vitro*, resembles the patterns of activity that accompany key neurodevelopmental processes occurring during prenatal and early postnatal development *in vivo*, including the refinement of synaptic connectivity^{61}.

In the current study, we translate prior GNM research at the level of inter-regional connectivity^{45–52} to the microscale (functional connectivity between single neurons) to test whether generative wiring principles are recapitulated across spatial scales. Using HD-MEAs, we acquired electrophysiological network recordings of the spontaneous extracellular neuronal activity from populations of developing primary cells (PCs) derived from dissociated embryonic rodent cortices, three different lines of human induced pluripotent stem cell (iPSC)-derived neurons and more mature sliced human embryonic stem cell (hESC)-derived cerebral organoids (hCOs). Facilitated by the high spatial recording resolution of the used HD-MEAs (center-to-center electrode distance of 17.5μm; >3000 electrodes/mm^{2}), we resolved the activity of several thousands of individual neurons and tracked the development of emerging functional connectivity over several weeks.

Using large-scale HD-MEA datasets acquired from developing rodent and human neuronal networks, we comprehensively compared the performance of different GNMs, and probed whether they can account for emerging neuronal network topology over development. We explicitly examine the effect of neuronal plating density on topology and – by introducing a novel way of considering the local hallmarks of regional connectivity (coined *topological fingerprints*) – test whether GNMs are capable of recapitulating the local organizational properties of the observed biological neuronal networks. Finally, by blocking GABA_{A} receptors, we probe whether GABAergic signaling impacts wiring stochasticity in the network. We argue that homophilic wiring mechanisms may reflect a fundamental principle shaping the development of neuronal networks *in vitro*, and are likely to extend across scales and species in which local structures are refined via activity-dependent interactions over time.

## RESULTS

### Tracking developing neuronal networks at single-cell resolution

The datasets of this study comprised recordings of rodent and human neuronal networks that were plated and maintained using established protocols^{60,62}. Our primary analysis focused on primary rat embryonic day (E) 18/19 cortical cultures (PCs), which were plated at two different plating densities (sparse PC networks: 50,000 cells per well, n=6 cultures; dense PC networks: 100,000 cells per well, n=12 cultures) and used to follow neuronal network development across several weeks *in vitro*. We also analyzed human induced pluripotent stem cell (iPSC)-derived neuron/astrocyte co-cultures, containing predominantly glutamatergic, dopaminergic and motor neurons (plated at a density of 100,000 cells per array), and sliced human cerebral organoids (hCOs; n=6 slices). For the complete details of the datasets analyzed in the study, see **Methods; Cell lines and plating protocol**; a summary of all the data used is provided in

**Supplementary Table 1**.

High-density microelectrode arrays^{59} (HD-MEAs, 3.85 × 2.10 mm^{2} sensing area; 26,400 electrodes; 17.5 μm center-to-center electrode pitch) were used to record the emerging spontaneous neuronal activity of neuronal cultures and to track developing rodent PC neuronal networks at single-cell resolution. We acquired whole-array activity scans to localize the neurons (**Figure 1b**), and then selected up to 1024 readout electrodes, configured into 4×4 electrode high-density blocks (**Figure 1d, e**), at the respective recording start points (i.e., days *in vitro* (DIV) 7 for the sparse PCs and DIV14 for the dense PCs networks). Recordings with the same electrode configuration were acquired also at the consecutive recording time points and concatenated for spike sorting^{63}. Spike-sorted network data allowed us to assign the extracellular electrical activity to individual neurons and to follow single neurons across development (**Figure 1e**; see also **Methods; Spike-sorting and post-processing**). The developed approach allowed us to follow single neurons over several weeks

*in vitro*(on average 243±37 for the sparse PC networks and 125±29 units for the densely plated PC networks; mean±S.D.;

**Supplemental Figure 1**). In line with previous works, we found that PC neuronal networks developed robust network burst activity (

**Figure 1c, g**) and that the firing rate of tracked units increased significantly in the first weeks of development (repeated measure analysis of variance (RANOVA): F(3,12)=7.02, p=5.62×10

^{−3}; n=6 sparse PC networks;

**Figure 1f**).

To infer functional connectivity among neurons statistically and to characterize neuronal network development of tracked neurons over time, we computed the Spike Time Tiling Coefficient (STTC)^{64} – a robust measure of pairwise correlations between spike trains (**Figure 1h-j**, see **Methods; Functional connectivity inference**). Overall STTC increased with development (F(3,12)=11.82, p=6.77×10

^{−4}, 50k PC cultures;

**Figure 1i**), as did the network density of inferred functional connectivity graphs (F(3,12)=11.08, p=8.97×10

^{−4}, n=6 sparse PC networks;

**Supplementary Figure 2a**). The probability of inferred STTC connections decayed with inter-neuronal distance (

**Supplemental Figure 2b**).

### Generative network models of functional neuronal networks in vitro

Following functional connectivity inference, we set out to describe the topology of these networks using graph theory, which provides a mathematical framework for capturing the topological properties of each node within the network, and the network as a whole. In **Figure 2a** we highlight three common topological measures (nodal degree, clustering, and betweenness centrality) and one geometric measure (edge length) that we will use throughout the current paper (see **Methods; Network statistics** for more details

^{7}). In

**Figure 2b**we show how these statistics allowed us to compute the node-wise statistics for each functional connectivity graph, and to establish the distribution of different statistics across the network.

Although graph theoretic measures provide a way to mathematically formalize the topology of networks, they do not provide an explanation as to what topological attachment principles may have shaped network development. To do this, we tested a number of candidate wiring rules that may best explain the self-organization of cellular-level functional connectivity graphs over time. This was done using a generative network model, which was previously used to probe whole-brain network organization^{45–49}. Generative network models develop *in silico* networks according to an economic trade-off, in which new connections are iteratively formed depending on both the modeled costs and values (**Figure 2c**). The generative algorithm is expressed as a simple wiring equation^{36–38}, which is updated over time:
where the *D*_{i,j} term represents the “costs” incurred between neurons modeled as the Euclidean distance between tracked neurons *i* and *j* (**Supplementary Figure 3**). The *K*_{i,j} term represents how single neurons *i* and *j* “value” each other, given by an arbitrary topological relationship which is postulated *a priori* (also termed, “wiring rule” given mathematically in **Supplementary Table 2**). *P*_{i,j} reflects the probability of forming a fixed binary connection between neurons *i* and *j*. This is proportional to the parametrized multiplication of costs and values. Two wiring parameters, η and γ, respectively parameterize the costs and value terms, which calibrate the relative influence of the two terms over time. We detail the generative network algorithms used in **Methods; Generative network model**. By iterating through different

*K*

_{i,j}terms and wiring parameter combinations, we can formally assess how variations in the generative model give rise to synthetic networks, which are statistically similar to those experimentally observed (

**Figure 2d**). To assess this similarity, the first test comes in the form of an energy equation

^{36–38}, which computes the Kolmogorov-Smirnov (KS) distance between the observed and simulated distributions of individual network statistics. It then takes the maximum of the four KS statistics considered so that, for any one simulation, no KS statistic is greater than the energy: where

*KS*is the Kolmogorov-Smirnov statistic between the observed and simulated networks at the particular η and γ combination used to construct the simulation, defined by the network degree k, clustering coefficient c, betweenness centrality b and Euclidean edge length d. These four measures are critical statistical properties of realistic networks, have been used in prior GNM research to benchmark model fits

^{45–47,56}, and have featured within well-documented simulated network models

^{65–67}.

For each empirical network, we simulated 20,000 networks across a wide parameter space (with η and γ limits −10 to 10) across 13 wiring rules (**Supplementary Table 2**), for each network across all available time points. In the present study, we used this wide parameter space as there is little prior work guiding our choice of parameters; we also did not select a seed network. A Voronoi tessellation procedure^{47} was used as the parameter selection algorithm (see **Methods; Parameter selection** for details).

### Homophilic wiring principles underpin developing rodent neuronal networks in vitro

Previous studies employing generative models of human macroscopic structural brain organization have shown that generative rules based on homophilic attachment mechanisms can achieve very good model fits^{45–47,49} (although see^{50}). *But what do homophily-based wiring rules essentially entail?* Homophily-based wiring prioritizes the wiring of nodes preferentially to those with overlapping connectivity patterns (e.g. via neighborhoods or connectivity profiles). For example, under a matching generative model^{46,47}, if two nodes have a large proportion of their connections with the same nodes, they will have a correspondingly high matching score because they have similar connectivity profiles. This matching score is *homophilic*, because the measure is defined in terms of similarity (the Greek *homós*, “same”) and preference (*philia*, “liking”). To test what generative models can best simulate microscale connectivity, we applied the generative procedure to inferred cellular functional connectivity graphs.

We first investigate the sparse (50,000 cells per well) PC rodent networks at days *in vitro* (DIV) 7, 10, 12 and 14. As previously shown (**Figure 1**), PC rodent networks underwent significant developmental changes during this time period. Yet, despite large topological changes (**Supplementary Figure 2**), generative models utilizing the homophilic attraction principle as their generative mechanism produce networks with the lowest energy, compared to all other rules, consistently beyond 10 days (**Figure 3a**, ANOVA DIV7 p=0.943, DIV10 p=5.18×10^{−3}, DIV12 p=1.72×10^{−12}, DIV14 p=6.32×10^{−11}; all pairwise comparisons after DIV10 with homophily, Cohen’s *d*=0.747 reflecting a large effect size; **Supplementary Table 3**). The single best performing homophily model, according to the energy equation, was the ‘matching model’ (see **Supplementary Table 2** for detail), which generates network topology according to the overlapping connectivity between nodes (**Supplementary Figure 4**). In **Supplementary Figure 5** we provide a null distribution of energy values that were fitted to density- and size-matched random graphs, to show this finding is specific to the topology of the network at DIV14.

The matching generative model, beyond providing the lowest energy values (i.e. very good model fits compared to other models), also produced synthetic networks whose aggregate nodal distributions were statistically indistinguishable from the experimentally observed networks (**Figure 3d**). We formally demonstrated this using a Monte-Carlo bootstrapping procedure^{68} which directly compares the statistics produced by the well-performing simulations with the observations (degree, p_{rank}=0.410; clustering, p_{rank} =0.505; betweenness, p_{rank} =0.800; edge length, p_{rank} =0.075; for detail of this procedure, see ** Methods; Cost functions**). In

**Supplementary Figure 6**we provide the same bootstrapping analysis, but for each of the best performing generative models of each model class (spatial, clustering average and degree average models). Of note, the best performing non-homophily model was the degree average model, which also produced statistically indistinguishable results when compared to the experimental observations.

Next, we asked *how well generative models approximate the time-course/trajectories of neuronal network formation*. An advantage of the generative modeling approach is that it allows one to decompose the developmental trajectory. Indeed, if networks are developing according to a homophilic attachment principle then the statistical properties of those simulated trajectories should vary in accordance to our longitudinal observations. To test this, we computed and compared the trajectories of two global network measures of segregation (modularity index Q^{2}) and integration (global efficiency^{69}) across the time-course of sparse rodent PC network development. Of note, these measures were both not included in the energy equation (**Equation 2**). Measures of segregation and integration capture important aspects of how efficient information can be processed across the network^{70}. Next, we selected the best fitting model at DIV14, and decomposed the simulated trajectories up to that point. This allowed us to test whether these simulated trajectories were consistent with the earlier longitudinal observations at DIV7, 10 and 12.

To do this, we compared each of the longitudinal observations (DIV7, DIV10, DIV12 and DIV14) to the simulation at the corresponding time-point of the DIV14 developing simulation (i.e. DIV7, 50%; DIV10, 71%; DIV12, 86% and DIV14, 100%). **Figures 3e, f** shows the developmental trajectories for modularity and global efficiency, both unspecified in the energy equation, along with the overlaid observed time-points. Simulations using the homophily generative model clearly captured the same developmental trend for modularity (decreasing over time) and efficiency (increasing over time) and accounted for a substantial amount of variance in both metrics (modularity: R^{2}=50.9%, r=0.713, p=9.19×10^{−5}; efficiency: R^{2}=79.8%, r=0.893, p=4.36×10^{−9}).

### Effect of plating density on topology and generative model outcome

So far, we quantified functional connectivity graphs derived from sparse PC neuronal cultures (50,000 cells plated per well). Despite some research on the effect of plating density on the emergence of population activity *in vitro*^{71}, synaptic strength and connectivity^{72}, there is currently no consensus as to how different plating densities affect neuronal topology. As one critical element of the generative network model is the geometric spacing between neurons, we next probed whether our findings in sparse cultures generalize to networks at higher neuronal plating densities. We, therefore, recorded a second independent dataset of more densely plated rodent PCs (100,000 neurons per well, n=12; see **Methods; Cell lines and plating protocol**) in the exact same way as outlined for the first PC rodent dataset and directly compared both densities at DIV14.

Interestingly, we found that key topological properties of the networks differed as a function of plating density, such that the sparser plated PC networks showed lower network density (Mann-Whitney U test; p=9.70×10^{−3}), efficiency (p=0.0245), edge lengths (p=1.30×10^{−3}), and matching (p=0.0320), but greater small-worldness (p=1.30×10^{−3}) as compared to the dense PC networks (**Figure 4a** and **Supplementary Figure 7**). Despite these changes across several topological metrics, the global correlational structure of these statistics remained stable (**Figure 4b**).

Given the topological differences across sparse/dense plating densities, we then asked, whether this also translated in significant changes in the energy values among the 13 tested generative network models. In **Figure 4c** we show that model energy is unaffected by plating density (p=1.96×10^{−21}, all comparisons to homophily Cohen’s *d*=1.59). In **Supplementary Figure 8**, we show this broken down by each individual model, in addition to showing that this remains when considering greater numbers of well-performing parameter combinations. All statistical comparisons, for each time-point in the dense PC networks, are presented in **Supplementary Table 4** showing stability (not only at DIV14, but also at DIV21 and 28). In **Figure 4d** we show the energy landscape for both plating densities, which, again, are very similar.

### Topological fingerprints arise from homophilic mechanisms in developing neuronal networks in vitro

The results presented so far show that homophily-based generative models produce synthetic networks which are statistically similar to observed functional rodent PCs networks. However, this similarity depends upon the maximum *KS* distance of the four topological statistics as defined in the energy equation. Crucially, this means that while experimentally observed and simulated network statistical distributions mirror each other at the *global* level, the *topological fingerprint* (TF) of these network statistics could differ. That is, nodes within simulated and observed networks could have different *local* relationships to one another, because node-wise local organizational properties are not captured per se by the existing energy equation. Local organizational properties have previously been investigated in terms of how well generative models can recapitulate the locations of organizational features, such as hub-nodes, in the *C. elegans* connectome^{73} or MRI-inferred human brain networks^{46,49,50}.

For example, consider the topological relationship between central and peripheral nodes commonly found in the classical representation of a brain network. Nodes which score highly in centrality measures (e.g. *betweenness centrality* – which determines how many shortest paths pass through – as shown by the red node in **Figure 5a, left**) tend *not* to sit within segregated modules, meaning it is common that they concurrently score low in measures of segregation (e.g. clustering coefficients, in which neighbors connect to each other) – and *vice versa* for peripheral nodes (as shown by the green node in **Figure 6a, left**). This means that when correlating measures of centrality with measures of clustering across a network, the correlation tends to be negligible or negative^{36} (**Figure 5a, right**).

To assess the ability of generative models to capture these types of local relationships in settings with no anatomical reference space (as neurons are randomly distributed on the HD-MEAs), we provide a very simple cost function, here termed *topological fingerprint dissimilarity* (*TF*_{dissimilarity}). The *TF*_{dissimilarity} demarcates the ability of *in silico* network simulations to recapitulate observe local hallmarks of organization, and is defined as:
*TF* is defined by the n-by-n correlation matrix of n local network statistics for the observed network (*TF*_{observed}) and its corresponding (simulated) network (*TF*_{simulated}). The *TF*_{dissimilarity} is subsequently equivalent to the Euclidean norm^{74} of the difference between observed and simulated topological correlation matrices. Here, we use six common measures of topology to compute the *TF* matrix (see **Methods; Cost functions** for detail).

If homophily is a plausible attachment mechanism by which single neurons together form networks, we should expect homophily-based GNMs to produce networks with a local topological structure resembling the observed data. To probe this (dis)similarity, we calculated the *TF*_{dissimilarity} between each experimentally inferred functional connectivity graph and the best performing simulated network (according to the energy equation), for each of the 13 generative models, across all recording time points (**Figure 5**).

**Figure 5a** provides a schematic of how *topological fingerprints* (TFs) were constructed. Results demonstrate that synthetic networks generated with homophilic attachment rules provide the lowest *TF*_{dissimilarity} from DIV10 onwards (**Figure 5b**). Homophilic rules resulted in the statistically smallest *TF*_{dissimilarity} (e.g. at DIV14; homophilic rules compared to degree rules: p=3.48×10^{−4}, Cohen’s *d*=1.55; homophilic compared to clustering rules: p=3.80×10^{−9}, Cohen’s *d*=2.33). Of note, homophily and spatial rules could be distinguished significantly at DIV12 (p=0.0106) but not at other time-points (e.g. at DIV14; p=0.157). All statistical findings are provided in **Supplementary Table 5**. Replicate analyses in the dense PC rodent dataset at DIV14, 21 and 28 provided almost identical results (see **Supplementary Table 6** and **Supplementary Figure 9**). The top panel in **Figure 5c** shows the experimentally observed *TF* matrix (averaged over n=6 sparse PC networks); the lower panel shows average TF matrices for the matching, clustering-average, degree-average and spatial generative models. Depicted are the best performing models within their generative rule class.

Our results highlight the importance of assessing GNM simulation performance both in terms of overall global topology (*energy*) and in terms of the local topology generated (*topological fingerprint, TF*). We find that homophily models concurrently outperform the other models on both fronts (**Figure 5d**, see **Supplementary Figure 10** for a replication analysis in the dense rodent PC dataset).

### Effect of GABA_{A} receptor antagonism on generative network wiring

GABAergic interneurons act as network hubs^{75–77} regulating the synchronization of spontaneous activity that is critical for formation of connections and plasticity throughout development as well as proper brain function later on^{75,77–79}. Here, we examined whether the perturbation of GABA_{A} receptors would alter formation of functional connectivity, and thereby the outcome of the generative models. To examine this, we cultured sparse rodent PC cultures (50,000 cells per well) under chronic application of gabazine, a selective GABA_{A} receptor antagonist (see **Methods; Pharmacological experiments**).

If activity-dependent mechanisms, and spontaneous activity more generally, serve to form functional networks according to wiring principles such as homophily, functional GABAergic interneurons may therefore be key to the implementation of homophily throughout development. By chronically blocking GABA_{A} receptors, relative to the control cultures, we hypothesized that either homophily would subsequently fail to be implemented, or its implementation would change, leading to differences in network topology. The first hypothesis would be supported by higher energy levels of homophilic rules in gabazine cultures relative to control homophilic rules or compared to other generative rules. The latter hypothesis would be supported by changes in the wiring parameters of a well-fitting homophilic model.

In **Figure 6a** we show representative population activity plots for control and gabazine-treated cultures. Whilst activity levels of control cultures were comparable to previous studies^{80}, chronic gabazine application led to a reduction of the overall size and density of networks, as well as multifaceted changes in their spiking patterns (**Supplementary Figure 11a**). This included an increase in burst rate (Mann-Whitney U, p=0.0238; see **Methods; Firing rate and burst statistics**) and a decrease in the variability in interburst intervals (IBIs) as shown by a reduction in the coefficient of variation (CV) of IBIs (p = 0.0238;

**Supplementary Figure 11b**). This is consistent with the notion that GABA

_{A}receptors regulate spiking activity and synchrony between neurons in the network, which was disrupted by GABA

_{A}receptor blockade. The global topology of their subsequently inferred functional connectivity graphs was also affected by Gabazine (

**Supplementary Figure 11c**). For example, control cultures exhibit more connections (p=0.0238) and greater small-worldness (p=0.0238).

Crucially, despite these aforementioned alterations in cellular activity and network topology, homophily generative attachment rules remained the best fitting models for both gabazine and control cultures, showing no statistical difference (p=0.450; **Figure 6b** and **Supplementary Figure 12a**). Despite this similarity in generative model outcome, we find that in gabazine cultures, the γ parameter (which varies the extent to which homophily influences the probability score) is significantly decreased (p=2.16×10^{−3}, Cohen’s *d*=2.27; **Figure 6c, Supplementary Figure 12b-c**). This finding supports our second hypothesis, suggesting that, while there is a conservation of generative mechanisms despite chronic GABA_{A} receptor antagonism by gabazine, wiring parameters are changed in the direction that weakens homophilic attachment mechanisms on the development of topology.

Next, we sought to investigate the direct effect of this decreased γ parameter on simulated development. As network development is based on dynamically updating stochastic processes, wiring parameters influence the extent to which outcomes are determined. This is because as wiring parameters trend towards zero, the probability score distributions (calculated dynamically within the wiring equation, *P*_{ij}) flatten, meaning that the subsequent trajectory of developmental connectivity becomes less certain (see **Methods; Generative probability distributions**). In

**Figure 6d**we show empirically that GABA

_{A}antagonism leads to a flattening in probabilistic wiring (mean comparisons over development; p=1.94×10

^{−55}, Cohen’s

*d*=0.452;

**Supplementary Figure 12d-e**) leading to a necessarily more aberrant, random, neuronal network topology. This is because network wiring becomes more evenly distributed across the network between larger numbers of possible neurons with less specificity, rather than being specific to a smaller number of candidate neurons that are deemed particularly valuable to wire with.

### Generative wiring principles are preserved in human neuronal cultures in vitro

Finally, we probed whether homophilic wiring principles can be generalized across neuronal networks derived from other species and cultures composed predominantly of specific cell types (see below). In order to test this, we first applied GNMs to human iPSC-derived neuron/astrocyte co-cultures and analyzed these networks at DIV28, a time point at which cultures reach a state of relative maturity^{60}. Note, this data was not tracked and electrodes selected based on the activity scan at DIV28. The human dataset comprised glutamatergic neurons (GNs, n=8), motor neurons (MNs, n=7), and dopaminergic neuronal cultures (DNs, n=6). In addition, we also studied slice cultures derived from 4-month-old human embryonic stem cell-derived cerebral organoids (hCOs, n=6 slices; **Figure 6a**; see **Methods; Human induced pluripotent stem cell-derived neuronal cultures** and

**Methods**). Previous studies have indicated that hCOs develop functional networks with increasing complexity as early as 90 days

*; Human cerebral organoid slice cultures**in vitro*

^{81,82}.

**Figure 7a** provides an overview on the human and rodent neuronal electrophysiological data. Following a t-distributed stochastic neighbor embedding (tSNE) analysis, we find that networks can be clustered according to their overall spike train dynamics (i.e. autocorrelograms derived from the aggregated spike train activity of each network) and group according to the respective cell lines. Representative examples of the observed differences in population activities across different human neuron cultures are depicted in **Figure 7d**. **Figure 7b** shows immunohistochemical stainings of a DIV21 human iPSC-derived DN culture expressing neuronal and astrocytic markers (MAP2, GFAP, and TH) and **Figure 7c** shows stainings for hCOs slices (Tau, NeuN, and GFAP), both derived from control experiments. As for the rodent PC cultures, we constructed functional connectivity graphs for all human neuron cultures, and assessed key connectivity metrics and topology (**Figure 7e-f**). Human iPSC-derived neuronal networks did not differ significantly in average STTC (ANOVA, p=0.0912), however, we found a significant difference in network density (p=1.02×10^{−4}) and topological metrics, such as the small-world index (p=1.51×10^{−4}).

Despite differences in observed network burst dynamics (**Figure 7a, d**), we find that homophily achieves the best model fits across all human iPSC lines, followed by degree wiring rules, clustering, and spatial wiring rules (**Figure 7h**, left; all statistical findings are provided in **Supplementary Table 7 and 8**). The comparison between homophily and the next best model class (degree) remains statistically significant with moderate to large effect sizes across all cell lines (GNs (p=1.16×10^{−5}, Cohen’s *d*=0.881), MNs (p=9.48×10^{−5}, Cohen’s *d*=0.821), and DNs (p=0.0219, Cohen’s *d*=0.642). There was also a significant difference between all generative models for the hCOs (**Figure 7f**, right; p=6.88×10^{−5}), but no significant difference between model classes (compared to homophily, all p>0.159). Still, both homophily models achieved the lowest energy in hCO networks (neighbors median energy = 0.230; matching median energy = 0.238), followed by the degree-average (median energy = 0.245) and degree-maximum (median energy = 0.245) models. The spatial model performed worst (median energy = 0.400). In **Supplementary Figure 13** we provide a depiction of each generative model’s performance for all monolayer and organoid cultures.

In summary, generative network models based on homophilic wiring mechanisms provide the best candidate explanation for network formation *in vitro* functional connectivity graphs derived from human neuronal cultures relative to the models we have examined. While results in more mature monolayer networks mirror the findings in rodent PCs, results in hCOs are as yet inconclusive, likely due to the observed variability in organoid functional connectivity.

## DISCUSSION

In the current study we applied HD-MEA large-scale electrophysiological recordings to track and characterize single-unit functional connectivity as neural networks develop *in vitro*. Moreover, we systematically tested which candidate topological attachment mechanisms could explain this developing self-organization, using generative network modeling to create *in silico* models of network formation. Across multiple different cell types (e.g. glutamatergic, dopaminergic, and motor neurons), two different species (rodent and human neurons), two plating densities, and various time scales, we show that the homophilic attachment principle^{45} provides a good explanation for the formation of complex networks *in vitro*.

In line with previous work, we found that functional connectivity increased with development, and that developing neuronal networks *in vitro* exhibited canonical characteristics of complex network architecture^{58,83}. A central aim of our work was to examine which attachment rules would best account for this developing topology. At the macroscopic level, a consistent result in whole-brain connectivity research has been that homophily-based generative models best recapitulate the topology of human brain connectomes^{45–49}. Here, we show that the same is true for the functional connectivity graphs of developing neuronal networks *in vitro*. Crucially, by combining multiple ways of estimating model fits, we show that homophilic wiring principles can not only account for the distributions of topological statistics at the *global* level, but also for the complex *local* organization of these statistics, as quantified by their *topological fingerprints*.

*Why does homophily offer a dominant account of topology, beyond alternative models?* One possible explanation is that homophily is, by definition, a preference for similarity to *oneself* implying that it is a *locally-knowable* computation undertaken by distributed nodes. This is likely critical, as any generative mechanism by which complex neurobiological networks develop are likely to emerge from the interactions between its local components over time^{23} – without any central mechanism aiming to optimize its global network properties. Instead, the network likely arises as a function of the sensed inputs available to each node, such as via spatial proximity or its communicable topology^{84} as investigated in the present study. Importantly, this communication may extend to other means of (local or non-local) cell-to-cell interactions^{85}, such as via para-, juxta- or endocrinological signaling, that together guide neural circuit construction *in vivo*^{86,87}.

A second potential explanation is that a homophily heuristic – much like in social networks^{88,89} – enables each part of the network to interact with its local environment without requiring inordinate computational resources. Indeed, homophily has been shown to provide an efficient trade-off capable of producing navigable small-world networks^{90,91}. Under this view, as limits to *local-knowledge* and *computational capacity* hold for any interacting developing system, homophily becomes a generative heuristic for any sufficiently large network. Notably, this resonates with accounts of Hebbian learning^{92,93} and spike-timing dependent plasticity (STDP)^{94} whereby neurons wire with each other as a function of *similarity*^{95} to themselves (e.g. concurrent or temporally precedent neuronal firing, respectively) provided that neurons are sufficiently close in space.

The application of generative modeling allows us to use graph theory and *in silico* simulations as a lingua franca to probe micro-connectomic self-organization^{10}. Comparative studies have examined economic accounts of connectomic organization across different species^{17} – such as in the worm *C. elegans*^{32,73,96}, larval zebrafish^{29}, mouse^{97}, macaque^{56} and human connectome^{45–52}. For example, Nicosea, *et al.* ^{73} modeled the growth of *C. elegans* using the known birth times of its somatic neurons – finding that as the body of the animal progressive elongates, that the cost of longer-distance connections become increasingly penalized. In humans, Oldham, *et al.* ^{49} incorporated known early changes in brain macroscopic geometry and other physiological measures of homophily (e.g. correlated gene expression) to improve an additive generative model’s network embedding (also see^{50}). These works have highlighted the benefit of incorporating specific developmental changes, that are specific to the organism, within a growth model that can simulate developmental outcomes. Our present work shows that homophilic generative models *per se* are appropriate growth models for *in vitro* neuronal networks as they were capable of recapitulating key statistical properties - both at the local and global level. However, as noted in prior GNM studies^{46,49}, a significant future advance will come from weighted generative network models capable of recapitulating weighted topological architectures. Such an approach would allow for both the tuning of connection weights over developmental time – a clear principle of network maturation^{96} – but also enable further study of how developing network topology, genetics^{49,50} and information processing^{56,98} together explain neuronal network organization across scales.

A key advantage of our *in vitro* neuronal network modeling approach is that it can be perturbed under varying experimental conditions, testing how specific cellular mechanisms may change the outcome of generative models in a predictable fashion. In line with previous findings^{79}, the inhibition of GABA_{A} receptors led to increased neuronal synchronization^{99}. Within our wiring equation, the two wiring parameters η and γ influence to what extent costs and topological homophily, respectively, shape wiring probabilities. Importantly, this wiring is probabilistic – the greater in magnitude the wiring parameters, the more that networks are “determined” by their economic negotiations. In the gabazine condition, the homophily generative model provided an equivalently good model fit, suggesting a conservation of the predominant generative mechanism. However, gabazine did significantly push the homophily γ parameter closer to zero. One interpretation is that increased synchronicity leads to decreased specificity in neuronal wiring preferences, simply because synchronicity leads to decreased differences between neurons in terms of their activity. As such, a decreased γ within a well-fitting homophily generative model simply reflects a decreased capacity for specific homophilic wiring. This explanation would suggest that probabilistic network wiring becomes less determined in any condition that decreases specificity between neurons. Interestingly, previous GNM work at the whole-brain scale has shown that lower magnitude wiring parameters are associated with poorer cognitive scores^{46}, age^{46,47} and a diagnosis of Schizophrenia^{45,48}. This may suggest convergent evidence for how developmental randomness, intrinsic to how developing parts interact with each other, may influence functional outcomes^{100}. A remaining challenge in the field is to be able to directly parse the extent to how stochasticity versus specific economic trade-offs may, together or independently, influence network outcomes under different conditions. Future methodological work should explore how various multiplicative^{45–47} or additive^{49} GNMs may be used to understand these shaping factors.

Importantly, it has to be noted that there is currently no consensus as to how functional connectivity can be inferred from the spontaneous activity of neurons developing *in vitro*^{101}. In the present study, we utilized the spike time tiling coefficient (STTC^{64}) which was developed to improve on some of the limitations of traditional metrics for the coupling between neurons, such as the correlation index^{102}. Nevertheless, future studies should probe whether statistically more sophisticated inference methods for functional or effective connectivity between neurons, such as transfer entropy-based methods^{103,104}, converge on the same GNM attachment rules. Ideally, these methods should be able to account for the burst activity observed *in vitro*^{105,106} and address potential confounds by temporal autocorrelations^{11}.

In conclusion, we find that the complex topology of developing rodent and human neuronal networks *in vitro* can be best simulated by a simple homophily generative model, where neurons aim to maximize locally-shared connectivity within an economic context. With this, and prior research at the macroscopic level in mind, we suggest that homophily wiring rules provide an adequate isomorphic explanation for any decentralized, locally-computing, developing system.

## METHODS

### High-density microelectrode arrays

Two types of CMOS-based high-density microelectrode array (HD-MEA) recording systems, produced by MaxWell Biosystems (Zurich, Switzerland), were used in the present study^{59,107}. The single-well HD-MEA MaxOne, consisting of 26,400 low-noise electrodes with a center-to-center electrode pitch of 17.5 μm, arranged in a 120 x 220 electrode array structure. This HD-MEA can record simultaneously from a total of 1024 (user-selected) readout-channels at 20 kHz; for more technical details see previous studies^{59,107}. The second recording system was the multi-well HD-MEA MaxTwo (MaxWell Biosystems, Zurich, Switzerland), comprising the same number of electrodes and readout-channels and electrode specifications as MaxOne for each well. With this system it is possible to simultaneously record from six wells at a time and at a sampling rate of 10 kHz. To decrease the impedance and to improve the signal-to-noise ratio (SNR), electrodes were coated with platinum black^{59}.

### Rodent primary cortical neuronal cultures

Before plating, HD-MEAs were sterilized in 70% ethanol for 30 minutes and rinsed three times with sterile water. To enhance cell adhesion, the electrode area of all HD-MEAs was treated with poly-D-lysine (PDL, 20 μL, 0.1 mg mL^{−1}; A3890401, Gibco, ThermoFisher Scientific, Waltham, USA) for 1 hour at room temperature and then rinsed three times with sterile water. Next, 10 μL Geltrex (A1569601, Gibco, 0.16 mg mL^{−1}) was pipetted on each array and again left for about one hour at room temperature. For the main analysis of the paper, we used rodent primary cortical (PC) neurons prepared as previously described^{43}. Briefly, cortices of embryonic day (E) 18/19 Wistar rats were dissociated in trypsin with 0.25% EDTA (Gibco), washed after 20 min of digestion in plating medium (see below), and triturated. Following cell counting with a hemocytometer, either 50,000 cells (sparse plating condition) or 100,000 cells were seeded on each array, and afterwards placed in a cell culture incubator for 30 min at 37°C/5% CO_{2}. Next, plating medium was added carefully to each well. The plating medium contained 450 mL Neurobasal (Invitrogen, Carlsbad, CA, United States), 50 mL horse serum (HyClone, Thermo Fisher Scientific), 1.25 mL Glutamax (Invitrogen), and 10 mL B-27 (Invitrogen). After two days, half of the plating medium was exchanged with growth medium containing 450 mL D-MEM (Invitrogen), 50 mL horse serum (HyClone), 1.25 mL Glutamax (Invitrogen) and 5 mL sodium pyruvate (Invitrogen). Across all experiments, the medium was then exchanged twice a week, at least one day before the recording sessions. All animal experiments were approved by the veterinary office of the Kanton Basel-Stadt, and carried out according to Swiss federal laws on animal welfare. A summary of the data used is provided in **Supplementary Table 1**.

### Human induced pluripotent stem cell-derived neuronal cultures

Three different human iPSC-derived neuronal cell lines were included in the study: iCell DopaNeurons, iCell Motor Neurons and iCell GlutaNeurons, all commercially available from FUJIFILM Cellular Dynamics International (FCDI, Madison, USA). All neural cells were co-cultured with human iCell Astrocytes (FCDI, see above). Cell plating: Cell plating medium consisted of 95 mL of BrainPhys Neuronal Medium (STEMCELL Technologies, Vancouver, Canada), 2 mL of iCell Neuronal Supplement B (FCDI), 1 mL iCell Nervous System Supplement (FCDI), 1 mL N-2 Supplement (100X, Gibco), 0.1 mL laminin (1 mg/mL, Sigma-Aldrich) and 1 mL Penicillin-Streptomycin (100X, Gibco). Neurons and astrocytes were thawed in a 37°C water bath for 3 minutes. The cells were then transferred to 50 mL centrifuge tubes, and 8 mL plating medium (at room temperature) was carefully added. Cell suspensions were centrifuged at 380 x g (1600 RPM) for 5 minutes, and the supernatant was aspirated. Cell pellets were then resuspended in plating medium and combined to achieve a final concentration of 10,000 neurons and 2,000 astrocytes per μL. Finally, 100,000 neurons and 20,000 astrocytes were seeded per HD-MEA by adding 10 μL of the prepared solution, after removing the Geltrex droplet. After incubating the cells for one hour at 37°C/5% CO_{2}, another 0.6 mL (small well MaxOne) / 1.2 mL (large well MaxOne) of plating medium was added. Half of the medium was changed twice a week.

### Human cerebral organoid slice cultures

Human embryonic stem cell (hESC)-derived cerebral organoids (hCOs) were generated from a commercially available hESC stem cell line (Takara Bio, Osaka, Japan), using the STEMdiff cerebral organoid kit (STEMCELL Technologies) following the manufacturer’s instructions. Slices were obtained from 120-day old hCOs. Single organoids were first transferred from maturation medium to ice-cold BrainPhys (STEMCELL Technologies) using cut 1000 μl pipette tips. Next, cross-sectional 500-μm-thick slices were cut from hCOs using a sterile razor blade and collected in petri dishes filled with BrainPhys medium at room temperature. Before the plating, HD-MEAs were sterilized in 70% ethanol for 30 minutes and rinsed 3 times with distilled water. To improve tissue adhesion, arrays were coated with 0.05% (v/v) poly(ethyleneimine) (Sigma-Aldrich) in borate buffer (pH 8.5, Thermo Fisher Scientific) for 30 minutes at room temperature, rinsed with distilled water, and left to dry. To attach hCOs on HD-MEAs, we applied a thin layer of Matrigel (Corning) to the center of the HD-MEA and then transferred individual organoid slices to the coated HD-MEAs. After positioning the tissue, we placed a tissue “harp” on top of the organoid slice and applied several drops of recording medium (STEMCELL Technologies, #05793) around the organoid. HD-MEAs were then covered with a lid and placed in a humidified incubator at 37°C, 5% CO_{2}/95% air for 30 minutes, before adding more medium to a final volume of 2 ml per chip. Half of the recording medium was changed every 2-3 days.

### Immunohistochemistry

Rodent PC neurons were stained as previously described^{42}. Briefly, PC neurons were fixed using 4% paraformaldehyde solution (ThermoFisher, #FB001). Samples were permeabilized and blocked using a PBS 10X (ThermoFisher, #AM9625) solution containing 10% normal donkey serum (NDS) (Jackson ImmunoResearch, West Grove, USA, #017000001), 1% bovine serum albumin (BSA) (Sigma-Aldrich, 0 5482), 0.02% Na-Az (Sigma-Aldrich, #S2002) and 0.5% Triton X (Sigma-Aldrich, #93443). Permeabilization facilitated antigen access to the cell, while blocking prevented non-specific binding of antibodies to neurons. Primary and secondary antibodies were diluted in a PBS solution containing 3% NDS, 1% BSA, 0.02% Na-Az and 0.5% Triton X. The used antibodies are also listed in **Supplemental Table 9**. Note, immunohistochemistry was performed on control PC cultures prepared as previously outlined^{62}.

Human iPSC-derived neurons were fixed using 8% PFA solution (#15714S, Electron Microscopy Sciences) and blocked for 1 hour at room temperature (RT) in blocking buffer containing 10% normal donkey serum (NDS) (Jackson ImmunoResearch, West Grove, USA, #017-000-001), 1% bovine serum albumin (BSA) (#05482, Sigma-Aldrich), and 0.2% Triton X (Sigma-Aldrich, #93443) in PBS (ThermoFisher Scientific, #AM9625). Primary antibodies (**Supplementary Table 9**) were diluted in a blocking buffer and incubated overnight at 4°C. Samples were washed three times with 1% BSA in PBS and incubated with the secondary antibody (**Supplementary Table 9**) diluted in blocking buffer for 1 hour at RT. After three additional washes with PBS, DAPI was added for 2 min at RT (1:10000). Images were acquired using the Opera Phenix Plus High-Content Screening System (cat. HH14001000, PerkinElmer, Waltham, MA, USA).

hCOs were fixed using 4% paraformaldehyde (PFA) for 4 hours at room temperature, washed with PBS and immersed in 30% sucrose solution at 4 °C overnight. PFA-fixed organoids were embedded in OCT compound (Sakura Finetek, Alphen aan den Rijn, Netherlands, #4583) and stored at −80 °C. 10 μm sections were cut on a cryostat and collected on Superfrost plus slides (Thermo Scientific, #22-037-246). For immunohistochemistry, sections were permeabilized in 0.1% Triton X-100 and blocked with animal-free blocker (Vector Laboratories, Burlingame, CA, USA, #SP-5030-250). Slides were incubated with primary antibodies for 1 hour at room temperature. Sections were washed in PBS and further incubated with secondary antibodies for 1 hour at room temperature. After washing with PBS, sections were incubated with PureBlu DAPI (Bio-Rad, Hercules, CA, USA, #1351303) for 3 minutes and mounted with ProLong Gold antifade mounting medium (Thermo Scientific, #P36930). Fluorescence images were acquired with a SP8 confocal microscope (Leica, Wetzlar, Germany). The primary and secondary antibodies used for hCO stainings are listed in **Supplementary Table 9**.

### Scanning electron microscope imaging

Fresh tissue samples were fixed in 2.5% glutaraldehyde solution (Sigma-Aldrich, St. Louis, USA) overnight. After fixation, the samples were dehydrated in ascending acetone series (50%, 70%, 80%, 90%, 95%, 100%), and critically point dried (CPD; Quorum Technologies, West Sussex, UK), using CO_{2} as the substitution fluid. The procedure is generally suited for SEM preparation and ensures that surface structures of animal tissue samples are preserved in their natural state, i.e. without shrinkage, distortion or dissolution. After CPD, specimens were carefully mounted on aluminum stubs using double sticky carbon-coated tabs as adhesive (Plano, Wetzlar, Germany). Thereafter, they were coated with gold-palladium in a sputter device for 45 seconds (Bio-Rad SC 510, Munich, Germany). SEM analyses were carried out with a Zeiss Digital Scanning Electron Microscope (SUPRA 40 VP, Oberkochen, Germany) in SE2 mode at 5-10 kV.

### Electrophysiological recordings

In order to track the development of functional connectivity of *in vitro* neuronal networks on HD-MEAs, we performed weekly recordings, starting one week after plating. In order to select a network recording configuration, we performed whole-array activity scans, i.e., series of 1-minute long high-density recordings, covering all 26,400 electrodes of the HD-MEA, using the MaxLab Live software (MaxWell Biosystems). To select recording electrodes, we estimated the multi-unit activity for each electrode using an online sliding window threshold-crossing spike-detection algorithm (window length: 1,024 samples; detection threshold: 4.5 × the root mean squared error (RMSE) of the noise of the 300-3000 Hz band-pass filtered signal). After the activity scan, we selected up to 1024 readout-electrodes, based on the detected average activity and a ranking of the inferred, average amplitude values. Additional high-density network recordings, consisting of 4 x 4 electrode blocks (17.5 μm pitch), were acquired for the tracking experiments (see below). The duration of the HD-MEA network recordings was about 30 minutes; an overview on the different datasets is provided in **Supplemental Table 1**. The PC neuronal network and the hCO data were acquired by MaxTwo multi-well plates (MaxWell Biosystems); the human iPSC-derived neurons (glutamatergic, motor and dopaminergic neurons) were recorded on single-well MaxOne HD-MEAs (MaxWell Biosystems).

### Pharmacological experiments

Pharmacological experiments with the GABA_{A} receptor blocker gabazine (SR 95531 hydrobromide, Sigma-Aldrich, #104104509), were performed on sparse (50,000 per well) primary cortical (PC) neuronal cultures. Three cultures were treated with 1 μM gabazine one day after plating and tracked until DIV14; media+gabazine exchanges were performed 2-3 times per week.

### Spike-sorting and post-processing

All HD-MEA network recordings underwent an initial quality control to assess the overall noise level and signal stability of each recording. Next, we used the software package Kilosort 2 (KS2)^{63} to spike sort data, applying default parameters. For the developmental tracking analyses, we concatenated all recordings (i.e. DIV7, 10, 12, and 14 for the PC cultures at 50k plating density, and DIV14, 21 and 28 for the PC cultures plated at 100k per well). After spike sorting, we inferred array-wide spike-triggered averages (STAs) for all units labeled as ‘good’ by KS2. Next, we calculated the spatial similarity between all detected units/STAs to minimize the influence of potential cluster splits that might have occurred during spike sorting of bursty spontaneous activity. The spatial similarity among the inferred array-wide templates was probed by the normalized pairwise maximum cross-correlation: units/STAs that showed a similarity *r* >0.75 and had at least 5 electrodes in common underwent an iterative elimination process using a simple clustering heuristic^{108}. Please see **Supplementary Table 1** for a summary of the data sets used in this study, and **Supplemental Figure 1** for the number of trackable units for both datasets.

### Firing rate and burst statistics

Firing rates across each neuronal unit were calculated as the total number of spikes per unit time (in seconds) in the entire recording. Array values were calculated as the mean across all active units (firing rates >0.01 Hz). Burst rates were calculated using a maximum interspike-interval (ISI) method^{109} based on the ISI between every Nth spike (ISI_{N})^{110}. The ISI_{N} threshold for determining the onset/offset of bursting activity was determined by finding the local trough in the bimodal logISI distribution (see **Supplemental Figure 11b**). The two peaks, at short ISIs and long ISIs represent more high frequency bursting and regular activity, respectively. The coefficient of variation (CV) of interburst intervals (IBIs) was calculated as the standard deviation of IBIs relative to the mean IBI in a given neuronal unit; the array value was the mean of this across all neuronal units.

### Functional connectivity inference

To detect pairwise correlations in spike trains, here referred to as functional connectivity, we computed the spiketime tiling coefficient (STTC)^{64}. The STTC aims to mitigate potential confounding in basic correlation indices introduced by different firing rates, by quantifying the proportion of spikes in one train which fall within ±Δt (the synchronicity window) of another. It is given by:
where T_{A} is the proportion of total recording time which lies within ±Δt of any spike from A (T_{B} is calculated similarly). P_{A} is the proportion of spikes from A which lies within ±Δt of any spike from B (P_{B} is calculated similarly). The synchronicity window, Δt, is the only free parameter in the STTC calculation. In the present study, we used a Δt=10 ms. A visualization of the STTC calculation is provided in **Figure 1h**; STTC was calculated using publicly available Matlab code^{81}. We used permutation-based testing to determine the significance of connections. For a given neuronal unit’s spike train, spike times were randomly jittered by ±10ms to create a surrogate spike train, using code provided by the Neural Complexity and Criticality Toolbox^{111}. This was repeated for each neuronal unit for 1000 permutations. To calculate significance of pairwise functional connectivity, experimentally inferred STTC values were compared to the distribution of surrogate SSTC values. A significance value of p < 0.01 was used as a cutoff to binarize functional connectivity matrices and calculate network related analysis throughout the manuscript; only units with firing rates >0.01 Hz were considered.

### Network statistics

In **Figure 2a** we provide a visualization of key graph theoretical metrics relevant for this study. Here we provide both a written and mathematical definition for each measure used. Each statistic was calculated using the Brain Connectivity Toolbox^{112}:

#### Degree

The degree is the number of edges connected to a node. The degree of node *i* is given by:
where *a*_{i,j} is the connection status between *i* and *j*. *a*_{i,j} =1 when link *i,j* exists (when *i* and *j* are neighbors); *a*_{i,j} = 0 otherwise (*a*_{i,j} = 0 for all *i*).

#### Clustering coefficient

The clustering coefficient is the fraction of a node’s neighbors that are neighbors of each other. The clustering coefficient for node *i* is given by:
where *c*_{i} is the clustering coefficient of node *i* (*c*_{i} = 0 for *k*_{i} < 2).

#### Betweenness centrality

The betweenness centrality is the fraction of all shortest paths in the network that contain a given node. Nodes with high values of betweenness centrality therefore participate in a large number of shortest paths. The betweenness centrality for node *i* is given by:
where *ρ*_{hj} is the number of shortest paths between *h* and *j*, and *ρ*_{hj}(*i*) is the number of shortest paths between *h* and *j* that pass through *i*.

#### Edge length

The edge length is the total edge lengths connected to a node. It is given by:
where *d*_{i,j} is the Euclidean distance between *i* and *j*. The Euclidean distances of functional connectivity graphs inferred in the present study are depicted in **Supplementary Figure 3**.

#### Global efficiency

The global efficiency is the average of inverse shortest path length. It is given by:

#### Matching

The matching index computes the proportion of overlap in the connectivity between two nodes. It is given by:
where *N*_{i/j} refers to neighbors of the node *i* excluding node *j*. Where global measures of matching have been used, we averaged across the upper triangle of the computed matching matrix.

#### Small-worldness

Small-worldness refers to a graph property where most nodes are not neighbors of one another, but the neighbors of nodes are likely to be neighbors of each other. This means that most nodes can be reached from every other node in a small number of steps. It is given by:
where *c* and *c*_{rand} are the clustering coefficients, and *l* and *l*_{rand} are the characteristic path lengths of the respective tested network and a random network with the same size and density of the empirical network. Networks are generally considered as small-world networks at σ>1. In our work, we computed the random network as the mean statistic across a distribution of n=1000 random networks. The characteristic path length is given by:

### Modularity

The modularity statistic, Q, quantifies the extent to which the network can be subdivided into clearly delineated groups:
where *m*_{i} is the module containing node *i*, and if *m*_{i} = *m*_{j}, and 0 otherwise.

#### Participation coefficient

The participation coefficient is a measure of diversity of intermodular connections of individual nodes, where community allocation was determined via a Louvain algorithm, with a resolution parameter γ = 1, which aims to form a subdivision of the network which maximizes the number of within-group edges and minimizes between group edges.

### Generative network modeling

The generative network model can be expressed as a simple wiring equation^{45–47}, where wiring probabilities are computed iteratively by trading-off the cost of forming a connection, against the value of the connection being formed in terms of a network topology term. Connections are added iteratively according to these wiring probabilities. It is given by the wiring equation as provided in **Equation 1**. The *D*_{i,j} term represents the “costs” incurred between neurons modeled as the Euclidean distance between tracked units (**Supplementary Figure 3**). The *K*_{i,j} term represents how neurons “value” each other, given by an arbitrary topological relationship which is postulated *a priori* (also termed, “wiring rule” given mathematically in **Supplementary Table 2**). *P*_{i,j} reflects the probability of forming a fixed binary connection at the current time step. The simulation continues until the simulated network has the same number of connections of the observed network. The *D*_{i,j} term remains constant during the simulation while the *K*_{i,j} term updates at each time point (and therefore also the *P*_{i,j} term). Since networks were sometimes fragmented during early developmental time points, we restricted calculations of topological metrics to the giant component of the network, i.e. the largest connected part of the network.

### Cost functions

In the present study, we make a distinction between simulated networks which mirror the statistical distributions of observed networks and those which mirror the topological organization of those statistics. The former can be accessed via a previously used energy equation^{46,47} whereby the model fit is given by the “worst” of the four *KS* distances assessed, given by **Equation 2**. *KS* is the Kolmogorov-Smirnov statistic between the observed and simulated networks at the particular η and γ combination used to construct the simulation, defined by the network degree *k*, clustering coefficient *c*, betweenness centrality *b* and Euclidean edge length *d*. Notably, the KS distance between two vectors simply considers their statistical *distributions*.

In **Supplementary Figure 6**, we further assess the ability of the best performing generative models in each class (spatial, matching, clustering average and degree average) to recapitulate network statistics as included in the energy equation, but also two measures outside (local efficiency and participation coefficient). We did this via a Monte-Carlo bootstrapping procedure^{68}. First, we took the top n=99 performing simulations for each rodent 50k PC culture’s model considered, and computed each of the six local statistics as shown in **Supplementary Figure 6** as cumulative density plots. For each statistic, we computed a KS statistic between the observed local statistics distribution and an average of the statistics of the 99 simulations. We then undertook 99 individual leave-one-out iterations in which we replaced a single simulation of the 99 with the observed distribution. For each of the 99 permutations, we computed the same statistic, forming a null distribution. We then calculated a p_{rank} by ranking how close the original observed statistic was to the mean of this computed null distribution (i.e. how close was the observation to the middle of the null). This was computed for each culture and statistic, for each of the considered generative models. We then quoted the median p_{rank} across cultures.

Later in the study, we provide an alternative but simple cost function which does not assess distributions of statistics, but instead assesses the *topological fingerprint dissimilarity* of these network statistics. The *topological fingerprint (TF)* matrix is calculated as a Pearson’s correlation matrix between each pair-wise combination of the local statistics. In our study, we used six common network statistics to form this correlation matrix, however, in principle, these can be extended to any number or range of local statistical measures. The construction of the *TF* is visualized in **Figure 5a**. The *TFdissimilarity* is then calculated as the Euclidean norm^{74} of the difference between the observed and simulated *TF* matrices. This is given in **Equation 3**.

### Parameter selection

We optimized η and γ using a Voronoi tessellation procedure as used in prior work^{47}. This procedure works by first randomly sampling the parameter space and evaluating the model fits of the resulting simulated networks, via the energy equation. As there is little prior literature that can be used to guide the present study, we considered a wider range of parameter values, with η values in the range from −10 to 10 and γ values in the range −10 to 10. Following an initial search of 4000 parameters in this space, we performed a Voronoi tessellation, which establishes two-dimensional cells of the space. We then preferentially sampled from cells with better model fits according to the energy equation (see^{47} for further detail). Preference was computed with a severity of α = 2 which determines the extent to which cell performance led to preferential sampling in the next step. This procedure was repeated a further four times, leading to a total of 20,000 simulations being run for each considered network across the 13 generative rules as described in **Supplementary Table 1**.

### Generative probability distributions

In **Figure 6d**, we show the mean probability score (*P*_{ij}) distributions within the generative models fit to gabazine and control networks. This was calculated by measuring the *P*_{ij} across all node pairs *i* and *j* in the network, in 1% intervals, before plotting the average distribution of *P*_{ij} across these timesteps. In **Supplementary Figure 12d-e**, we show each distribution of these probability distributions (that was averaged to provide comparisons in **Figure 6d**). Note that the probability score distribution flattening means there are more edges with higher probabilities of being connected, leading to decreased specificity of future wiring. This flattening effect is equivalent to the network outcomes being more random, which is a direct result of the homophily γ parameter having a decreased magnitude (as shown in **Figure 6c, right**).

### Code availability

Results were generated using code written in Matlab 2020b. All code is available at https://github.com/DanAkarca/MEA_generative_models

### Data availability

All data used in this study, along with documentation detailing each dataset, is openly available at https://zenodo.org/record/6109414#.Yid27y-l2J8

## AUTHOR CONTRIBUTIONS

DA, AWED, DEA & MS conceived the project and wrote the manuscript. DA, AWED and MS contributed to all analyses provided in the manuscript. MS ran the processing of all neuronal data, including spike-sorting and functional connectivity inference. DA computed the generative network models, topological analyses of networks, topological fingerprints. AWED computed cellular firing and bursting analyses. MS, PJH, SR & MF recorded all neuronal data provided in the study. OP, SBM, SE, PEV provided computational and physiology overview, including STTC and gabazine expertise. AH & MS provided the engineering overview, particularly relating to HD-MEA recording. CW and MT cultured and derived human cerebral organoids.

## COMPETING INTERESTS

SR is employed at MaxWell Biosystems AG, which commercializes HD-MEA technology.

## ACKNOWLEDGEMENTS

This work was supported by the European Union through the European Research Council (ERC) Advanced Grant 694829 ‘neuroXscales’ and the corresponding proof-of-concept Grant 875609 ‘HD-Neu-Screen’, by the two Cantons of Basel through a Personalized Medicine project (PMB-01-18), granted by ETH Zurich, the Innosuisse Project 25933.2 PFLS-LS, the Swiss National Science Foundation under contract 205320_188910 / 1 and a Swiss Data Science Center project grant (C18-10). Danyal Akarca and Alexander Dunn are supported by the Medical Research Council Doctoral Training Programme. Danyal Akarca is supported by the Cambridge Trust Vice Chancellor’s Award Scholarship. Duncan Astle is supported by Medical Research Council Program Grant MC-A0606-5PQ41. Both Duncan Astle and Danyal Akarca are supported by The James S. McDonnell Foundation Opportunity Award. Congwei Wang and Marco Terrigno are supported by Roche postdoctoral fellowship program. Petra Vertes is a fellow of MQ:Transforming Mental Health (MQF17_24).

We thank Dr Martin Oeggerli for contributing the serial section electron microscopy image (**Figure 1a**), and the IT department at the MRC Cognition and Brain Sciences Unit, Cambridge, as well as the HPC team at ETH Zurich, for assistance with high performance computing.

## REFERENCES

- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.
- 27.
- 28.
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.
- 52.↵
- 53.
- 54.
- 55.
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.
- 77.↵
- 78.
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵
- 106.↵
- 107.↵
- 108.↵
- 109.↵
- 110.↵
- 111.↵
- 112.↵