## Abstract

The pattern of synaptic connections among neurons defines the circuit structure, which constrains the computations that a circuit can perform. The strength of synaptic connections is costly to measure yet important for accurate circuit modeling. It has been shown that synaptic surface area correlates with synaptic strength, yet in the emerging field of connectomics, most studies rely instead on the counts of synaptic contacts between two neurons. Here we quantified the relationship between synaptic count and synaptic area as measured from volume electron microscopy of the larval *Drosophila* central nervous system. We found that the total synaptic surface area, summed across all synaptic contacts from one presynaptic neuron to a postsynaptic one, can be accurately predicted solely from the number of synaptic contacts, for a variety of neurotransmitters. Our findings support the use of synaptic counts for approximating synaptic strength when modeling neural circuits.

## Introduction

The wiring diagram of a neuronal circuit can be represented as a directed graph whose nodes are individual neurons. Edges represent every synaptic contact between the two nodes: the edge weight captures physiological connection strength. When reconstructed from volume electron microscopy, the edge weight is often derived from the number of synaptic contacts between two neurons: more contacts is assumed to imply a stronger connection. This approach has been used to build, on the basis of synaptic connectivity, models of neuronal circuit function. Such models are capable of predicting animal behavior with some accuracy [1, 2, 3, 4, 5, 6, 7].

The number of synaptic contacts of a neuron changes over time, due to both development and plasticity [8, 9, 10, 4, 11]. As the neuronal arbor grows, its absolute number of synaptic contacts increases, but the fraction corresponding to specific partner neurons remains constant [10, 4]. There-fore, synaptic input fractions may be preferred over synaptic counts to compute the edge weights that approximate connection strengths, to enable comparisons across cell types and developmental stages.

However, neither the synaptic count nor the synaptic input fraction can be guaranteed to accurately predict connection strength. Two identical edge weights, as derived from either synaptic counts or fractions, could correspond to two different physiological connection strengths. It could be the case that some input cell types make few large synaptic contacts, and others make many smaller ones, onto the same target postsynaptic neuron. Additionally, molecular and biophysical properties of both cells and the synapse itself could invalidate both counts and fractions for deriving edge weights.

Physiologically, synaptic strength has been measured in two ways. We can derive a joint measurement for all synaptic contacts between two neurons by exciting the presynaptic neuron and recording from the postsynaptic one. Alternatively, we can count the number of vesicles that are releasing neurotransmitter at individual synaptic clefts [12, 13, 14].

The best measurement of connection strength would be paired electrophysiological recordings for all possible pairs of neurons. However, paired recordings for all possible neuron pairs is infeasible for large circuits in practice. Firstly, there is the combinatorial explosion leading to prohibitive costs. Secondly, dissected brains have limited viability *ex vivo*: only a small number of trials would be possible with any single preparation. Finally, weak connections could be hard to detect, despite being physiologically important for e.g. subthreshold potentials [15, 16, 17], or integration across many simultaneous inputs [18, 19].

One step towards evaluating synaptic strength is the quantification of synaptic surface area, which is feasible in volume electron microscopy. Presynaptic active zones are visible in EM primarily due to electron-dense docked vesicles and their binding machinery, which represents the readily releasable pool (RRP) of neurotransmitter quanta [20]. The size of the visible active zone, therefore, correlates with the size of the RRP, which in turn relates to synaptic neurotransmitter release probability [12], and therefore connection strength [21]. Similarly, postsynaptic sites are visible due to the high density of structural proteins, protein kinases and phosphates [22] involved in neurotransmitter reception and recycling: the larger the site, the greater the probability of signal reception.

The cost of measuring synaptic surface areas is linear to the number of synaptic contacts for each graph edge, and could potentially be estimated by sampling only a subset of synaptic contacts. Here, we measured surface areas for all synaptic contacts of a number of connections between different cell types in the central nervous system of *Drosophila* larvae, using volume electron microscopy.

Our goal was two-fold. First, to find out whether synaptic surface areas are similar within and across cell types. Second, to study the correlation between synaptic contact number and area. We found that synaptic surface area measurements are similar across the systems and synapse types studied, and that there exists a strong correlation between the number of synaptic contacts and the total synaptic surface area per connection, which may be generalisable.

Therefore, our findings support the use of synaptic counts towards predicting connection strengths in a wiring diagram, dispensing with the labor-intensive measurements of synaptic surface areas, in *Drosophila* larvae. Therefore, any microscopy modality sufficient to resolve synaptic puncta (including light microscopy; e.g. [23, 9, 24], and more recently expansion microscopy, e.g. [25]) can serve as the basis for predicting circuit graph edge weights towards computational modeling of neuronal circuit function.

## Results & Discussion

### Edge types

*Drosophila* synapses are largely one-to-many [26], and therefore the best measure of synaptic area for a single contact between two neurons is that of the postsynaptic membrane. We measured the area of 540 such contacts in 3D using a serial section transmission electron microscopy (ssTEM) image stack. These represent 80 graph edges (unique presynaptic-postsynaptic neuron partnerships). The postsynaptic neurons are first-order projection neurons belonging to two disparate sensory systems. In each system, one set of inhibitory and one set of excitatory inputs were measured bilaterally, giving 4 total edge types with left/right replication:

Olfactory projection neurons (

*PN*s) in the brain- Local inhibitory neurons designated “Broad D1” and “D2” (

*broad*)- Excitatory olfactory receptor neurons (

*ORN*s) onto the same pairs of olfactory PNs

First-order mechano/nociceptive projection neurons “Basin” 1 through 4 (

*Basin*s) in the first abdominal segment (a1)- Local inhibitory neurons designated “Drunken”, “Griddle”, and “Ladder” (

*LN*s)- Stretch receptor neurons from the chordotonal organs designated “lch5”, “v’ch”, and “vch” (

*cho*)

The neuronal arbors of all neurons had previously been reconstructed and reviewed by expert annotators [27, 2], but only as skeletonised representations, with point annotations for synaptic contacts. This approach currently represents the most feasible strategy for manual retrieval of both contact number and contact fraction. These are shown along with the neuronal morphology in Figure 1.

In the olfactory system, *broad* neurons mediate both intra- and inter-glomerular lateral inhibition [27] by synapsing onto a broad range of cell types, including ORNs and PNs. ORNs, on the other hand, primarily innervate a single partner PN (making up over 50% of its dendritic inputs in some cases), although they have some off-target edges with low contact number and fraction.

In the chordotonal system, Basin projection neurons integrate inputs from a wide array of input neurons across several modalities [2], including the mechanosensory chordotonal neurons. They mediate both short- and long-loop behavioural responses, and are repeated segmentally. Behavioural choice is in part mediated by local inhibitory neurons [3], including the LNs discussed here.

### Distribution of synaptic surface area per edge type

For every one of the 540 synaptic contacts we measured the synaptic surface area (fig. 2 Ai, Aii; see methods). For each edge type, we independently plotted the histograms of log-synaptic surface area for all individual morphological synapses belonging to that edge type, with overlaid fitted normal distributions (fig. 2 Bi, Bii). We chose to plot the log-surface rather than the plain surface because log-normal distributions are common in models of biological growth, where larger specimens can grow faster in absolute terms given that they have proportionally more resources than smaller specimens [28, 29].

We tested pair-wise whether the distributions of log-surface area per edge type are significantly different from each other (fig. 2 Biii), and found that two of the pair-wise comparison are very significantly different (to p<0.001), whereas none of the others were significant. The question remains as to whether the effect size of the pair-wise statistical comparisons is biologically meaningful.

### Correlation between synaptic input counts and synaptic surface area

To analyse whether the number of synaptic contacts can reliably predict the total synaptic surface area of an edge, we computed a linear regression between synaptic counts per edge and the sum of synaptic surface area per edge (fig. 3 A). At the synaptic level, the assumption of homoscedasticity (homogeneity of variance) holds, but when aggregated at the graph edge level, the variance of the area is proportional to the count. The account for this heteroscedasticity in the aggregated model, the regression was weighted by the inverse of the synaptic count. When considering all edges jointly, ignoring edge type, we found an *R*^{2} of 0.905 (fig. 3 C), with visually few outliers, and a very good intercept at zero.

To analyse the influence of the edge type, we independently computed weighted linear regression for all edges of each edge type (fig. 3 B). We found that all independent regressions presented a similarly high *R*^{2} values, indicating that correlation is strong within each edge type as well as for the joint.

While the joint regression appears very similar to each independent regression by edge type, there could be significant differences by edge type that would prevent generalisation across edge types. We then analysed the contribution of edge type to predicting total synaptic area per edge from the count of synaptic contacts per edge (the contact number per edge).

### Variability across edge types

If across edge types, an edge’s total synaptic area correlated with its synaptic count, we could generalise and predict with confidence the former from the latter, independently of edge type. If not, and edge type instead had some predictive power over the synapse size, then every edge type’s synaptic areas would need to be sampled before predicting total synaptic area for any edge of that type. If the predictive power of edge type is small, then synapses of different edge types can be treated as belonging to the same distribution: edge type could be ignored. Therefore, we could confidently predict total surface area for an edge independently of the edge type.

Figure 2 shows a large variance in the areas of individual synapses (the *α* = 0.1 confidence interval spans approximately an order of magnitude within each edge type). The difference in means between the different edge types appears to have a small effect size in comparison to the intra-type variation (2 Bi), even where that difference is, strictly speaking, statistically significant (2 Biii). However, when inferring contact area from contact number in a graph edge, the edge type may still have predictive power. We here examine whether this is the case.

Bayesian hierarchical modelling allows us to incorporate our understanding of the sources of variation (e.g. potentially the edge type) into the model of a stochastic process.

Each graph edge from presynaptic neuron *i* to postsynaptic neuron *j* has *n* contacts. Each contact *k* in the edge has an area *A*_{(i,j,k)} therefore each edge has a total area . It is assumed that the area of a single morphological synapse is the result of some stochastic process which depends on which edge it belongs to, and which type that edge is.

Here we enumerate the four levels of the hierarchical model:

This approach, with a hierarchical model (fig. 4 A), gives us a single cohesive model which encodes the different sources of variance, the edge type being one of them. It allows us to use the same model when different amounts of information are available, and adjust our confidence in predicted values accordingly; a characteristic that makes the model well-suited for our data which has unequal sample sizes across edge types. The simple model (when we measure synaptic surface areas but ignoring the edge type or the partner identity) gives us a low-confidence prediction of the size of a newly-annotated synapse which we suspect to belong to this general distribution (fig. 4 B). However, the prediction may change, and the confidence increase, if we know which edge type the synapse belongs to–finding whether this is true is the goal of this analysis. The confidence can increase again when we identify its presynaptic and postsynaptic neurons (the partner identity). In other words, within the Bayesian framework, the more prior information available, the better our prediction.

Furthermore, being able to parameterise the variance of the distribution parameters allows us to draw conclusions about how different the distributions are across edges and edge types. Tightly grouped scale_{0} parameters within an edge type (low variance of ) would suggest that edges within that group have similar distributions. In this case, synaptic area measurements in one edge could be used to approximate the areas of synapses in other edges of the same type. A tight grouping of scale_{1} parameters (low variance of ) would imply that the difference between edge types is small, and that synapse size information generalises across edge types.

Here we show how adding edge type information does not meaningfully improve the confidence in predicting total synapse area from contact number for an edge. Figure 4 B shows the confidence intervals for predicting total edge contact area given edge contact number, ignoring edge type information. Figure 4 C shows, for each edge type, the change in confidence interval when information about that edge type is introduced to the prediction. The increase in prediction confidence (narrowing of the confidence interval) is small, suggesting synapse size information from one edge type generalises fairly well to other types, for the edge types studied here.

## Conclusion

Our data shows that, within the studied connections in the *Drosophila* larval nervous system, the total synaptic surface area can be predicted solely from the number of morphological synaptic contacts from a presynaptic neuron to a postsynaptic one, independently of the participating neuronal cell types.

The diversity of edge types included in our study (sensory axons onto excitatory interneurons, and inhibitory interneurons onto excitatory interneurons), together with the observed strong correlation between contact counts and surface areas, suggest that we should expect a similarly strong correlation in other edge types in the central nervous system at the same *Drosophila* larval life stage.

Whether the observed correlation holds true through changes in neuronal arbor dimensions is an open question. In development, we know that the number of morphological synaptic contacts in a connection (i.e. an edge) grows by a factor of 5 from early first instar to late third, while the distribution of the number of partners per polyadic synapse remains constant [10], and the associated behavior doesn’t change [31]. In addition, synaptic input fractions are preserved from first to third instar despite a 5-fold increase in cable and in number of morphological synaptic contacts [10]. Similarly, in the adult *Drosophila* fly, an increase of 50% in the amount of dendritic cable and number of morphological synaptic contacts doesn’t alter the synaptic input fractions, or the density of synaptic contacts per unit of cable, or some key biophysical properties of the neuron [4]. Our data, together with these reports on the strong preservation of both structural and functional circuit properties across neuronal arbor growth, suggests that we ought to expect the observed strong correlation between number of contacts and surface areas to persist throughout larval development as well as in situations where arbors have different dimensions.

Our findings have implications for the study of the relationship between structure and function, in particular towards making functional inferences from morphological data. Following the chain of reasoning that synaptic strength correlates with vesicle release probability [32], which in turn correlates with the number of docked vesicles [21, 12], which correlates with the surface area of the EM-dense active zone, our findings suggest that inferring synaptic strength directly from the number of morphological contacts is grounded and generalizable across different edge types. Therefore our findings are consistent with the formulation of computational circuit models directly from morphological synaptic counts per edge, as has been done in previous studies that had implicitly assumed a correlation between counts and synaptic strength [2, 33, 3, 19, 7].

A full-fledged validation of our findings will require the comprehensive measurement of all synaptic surface areas of all neurons of a central nervous system. Recent and upcoming automated methods for segmenting synaptic surface areas [34], aided by improvements in volume EM such as isotropic imaging with FIBSEM [35], will make such a study tractable in the near term for small model organisms such as *Drosophila*.

Our findings have further implications towards speeding up reconstruction of wiring diagrams. Namely, measuring synaptic surface areas may no longer be a requirement. Therefore, the performance trade-off could be tipped towards imaging speed at the expense of resolution, lowering acquisition costs, and therefore enabling the detection of synaptic puncta from light microscopy [9, 25, 23] or low-resolution EM [35], which, as per our data, could be used to infer synaptic strength towards the functional analysis of neural circuits.

## Methods

### Imaging

The data volume used was the whole central nervous system described in [2]. The specimen was a 6 hour old female *Drosophila melanogaster*, in the L1 larval stage. It is comprised of 4850 sections, each 50*nm* thick, cut with a Diatome diamond knife. Each section was imaged at 3.8 × 3.8*nm* resolution using an FEI Spirit TEM. The images were montaged and registered using the nonlinear elastic method described in [36].

### Neuronal and synapse morphologies

The neurons analyzed in this study were all published in [27] (olfactory sensory neurons ORNs, olfactory PNs and GABAergic Broad LNs) and in [3] (chordotonal somatosensory neurons, Basin neurons, and synaptically connected GABAergic LNs including Drunken, Ladder and Griddle LNs). Neuronal morphologies and connectivity were reconstructed collaboratively using CATMAID [37] with the procedures described in [38].

To reconstruct a neuronal arbor, point annotations (“skeleton nodes”) are placed in each *z* section of a neurite. Synaptic contacts are identified by the presence of a thick black active zone, presynaptic specialisations such as vesicles and a T-bar, and evidence of postsynaptic specialisations, across several sections. In the CATMAID software, a point annotation (“connector node”) is placed on the presynaptic side of the synaptic cleft; directed edges are then drawn from the nearest presynaptic skeleton node to the connector node, and from the connector node to a skeleton node in each postsynaptic neuron.

### Synaptic area annotation

Using catpy, a Python interface to CATMAID’s REST API [39], the locations of synaptic contacts between neurons of interest were extracted, and an axis-aligned cuboid of image data including the connector node and the pre and post-synaptic skeleton nodes of interest was downloaded. This was stored in an HDF5 [40] file whose schema extends that used in the CREMI challenge [41]. The location of the relevant skeleton and connector nodes are also stored in the file, to be used as a guide.

In *Drosophila*, most synapses are polyadic (one-to-many). Therefore, the best measurement for the synaptic contact area from presynaptic neuron *i* to postsynaptic neuron *j* is the area of the postsynaptic membrane. Using BigCAT, a BigDataViewer-based [42] volumetric annotation tool, the postsynaptic membrane (identified by evidence of postsynaptic specialisations, and adjacency to a synaptic cleft which was itself adjacent to presynaptic specialisations) was annotated with a thin line in each *z* section in which it appeared. These annotations are given unique IDs, and the association between the ID and the nodes it is associated with is also stored in the CREMI file.

These annotations were then normalised *post hoc*. Using scikit-image [43] v0.14, the line drawn in each *z* section was skeletonised [44]. The skeletonised line was converted into a string of coordinates in pixel space, which was smoothed with a Gaussian kernel (*σ* = 3). Its length in pixel units was taken and multiplied by the *xy* resolution (3.8*nm*). This 2D length is then multiplied by the *z* resolution (50*nm*) to give an approximation of the synaptic surface represented by the membrane visible in this section, in *nm*^{2}. The area of a synapse is approximated by the sum of such areas for a single contact across *z* sections.

By extension, the contact area of a graph edge between neurons *i* and *j* is the sum of synaptic areas for all contacts between *i* and *j*.

### Hierarchical modelling

The gamma distribution was selected as it is a versatile continuous probability distribution with a low number of parameters and a positive domain. The priors for each of the 3 shape parameters were uniform between 0.001 and 1000. The prior for parameter scale_{2} was uniform between 1 and 1000000.

The modeling was performed using the R language [45] and the JAGS [46] sampler. The sampler was run for 100000 iterations, and thinned by 2, with a 1000-iteration burn-in.

### Other software

The majority of analysis was performed using Python [47] version 3.7, numpy [48] v1.16, scipy [49] v1.1, and pandas [50] v0.23. Figures 1, 2, and 3 were generated using matplotlib [51] v3.1 and FigureFirst [52]. Figure 4 was generated using the R language [45] and ggplot2 [53]. Figures were assembled using Inkscape [54] v0.94.

## Acknowledgements

C.L.B. thanks the joint graduate program between HHMI Janelia Research Campus and the University of Cambridge, and William Schafer at the MRC LMB for being the Cambridge host; and thanks Susan Jones at the Department of Physiology, Development and Neuroscience of the University of Cambridge, and Maryrose Franko and Erik Snapp at HHMI Janelia, for their support during his graduate studies. The authors thank Arlo Sheridan in Jan Funke’s lab at HHMI Janelia, and Andrew Champion and Tom Kazimiers for their support with CATMAID software engineering. This work was funded by HHMI Janelia.

## Footnotes

cb619{at}cam.ac.uk, dbb31{at}cam.ac.uk, acardona{at}mrc-lmb.cam.ac.uk

## References

- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵