An anatomical substrate of credit assignment in reinforcement learning

Learning turns experience into better decisions. A key problem in learning is credit assignment—knowing how to change parameters, such as synaptic weights deep within a neural network, in order to improve behavioral performance. Artificial intelligence owes its recent bloom largely to the error-backpropagation algorithm1, which estimates the contribution of every synapse to output errors and allows rapid weight adjustment. Biological systems, however, lack an obvious mechanism to backpropagate errors. Here we show, by combining high-throughput volume electron microscopy2 and automated connectomic analysis3–5, that the synaptic architecture of songbird basal ganglia supports local credit assignment using a variant of the node perturbation algorithm proposed in a model of songbird reinforcement learning6, 7. We find that key predictions of the model hold true: first, cortical axons that encode exploratory motor variability terminate predominantly on dendritic shafts of striatal spiny neurons, while cortical axons that encode song timing terminate almost exclusively on spines. Second, synapse pairs that share a presynaptic cortical timing axon and a postsynaptic spiny dendrite are substantially more similar in size than expected, indicating Hebbian plasticity8, 9. Combined with numerical simulations, these findings provide strong evidence for a biologically plausible credit assignment mechanism6.

Learning turns experience into better decisions. A key problem in learning is credit assignment-knowing how to change parameters, such as synaptic weights deep within a neural network, in order to improve behavioral performance. Artificial intelligence owes its recent bloom largely to the error-backpropagation algorithm 1 , which estimates the contribution of every synapse to output errors and allows rapid weight adjustment. Biological systems, however, lack an obvious mechanism to backpropagate errors. Here we show, by combining high-throughput volume electron microscopy 2 and automated connectomic analysis [3][4][5] , that the synaptic architecture of songbird basal ganglia supports local credit assignment using a variant of the node perturbation algorithm proposed in a model of songbird reinforcement learning 6,7 . We find that key predictions of the model hold true: first, cortical axons that encode exploratory motor variability terminate predominantly on dendritic shafts of striatal spiny neurons, while cortical axons that encode song timing terminate almost exclusively on spines. Second, synapse pairs that share a presynaptic cortical timing axon and a postsynaptic spiny dendrite are substantially more similar in size than expected, indicating Hebbian plasticity 8,9 . Combined with numerical simulations, these findings provide strong evidence for a biologically plausible credit assignment mechanism 6 . Neural circuits that control decisions and actions are recurrently connected and involve many network layers from sensory inputs to motor output. Yet, as we learn, some mechanism specifies precisely which synapses, out of trillions, are to be modified and in what way. The backpropagation algorithm 10 is powerful because it directly calculates, based on the network architecture, the derivative of output errors with respect to every synaptic weight, providing an efficient method to update synaptic strengths. However, it remains unclear whether backpropagation or its variants are biologically implemented, or even plausible [11][12][13] . An alternative approach to implement gradient-based learning is weight-or node-perturbation 14,15 , in which the activity of a specific synapse or neuron is stochastically varied to determine its contribution to the output. Here we use a connectomic approach to study the biological implementation of stochastic gradient descent, which requires as-yet unknown circuit structures to inject variability, correlate variability with reward signals, and correctly assign credit to relevant synapses.
Node perturbation is conceptually similar to behavioral trial-and-error reinforcement learning (RL) 16,17 . In the vertebrate brain RL is thought to involve the basal ganglia 18 , where a multitude of sensory and other context and state signals converge with action and outcome signals to determine which actions in which state lead to the best outcomes. In the songbird, the basal ganglia circuit dedicated to song learning 19 , Area X, receives synaptic input from two cortical areas 20 : the vocal variability-generating area LMAN (lateral magnocellular nucleus of the anterior neostriatum) and a timing generating area, HVC (proper noun), which controls vocal output at each moment in the song. Area X also receives information about song performance via dopaminergic (DA) axons originating in the ventral tegmental area (VTA) 21 (Fig. 1a). It has been proposed that medium spiny neurons (MSNs) in Area X integrate signals from HVC, LMAN, and VTA to detect which song variations (from LMAN) at which times in the song (from HVC) lead to improved song performance (from VTA). The model posits that convergence of these factors drives plastic changes in HVC synapses within the basal ganglia such that HVC can improve song performance at each moment in subsequent song renditions 6 .
At a synaptic level, precise temporal coincidence of HVC and LMAN input onto MSN dendrites is hypothesized to form a transient biochemical eligibility trace 22 at the HVC-MSN synapse. This tags the synapse as a candidate for strengthening, conditional on a subsequent DA signal, indicating improved song performance 15,[23][24][25] , similar to three-factor Hebbian learning rules 26,27 . Spinous synapses are well suited to carry out the proposed functions of the HVC-MSN connection: they are electrically and biochemically compartmentalized 28 , can function as coincidence detectors 29 , exhibit robust synaptic plasticity 30 , and have been proposed as loci of eligibility traces 22 . The model posits that, in contrast to HVC inputs, LMAN synapses are neither plastic, nor carry an eligibility trace, but rather signal the occurrence of actions globally across MSN dendrites-a function more suitable to shaft synapses. Altogether, these observations lead to the specific predictions ( Fig. 1a) that HVC-MSN synapses are plastic and preferentially terminate on spines, while LMAN-MSN synapses preferentially terminate on dendritic shafts 24 . Fig. 1: Anatomy of the zebra finch song system and Area X connectome analysis. a , The brain regions and their connections involved in zebra finch song learning and generation; solid lines indicate mono-synaptic connections; broken lines, multi-synaptic. b , Automated reconstruction pipeline (FFN, flood filling network; CNN, convolutional neural network; RF, random forest classifier). c , Area X data set used in this study. d , UMAP projection of a 67-dimensional neurite feature space into 2d (each neurite is one dot, n=7,120). Dots with the same color belong to the same cell type as assigned by a neural network morphology classifier 5 . Also shown are modulatory (light red), subthalamic-nucleus-like (green), pallidal-like (yellow), inhibitory (inter) neuron-like neurites (turquoise). e, f, g, Typical neurites from putative corticostriatal axon clusters HVC, LMAN, and a MSN. Scale bar for HVC and LMAN 10 µm, for MSN 20 µm. To examine the anatomical fine structure of inputs to MSN dendrites and test these predictions, we acquired a data set (approximately cubical, with 100 µm edge length) by serial block-face electron microscopy (SBEM) 2 from Area X of an adult male zebra finch. The neural circuitry contained in this volume (~2.4 m axonal and~0.5 m dendritic path length; 253,657 synapses; 137,918 excitatory, 115,739 inhibitory) was then reconstructed using fully automated methods for neurite segmentation, synapse identification and morphology classification [3][4][5] (Fig. 1b, c). The results presented are based on a hybrid reconstruction, with the machine output augmented by some (<1000 hours) manual proof-reading. However, later inspection revealed that the conclusions are the same for analyses carried out on fully automated reconstructions, without manual neurite proofreading (see Supplementary Note 1 and Extended Data Fig. 1). This speaks to dramatic progress in automating connectomic analysis [3][4][5] , and stands in contrast to the recent situation in which significant amounts of manual proofreading were required to address meaningful biological questions 9,31,32 .
To interpret the synaptic architecture associated with the MSN dendrite, we next needed to determine the cell types of the neurites involved. We used a neural network-based morphology classifier 5 trained on a set of 240 neurites, manually classified on the basis of known morphological characteristics (Extended Data Table 1). These characteristics included the rate of branching, which is especially low for HVC axons (Fig. 1e), the presence of a dense terminal plexus, typical of LMAN axons 33,34 (Fig. 1f), the density of spines on dendrites, to identify MSNs 35 (Fig. 1g). Cross-validation (Methods) of classifier performance (F1-scores of 0.91, 0.95, 0.93 for HVC, LMAN and MSN, respectively) indicates a misclassification rate below 10%, which also largely agrees with an unsupervised low dimensional embedding ( Fig. 1d; Supplementary Note 2, Extended Data Fig. 2

and 3).
We start by analysing the nature of HVC and LMAN synaptic contacts on MSN dendrites (Fig. 2a). Together these inputs contribute 94% of all excitatory synapses onto MSNs. HVC synapses dominate those from LMAN about sixfold by number, and sevenfold by area, a ratio much larger than the 1-to-1 ratio between the numbers of axons (about 10,000) entering from each region 36,37 , but smaller than the 12-to-1 ratio between the total axon path lengths (~90 cm vs.~7.7 cm, in the data set; Fig. 2b). Consistent with the model's prediction, most (86.7%) of the synaptic area of HVC axons onto MSNs was found on spines, while the corresponding fraction was less than half (44.7%) for LMAN axons. Similar results were obtained for sets of manually classified HVC and LMAN axons (87.6% and 38%; 30 neurite fragments each). In contrast, the synaptic areas of HVC and LMAN axons terminating on MSN dendritic shafts were 11.6% and 52.3%, respectively. The fractions of synaptic contact area going to an MSN postsynaptic compartment other than a spine or shaft were negligible (1.4% and 2.5% on somata, and 0.3% and 0.5% on spine necks, for HVC and LMAN, respectively). We found no evidence for distinct subpopulations of HVC or LMAN axons with different spines/shaft ratios; a similar analysis also revealed no distinct subpopulations of MSN dendrites on this basis (Fig. 2c,d).
Given the very different roles for HVC and LMAN synapses postulated in the model, we wondered if these synapses also differ in their distribution of synaptic areas. HVC synapses onto spines were larger than those from LMAN (mean=0.35 µm 2 vs 0.22 µm 2 , median=0.23 µm 2 vs 0.15 µm 2 , n=81,684 and n=8,946, for HVC and LMAN, respectively), while the reverse was true for synapses onto MSN shafts (mean=0.30 µm 2 vs 0.35 µm 2 , median=0.18 µm 2 vs 0.25 µm 2 , n=13,018 and n=6,668, for HVC and LMAN, respectively). These differences were highly significant (two-sided Mann Whitney U tests; all p-values < 10 -50 ). The size distribution of each of the four synapse classes (HVC-spine, HVC-shaft, LMAN-spine, LMAN-shaft) was much better fit by a mixture of two log-normal components than by a single log-normal distribution (Fig. 2e, Extended Data Fig. 5), reminiscent of a similar finding of bimodal synapse sizes in the mammalian cortex 38 . In fact, the larger size of HVC spine synapses, compared to those from LMAN, is consistent with their hypothesized role in driving temporally specific MSN activity 39 and appears to be largely explained by a more robust tendency of HVC synapses to occupy the larger synapse state (HVC-spine: 81% µ S = 0.18 µm 2 ; 19% µ L =0.71 µm 2 ; LMAN-spine: 89% µ S = 0.14 µm 2 ; 11% µ L = 0.54 µm 2 ). The converse was true for shaft synapses where it was the LMAN inputs that were more likely to occupy the larger of the two states (HVC: 77% µ S = 0.14 µm 2 and 23% µ L = 0.59 µm 2 ; LMAN: 67% µ S = 0.17 µm 2 and 33% µ L = 0.5 µm 2 ). In the proposed learning model, LMAN inputs generate a dendrite-wide depolarization to enable detection of HVC and LMAN coincidence. Such broad depolarization might require multiple LMAN contacts spread across the target compartment. We examined the incidence of multi-synaptic connections from HVC and LMAN axons (only fragments > 50 µm long) onto MSN dendrites (Fig. 3a). On average, LMAN axons dedicated 13% of their synaptic area to multi-synaptic connections, while for HVC axons this was only 5%. In both cases, this number is likely to be an underestimate due to a substantial number of reentering axons that result from the limited extent of the data set. About 65% of all LMAN axon fragments, but only 26% of HVC fragments made more than one synapse on the same MSN (Fig. 3b), which is consistent with the more focal distributions of LMAN arbors 34 . For both HVC and LMAN axons, multiple synaptic contacts had an average pairwise distance of about 10 µm, which is small compared to expected electrotonic lengths.
It has been argued that Hebbian learning should lead to a correlation in the size of synapses that share both pre-and post-synaptic neurons (dual-connection pairs), and that when such correlations are found they constitute evidence for Hebbian learning 8,40,41 . In our data set, the dual-connection pairs (n=1950) between HVC axons and MSN spines showed a significant degree of such size correlations. In particular, the distribution of normalized intra-pair synaptic-area differences was significantly shifted toward smaller values compared to randomized controls ( Fig. 3c-e; one-sided Mann Whitney U tests; for all controls p-values < 10 -5 ; see Methods, Extended Data Table 3 and Extended Data Fig. 4). This suggests that HVC-to-MSN-spine synapses, which comprise the vast majority of spinous inputs to MSNs, are capable of Hebbian-type plasticity, consistent with the proposed learning model. For LMAN-to-MSN-spine synapses (n = 443 pairs) a marginally significant effect was observed (shuffle control p-value = 0.045; rand control: p = 0.025; sA_dD control: p = 0.098), while no significant differences from randomized control pairs was seen for shaft synapses of both axon types (all p-values > 0.05; HVC n = 128; LMAN n = 215). Notably, while the enrichment of similarly sized pairs observed in Area X spine synapses is smaller than that observed in hippocampus 8,41 , it is comparable to the enrichment reported for neocortex 9,38,40 . Multi-synaptic connections of MSN dendrites a , MSN dendrite (purple) that receives two spine-head synapses from the same HVC axon (blue). Also shown in cross sections (right). Scale bars: 1 µm. b , CDF showing the fraction of multi-synaptic area for HVC (blue) and LMAN (red) axons. c, area of the first vs. area of the 2nd synapse (and vice versa, i.e. symmetrized across the diagonal) for all HVC-on-MSN-spine dual pairs. d , Histogram of observed normalized size difference values (cv) for dually connected HVC-MSN spine synapse pairs relative to values for different sets of control synapse pairs. Three different control sets are used: sA_dD, both synapses sampled from the same axon but connecting to different dendrites; rand, control pairs drawn randomly from all HVC-MSN spine synapses; shuffle, control pairs drawn randomly from all dually connected pairs of HVC-MSN synapses, without replacement. Note the excess of pairs with small differences and suppression of pairs with large differences compared to controls. Error bars are of histogram counts. e, Excess size similarity of HVC-MSN spine ± √n synapse relative to controls (percentage mean cv change) for synapse pairs of different mean size ranges. Error bars are s.e.m. (Methods).
To quantitatively assess the ability of the observed neural architecture to correctly implement credit assignment in the context of vocal learning, we constructed a numerical simulation of the model consistent with our anatomical findings. Closely related to the concept of parallel node perturbation 7,15 , the local learning rule incorporates non-plastic LMAN shaft synapses, and plastic HVC spine synapses. The latter are strengthened by coincident activation of HVC (state) and LMAN (action) inputs, followed by a delayed reward (Fig. 4a). Numerical simulations show that the model learns to imitate a simple song template after a few thousand iterations (Fig. 4b), consistent with typical learning rates in juvenile zebra finches 42 . Furthermore, our simulations revealed that the network exhibits a high temporal precision of learning that exceeds that of the reward signal (~100ms) 21 (Fig. 4c), consistent with the reported ~10 ms resolution of vocal learning in the songbird 43 (Fig. 4d). Connectomic analysis of brain circuitry has shown promise as a method to investigate synaptic plasticity 8,40 . Using highly automated dense reconstruction, we have extended this approach to analyze the ultrastructure of synaptic interactions of inputs that carry computationally distinct and learning-relevant signals. In the context of a detailed model of a cortical-basal ganglia circuit that implements reinforcement learning, our data reveal the existence of a significant asymmetry between two types of cortico-striatal (CS) connections. The LMAN input resembles mammalian pyramidal-tract CS projections and carries an efference copy of a motor command signal, while the HVC input resembles mammalian intratelencephalic CS projections and carries a state (timing) signal 44,45 . The predominance of HVC inputs onto dendritic spines and LMAN inputs onto shafts supports a model of basal ganglia learning in which action efference copy inputs modulate or gate plasticity of state inputs to MSNs. In this view, MSN spines detect and transiently maintain a memory of recently active action-state pairings.
Our data provide insight into a plausible biological implementation of node perturbation 15 , a form of stochastic gradient descent learning that performs synaptic credit assignment without backpropagation. Conceptually, our model is a variant that could be called "remote node-perturbation", since action feedback connections transmit network output variations back to the relevant synapses for credit assignment (Fig. 4e). We have identified a candidate ultrastructural motif for this mechanism: distinct excitatory synapse types interacting on a dendrite to inject variability, to correlate variability with reward signals, and to correctly assign credit to relevant synapses that potentiate improved performance. The generality of such a credit assignment strategy and whether it is anatomically manifested similarly in other brain areas, namely the cortex or the cerebellum, remains an interesting topic for future theoretical and empirical work.

Code availability
All analysis code will be made available upon publication on GitHub.

Sample preparation
An adult male zebra finch (> 120 days post hatching, obtained from the MPI of Ornithology, Seewiesen, Germany) was transcardially perfused with cacodylate buffer (CB) at room temperature (RT) that contained 0.07 M sodium cacodylate (Serva, Germany), 0.14 M sucrose (Sigma-Aldrich), 2 mM CaCl 2 (Sigma-Aldrich), 2% PFA (Serva) and 2% GA (Serva) to retain the extracellular space 46 . We cut the brain sagittally into 200 µm thick slices on a vibratome (Leica VT1000S), found the slice with the largest cross section of Area X (identified by its round shape and location in the section) and excised a piece of about 300 x 300 µm centered in Area X. We next immersed the sample in a sequence of aqueous solutions in 2 ml Eppendorf tubes. First 2% OsO 4 (Serva) reduced with 2.5 % potassium hexacyanoferrate(II) (Sigma-Aldrich) for 2 h at RT, followed by 1% thiocarbohydrazide (Sigma-Aldrich) at 58°C for 1h, then 2% OsO 4 at RT for 2h, 1.5% uranyl acetate (Serva) in H 2 O at 53°C for 2h and 0.02M lead-aspartate (Sigma-Aldrich) at 53°C for 2h. The sample was rinsed three times with CB after the first OsO 4 step and once with double-distilled water after each of the other staining steps. We next dehydrated the samples using chilled ethanol-water mixtures (70 %, 100 %, 100 %, 100 %, ethanol (Electron Microscopy Sciences), 10, 15, 10, 10 min) followed by propylene oxide (100%, 100%) (Sigma-Aldrich), infiltrated first with a 50/50 propylene oxide/epoxy mixture, then with 100% epoxy. The epoxy was 812-replacement (hard mixture, Serva). After infiltration, samples were cured (60°C for 48h), trimmed, and smoothed (Leica Ultracut microtome). All animal experiments were approved by the Regierungspräsidium Karlsruhe and were carried out in accordance with the laws of the German federal government.

SBEM data-set generation
The sample was imaged and sectioned in a eld-emission scanning electron microscope (SEM, Zeiss UltraPlus) equipped with a custom diamond knife-based serial-blockface ultramicrotome 2 . We used a landing energy of 1.6 kV, a beam current of 1 nA, a scan rate of 3.3 MHz, a lateral pixel size of 9 x 9 nm and a cutting thickness of 20 nm. The acquired data set (single image tile 10,240 x 10,240 pixels) was initially aligned by translation only, followed by elastic alignment (A. Pope, Google Research. The aligned stack contained 10,664 × 10,914 × 5701 voxels with 8 bit intensity values (~664 GB). No contrast normalization was performed.

Neuron reconstruction and proofreading
Neurite reconstruction was carried out by Flood Filling Network (FFN) segmentation of the entire (114 µm x 98 µm x 96 µm) data set. Broadly, this was done in two stages: an initial over-segmentation (base segmentation) was created to ensure the near-absence of false mergers; next, segments in the base segmentation were then agglomerated into more complete units. Several steps were taken to improve oversegmentation in the initial stage (i.e. to further reduce the occurrence of false mergers), and to improve object continuity in the second stage (i.e. to reduce the occurence of false splits). In addition to the base segmentation described in Januszewski et al. 3 , we created an independent new oversegmentation that was then combined with the original base segmentation using oversegmentation consensus 3 . The new oversegmentation started with the preexisting base segmentation as its initial state. We then used FFN inference with the following adjustments to create new neurite fragments:(a) we increased the "disconnected seed threshold" parameter from 0 to 0.15, which resulted in the assignment of previously unclaimed voxels in the interior of small-diameter axons (36.4B voxels filled); (b) we used a different snapshot of FFN weights ("checkpoint") in areas with lower data quality (19.8B voxels filled); (c) we recreated objects adjacent to voxels classified as myelin by the tissue type classifier in an FFN inference run without the myelin mask (6.9B voxels filled); and (d) in areas adjacent to detected irregularities we recreated objects in an FFN inference run with FOV movement restriction with the section-to-section shift threshold relaxed from 4 to 6 voxels (1.8B voxels filled). The steps a)-d) were carried out sequentially, and the result of every step was combined with that of the previous step with oversegmentation consensus. To even further reduce the number of false mergers, we took the seed point for every segment in the original base segmentation, and used FFN inference at 2x reduced in-plane resolution to create a segmentation with no encumbrance by any other object (note that in this process the base-set objects can shrink as well as grow). The results were upsampled and combined with the base segmentation, again through oversegmentation consensus.
We then ran FFN agglomeration as described previously 3 . For every decision point involving at least one segment added in (a)-(d) (see paragraph above), we ran FFN agglomeration a second time, this time with inference settings matching the conditions (a)-(d) under which the segment was created. To detect remaining splits we skeletonized the agglomerated objects using TEASAR 47 . For each skeleton node with only one adjacent edge we determined whether it was a true neurite endpoint by running FFN segmentation within an empty (no pre-existing segments) (201,201,101)-voxel subvolume centered on the seed placed at the skeleton node. This was done with both the main FFN checkpoint and the one used in condition (b) above. We considered any base segments to be "recovered" if at least 60% of their voxels were overlapped by the predicted object map (POM) generated by the FFN in this procedure, and thresholded at 0.5. We then merged segments (A, B) in case of symmetric recovery, i.e. when the subvolume segmentation described above, seeded from a node located in A recovered B, and vice versa. Finally, we identified orphan fragments (defined as not reaching the surface of the data set) and iteratively (6 times) connected them to the partner fragment with the highest Jaccard score as computed in FFN agglomeration for the fragment pair 3 . Only pairs for which at most 15% of the voxels of one of the objects changed POM values from >0.8 to <0.5 during FFN agglomeration inference were considered in this step.
The number of orphaned neurite fragments was further reduced by manually inspecting their ends. If a split error was detected, a skeleton tracing was initiated. The annotator was provided with a seed location and an initial tracing direction but not the actual segmentation. Tracing was terminated when the annotator decided that a true neurite ending had been reached or when the newly traced skeleton reached another fragment with > 10 µm skeleton length. The thus created connectors (n=36,154 in 911 hours of tracing) were then used to combine the appropriate fragments. This reduced the number of fragments from 231,207 to 212,656. 13 of those contained at least one erroneous merger, identified by the presence of at least two somata. All connectors contained in those (n=434) were then removed, resulting in 213,076 fragments. This proofreading step was performed using a Python plugin for KNOSSOS ( www.knossostool.org ).
To further reduce the number of false mergers, one of us (MJ) inspected 36,653 classified neurites using Neuroglancer, identified merge errors by visual evaluation of object shape plausibility, and manually removed agglomeration graph edges causing those mergers. In total, less than 10 hours were spent for this additional proofreading step, which resulted in the removal of 846 edges and the elimination of 840 merge errors.
Unless otherwise mentioned, manual proofreading was performed by student assistants or outsourced (ariadne-service gmbh).

Synapse detection
In a first step, mitochondria, vesicle clouds, synaptic junctions and synapse type were predicted in the data set using two convolutional neural networks 4 . For each boundary voxel of the FFN segmentation, all other neurite IDs were collected in a 6 x 6 x 3 subvolume and the most frequent ID was denoted the partner neurite ID for this boundary voxel. All voxels that were both part of such a neurite-neurite contact and classified as a synaptic junction were clustered (connected components with a maximum distance of 250 nm) and each cluster considered as a putative synapse object. Mitochondria and vesicle clouds were assigned to the cells with which they shared the highest voxel overlap.
In a second step, every candidate synapse object was assigned a probability by a random forest (RF) classifier trained on manually labeled candidate synapse objects (n=436; non-synaptic: 238, synaptic: 198). Candidate synapses below probability 0.5 were eliminated. This classifier used 11 hand-designed features extracted from the pre-and postsynaptic neuron (collected within a sphere of 4 µm) and from the candidate synapse itself, in particular, the synapse size (in voxels), the volume ratios between contact site and synaptic junction bounding box, synaptic junction and contact site, as well as the numbers and voxel counts of pre-and post-synaptic mitochondria and vesicle clouds. The synaptic area of identified synapses was estimated as its surface mesh area (which includes pre-and post-synaptic areas) divided by two. Meshes were computed using marching cubes applied to the masked synapse voxels after binary closing with seven iterations.
We used cellular morphology neural networks (CMN) 5 to recognize and differentiate cellular substructures. The two models, one for spines 5 and one for axons (axon, terminal bouton, en-passant bouton, dendrite, soma; trained on 17 reconstructions; window size of 40.96 µm x 20.48 µm x 20.48 µm and 1024 by 512 pixels; 3 projection windows per location; rendering locations were the set of vertices downsampled by a third of the maximum window size), classified the cell surface based on 2D-projections of the neurite and its mapped organelles (mitochondria, vesicle clouds) and synaptic junctions. Surface labels were mapped to the synapse directly by k-nearest-neighbors in case of spines (k=50). In the axon (dendrite, soma and bouton) case the surface labels were first mapped to the skeleton nodes by k-nearest-neighbors (k=50) and then averaged using a sliding window approach, which collected the labels of skeleton nodes within 10 µm traversal length along the skeleton in every direction starting from the source node. The axon (dendrite, soma, bouton) label of each synaptic partner was taken from the respective skeleton node closest to the synapse.

Cell-type classification
Using the CMN pipeline, we also detected astrocyte fragments, and removed these from all subsequent analyses, as described in 5 .
The cell types of the remaining FFN reconstructions were then identified with the same approach, with a CMN trained on manually labeled neurites (JK), 30 per class, in total n=240 (HVC, LMAN, MSN, pallidal, subthalamic nucleus like, dopaminergic, cholinergic, interneuron; the dopaminergic and cholinergic classes were combined for the classification used in this manuscript). We modified the CMN from 5 by extending the latent representation returned by the last convolution layer with the synaptic area ratio between symmetric and all synapses in the neurite before it was passed to the first fully connected layer. The performance was assessed using ten-fold cross-validation 48 .
To cluster the neurites for an unsupervised cell type identification, independent of the supervised approach based on the CMN-pipeline, we designed three sets of neurite compartment specific features (soma, axon and dendrite). These consisted of 62 neurite-specific morphology features (e.g., caliber variation, synaptic density, mitochondrial density), as well as 15 features that were computed based on its synaptic connectivity: e.g., for axons, we computed for every of its outgoing synapse a feature of the postsynaptic dendrite, and then used the synaptic area to compute a weighted average of those postsynaptic properties. The full set of features is described in Extended Data Table 1. Each neurite corresponds to a point in this 62 dimensional space which was then projected onto a plane using either the Uniform Manifold Approximation and Projection, UMAP 49 or the t-distributed stochastic neighbor algorithm (see Supplementary Note 2) 50 (Python sklearn.manifold.TSNE) with a random seed of 0. The pairwise distance of two neurites A and B, was calculated by averaging the Euclidean distance of the separate feature vectors of the three compartments (axon, dendrite, soma). In case a compartment did not exist in A or B (compartments with less than 15 synapses associated), it did not contribute to the averaging. The embeddings were visually similar over a range of perplexities (tested from 10 to 320, Extended Data Fig. 2a,b), a parameter which controls in t-SNE how many neighbors should be considered for the embedding, and the actual number of neighbors in UMAP, also a controllable parameter (tested from 10 to 320). After visually inspecting neurites from different clusters and confirming that they agree largely with our supervised results ( Fig. 1d; UMAP parameters: n_neighbors=15, min_dist=0.25), we applied Hierarchical density based clustering (HDBSCAN) 51 to the down-projected feature space (UMAP parameters: n_neighbors=40, min_distance=0.0; HDBSCAN parameters: min_cluster_size=100, min_samples of 5) to assign cluster labels to each neurite (HDBSCAN clustering on the full feature dimensions led to no meaningful results). Agreement with the supervised CMN classifier was calculated by first identifying for each CMN celltype class the unsupervised cluster label with the greatest overlap that was then compared to the supervised CMN cell type of the same neurite and used to calculate per class F1-scores, shown in Supplementary Note 2.
Synapse-size distribution analysis of HVC and LMAN axons Synaptic size (contact area) was computed as described in the 'Synapse detection' section of the methods.
Synapse sizes were log-10 transformed and the distribution was fit to a mixture of two truncated (lower boundary of 0.01 µm 2 , no upper boundary) normal distributions (Python scipy.stats.truncnorm) by minimizing the negative log-likelihood function using Sequential Least Squares Programming (Python scipy.optimize.fmin_slsqp). Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC) comparisons were performed by fitting n-component (n ranging from 1 to 10) normal distributions using Python's sklearn.mixture.GMM to the log-10 transformed synapse sizes.
We computed the fraction of excitatory synaptic area onto different post-synaptic compartments of MSNs. These fractions were computed separately for synapses arising from HVC, LMAN and STN-like axons as follows. We divided the total synaptic area from one axon class formed onto one MSN post-synaptic compartment class (spine head, neck, dendritic shaft or soma; determined by the morphology classifier 5 ) by the sum of the synaptic area from that axon class onto all MSN compartments. For example, the data set-wide spine fraction for HVC onto MSNs was computed as follows: In addition to computing these quantities for the entire data set, we also performed the same analysis restricted to individual axons (Fig. 2c) or MSN dendrites (only for neurites with > 50 synapses) (Fig. 2d). This was done to estimate the distribution of targeted MSN postsynaptic compartments over individual neurites.

Synapse pair size correlation analysis of HVC and LMAN synapses
To identify signatures of Hebbian plasticity we computed several measures of size correlation in synapse pairs with the same pre-and postsynaptic neurites (Fig. 3, Extended Data Fig. 4, Extended Data Table 1). Specifically, 1) we analyzed normalized pair synapse size differences (coefficient of variation), 2) calculated the non-parametric Spearman rank correlation, 3) performed an analysis of variance (ANOVA) and 4) performed a comparison of the variances along the first two principal components. 1) We computed the normalized size difference of the two synapse sizes in a pair (minimum Euclidean distance of 1 µm) as with being the larger synapse size in µm 2 40 . v c = √2 s1+s2 s1−s2 1 s As control we sampled 200,000 pairs randomly from different populations of synapses: synapse pairs that shared the same axon (sA_dD), and random synapse pairs (rand), but all restricted pre-synaptically to either HVC or LMAN axons and post-synaptically to MSN dendrites. The shuffle control pair population was established by performing 1000 random pairings (without replacement) of the observed pair synapse sizes (Fig. 3d,e and Extended Data Fig. 4a-c). Mann Whitney U tests were performed on the cv distributions to assess statistical significance (Python's scipy.stats.mannwhitneyu). 2) Spearman rank correlation was calculated using Python's scipy.stats.spearmanr on mirrored pair sizes to account for the arbitrary order of synapses in a pair, i.e. for each pair we considered (s1, s2) and (s2, s1). 3) We further tested for correlated synapse pair sizes using a one-way ANOVA of log10-transformed synapse sizes, and a Kruskal-Wallis test of non-transformed synapse sizes. In both cases, the synapse sizes of a pre-post synapse pair were considered a group, which has the advantage that no artificial ordering has to be introduced, as in the case of the Spearman correlation coefficient. If synapse sizes in pre-post pairs were not independently drawn from the same size distribution, due to a size correlation, the ANOVA test would reject the null hypothesis of no difference in mean between the groups. The same argument applies to the rank based Kruskal-Wallis test (Extended Data Table 3). 4) We projected the mirrored pair synapse sizes on the (1,1) diagonal (s.d.1) and the (-1,1) diagonal (s.d.2) and then calculated the ratio of the standard deviations (s.d.1/s.d.2) along these axes in comparison to the shuffle control (Extended Data Fig. 4e).

Computational model of learning in Area X
A numerical simulation of the basal ganglia circuit underlying songbird vocal learning, focusing on HVC, LMAN, and MSN interactions, was implemented as a firing rate model in Python 52 . The total song duration was 300 ms (corresponding to a short typical zebra finch song), and the simulation was carried out with 1 ms temporal resolution. The model was implemented with 300 sparsely firing MSNs 39 , each receiving temporally specific input from a single dedicated HVC neuron that is active at a single 7ms interval in the song (delta-function peak smoothed by a Gaussian kernel with , corresponding to a ms σ = 3 full-width-at-half-maximum of~7 ms; activity of adjacent HVC neurons staggered by 1 ms). During singing, the HVC neurons form a continuous sequence 53,54 . The LMAN-MSN synapse is assumed to be non-plastic (one axon shared across all MSNs. LMAN activity is generated as random fluctuations between 0 and 1 (smoothed by a 3 ms box kernel). LMAN inputs to MSNs do not activate the MSN, but rather serve to gate plasticity at HVC-MSN synapses. Each HVC-MSN synapse has an eligibility trace defined as Supplementary Note 1 -Analysis of automatically reconstructed connectome To assess the degree to which the proofreading of the automated reconstructions affected our conclusions, we created a fully automated connectome without manual healing of splits (reconnects) or false mergers. The resulting segmentation contained slightly more neurites (n=46,033 vs. 41,324), due to a higher split rate. This was not offset by the additional false mergers, and had a marginal effect on the overall neurite length distributions (Extended Data Fig. 1a). We recreated key analyses regarding the model predictions, and found virtually no differences (Extended Data Fig. 1 b-d). While our results demonstrate the dramatic progress of automated reconstruction methods for EM data sets, we stress that manual proofreading still improved the quality of the data set. Other connectomic studies requiring more complete circuit-level analysis might still benefit from manual neurite proofreading. Fig. 1: Analysis of connectome without manually corrected reconstruction errors.

Extended Data
For panels b-f, analyses from panels in main figures are re-computed for data set without manual proofreading. a , Left: Distribution of neurite skeleton path lengths for the manually proofread connectome. Right: without manual proofreading. b , UMAP embeddings of data set neurites, as in Fig. 1d. c , Synapse size distribution of as in Fig. 2e, fitted by two slightly truncated log-normal distributions. d , Normalized histograms of the fraction of synaptic area with spine head synapses for different axon classes, corresponding to Fig. 1c,d. e , Observed pair cv values for dually connected HVC-on-MSN spine synapse pairs minus cv values for different sets of control synapse pairs, as in Fig. 3d. Error bars are of histogram bin. f , Change of HVC-on-MSN spine synapse size similarity ± √n relative to controls as in Fig. 3e. Error bars show relative s.e.m..