## Abstract

A major obstacle to understanding neural coding and computation is the fact that experimental recordings typically sample only a small fraction of the neurons in a circuit. Measured neural properties are skewed by interactions between recorded neurons and the “hidden” portion of the network. To properly interpret neural data and determine how biological structure gives rise to neural circuit function, we thus need a better understanding of the relationships between measured effective neural properties and the true underlying physiological properties. Here, we focus on how the effective spatiotemporal dynamics of the synaptic interactions between neurons are reshaped by coupling to unobserved neurons. We find that the effective interactions from a pre-synaptic neuron *r′* to a post-synaptic neuron *r* can be decomposed into a sum of the true interaction from *r′* to *r* plus corrections from every directed path from *r′* to *r* through unobserved neurons. Importantly, the resulting formula reveals when the hidden units have—or do not have—major effects on reshaping the interactions among observed neurons. As a particular example of interest, we derive a formula for the impact of hidden units in random networks with “strong” coupling—connection weights that scale with , where *N* is the network size, precisely the scaling observed in recent experiments. With this quantitative relationship between measured and true interactions, we can study how network properties shape effective interactions, which properties are relevant for neural computations, and how to manipulate effective interactions.

## Introduction

Establishing relationships between a network’s architecture and its function is a fundamental problem in neuroscience and network science in general. Not only is the architecture of a neural circuit intimately related to its function, but pathologies in wiring between neurons are believed to contribute significantly to circuit dysfunction [1–15].

A major obstacle to uncovering structure-function relationships is the fact that most experiments can only directly observe small fractions of an active network. State-of-the-art methods for determining connections between neurons in living networks infer them by fitting statistical models to neural spiking data [16–24]. However, the fact that we cannot observe all neurons in a network means that the statistically inferred connections are “effective” connections, representing some dynamical relationship between the activity of nodes but not necessarily a true physical connection [24–32]. Intuitively, reverberations through the network must contribute to these effective interactions; our goal in this work is to formalize this intuition and establish a quantitative relationship between measured effective interactions and the true synaptic interactions between neurons. With such a relationship in hand we can study the effective interactions generated by different choices of synaptic properties and circuit architectures, allowing us to not only improve interpretation of experimental measurements but also probe how circuit structure is tied to function.

The intuitive relationship between measured and effective interactions is demonstrated schematically in Fig. 1. Fig. 1A demonstrates that in a fully-sampled network the directed interactions between neurons—here, the change in membrane potential of the post-synaptic neuron after it receives a spike from the pre-synaptic neuron—can be measured directly, as observation of the complete population means different inputs to a neuron are not conflated. However, as shown in Fig. 1B, the vastly more realistic scenario is that the recorded neurons are part of a larger network in which many neurons are unobserved or “hidden.” The response of the post-synaptic neuron 2 to a spike from pre-synaptic neuron 1 is a combination of both the direct response to neuron 1’s input as well as input from the hidden network driven by neuron 1’s spiking. Thus, the measured membrane response of neuron 2 due to a spike fired by neuron 1—which we term the “effective interaction” from neuron 1 to 2—may be quite different from the true interaction. It is well-known that circuit connections between recorded neurons, as drawn in Fig. 1C, are at best effective circuits that encapsulate the effects of unobserved neurons, but are not necessarily indicative of the true circuit architecture. The formalized relationship we will establish in the Results is given in Fig. 2.

Even once we establish a relationship between the effective and true connections, we will in general not be able to use measurements of effective interactions to extrapolate back to a unique set of true connections; at best, we may be able to characterize some of the statistical properties of the full network. The obstacle is that several different networks—different both in terms of architecture and intrinsic neural properties—may give rise to the same network behaviors, a theme of much focus in the neuroscience literature [33–38]. That is, inferring the connections and intrinsic neural properties in a full network from activity recordings from a subset of neurons is in general an ill-posed problem, possessing several degenerate solutions. Several statistical inference methods have been constructed to attempt to infer the presence of, and connections to, hidden neurons [27, 39–41]; the subset of the degenerate solutions that each of these methods finds will depend on the particular assumptions of the inference method (e.g., the regularization penalties applied). As an example, we demonstrate two small circuit motifs that give rise to nearly identical effective interactions, despite crucial differences between the circuits.

Understanding the effect of hidden neurons on small circuit motifs is only a small part of the hidden neuron puzzle, and a full understanding necessitates scaling up to large circuits containing many different motifs. Having an analytic relationship between true and effective interactions greatly facilitates such analyses by directly studying the structure of the relationship itself, rather than trying to extract insight indirectly through simulations. In particular, in going to large networks we focus on the degree to which hidden neurons skew measured interactions (Fig. 5), and how we can predict the features of effective interactions we expect to measure when recording from only a subset of neurons in a network with hypothesized true interactions (Fig. 6).

Establishing a theoretical relationship between measured and “true” interactions will thus enable us to study how one can alter the network properties to reshape the effective interactions, and will be of immediate importance not only for interpreting experimental measurements of synaptic interactions, but for elucidating their role in neural coding. Moreover, understanding how to shape effective interactions between neurons may yield new avenues for altering, in a principled way, the computations performed by a network, which could have applications for treating neurological diseases caused in part by pathological synaptic interactions.

## Results

### Overview

Our goal is to derive a relationship between the effective synaptic interactions between recorded neurons and the true synaptic interactions that would be obtained if the network were fully observed. This makes explicit how the synaptic interactions between neurons are modified by unobserved neurons in the network, and under what conditions these modifications are—or are not—significant. We derive this result first, using a probabilistic model of network activity in which all properties are known. We then build intuition by applying our result to two simple networks: a 3-neuron feedforward-inhibition circuit in which we are able to qualitatively reproduce measurements by Pouille and Scanziani [42], and a 4-neuron circuit that demonstrates how degeneracies in hidden networks are handled within our framework.

To extend our intuition to larger networks, we then study the effective interactions that would be observed in sparse random networks with *N* cells and strong synaptic weights that scale as [43–46], as has been recently observed experimentally [47]. We show how unobserved neurons significantly reshape the effective synaptic interactions away from the ground-truth interactions. This is not the case with “classical” synaptic scaling, in which synaptic strengths are inversely proportional to the number of inputs they receive (assumed 𝒪 (*N*)), as we will also show. (The case of classical scaling has also been studied previously using a different approach in [48–51]).

### Model

We model the full network of *N* neurons as a nonlinear Hawkes process [52], commonly known as a “Generalized linear (point process) model” in neuroscience, and broadly used to fit neural activity data [16–23, 53]. Here we use it as a generative model for network activity, as it approximates common spiking models such as leaky integrate and fire systems driven by noisy inputs [54, 55], and is equivalent to current-based leaky integrate-and-fire models with soft-threshold (stochastic) spiking dynamics (see Methods).

To derive an approximate model for an observed subset of the network, we partition the network into recorded neurons (labeled by indices *r*) and hidden neurons (labeled by indices *h*). Each recorded neuron has an instantaneous firing rate *λ*_{r}(*t*) such that the probability that the neuron fires within a small time window [*t, t* + *dt*] is *λ*_{r}(*t*)*dt*, when conditioned on the inputs to the neuron. The instantaneous firing rate in our model is
where *λ*_{0} is a characteristic firing rate, *ϕ*(*x*) is a non-negative, continuous function, *μ*_{r} is a tonic drive that sets the baseline firing rate of the neuron, and is the convolution of the synaptic interaction (or “spike filter”) *J*_{i,j}(*t*) with spike train *from* pre-synaptic neuron *j to* post-synaptic neuron *i*. In this work we take the tonic drive to be constant in time, and focus on the steady-state network activity in response to this drive. We con sider interactions of the form *J*_{i,j}(*t*) *=𝒥*_{i,j}*g*_{j}(*t*), where the temporal waveforms *g*_{j}(*t*) are normalized such that for all neurons *j*. Because of this normalization, the weight *𝒥*_{i,j} carries units of time. We include self-couplings *J*_{i,i}(*t*) not to represent autapses, but to account for intrinsic neural properties such as refractory periods (𝒥_{i,i}*<* 0) or burstiness (𝒥_{i,i}*>* 0). The firing rates for the hidden neurons follow the same expression with indices *h* and *r* interchanged.

We seek to describe the dynamics of the recorded neurons entirely in terms of their own set of spiking histories, eliminating the dependence on the activity of the hidden neurons. This demands calculating the effective membrane response of the recorded neurons by averaging out the activity of the hidden neurons, *conditioned on the activity of the recorded neurons*. In practice this is intractable to perform exactly [56–58]. Here, we use a mean field approximation to calculate the mean input from the hidden neurons (again, conditioned on the activity of the recorded neurons). The value of deriving such a relationship analytically, as opposed to simply numerically determining the effective interactions, is that the resulting expression will give us insight into how the effective interactions decompose into contributions of different network features, how tuning particular features shapes the effective interactions, and conditions under which we expect hidden units to skew our measurements of connectivity in large partially observed networks.

As shown in detail in the Methods, the instantaneous firing rates of the recorded neurons can then be approximated as
The effective baselines , are simply modulated by the net input to the neuron, so we do not focus on them here. The effective coupling filters are given in the frequency domain by
These results hold for any pair of recorded neurons *r′* and *r*, and any choice of network parameters for which the mean field steady state of the hidden network exists. Here, the *v*_{h} are the steady-state mean firing rates of the hidden neurons and is the linear response function of the hidden network to perturbations in the *input*. That is, Γ_{h,h′} (*t* - *t′*) is the linear response of hidden neuron *h* at time *t* due to a perturbation to the input of neuron *h′* at time *t′*, and incorporates the effects of *h′* propagating its signal to *h* through other hidden neurons, as demonstrated graphically in Fig. 2. Both *v*_{h} and are calculated *in the absence of the recorded neurons*. In deriving these results, we have neglected both fluctuations around the mean input from the hidden neurons, as well as higher order filtering of the recorded neuron spikes. For details on the derivations and justification of approximations, see the Methods and Supporting Information (SI).

The effective coupling filters are what we would—in principle—measure experimentally if we observe only a subset of a network, for example by pairwise recordings shown schematically in Fig. 1. For larger sets of recorded neurons, interactions between neurons are typically inferred using statistical methods, an extremely nontrivial task [16–23, 27, 39, 40], and details of the fitting procedure could potentially further skew the inferred interactions away from what would be measured by controlled pairwise recordings. We will put aside these complications here, and assume we have access to an inference procedure that allows us to measure without error, so that we may focus on their properties and relationship to the ground-truth coupling filters.

### Structure of effective coupling filters

The ground-truth coupling filters are modified by a correction term . The linear response function admits a series representation in terms of paths through the network through which neuron *r′* is able to send a signal to neuron *r via hidden neurons only*.

We may write down a set of “Feynmanesque” graphical rules for explicitly calculating terms in this series [52]. First, we define the gain, *γ*_{h}*≡ λ*_{0}*ϕ′* (*μ*_{h′} +_{h′}*h,h*_{′}*v*_{h}_{′}). The contribution of each term can then be written down using the following rules, shown graphically in Fig. 2: *i*) for the edge connecting recorded neuron *r′* to a hidden neuron *h*_{i}, assign a factor ; *ii*) for each node corresponding to a hidden neuron *h*_{i}, assign a factor ; *iii*) for each edge connecting hidden neurons *h*_{i}*≠ h*_{j}, assign a factor ; and *iv*) for the edge connecting hidden neuron *h*_{j} to recorded neuron *r*, assign a factor . All factors for each path are multiplied together, and all paths are then summed over.

The graphical expansion is reminiscent of recent works expanding correlation functions of linear models of network spiking in terms of network “motifs” [59–61]. Computationally, this expression is practical for calculating the effective interactions in small networks involving only a few hidden neurons (as in the next section), but is generally unwieldy for large networks. In practice, the linear response matrix can be calculated directly by numerical matrix inversion and an inverse Fourier transform back into the time domain. The utility of the path-length series is the intuitive understanding of the origin of contributions to the effective coupling filters and our ability to analytically analyze the strength of contributions from each path. For example, one immediate insight the path decomposition offers is that neurons only develop effective interactions between one another if there is a path by which one neuron can send a signal to the other.

### Feedforward inhibition and degeneracy of hidden networks in small circuits

#### Effective interactions & emergent timescales in a small circuit

To build intuition for our result and compare to a well-known circuit phenomenon, we apply our Eq. (2) to a 3-neuron feedforward inhibition circuit, like that studied by Pouille and Scanziani [42]. Feedforward inhibition can sharpen the temporal precision of neural coding by narrowing the “window of opportunity” in which a neuron is likely to fire. For example, in the circuit shown in Fig. 3A, excitatory neuron 1 projects to both neurons 2 and 3, and 3 projects to 2. Neuron 1 drives both 2 and 3 to fire more, while neuron 3 is inhibitory and will counteract the drive neuron 2 receives from 1. The window of opportunity can be understood by looking at the effective interaction between neurons 1 and 2, treating neuron 3 as hidden. We use our path expansion (Fig. 2) to quickly write down the effective interaction we expect to measure in the frequency domain, The corresponding true synaptic interactions and resulting effective interaction are shown in Fig. 3B. The effective interaction matches qualitatively the observed membrane changes measured by Pouille and Scanziani [42], and shows a narrow window after neuron 2 receives a spike in which the change in membrane potential is depolarized and neuron 2 is more likely to fire. Following this brief window, the membrane potential is hyperpolarized and the cell is less likely to fire until it receives more excitatory input.

The effective interaction from neuron 1 to 2 in this simple circuit also displays several features that emerge in more complex circuits. Firstly, although the true interactions are either excitatory (positive) or inhibitory (negative), the effective interaction has a mixed character, being initially excitatory (due to excitatory inputs from neuron 1 arriving first through the monosynaptic pathway), but then becoming inhibitory (due to inhibitory input arriving from the disynaptic pathway).

Secondly, emergent timescales develop due to reverberations between hidden neurons with bi-directional connections, represented as loops between neurons in our circuit schematics (e.g., between neurons 3 and 4 in Fig. 4). This includes self-history interactions such as refractoriness, schematically represented by loops like the 3 *→* 3 loop shown in Fig. 3, corresponding to the factor 1/(1 *- γ*_{3}*Ĵ*_{3,3}(*ω*))). In the particular example shown in Fig. 3, in which we use a self-history interaction *J*_{33}(*τ*) = *𝒥*_{33}*β*_{33} exp(–*β*_{33}*τ*), a new timescale develops. Other choices of interactions can generate more complicated emergent timescales and temporal dynamics, including oscillations. For example, in the 4-neuron circuit discussed below (Fig. 4), the choice *J*_{3,4}(*τ*) = *J*_{4,3}(*τ*) = –| *𝒥* |*α*^{2}*τ e*^{-ατ} yields effective interactions with new decay and oscillatory timescales equal to (*α*(1 – *λ*_{0} |*𝒥*|))^{−1} and (*αλ*_{0} |*𝒥*|)^{−1}. In the larger networks we consider in the next section, inter-neuron interactions must scale with network size in order to maintain network stability. Because emergent timescales depend on the synaptic strengths of hidden neurons, we typically expect emergent timescales generated by loops between hidden neurons to be negligible in large random networks. However, because the magnitudes of the self-history interaction strengths need not scale with network size, they may generate emergent timescales large enough to be detected.

It is worth noting explicitly that only the interaction from neuron 1 to 2 has been modified by the presence of the hidden neuron 3, for the particular wiring diagram shown in Fig. 3. The self-history interactions of both neurons 1 and 2, as well as the interaction from neuron 2 to 1 (zero in this case) are unmodified. The reason the hidden neuron did not modify these interactions is that the only link neuron 3 makes is from 1 to 2. There is no path by which neuron 1 can send a signal back to itself, hence its self-interaction is unmodified, nor is there a path that neuron 2 can send signals to neuron 3 or on to neuron 1, and hence neuron 2’s self-history interaction and its interaction to neuron 1 are unmodified.

#### Degeneracy of hidden networks giving rise to effective interactions

It is well known that different networks may produce the same observed circuit phenomena [33–38]. To illustrate that our approach may be used to identify degenerate solutions in which more than one network underlies observed effective interactions, we construct a 4-neuron circuit that produces a quantitatively similar effective interaction between the recorded neurons 1 and 2, shown in Fig. 4. Specifically, in this circuit we have removed neuron 3’s self-history interaction and introduced a second inhibitory hidden network that receives excitatory input from neuron 1 and provides inhibitory input to neuron 3. By tuning the interaction strengths we are able to produce the desired effective interaction. This demonstrates that intrinsic neural properties such as refractoriness can trade off against inputs from other hidden neurons, making it difficult to distinguish the two cases from one another (or from a potentially infinity of other circuits that could have produced this interaction; for example, a qualitatively similar interaction is produced in the *N* = 1000 network in which only three neurons are recorded, shown below in Fig. 6). Statistical inference methods may favor one of the possible underlying choices of complete network consistent with a measured set of effective interactions, suggesting there may be some sense of a “best” solution. However, the particular “best” network will depend on many factors, including the amount and fidelity of data recorded, regularization choices, and how well the fitted model generalizes to new data (i.e., how “close” the fitted model is to the generative model). Potentially, if these conditions were met, with enough data the slight quantitative differences between the effective interactions produced by different hidden networks (including higher order effective interactions, which we assume to be negligible here; see SI), could help distinguish different hidden networks. However, the amount of data required to perform this discrimination and validate the result may be impractically large [35, 62–64]. It is thus worth studying the structure of the observed effective interactions directly in search of possible signatures that elucidate the statistical properties of the complete network.

#### Strongly coupled large networks

Constructing networks that produce particular effective interactions is tractable for small circuits, but much more difficult for larger circuits composed of many circuit motifs. Not only can combinations of different circuit motifs interact in unexpected ways, one must also take care to ensure the resulting network is both active and stable—i.e., that firing will neither die out nor skyrocket to the maximum rate. Stability in networks is often implemented by either building networks with classical (or “weak”) synapses whose strength scales inversely with the number of inputs they receive, assumed here to be proportional to network size, and hence *𝒥*_{i,j}*∼* 1*/N*, or by building balanced networks in which excitatory and inhibitory synaptic strengths balance out, on average, and scale as [43, 47]. In both cases the synapses tend to be small in value in large networks, but are compensated for by large numbers of incoming connections. In the case of 1*/N* scaling, neurons are driven primarily by the mean of their inputs, while in “strong” balanced networks neurons are driven primarily by fluctuations in their inputs.

Our goal is to understand how the interplay between the presence of hidden neurons and different synaptic scaling or network architectures shapes effective interactions. Previous work has studied the hidden-neuron problem in the weak coupling limit [48–51] using a different approach to relate inferred synaptic parameters to true parameters; here we use our approach to study the strong coupling limit, theoretically predicted to be an important feature that supports computations in networks in a balanced regime [43–46]. Moreover, experiments in cultured neural tissue have been found to be more consistent with the scaling than 1*/N* [47], indicating that it may have intrinsic physiological importance.

We analytically determine how significantly effective interaction strengths are skewed away from the true interaction strengths as a function of both the number of observed neurons and typical synaptic strength. We consider several simple networks ubiquitous in neural modeling: first, an Erdős-Réyni (ER) network with “mixed synapses” (i.e., a neuron may have both positive and negative synaptic weights), a balanced ER network with Dale’s law imposed (a neuron’s synapses are all the same sign), and a Watts-Strogatz (WS) small world network with mixed synapses. Each network has *N* neurons and connection sparsity *p* (only 100*p*% of connections are non-zero). Connections in ER networks are chosen randomly and independently, while connections in the WS network are determined by randomly rewiring a fraction *β* of the connections in a (*pN*)^{th}-nearest-neighbor ring network. In each network *N*_{rec} neurons are recorded randomly.

For simplicity we take the baselines of all neurons to be equal, *μ*_{i} = *μ*_{0} (such that in the absence of synaptic input the probability that a neuron fires in a short time window Δ*t* is *λ*_{0}Δ*t* exp(*μ*_{0})). We choose the rate nonlinearity to be exponential, *ϕ*(*x*) = *e*^{x}; this is the “canonical” choice of nonlinearity often used when fitting this model to data [16–18, 20, 65]. We will further assume exp(*μ*_{0}) ≪ 1, so that we may use this as a small control parameter. For *i* ≠ *j*, the non-zero synaptic weights between neurons *𝒥*_{i,j} are independently drawn from a normal distribution with zero mean and standard deviation *J*_{0}/(*pN*)^{2a}, where *J*_{0} controls the overall strength of the weights and *a* = 1 or 1/2, corresponding to “weak” and “strong” coupling. For simplicity, we do not consider intrinsic self-coupling effects in this part of the analysis, i.e., we take *𝒥*_{i,i} = 0 for all neurons *i*. For the Dale’s law network, the overall distribution of synaptic weights follows the same normal distribution as the mixed synapse networks, but the signs of the weights correspond to whether the pre-synaptic neuron is excitatory or inhibitory. Neurons are randomly chosen to be excitatory and inhibitory, the average number of each type being equal so that the network is balanced. Numerical values of all parameters are given in Table 1 in the Methods.

We seek to assess how the presence of hidden neurons can shape measured network interactions. We first focus on the typical strength of the effective interactions as a function of both the fraction of neurons recorded, *f* = *N*_{rec}*/N*, and the strength of the synaptic weights *J*_{0}. We quantify the strength of the effective interactions by defining the effective synaptic weights ; c.f. for the true synaptic weights. We then study the sample statistics of the difference, , averaged across both subsets of recorded neurons and network instantiations, to estimate the typical contribution of hidden neurons to the measured interactions. The mean of the synaptic weights is near zero (because the weights are normally distributed with zero mean in the mixed synapse networks and due to balance of excitatory and inhibitory neurons in the Dale’s law network), so we focus on the root-mean-square of . This measure is a conservative estimate of changes in strength, as may have both positive and negative components that partially cancel when integrated over time, unlike *J*_{r,r′} (*τ*). An alternative measure we could have chosen that avoids potential cancellations is , i.e., the integrated absolute difference between effective and true interactions. However, this will depend on our specific choices of waveform *g*(*τ*), whereas does not due to our normalization . As | ∫ *dτ f* (*τ*)| *≤ ∫ dτ |f* (*τ*)*|*, for any *f* (*τ*), we can consider our choice of as a lower bound on the strength that would be quantified .

Numerical evaluations of the population statistics for all three network types are shown as solid curves in Fig. (5), for both strong coupling and weak coupling. All three networks yield qualitatively similar results. The vertical axes measure the root-mean-square deviations between the statistically expected true synaptic *𝒥r,r*_{′} and the corresponding effective synaptic weight , normalized by the true root mean square of *𝒥*_{r,r′}. Thus, a ratio of 0.5 corresponds to a 50% root-mean difference in effective versus true synaptic strength. We measure these ratios as a function of both the fraction of neurons recorded (horizontal axis) and the parameter *J*_{0} (labeled curves).

There are two striking effects. First, deviations are nearly negligible for 1*/N* scaling of connections. Thus, for large networks with synapses that scale with the system size, vast numbers of hidden neurons combine to have negligible effect on effective couplings. This is in marked contrast to the case when coupling is strong ( scaling), when hidden neurons have a pronounced impact (*𝒪* (1)). This is particularly the case when *f* ≪ 1—the typical experimental case in which the hidden neurons outnumber observed ones by orders of magnitude—or when *J*_{0}; ≲ 1.0, when typical deviations become half the magnitude of the true couplings themselves (upper blue line). For *J*_{0} ≳.0, the network activity is unstable for an exponential nonlinearity.

To gain analytical insight into these numerical results, we calculate the standard deviation , normalized by *σ*[*𝒥*_{r,r′}], for contributions from paths up to length-3, focusing on the case of the ER network with mixed synapses (the Dale’s law and WS networks are more complicated, as the moments of the synaptic weights depend on the identity of the neurons). For strong coupling we find
corresponding to the black dashed curves in Fig. 5 left. Eq. (4) is a truncation of a series in powers of , where *f* = *N*_{rec}*/N* is the fraction of recorded neurons. The most important feature of this series is the fact that it only depends on the *fraction* of recorded neurons *f*, not the absolute number, *N*. Contributions from long paths remain finite, even as *N → ∞*. In contrast, the correspondin/g expression for in the case of weak 1*/N* coupling is a series is in powers of , so that contributions from long paths are negligible in large networks *N ≫* 1. (See [65] for derivation and results for *N* = 100.) Deviations of Eq. (4) from the numerical solutions in Fig. 5 indicate that contributions from truncated terms are not negligible when *f ≪* 1. As these terms correspond to paths of length-4 or more, this shows that long chains through the network contribute significantly to shaping effective interactions.

The above analysis demonstrates that the strength of the effective interactions can deviate from that of the true direct interactions by as much as 50%. However, changes in strength do not give us the full picture—we must also investigate how the temporal dynamics of the effective interactions change. To illustrate how hidden units can skew temporal dynamics, in Fig. 6 we plot the effective vs. true interactions between *N*_{rec} = 3 neurons in an *N* = 1000 neuron network. Because the three network types considered in Fig. 5 yield qualitatively similar results, we focus on the Erdős-Réyni network with mixed synapses.

Four of the true interactions between neurons shown in Fig. 6 are non-zero , and . Of these, three exhibit only slight differences between the true and effective interactions: and have slightly longer decay timescales than their true counterparts, while has a slightly shorter timescale, indicating the contribution of the hidden network to these interactions was either small or cancelled out. However, the interaction differs significantly from the true interaction, becoming initially excitatory but switching to inhibitory after a short time, as in our earlier example case of feedforward inhibition. This indicates that neuron 2 must drive a cascade of neurons that ultimately provide inhibitory input to neuron 3.

Contrasting the true and effective interactions shown in Fig. 6 highlights many of the ways in which hidden neurons skew the temporal properties of measured interactions. An immediately obvious difference is that although the true synaptic connections in the network are sparse, the effective interactions are not. This is a generic feature of the effective interaction matrix, as in order for an effective interaction from a neuron *r′* to *r* to be identically zero there cannot be any paths through the network by which *r′* can send a signal to *r*. In a random network the probability that there are no paths connecting two nodes tends to zero as the network size *N* grows large. Note that this includes paths by which each neuron can send a signal back to itself, hence the neurons developed effective self-interactions, even though the true self-interactions are zero in these particular simulations.

## Discussion

We have derived a quantitative relationship between “ground-truth” synaptic interactions and the effective interactions (interpreted here as post-synaptic membrane responses) that unobserved neurons generate between subsets of observed neurons. This relationship, Eq. (2) and Fig. 2, provides a foundation for studying how different network architectures and neural properties shape the effective interactions between subsets of observed neurons. Our approach can be also be used to study higher order effective interactions between 3 or more neurons, and can be systematically extended to account for corrections to our mean-field approximations and investigate effective noise generated by hidden neurons (using field theoretic techniques from [52], see SI), as well as time-dependent external drives or steady-states.

Here, as first explorations, we focused on the effective interactions corresponding to linear membrane responses. We first demonstrated that our approach applied to small feedforward inhibitory circuits yields effective interactions that capture the role of inhibition in shortening the time window for spiking, and are qualitatively similar to experimentally observed measurements [42]. Moreover, we used this example to demonstrate explicitly that different hidden networks can give rise to the same effective interactions between neurons. We then showed that the influence of hidden neurons can remain significant even in large networks in which the typical synaptic strengths scale with network size. In particular, when the synaptic weights scale as , the relative influence of hidden neurons depends only on the fraction of neurons recorded. Together with theoretical and experimental evidence for this scaling in cortical slices [43–47], this suggests that neural interactions inferred from cortical activity data may differ markedly from the true interactions and connectivity.

The issue of degeneracy in complex biological systems and networks has been discussed at length in the literature, in the context of both inherent degeneracies—multiple different network architectures can produce the same qualitative behaviors [33, 36–38], as well as degeneracies in our model descriptions—many models may reproduce experimental observations, demanding sometimes arbitrary criteria for selecting one model over another. All have implications for how successfully one can infer unobserved network properties. One kind of model degeneracy, “sloppiness” [34, 63], describes models in which the behavior of the model is sensitive to changes in only a relatively small number of directions in parameter space. Many models of biological systems have been shown to be sloppy [34]; this could account for experimentally observed networks that are quite different in composition but produce remarkably similar behaviors. Sloppiness suggests that rather than trying to infer all properties of a hidden network, there may be certain parameter combinations that are much more important to the overall network operation, and could potentially be inferred from subsampled observations.

Another perspective on model degeneracy comes from the concepts of “universality” that occur in random matrix theory [66, 67] and Renormalization Group methods of statistical physics [62]. Many bulk properties of matrices (e.g., the distribution of eigenvalues) whose entrees are combinations random variables, such as our , are universal in that they depend on only a few key details of the distribution that individual elements are drawn from [68]. Similarly, one of the central results of the Renormalization Group shows that models with drastically disparate features may yield the same coarse-grained model structure when many degrees of freedom are averaged out, as in our case of approximately averaging out hidden neurons. Different distributions (in the case of random matrix theory) or different models (in the case of the Renormalization group) that yield the same bulk properties or coarse-grained models are said to be in the same “universality class.” Measuring particular quantities under a range of experimental conditions (e.g., different stimuli) may be able to reveal which universality class an experimental system belongs to and eliminate models belonging to other universality classes as candidate generating models of the data, but these measurements cannot distinguish between models within a universality class. As our network of subsampled neurons can be thought of as a model in which the hidden network has been approximately averaged over, this means we can potentially use Eq. (2) to rule out sets of models of the hidden network that are inconsistent with measured sets of effective interactions (e.g., hidden networks with given network *statistics*), even if we are unable to uniquely pin down the true hidden network (i.e., the exact or even approximate *configuration* of network parameters drawn those statistical distributions).

The fact that many different hidden networks may yield the same set of effective interactions suggests that the effective interactions themselves may yield direct insight into a circuit’s functions. For instance, many circuits consist of principal neurons that transmit the results of circuit computation to downstream circuitry, but often do not make direct connections with one another, instead interacting through (predominantly inhibitory) intermediaries called interneurons. From the point of view of a downstream circuit, the principal neurons are “recorded” and the interneurons are “hidden.” A potential reason for this general arrangement is that direct synaptic interactions alone are insufficient to produce the membrane responses required to perform the circuit’s computations, and the network of interneurons reshapes the membrane responses of projection neurons into effective interactions that can perform the desired computations—it may thus be that the effective interactions should be of primary interest, not necessarily the (possibly degenerate choices of) physiological synaptic interactions. For example, in the feedforward inhibitory circuits of Figs. 3 and 4, the roles of the hidden inhibitory neurons may simply be to act as interneurons that reshape the interaction between the excitatory projection neurons 1 and 2, and the choice of which particular circuit motif is implemented in a real network is determined by other physiological constraints, not only computational requirements.

One of the greatest achievements in systems neuroscience would be the ability to perform targeted modifications to a large neural circuit and *selectively* alter its suite of computations. This would have powerful applications for both studying a circuit’s native computations, but also repurposing circuits or repairing damaged circuitry (due to, e.g., disease). If the computational roles of circuits are indeed most sensitive to the effective interactions between principal neurons, this suggests we can exploit potential degeneracies in the interneuron architecture and intrinsic properties to find *some* circuit that achieves a desired computation, even if it is not a physiologically natural circuit. Our main result relating effective and true interactions, Eq. (2), provides a foundation for future work investigating how to identify sets of circuits that perform a desired set of computations. We have shown in this work that it can be done for small circuits (Figs. 3 and 4), and that the effective interactions in large random networks can be significantly skewed away from the true interactions when synaptic weights scale as , as observed in experiments [47]. This holds promise for identifying principled approaches to tuning or controlling neural interactions, such as by using neuromodulators to adjust interneuron properties or inserting artificial or synthetic circuit implants into neural tissue to act as “hidden” neurons. If successful, this could contribute to the long term goal of using such interventions to aid in reshaping the effective synaptic interactions between diseased neurons, and thereby restore healthy circuit behaviors.

## Methods

### Model definition and details

The firing rate of a neuron *i* in the full network is given by
where *λ*_{0} is a characteristic rate, *ϕ*(*x*) ≥ 0 is a nonlinear function, *μ*_{i} (potentially a function of some external stimulus *θ*) is a time-independent tonic drive that sets the baseline firing rate of the neuron in the absence of input from other neurons, is an external input current, and *J*_{ij}(*t – t′*) is a coupling filter that filters spikes fired by presynaptic neuron *j* at time *t′*, incident on post-synaptic neuron *i*. We will take for simplicity in this work, focusing on the activity of the network due to the tonic drives *μ*_{i} (which could be still be interpreted as external tonic inputs, so the activity of the network need not be interpreted as spontaneous activity).

While we need not attach a mechanistic interpretation to these filters, a convenient interpretation is that the nonlinear Hawkes model approximates the stochastic dynamics of a leaky integrate-and-fire network model driven by noisy inputs [54, 55]. In fact, the nonlinear Hawkes model is equivalent to a current-based integrate-and-fire model in which the deterministic spiking rule (a spike fires when a neuron’s membrane potential reaches a threshold value *V*_{th}) is replaced by a stochastic spiking rule (the higher a neuron’s membrane potential, the higher the probability a neuron will fire a spike). (It can also be mapped directly to a conductance-based in special cases [69]). For completeness, we present the mapping from a leaky integrate-and-fire model with stochastic spiking to Eq. (5) in the Supporting Information (SI).

### Derivation of effective baselines and coupling filters

To study how hidden neurons affect the inferred properties of recorded neurons, we partition the network into “recorded” neurons, labeled by indices *r* (with sub- or superscripts to differentiate different recorded neurons, e.g., *r* and *r′* or *r*_{1} and *r*_{2}) and “hidden” neurons labeled by indices *h* (with sub- or superscripts). The rates of these two groups are thus
To simplify notation, we write . If we seek to describe the firing of the recorded neurons only in terms of their own spiking history, input from hidden neurons effectively acts like noise with some mean amount of input. We thus begin by splitting the hidden input to the recorded neurons up into two terms, the mean plus fluctuations around the mean:
where denotes the mean activity of the hidden neurons conditioned on the activity of the recorded units, and *ξ*_{r}(*t*) are the fluctuations, i.e., . Note that *ξ*_{r}(*t*) is also conditional on the activity of the recorded units.

By construction, the mean of the fluctuations is identically zero, while the cross-correlations can be expressed as
where is the cross-correlation between hidden neurons *h*_{1} and *h*_{2} (conditioned on the spiking of recorded neurons). If the autocorrelation of the fluctuations (*r* = *r′*) is small compared to the mean input to the recorded neurons , or if *J*_{r,h} is small, then we may neglect these fluctuations and focus only on the effects that the mean input has on the recorded subnetwork. At the level of the mean field theory approximation we make in this work, the spike-train correlations are zero. One can calculate corrections to mean field theory (see SI) to estimate the size of this noise, however, even when this noise is non-negligible it can be treated as a separate input to the recorded neurons, and hence will not alter the form of the effective couplings between neurons. Averaging out the effective noise, however, will generate new interactions between neurons; we leave investigation of this issue for future work.

In order to calculate how hidden input shapes the activity of recorded neurons, we need to calculate the mean . This mean input is difficult to calculate in general, especially when conditioned on the activity of the recorded neurons. In principle, the mean can be calculated as
This is not a tractable calculation. Taylor series expanding the nonlinearity *ϕ*(*x*) reveals that the mean will depend on *all* higher cumulants of the hidden unit spike trains, which cannot in general be calculated explicitly. Instead, we again appeal to the fact that in a large, sufficiently connected network, we expect fluctuations to be small, as long as the network is not near a critical point. In this case, we may make a mean field approximation, which amounts to solving the self-consistent equation
In general, this equation must be solved numerically. Unfortunately, the conditional dependence on the activity of the recorded neurons presents a problem, as in principle we must solve this equation for *all possible patterns of recorded unit activity*. Instead, we note that the mean hidden neuron firing rate is a *functional* of the filtered recorded input , so we can expand it as a functional Taylor series around some reference filtered activity ,
Within our mean field approximation, the Taylor coefficients are simply the response functions of the network — i.e., the zeroth order coefficient is the mean firing rate of the neurons in the reference state , the first order coefficient is the linear response function of the network, the second order coefficient is a nonlinear response function, and so on.

There are two natural choices for the reference state . The first is simply the state of zero recorded unit activity, while the second is the mean activity of the recorded neurons. The zero-activity case conforms to the choice of nonlinear Hawkes models used in practice. Choosing the mean activity as the reference state may be more appropriate if the recorded neurons have high firing rates, but requires adjusting the form of the nonlinear Hawkes model so that firing rates are modulated by filtering the *deviations* of spikes from the mean firing rate, rather than filtering the spikes themselves. Here, we focus on the zero-activity reference state. We present the formulation for the mean field reference state in the SI.

For the zero-activity reference state , the conditional mean is
The mean inputs are the mean field approximations to the firing rates of the hidden neurons in the absence of the recorded neurons. Defining , these firing rates are given by
in writing this equation we have assumed that the steady-state mean field firing rates will be time-independent, and hence the convolution *J*_{h,h′}**v*_{h′} = *𝒥*_{h,h′}*v*_{h′}, where_{}. This assumption will generally be valid for at least some parameter regime of the network, but there can be cases where it breaks down, such as if the nonlinearity *ϕ*(*x*) is bounded, in which case a transition to chaotic firing rates *v*_{h}(*t*) may exist (c.f. [70]). The mean field equations for the *v*_{h} are a system of transcendental equations that in general cannot be solved exactly. In practice we will solve the equations numerically, but we can develop a series expansion for the solutions (see below).

The next term in the series expansion is the linear response function of the hidden unit network, , given by the solution to the integral equation
The “gain” *γ*_{h} is defined by
where *ϕ′*(*x*) is the derivative of the nonlinearity with respect to its argument.

For time-independent drives *μ*_{r} and steady states *v*_{h} (and hence *γ*_{h}), we may solve for Γ_{h,h}_{′} (*t – t′*) by first converting to the frequency domain and then performing a matrix inverse:
where .

If the zero and first order Taylor series coefficients in our expansion of are the dominant terms— i.e., if we may neglect higher order terms in this expansion—then we may approximate the instantaneous firing rates of the recorded neurons by Where are the effective baselines of the recorded neurons and are the effective coupling filters in the frequency domain, as given in the main text. In addition to neglecting the higher order spike filtering terms here, we have also neglected fluctuations around the mean input from the hidden network. These fluctuations are zero within our mean field approximation, but we could in principle calculate corrections to the mean field predictions using the techniques of [52]; we do so to estimate the size of the effective noise correlations in the SI.

In the main text, we decompose our expression for into contributions from all paths that a signal can travel from neuron *r′* to *r*. To arrive at this interpretation, we note that we can expand in a series over paths through the hidden network. To start, we note that if for some matrix norm ∥ ·∥, then the matrix [𝕀 *-* **V**(*ω*)]^{−1} admits a convergent series expansion [71]
where ** is a matrix product and ****. We can write an element of the matrix product out as
inserting yields
This expression can be interpreted in terms of summing over paths through network of hidden neurons that join two observed neurons: the are represented by edges from neuron ***h*_{j} to *h*_{i}, and the *γ*_{h}_{i} are represented by the nodes. In this expansion, we allow edges from one neuron back to itself, meaning we include paths in which signals loop back around to the same neuron arbitrarily many times before the signal is propagated further. However, such loops can be easily factored, contributing a factor . We thus remove the need to consider self-loops in our rules for calculating effective coupling filters by assigning a factor *γ*_{h}/(1 – *γ*_{h}*J*_{h,h}(*ω*)) to each node, as discussed in the main text and depicted in Fig. 2. (The contribution of the self-feedback loops can be derived rigorously; see the SI for the full derivation).

Although we have worked here in the frequency domain, our formalism does adapt straightforwardly to handle time-dependent inputs; however, among the consequences of this explicit time-dependence are that the mean field rates *v*_{h}(*t*) are not only time-dependent, but solutions of a system of nonlinear integral equations, and hence more challenging to solve. Furthermore, quantities like the linear response of the hidden network, Γ_{h,h}/ (*t, t′*), will depend on both absolute times *t* and *t′*, rather than just their difference, *t – t′*, and hence we must also (numerically) solve for Γ_{h,h′} (*t, t′*) directly in the time domain. We leave these challenges for future work.

### Model network architectures

Our main result, Eq. (2), is valid for general network architectures with arbitrary weighted synaptic connections, so long as the hidden subset of the network has stable dynamics when the recorded neurons are removed. An example for which our method must be modified would be a network in which all or the majority of the hidden neurons are excitatory, as the hidden network is unlikely to be stable when the inhibitory recorded neurons are disconnected. Similarly, we find that synaptic weight distributions with undefined moments will generally cause the network activity to be unstable. For example, *𝒥*_{i,j} drawn from a Cauchy distribution generally yield unstable network dynamics unless the weights are scaled inversely with a large power of the network size *N*.

#### Specific networks—common features

The specific network architectures we study in the main text share several features in common: all are sparse networks with sparsity *p* (i.e., only a fraction *p* of connections are non-zero) and non-zero synaptic weight strengths drawn independently from a random distribution with zero population mean and population standard deviation *J*_{0}/(*pN*)^{a}; the overall standard deviation of weights, accounting for the expected 1 – *p* fraction of zero weights is . The parameter *a* determines whether the synaptic strengths are “strong” (*a* = 1/2) or “weak” (*a* = 1). In most of our analytical results we only need the mean and variances of the weights, so we do not need to specify the exact distribution. In simulations, we use a normal distribution. The reason for scaling the weights as 1/(*pN*)^{a}, as opposed to just 1*/N* ^{a}, is that the mean incoming degree of connections is *p*(*N –* 1) *≈ pN* for large networks; this scaling thus controls for the typical magnitude of incoming synaptic currents.

For strongly coupled networks, the combined effect of sparsity and synaptic weight distribution yields an overall standard deviation of . Because the sparsity parameter *p* cancels out, it does not matter if we consider *p* to be fixed or *k*_{0} = *pN* to be fixed—both cases are equivalent. However, this is not the case if we scale *𝒥*_{i,j} by 1*/k*_{0}, as the overall standard deviation would then be , which only corresponds to the weak-coupling limit if *p* is fixed. If *k*_{0} is fixed, the standard deviation would scale as .

It is worth noting that the determination of “weak” versus “strong” coupling depends not only on the power of *N* with which synaptic weights scale, but also on the network architecture and correlation structure of the weights *𝒥*_{i,j}. For example, for an all-to-all connected matrix with symmetric rank-1 synaptic weights the form *𝒥*_{i,j} = *ζ*_{i}*ζ*_{j}, where the *ζ _{i}* are independently distributed normal random variates, the standard deviation of

*each ζ*must scale as in order for hidden paths to generate

*𝒪*(1) contributions to effective interactions, such that

*𝒥*

_{i,j}scales as 1

*/N*but the coupling is still strong.

#### Specific networks—differences in architecture and synaptic constraints

Beyond the common features outlined above, we perform our analysis of the distribution of effective synaptic interaction strengths for three network architectures commonly studied in network models. These architectures are not intended to be realistic representations of neuronal network structures, but to capture basic features of network architecture and therefore give insight into the basic features of the effective interaction networks.

##### Erdős-Réyni + mixed synapses

The first network we consider (and the one we perform most of our later analyses on as well) is an Erdős-Réyni random network architecture with “mixed synapses.” That is, each connection between neurons is chosen randomly with probably *p*. By “mixed synapses” we mean that each neuron’s outgoing synaptic weights are chosen completely independently. i.e., in this network there are no excitatory or inhibitory neurons; each neuron make make both excitatory and inhibitory connections. The corresponding analysis is shown in Fig. 5A.

##### Erdős-Réyni + Dale’s law imposed

Real neurons appear to split into separate excitatory and inhibitory classes, a dichotomy know as “Dale’s law” (or alternatively, “Dale’s principle” to highlight that it is not really a law of nature). Neurons in a network that obeys this law will have coupling filters *J*_{i,j}(*t*) that are strictly positive for excitatory neurons and strictly negative for inhibitory neurons. This constraint complicates analytic calculations slightly, as the moments of the synaptic weights now depend on the identity of the neuron, and more care must be taken in calculating expected values or population averages. We instead impose this numerically to generate the results shown in Fig. 5B. The trends are the same as in the network with mixed synapses, with the resulting ratios being slightly reduced.

As a technical point, because our analysis requires calculation of the mean field firing rates of the hidden network in absence of the recorded neurons, random sampling of the network may, by chance, yield hidden networks with an imbalance of excitatory neurons, for which the mean field firing rates of the hidden network may diverge for our choice of exponential nonlinearity. This is the origin of the relatively larger error bars in Fig. 5B: less random samplings for which the hidden network was stable were available to perform the computation. One way this artifact can be prevented is by choosing a nonlinearity that saturates, such as *ϕ*(*x*) = *c*/(1 + exp(–*x*)), which prevents the mean-field firing rates from diverging and yields stable network activity (see Fig. 8). Another is to choose a different reference state of network activity around which we perform our expansion of , such as the mean field state discussed in the SI.

##### Watts-Strogatz network + mixed synapses

Finally, although Erdős-Réyni networks are relatively easy to analyze analytically, and are ubiquitous in many influential computational and theoretical studies, real world networks typically have more structure. Therefore, we also consider a network architecture with more structure, a Watts-Strogatz (small world) network. A Watts-Strogatz network is generated by starting with a *K*-nearest neighbor network (such that fraction of non-zero connections each neuron makes is *p* = *K*/(*N* – 1)) and rewiring a fraction *β* of those connections. The limit *β* = 0 remains a *K*-nearest neighbor network, while *β →* 1 yields an Erdős-Réyni network. We generated the adjacency matrices of the Watts-Strogatz networks using code available in [72]. Here we consider only a Watts-Strogatz network with mixed synapses; a network with spatial structure and Dale’s law with become sensitive to both the distribution of excitatory and inhibitory neurons in the network as well as the way in which the neurons are sampled, an investigation we leave for future work. The results for the Watts-Strogatz network with mixed synapses are shown in Fig. 5C, and are qualitatively similar to the Erdős-Réyni netowrk with mixed synapses.

Because all three network types we considered yield qualitatively similar results, for the remainder of our analyses, we focus on the Erdős-Réyni + mixed synapses network for simplicity in both simulations and analytical calculations.

Parameter values used to generate our networks are given in Table 1.

### Choice of nonlinearity *ϕ*(*x*)

The nonlinear function *ϕ*(*x*) sets the instantaneous firing rate for the neurons in our model. Our main analytical results (e.g., Eq. (2) hold for arbitrary choice of *ϕ*(*x*). Where specific choices are required in order to perform simulations, we used *ϕ*(*x*) = max(*x,* 0) for the results presented in Figs. 3 and 4 and *ϕ*(*x*) = exp(*x*) otherwise. The rectified linear choice is convenient for small networks, as high-order derivatives are zero, which eliminates corresponding high-order “loop corrections” to mean field theory [52]. The exponential function is the “canonical” choice of nonlinearity for the nonlinear Hawkes process [16–18, 20]. The exponential has particularly nice theoretical properties, but is also convenient for fitting the nonlinear Hawkes model to data, as the log-likelihood function of the model simplifies considerably and is convex (though some similar families of nonlinearities also yield convex log-likelihood functions).

An important property that both choices of nonlinearity possess is that they are unbounded. This property is necessary to *guarantee* that a neuron spikes given enough input. A bounded nonlinearity imposes a maximum firing rate, and neurons cannot be forced to spike reliably by providing a large bolus of input. The downside of an unbounded nonlinearity is that it is possible for the average firing rates to diverge, and the network never reaches a steady state. For example, in a purely excitatory network (all *𝒥*_{i,j}*≥* 0) with an exponential nonlinearity, neural firing will run away without a sufficiently strong self-refractory coupling to suppress the firing rate. This will not occur with a bounded nonlinearity, as excitation can only drive neurons to fire at some maximum but finite rate.

This can be a problem in simulations of networks obeying Dale’s law. For unbounded nonlinearities, the mean field theory for the hidden network occasionally does not exist due to an imbalance of excitatory and inhibitory neurons caused by our random selection of recorded of neurons. However, the Dale’s law network is stable if the nonlinearity is bounded. We demonstrate this below in Figs. 7 and 8, comparing simulations of the effective interaction statistics in Erdős-Réyni networks with and without Dale’s law for a sigmoidal nonlinearity *ϕ*(*x*) = 2/(1 + *e*^{-x}).

Another consequence of unbounded nonlinearities is that the mean firing rates are either finite or they diverge. Bounded nonlinearities, on the other hand, may allow for the possibility of a transition to chaotic dynamics in the mean-field firing rate dynamics (cf. the results of the [70]).

### Specific choices of network properties used to generate figures

#### Feedforward-inhibitiory circuit model details

##### 3 neuron circuit (Fig. 3)

Using our graphical rules (Fig. 2), we calculated the effective interaction from neuron 1 to 2 for the circuit shown in Fig. 3A, giving Eq. 3. In principle, our mean field approximation would not be expected to hold for such a small circuit; in particular, loop corrections [52] to our calculation of the rate *v*_{3} and associated gain *γ*_{3} might be significant. However, as loop corrections depend on derivatives of the nonlinearity *ϕ*(*x*), we can minimize these errors by choosing *ϕ*(*x*) = max(*x,* 0), for which *ϕ′*(*x*) = Θ (*x*), the Heaviside step function. Accordingly, we can solve for *v*_{3} = *λ*_{0}*μ*_{3}/(1 *- λ*_{0}*𝒥*_{33}) and *γ*_{3} = *λ*_{0} for this particular network.

To generate the plots shown in Fig. 3C, we take the inter-neuron couplings to have the form and the self-history couplings to have the form *J*_{i,i}(*τ*) = *𝒥*_{i,i}*β*_{i,i}*e*^{-βi,iτ}.

Using Mathematica to perform the inverse Fourier transform, we obtain an explicit expression for the effective interaction,
In order for the inverse Fourier transform to converge and result in a causal function, we require that 1 - *λ*_{0}𝒥_{33} *>* 0.

Parameter values used to generate the plots in Fig. 3C are given in Table 2.

##### 4 neuron circuit (Fig. 4)

Like for the 3-neuron circuit, we can use our graphical rules (Fig. 2) to calculate the effective interaction for our 4-neuron circuit (Fig. 4A) in the frequency domain: in going to the last equality we have separated the terms out into contributions from each of the paths, in order, shown in Fig. 4B.

To generate the plots in Fig. 4C, we choose *ϕ*(*x*) = max(*x,* 0), which gives *γ*_{i} = *λ*_{0}, as in Fig. 3C, and interaction filters for the direct interaction and *J*_{i,j}(*τ*) = 𝒥_{ij}*α*^{2}*τ e*^{-ατ} for all other interactions shown—i.e., all other interactions have the same decay time *α*^{−1} for simplicity.

Inverting the Fourier transform using Mathematica yields
In order for this result to converge, we require *|𝒥*_{34}*||𝒥*_{43}*| <* 1. Splitting this result up into the contributions to each plot in Fig. 4C, using the specific parameter choices *λ*_{0} = 1 and *𝒥*_{34} = *𝒥*_{43} *≡ 𝒥*, gives
Parameter values used to generate the plots in Fig. 4C are given in Table 3.

##### Large networks

To generate the results in Fig. 6 in the main text, we choose the coupling filters to be *J*_{i,j}(*t*) = 𝒥_{i,j}*α*^{2}*te*^{-αt}, for *i ≠ j*, which has Fourier transform
using the Fourier convention
The weight matrix 𝒥 is generated as described in “Model network architectures,” choosing *J*_{0} = 1.0. We partition this network up into recorded and hidden subsets. For a network of *N* neurons, we choose neurons 1 to *N*_{rec} to be recorded, and the remainder to be hidden, hence we define (using an index notation starting at 1; indices should be subtracted by 1 for 0-based index counting)
and

We numerically calculate the linear response matrix by evaluating
where and diag is an *N*_{hid} *× N*_{hid} diagonal matrix with elements *γ*_{h}.

The effective coupling filter in the frequency domain can then be evaluated pointwise at a desired set of frequencies *ω* by matrix multiplication,
We then return to the time domain by inverse Fourier transforming the result, achieved by treating **Ĵ**^{eff} (*ω*) as an *N*_{rec} *× N*_{rec} *× N*_{freq} array (where *N*_{freq} is the number of frequencies at which we evaluate the effective coupling) and multiplying along the frequency dimension by an *N*_{freq} *× N*_{time} matrix **E** with elements *E*_{ω,t} = exp(*iωt*)Δ*t*/(2*p*), for *N*_{time} sufficiently small time bins of size *δt* = 0.1*/α*, for *α* = 10, as listed in Table 4.

To generate Fig. 5, we focus on the zero-frequency component of *Ĵ*^{eff} (*ω*), which is also equal to the time integral of **J**^{eff} (*t*). As in the main text, we label the elements of this component , which is equal to
We do not need to simulate the full network to study the statistics of . We only need to generate samples of the matrix 𝒥 and evaluate . This is where the choice of an Erdős-Réyni network that is not restricted to obey Dale’s law becomes convenient. Because the weights 𝒥_{i,j} are *i.i.d.* and the sign of the weight is random, population averages will be equivalent to expected values. i.e., the sample mean
and sample variance
will tend to the expected values and for large networks. We have explicitly removed the diagonal elements from these averages because these elements will have slightly different statistics from the off-diagonal elements due to the fact that all ground-truth self-couplings are set to zero, 𝒥_{r,r} = 0. This allows us to compare the population variance, plotted in Fig. 5 (after normalization by the population variance of the true off-diagonal weights), to the expected variance calculated analytically below.

The error bars in Fig. 5 are generated by first drawing a single sample of true weights 𝒥, and then taking 100 random subsets of *N*_{rec} = *{*10, 110, 210, 310, 410, 510, 610, 710, 810, 910, 999*}* recorded neurons. For this analysis, random subsets were generated by permuting the indices of the full weight matrix 𝒥 and taking the last *N*_{rec} neurons to be recorded. For each random subset of the network we calculate the population statistics. The standard error of, for example, the population variance across subsets gives an estimate of the error. However, if we only use a single sample of the network architecture and weights 𝒥_{i,j}, this estimate may depend on the particular instantiation of the network. To average over the effects of global network architecture, we draw a total of 10 network architecture samples, and average a second time over these samples to obtain our final estimates of the population variance of . We note that for an Erdős-Réyni network with mixed synapses, this second stage of averaging is probabilistically unnecessary: for a large enough network random subsets of a single large network are statistically identical to random subsets drawn from several samples of full Erdős-Réyni networks. However, this will not be true for networks with more structure, such as the Watts-Strogatz or Dale’s law networks we also considered, for which the second stage of averaging over the global network architecture is necessary to average over network configurations.

### Series approximation for the mean field firing rates for the case of exponential nonlinearity *ϕ*(*x*) = *e*^{x}

The mean field firing rates for the hidden neurons are given by
where we focus specifically on the case of exponential nonlinearity *ϕ*(*x*) = exp(*x*). For this choice of nonlinearity, *γ*_{h} = *v*_{h}, so we do not need to calculate a separate series for the gains.

This system of transcendental equations generally cannot be solved analytically. However, for small exp(*μ*_{h}) *≪* 1 we can derive, recursively, a series expansion for the firing rates. We first consider the case of *μ*_{h} = *μ*_{0} for all hidden neurons *h*. Let *ϵ* = exp(*μ*_{0}). We may then write

Plugging this into the mean field equation,

Thus, matching powers of *λ*_{0∊} on the left and right hand sides, we find and
for *ℓ >* 0.

For *ℓ* = 1, the sum in *m* truncates at *m* = 1 (as is zero for *m > ℓ*, as all indices are positive). Thus,
With this we have calculated the firing rates to *𝒪*(*ϵ*^{4}).

The analysis can be straightforwardly extended to the case of heterogeneous *μ*_{h}, though it becomes more tedious to compute terms in the (now multivariate) series. Assuming *∊*_{h}*≡* exp(*μ*_{h}) *≪* 1 for all *h*, to *𝒪* (*ϵ*^{3}) we find

### Variance of the effective coupling to second order in *N*_{rec}*/N* & fourth order in *λ*_{0}*J*_{0}*e*^{μ0} (exponential nonlinearity)

To estimate the strength of the hidden paths, we would like to calculate the variance of the effective coupling and compare its strength to the variance of the direct couplings *𝒯*_{r,r′}, where and , as in the main text.

We assume that the synaptic weights *𝒯*_{i,j} are independently and identically distributed with zero mean and variance for *i ≠ j*, where *a* = 1 corresponds to weak coupling and *a* = 1/2 corresponds to strong coupling. We assume no self-couplings, *𝒯*_{i,i} = 0 for all neurons *i*. The overall factor of *p* in var[*𝒯*] comes from the sparsity of the network. For example, for normally distributed non-zero weights with variance , the total probability for every connection in the network is

Because the *𝒯*_{i,j} are *i.i.d.*, the mean of :
where we used the fact that depends only on the hidden neuron couplings *𝒯*_{h,h′}, which are independent of the couplings to the recorded neurons, *𝒯*_{r,h} and *𝒯*_{h′,r′}. This holds for any pair of neurons (*r, r′*), including *r* = *r′* because of the assumption of no self-coupling.

The variance of is thus equal to the mean of its square, for *r ≠ r′*,
In this derivation, we used the fact that due to the fact that the synaptic weights are uncorrelated. We now need to compute . This is intractable in general, so we will resort to calculating this in a series expansion in powers of *ε ≡* exp(*μ*_{0}) for the exponential nonlinearity model. Our result will also turn out to be an expansion in powers of *J*_{0} and 1 *- f ≡ N*_{hid}*/N*.

The lowest order approximation is obtained by the approximation *v*_{h}*≈ λ*_{0}*ε* and Γ_{h,h′}*≈ v*_{h}*δ*_{h,h′}, yielding
This result varies linearly with *f*, while numerical evaluation of the variance shows obvious curvature for *f ≪*1 and *J*_{0}; ≲ 1, so we need to go to higher order. This becomes tedious very quickly, so we will only work to *𝒪*(*ε*^{4}) (it turns out *𝒪* (*ε*^{3}) corrections vanish).

We calculate using a recursive strategy, though we could also use the path-length series expression for , keeping terms up to fourth order in *ε*. We begin with the expression
and plug it into itself until we obtain an expression to a desired order in *∊*. In doing so, we note that *v*_{h}*∼ 𝒪*(*∊*), so we will first work to fourth order in *v*_{h}, and then plug in the series for *v*_{h} in powers of *∊*.

We begin with
The third order term vanished because we assume no self-couplings. We have obtained to fourth order in *v*_{h}; now we need to plug in the series expression for *v*_{h} to obtain the series in powers of *λ*_{0}*∊*. We will do this order by order in *v*_{h}. The easiest terms are the fourth order terms, as

The second order term is where . We need the average . The third-order term will vanish upon averaging, and using the fact that synaptic weights are independent (giving the factor) and self-couplings are zero (giving the factor). We thus obtain

The first fourth order term in , will vanish upon averaging because matching indices requires *h′′* = *h* = *h′* and we assume no self-couplings. The second fourth order term is , which averages to var[*𝒥*](1 *δ*_{h,h′}), where the factor of (1 *δ*_{h,h′}) again accounts for the fact that this term does not contribute when *h* = *h′* due to no self-couplings. We thus arrive at
Putting everything together,
For weak coupling, this tends to 1 in the *N ≫* 1 limit, as , for fixed fraction of observed neurons *f* = *N*_{rec}*/N*. For strong coupling, , which is constant as *N → ∞*, and hence
where we have used little-*o* notation to denote that there are higher order terms dominated by (*λ*_{0}*J*_{0}*∊*)^{4}(1 – *f*)^{2}. With this expression, we have improved on our approximation of the relative variance of the effective coupling to the true coupling; however, the neglected higher order terms still become significant as *f →* 0 and *J*_{0} *→* 1, indicating that hidden paths have a significant impact when synaptic strengths are moderately strong and only a small fraction of the neurons have been observed.

Because the synaptic weights *𝒥*_{i,j} are independent, we may rewrite Eq. (8) as
or, in terms of the ratio of standard deviations,
where we used the approximation for *x* small.

In the main text, we plotted results for *N* = 1000 total neurons (Fig. 5A). For strongly coupled networks, the results should only depend on the fraction of observed neurons, *f* = *N*_{rec}*/N*, while for weak coupling the results do depend on the absolute number *N*. To demonstrate this, in Fig. 9 we remake Fig. 5 for *N* = 100 neurons. We see that the strongly coupled results have not been significantly altered, whereas the weakly coupled results yield stronger deviations (as the deviations are .

## Supporting Information

### Mapping a leaky integrate-and-fire network with stochastic spiking to the non-linear Hawkes model

As claimed in the Methods, we now show explicitly how to map a current-based leaky-integrate and fire network model with stochastic spiking rules on to the nonlinear Hawkes model we use in this work. Suppose each neuron’s membrane potential obeys the differential equation
where *τ*_{m} is the membrane time constant of the neuron, *ε*_{L} is its reversal potential, is an external current input (converted to a voltage by dividing by the membrane resistance), and
are the synaptic currents flowing into neuron *i* from presynaptic neurons *j*, where is the spike train from presynaptic neuron *j* at time *t′* and is a spike filter. For notational convenience we also include the self-history coupling in this term, though it has a physiologically different origin, representing refractory effects that reset a neuron’s voltage after it spikes, rather than having a hard reset. Similarly, rather than having a hard firing threshold, we assume that neurons spike stochastically with an instantaneous rate
where *λ*_{0} sets a baseline firing rate, *ϕ*(*·*) *≥* 0 is a nonlinear function of the membrane voltage, *ε*_{th} is a “soft” threshold value, *ε*_{s} sets the steepness of the nonlinearity, and *V*_{i}(*t*) is the membrane voltage given in Eq. (9). We term this the instantaneous firing rate because it is equal to the the trial-averaged spike trains , conditioned on the inputs to the neuron. The value *ε*_{th} is a soft-threshold because while it is likely the neuron will fire when *V*_{i}(*t*) reaches *ε*_{th}, it is possible the neuron will fire at higher or lower voltage. In this work we assume that the probability that the number of spikes neuron *i* fires in a small time window Δ*t* around time *t*, given its input history, is
however, we could have chosen many other point processes with instantaneous rates *λ*_{i}(*t*). Note that while spiking is stochastic, a neuron is guaranteed to fire at times when its instantaneous rate *λ*_{i}(*t*) diverges, so there is some sense of deterministic output retained.

We can formally solve the membrane equation (9), giving

We now define
and
we arrive at this last definition by changing integration order
and then changing variables to *y* = *t - t′′*. With these definitions,
i.e., we have shown the the soft-threshold leaky integrate-and-fire model is equivalent to a nonlinear Hawkes model, Eq. (5). Because the argument of the rate is now expressed entirely in terms of the spiking of the neurons, and not the membrane voltage, we need only simulate the spiking activity of the network; i.e., we do not need to keep track of the membrane voltages and can simply use Eq. (5).

Lastly, we note that membrane potential dynamics are more appropriately described by changes to a neuron’s membrane conductance, rather than current inputs [54]. If we insert conductance-based synaptic inputs, such as
into Eq. (9), the voltage equation is still formally solvable, but the rates *λ*_{i}(*t*) will no longer be of the form of Eq. (5), except in approximate limits or if special conditions are met [69]. We leave a more detailed investigation of conductance-based models—including those with nonlinear voltage dependence—for future work.

### Complete derivation of the contribution of self-cycles to nodes in Fig. 2

In the Methods section of the full text, we heuristically argued that loops from a neuron back to itself in the series expansion of , could be explicitly summed into a factor contributed by each node *h*. This factor can be derived directly; we do so here.

Let us decompose the matrix in a diagonal and off-diagonal piece,_{} Then,

We assumed that is invertible, which requires that there is no element for which for all *ω*. Assuming this is the case, the inverse of the matrix is trivial to calculate, as it is diagonal:
The matrix can be expressed as a series, as before:
Hence, inserting the contribution from the factor that we pulled out, and the factor *γ*_{h′} that left-multiplies to give , we have
This is the same as our previous expression, with and restricting the sum over hidden units such that self-loops are removed (*h*_{i} ≠ *h*_{i+1}), proving the result described informally above. We note again that this puts restrictions on the allowed size of self-interactions, as the zeros of must be in the upper-half plane of the complex *ω* plane in order for the filters to be causal and physically meaningful (given our Fourier sign-convention .

The complete expression for the correction term is thus

This is the exact mathematical expression underlying the graphical rules given in Fig. 2.

### Second order nonlinear response function

Higher order terms in the series expansion represent nonlinear response functions. We do not focus on these terms in this work, assuming instead that we can truncate this series expansion at linear order. We will, however, estimate the error incurred by this truncation by calculating the second order response function, which we label . Rather than differentiate our formal solution for the linear response, we differentiate the implicit form, yielding an integral equation
where we have defined
*γ*_{h} without the superscript is the gain defined previously, *γ*_{h} = *λ*_{0}*ϕ′*(*μ*_{h} +∑_{h′} 𝒥_{h,h′} *v*_{h′}). Rearranging,
Inverting the operator on the left hand side yields the input linear response function (when introducing the factor of 1 = *γ*_{h}*′ /γ*_{h}*′* on the right hand side), giving the solution
Because Γ_{h,h′} (*t - t′*) is proportional to *γ*_{h}, the second order nonlinear response function is proportional to . For an exponential nonlinearity, , and the second order response function is of the same order of the linear response (but the overall contribution to network statistics is not of the same order; see below). For a rectified linear nonlinearity (as in Figs. 3 and 4), *and* the second-order response vanishes.

The effective quadratic interaction from two recorded neurons and to neuron *r* is thus
where we have defined the quadratic spike interaction to be

### Estimating the error from neglecting higher order spike filtering (exponential nonlinearity)

In the main text we calculate corrections to the baseline and linear spike filters, neglecting higher-order spike filtering and fluctuations around the mean input to the recorded neurons. In the methods we validated this result numerically; here we derive an analytic estimate of the order of the error we incur by neglecting these terms. We will do so within mean field theory (meaning the noise fluctuations contribute zero error as they do not contribute to the mean field approximation). Specifically, we will assume that the quadratic spike filtering term is small, and calculate the corresponding correction to our mean field approximation of the firing rates when this term is completely neglected. If we take as our approximation of the recorded neuron firing instantaneous firing rates
then the mean field approximation of the firing rates is
where we have used the fact that the average firing rates are independent of time, and replaced and with their time integrals, denoted byand . The parameter *b* is just a book-keeping parameter.

To calculate the lowest order correction to the linear filtering approximation (*b →* 0), we write ,treating *b* formally as a small parameter. The linear firing rate is given by
For the quadratically-modified firing rates, keeping terms only to linear order in *b*,
Collecting on *b* and rearranging,
Because , the expansion parameters we have been using, the lowest order approximation for is
To evaluate the coefficient , we may use the fact *γ*_{h} = *v*_{h} and to leading order *v*_{h} *∼ λ*_{0∊} and , giving
and hence
To lowest order the error term is
For *𝒥*_{i,j} *i.i.d.*, the population average should converge to the expected value, which is zero because the *𝒥*_{i,j}have mean zero. We can calculate the root-mean-squared-error (RMSE) by looking at the variance:
In principle, we should take care to separate out the *r*_{1} *≠ r*_{2} and *r*_{1} = *r*_{2} terms from the sum, as the latter will contribute a factor,which we have not specified yet (though one could calculate for specific choices, such as the normal distribution that we use for most of our numerical analyses). However, both and will scale as ,so we will neglect constant factors and simply use this scaling to arrive at the result
If we take *N → ∞* with *N*_{rec} = *f N* and *N*_{hid} = (1 *- f*)*N* for *f* fixed, the RMSE scales as
For *a* = 1 (weak coupling), the error falls off quite quickly as *N*^{3/2}, while it is independent of *N* for *a* = 1/2 (strong coupling). However, the error does still scale with the fraction of observed neurons, as . This demonstrates that the typical error that arises from neglecting the nonlinear filtering is small both when most neurons have been observed (*f* ≲ 1) and when very few neurons have been observed (*f ≳* 0). While it may at first seem surprising that the error is small when very few neurons have been observed, the result does make intuitive sense: when a very small fraction of the network is observed, we can treat the unobserved portion of the network as a “reservoir” or “bath.” Feedback from the observed neurons into the reservoir has a comparatively small effect, so we can get away with neglecting feedback between the observed and unobserved partitions of the network. However, when the number of observed neurons is comparable to the number of unobserved neurons, neither can be treated as a reservoir, and feedback between the two partitions is substantial. Neglecting the higher order spike filter terms may be inaccurate in this case.

### Tree-level calculation of the effective noise correlations

In our approximation of the model for the recorded neurons, we also neglected fluctuations from the mean input around the hidden neuron input. We should therefore check how strong this noise is. At the level of a mean-field approximation we may neglect it, so we will need to go to a tree-level approximation to calculate it. (The means and response functions are not modified at tree-level.)

The noise is defined by
It has zero mean (by construction), conditioned on the activity of the recorded units — i.e., the “noise” receives feedback from the recorded neurons. We can evaluate the cross-correlation function of this noise, conditioned on the recorded unit activity. This is given by
where
is the cross-correlation function of the spikes (the superscript *c* denotes ‘cumulant’ or ‘connected’ correlation function to distinguish it from the moments without the superscript). At the level of mean field theory
and thus the cross-correlation function is zero. We can go beyond mean field theory and calculate the tree-level contribution to the correlation functions using the field theory diagrammatic techniques of [52]. We will do so for the reference state of zero-recorded unit activity, , as we expect this to be the leading order contribution to the correlation function. As we are interested primarily in the typical magnitude of the noise compared to the interaction terms, we will work only to leading order in *ϵ* = exp(*μ*_{0}) for the exponential nonlinearity network. We find
where Δ_{h,h}*′* (*ω*) *≈ δ*_{h,h}*′* + *𝒪*(*ϵ*) is the linear response to perturbations to the *output* of a neuron’s rate. It is related to Γ_{h,h}*′* (*ω*) by Γ_{h,h}*′* (*ω*) = Δ_{h,h}*′* (*ω*)*γ*_{h}*′*, where *γ*_{h} = *v*_{h} for *ϕ*(*x*) = *e*. The overall noise cross-correlation function is then approximately
If *r ≠ r′*, the expected noise cross-correlation, averaged over the synaptic weights *J*_{i,j}, is zero. If *r* = *r′*, the expected value is non-zero. The expected noise auto-correlation function is then
For the specific case of *g*(*t*) = *α*^{2}*te*^{-αt}Θ(*t*), we have
For weak coupling (*a* = 1), the expected autocorrelation function falls off with network size as 1*/N*, while for strong coupling (*a* = 1/2), it scales with the fraction of observed neurons *f*, but is independent of the absolute network size. The overall *λ*_{0∊} scaling puts the magnitude of the autocorrelation function on par with contributions from hidden paths through a single hidden neuron that contributes a factor of *λ*_{0∊} to the correction to the coupling filters. Based on our results shown in Fig. 5, which suggest that contributions from long paths through hidden neurons are significant when the fraction of neurons *f* is small and *J*_{0} ≲ 1, we expect that network noise will also be significant in these regimes. This will not modify the results presented in the main paper, however. It simply means that this noise should be retained in the rate of our approximate model,

### Validating the mean field approximation and linear conditional rate approximation via direct simulations of network activity (exponential nonlinearity)

The results presented in the main text are based on analytical calculations or numerical analyses using analytically derived formulas. For example, the statistics of , are calculated based on our expression , where can be calculated by solving the matrix equation Generating these results does not require a simulating the full network, so we check here that our approximations indeed agree with the results of full network simulations.

We check the validity of two main results: 1) that mean field theory is an accurate approximation for the parameters we consider, and 2) that our truncation of the conditional hidden firing rates at linear order in is valid.

We first discuss some basic details of the simulation. The simulation code we use is a modification of the code used in [52], written by Gabe Ocker; refer to this paper for full details of the simulation.

The main changes we made are considering exponential nonlinearities and synaptic weights drawn from normal or lognormal distributions.

As in [52] and the main text, we choose the coupling filters to follow an alpha function
The Heaviside step function Θ(*t*) enforces causality of the filter, using the convention Θ(0) = 0. All neurons have the same time constant 1*/α*.

To efficiently simulate this network the code computes the synaptic variable not by direct convolution but by solving the inhomogeneous system of differential equations, setting *x*(*t*) = *s*(*t*) and ,
The instantaneous firing rates of the neurons can in this way be quickly computed in time steps of a specified size Δ*t*. The number of spikes *n*_{i} that neuron *i* fires in the *t*^{th} time bin is drawn from a Poisson distribution with probability . An initial transient period of spiking before the network achieves a steady state is discarded.

The parameters we use in our simulations of the full network are given in Table 4.

### Validating the mean field approximation

To confirm that the mean field approximation is valid, we seek to compare the empirically measured spike rates measured from simulations of the network activity to the calculated mean field rates. The empirical rates are measured as
calculated after discarding the initial transient period of firing, for any neuron *i* (recorded or hidden).

The steady-state mean field firing rates are the solutions of the transcendental equation
The only difference between this equation and the equation for *v*_{h} is that the neuron indices are not restricted to hidden units. i.e., the *v*_{h} are the mean field rates for the hidden neurons *only* (recorded neurons removed entirely), whereas the are the mean field rates for the entire network. If the mean field approximation is valid, the empirical rates should be approximately equal to the mean field rates, so a scatter plot of ^{MFT} versus ^{emp} should roughly lie along the identity line. We test this for a network in the strong coupling limit for four values of *J*_{0}, *J*_{0} = 0.25, 0.5, 0.75, and 1.0. We expect *J*_{0} = 1.0 to be close to the stability threshold of the model based on a linearized analysis [73, 74]; i.e., for *J*_{0} ≳ 1.0 there may not be a steady state, so this may be where we expect the mean field approximation to break down. As seen in Fig. 10, the mean field approximation appears to hold well even up to *J*_{0} = 1.0, though there are some slight deviations for neurons with large rates.

### Verifying the linearized conditional mean approximation

Having verified that the mean field approximation is valid, we now seek to check our linearized approximation of the firing rates of the hidden neurons *conditioned on the activity of the recorded neurons*, . That is, we calculated above that
the *…* correspond to higher order spike filtering terms that we have neglected in our analyses, assuming them to be small. In an earlier calculation above, we estimated that the error incurred by neglecting higher order spike filtering is of the order , but we would like to confirm the negligibility of the higher order coupling through simulations.

To do so, we compare the empirical firing rates of the designated “hidden” neurons obtained from simulations of the full network models with the approximation of the firing rates of the hidden neurons conditioned on the recorded neurons using the linear expansion, averaged over recorded neuron activity to give where as usual the zero-frequency component of the linear response function of the hidden neurons is calculated in the absence of recorded neurons.

If we make a scatter plot of this against the empirical estimates of the hidden neurons, , the data points will lie along the identity line if our approximation is valid. If the data deviates from the identity line, it indicates that the neglected higher-order filtering terms contribute substantially to the firing rates of the neurons. It is possible that the zeroth order rate approximation, *v*_{h}, would be sufficient to describe the data, so we compare the empirical rates to both *v*_{h} and .

As in the mean field approximation test, we focused on a strongly coupled network with *J*_{0} = 0.25, 0.5, 0.75, and 1.0. In the SI we analytically estimate the error, predicting it is small for both small and large fractions of recorded neurons and largest error when *N*_{rec} *∼ N*_{hid}, so we check both *N*_{rec} = 100 neurons out of *N* = 1000 neurons (*f* = 0.1) in Fig. 11 and *N*_{rec} = 500 neurons out of *N* = 1000 (*f* = 0.5) in Fig. 12.

For each value of *J*_{0}, we present two plots: the empirical rates versus the mean field rates *v*_{h} in the absence of recorded neurons (the zeroth order approximation; Figs. 11 and 12, top row), and the empirical rates versus the linear approximation (the first order approximation; Figs. 11 and 12, bottom row). We find that in both cases the data is centered around the identity line, but the spread of data grows with *J*_{0} for the zeroth order approximation, while it is quite tight for the first order approximation up to *J*_{0} = 1.0, validating our neglect of the higher order spike filtering terms. We also confirm that *N*_{rec} = 500 offers worse agreement than *N*_{rec} = 100, though the agreement between and is still not too bad.

### Full mean-field reference state

For most of our analyses, we have been expanding the conditional firing rates of the hidden neurons around a reference state of zero activity of the recorded neurons. The quantities *v*_{h}, *γ*_{h}, , and so on, are thus calculated using a network in which the recorded neurons have been removed. We have demonstrated that this approximation is valid for the networks considered in this paper. However, this approximation may break down in networks in which the recorded neurons spike at high rates. In this case, we may need another reference state to expand the conditional rates around. A natural choice of reference state in this case would be the mean firing rates of the neurons. We will set up this expansion here.

The mean firing rates of the neurons are intractable to calculate exactly, so we will estimate them by the mean field rates, an approximation that we expect to be accurate in the high-firing rate regime.

The mean field equations for the full network are

We can then expand around , truncating at linear order to obtain
where is the input linear response of the *full network*, including the recorded neurons.

We can then approximate the instantaneous firing rates of the recorded neurons by
note that we introduced so that we could write the instantaneous firing not as a function of filtered spike trains but as a function of filtered deviations from the mean firing rate. Importantly, although it looks like only the baseline is different from the zero-activity reference state case but the coupling is the same, the linear response function is *not* the same as the zero-reference state case, and hence the correction to the coupling is slightly different. The solutions look similar, but the linear response functions now incorporate the effects of the recorded units as well. In particular, satisfies the equation
where is the gain of neuron *i* accounting for the entire network,
Thus, in Fourier space
where is an *N × N* matrix – i.e., it contains the couplings and firing rates of *all* neurons, recorded and hidden. Hence, while this looks formally similar to the result we obtained in the zero activity state, the inclusion of recorded neurons modifies our rules for calculating contributions to the effective coupling filters. In particular, involves contributions from paths through both hidden and recorded neurons, unlike the zero-activity reference case, which involved contributions only from paths through hidden neurons. The reason for this, of course, is that the reference state depends on the entire network, not just the hidden neurons. The difference between the cases matters only at higher orders in our expansion — paths of length *𝓁* = 4 or greater. We can see this by writing out the first few terms in the path length expansion of the effective coupling,
for conciseness, we have assumed zero-self coupling (*Ĵ*_{i,i}(*ω*) = 0), but this can be restored by setting .

We see that the first few terms of the expansion are the same as the zero-activity reference case, with the exception that the are the gains for the entire network, not just the hidden network absent the recorded neurons. It is only the *𝓁* = 4 term at which contributions to the linear response functions involving paths through any neuron *j*, recorded or hidden, start to appear. Because we typically expect the amplitude of these terms to be small, we anticipate expanding around the mean field reference state will yield similar results to the expansion around the zero-activity reference state presented in the main paper.

## Acknowledgments

We thank Gabe Ocker for providing the nonlinear Hawkes network simulation code that we modified to perform the full network simulations in this work, Tyler Kekona for work on an early version of a related project, and Ben Lansdell and Christof Koch for helpful feedback. Support provided by the Sackler Scholar Program in Integrative Biophysics (BAWB), CRCNS grant DMS-1208027 (ESB, FR), NIH grant EY11850 and the Howard Hughes Medical Institute (http://www.hhmi.org) (FR). This work was partially based on work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216 (MAB). MAB and ESB wish to thank the Allen Institute for Brain Science founders, Paul G. Allen and Jody Allen, for their vision, encouragement, and support.

## Footnotes

↵* bradenb{at}uw.edu