## Abstract

In many experiments, neuroscientists tightly control behavior, record many trials, and obtain trial-averaged firing rates from hundreds of neurons in circuits containing billions of behaviorally relevant neurons. Di-mensionality reduction methods reveal a striking simplicity underlying such multi-neuronal data: they can be reduced to a low-dimensional space, and the resulting neural trajectories in this space yield a remarkably insightful dynamical portrait of circuit computation. This simplicity raises profound and timely conceptual questions. What are its origins and its implications for the complexity of neural dynamics? How would the situation change if we recorded more neurons? When, if at all, can we trust dynamical portraits obtained from measuring an infinitesimal fraction of task relevant neurons? We present a theory that answers these questions, and test it using physiological recordings from reaching monkeys. This theory reveals conceptual insights into how task complexity governs both neural dimensionality and accurate recovery of dynamic portraits, thereby providing quantitative guidelines for future large-scale experimental design.

## 1 Introduction

In this work, we aim to address a major conceptual elephant residing within almost all studies in modern systems neurophysiology. Namely, how can we record on the order of hundreds of neurons in regions deep within the brain, far from the sensory and motor peripheries, like mammalian hippocampus, or pre-frontal, parietal, or motor cortices, and obtain scientifically interpretable results that relate neural activity to behavior and cognition? Our apparent success at this endeavor seems absolutely remarkable, considering such circuits mediating complex sensory, motor and cognitive behaviors contain *O*(10^{6}) to *O*(10^{9}) neurons [Shepherd, 2004] - 4 to 7 orders of magnitude more than we currently record. Or alternatively, we could be completely misleading ourselves: perhaps we should not trust scientific conclusions drawn from statistical analyses of so few neurons, as such conclusions might become qualitatively different as we record more. Without an adequate theory of neural measurement, it is impossible to *quantitatively* adjudicate where systems neuroscience currently stands between these two extreme scenarios of success and failure.

One potential solution is an experimental one: simply wait until we can record more neurons. Indeed, exciting advances in recording technology over the last several decades have lead to a type of Moore’s law in neuroscience: an exponential growth in the number of neurons we can simultaneously record with a doubling rate of 7.4 years since the 1960’s [Stevenson and Kording, 2011]. Important efforts like the BRAIN Initiative promise to ensure such growth in the future. However, if we simply extrapolate the doubling rate achieved over the last 50 years, we will require about 100 to 200 years to record 4 to 7 orders of magnitude more neurons. Thus, for the near future, it is highly likely that measurements of neural dynamics at single cell, single spike-time resolution in mammalian circuits controlling complex behaviors will remain in the highly sub-sampled measurement regime. Therefore we need a theory of neural measurement that addresses a fundamental question: how and when do statistical analyses applied to an infinitesimally small subset of neurons reflect the collective dynamics of the much larger, unobserved circuit they are embedded in?

Here we provide the beginnings of such a theory, that is quantitatively powerful enough to (a) formulate this question with mathematical precision, (b) make well defined, testable predictions that guide the interpretation of past experiments, and (c) provide a theoretical framework to guide the design of future large scale recording experiments. We focus in this work on an extremely commonly used experimental design in which neuroscientists repeat a given behavioral or cognitive task over many trials, and record the trial averaged neural dynamics of many neurons. An advantage of this design, which has promoted its widespread usage, is that the neurons need not be simultaneously recorded. This resulting trial average firing rate dynamics can be thought of as a collection of neural trajectories exploring a high dimensional neural space, with dimensionality equal to the number of recorded neurons (see e.g. Fig 1 below for a conceptual overview). They reflect a fundamental description of the state space dynamics of the neural circuit during cognition and behavior. Almost always, such trajectories are analyzed via dimensionality reduction (see [Cunningham and Yu, 2014] for a review), and almost ubiquitously, a large fraction of variance in these trajectories lives in a much lower dimensional space.

The resulting neural trajectories in the low dimensional space often provide a remarkably insightful dynamical portrait of circuit computation during the task in a way that is inaccessible through the analysis of individual neurons [Briggman et al., 2006]. For example, curvature in the geometry of these dynamical portraits recovered from macaque prefrontal cortex by [Mante et al., 2013] revealed a novel computational mechanism for contextually-dependent integration of sensory evidence. Similarly, dimensionality reduction by [Machens et al., 2010] uncovered dynamical portraits which revealed that macaque somatosensory cortices compute both stimulus frequency and time in a functionally but not anatomically separable manner in a tactile discrimination task. Dynamical portraits obtained by [Mazor and Laurent, 2005] revealed that neural transients in insect olfactory areas rapidly computed odor identity long before the circuit settled to a steady state. And an analysis of neural dynamics in macaque parietal cortex showed that the dynamical portraits were largely one-dimensional, revealing an emergent circuit mechanism for robust timing in attentional switching and decision making [Ganguli et al., 2008a]. Also, the low-dimensional activity patterns found in primary motor-cortex provide causal constraints learning itself [Sadtler et al., 2014].

Given the importance of these dynamical portraits as a first window into circuit computation, it is important to ask if we can trust them despite recording so few neurons? For example, would their geometry change if we record more neurons? How about their dimensionality? The ubiquitous low dimensionality of neural recordings suggests an underlying simplicity to neural dynamics; what is its origin? How does the number of neurons we need to record to accurately recover dynamical portraits scale with the complexity of the task, and properties of the neural dynamics? Indeed which minimal properties of neural dynamics are important to know in order to formulate and answer this last question?

Our theory provides a complete answer to these questions within the context of trial averaged experimental design. Central to our theory are two main conceptual advances. The first is the introduction of neural task complexity (NTC), a mathematically well defined quantity that takes into account both the complexity of the task, and the smoothness of neural trajectories across task parameters. Intuitively, the NTC measures the volume of the manifold of task parameters, in units of the length scales over which neural trajectories vary across task parameters, and it will be small if tasks are very simple and neural trajectories are very smooth. We prove that this measure upper bounds the dimensionality of neural state space dynamics. This theorem has important implications for systems neuroscience: it is likely that the ubiquitous low dimensionality of measured neural state space dynamics is due to a small NTC. In any such scenario, simply recording many more neurons than the NTC, while repeating the same simple task will not lead to richer, higher dimensional datasets; indeed data dimensionality will become independent of the number of recorded neurons. One would have to move to more complex tasks to obtain more complex, higher dimensional dynamical portraits of circuit computation.

The second conceptual advance is a novel theoretical link between the act of neural measurement and a technique for dimensionality reduction known as random projections. This link allows us to prove that, as long as neural trajectories are sufficiently randomly oriented in state space, we need only record a number of neurons proportional to the product of the number of task parameters and the *logarithm* of the NTC. This theorem again has significant implications for systems neuroscience. Indeed, it quantitatively adjudicates between the two extremes of success or failure raised above, fortunately, in the direction of success: it is highly likely that low dimensional dynamical portraits recovered from past experiments are reasonably accurate despite recording so few neurons, because those tasks were so simple, leading to a small NTC. Moreover, as we begin to move to more complex tasks, this theorem provides rigorous guidance for how many more neurons we will need to record in order to accurately recover the resulting more complex, higher dimensional dynamical portraits of circuit computation.

Below, we build up our theory step by step. We first review the process of recovering state space dynamical portraits through dimensionality reduction in neuroscience. We then introduce the notion of NTC, and illustrate how it provides an upper bound on neural dimensionality. Then we review the notion of random projections, and illustrate how the NTC of an experiment *also* determines how many neurons we must record to accurately obtain dynamical portraits. Along the way, we extract a series of experimentally testable predictions, which we confirm in neural recordings from the motor and premotor cortices of monkeys performing reaches to multiple directions. We end in the discussion with an intuitive summary of our theory and its implications for the future of large scale recordings in systems neuroscience.

## 2 Recovery of neural state space dynamics through dimensionality reduction

Imagine an experiment in which a neuroscientist records trial averaged patterns of neural activity from a set of *M* neurons across time. We denote by **x**_{i}(*t*) the trial averaged firing rate of neuron *i* at time *t*. These data are often visualized by superimposing the firing rates of each neuron across time (Fig. 1A). Alternatively, these data can be thought of as a neural trajectory in an *M* dimensional space (Fig. 1B). At each time, the measured state of the neural circuit consists of the instantaneous pattern of activity across *M* neurons, which corresponds to a point in *M* dimensional space, where each dimension, or axis in the space corresponds to the firing rate of one neuron. As time progresses, this point moves, tracing out the neural trajectory.

It is difficult to directly understand or visualize this trajectory, as it evolves in such a high-dimensional ambient space. Here dimensionality reduction is often used to simplify the picture. The main idea behind many linear dimensionality reduction methods is to decompose the entire set of dynamic neural activity patterns across neurons, unfolding over time, into a time dependent linear combination of a fixed set of static patterns across neurons (Fig. 1C). The hope is that the data can be dramatically simplified if a linear combination of a small number of static basis patterns are sufficient to account for a large fraction of variance in the data across time. When this is the case, the neural state space dynamics can unfold over time in a much lower dimensional pattern space, whose coordinates consist of how much each static pattern is present in the neural population (Fig. 1D).

Mathematically, the decomposition illustrated in Fig. 1C can be written as , where each *M* dimensional vector **u**^{α} is a static basis pattern across neurons, each *c*_{α}(*t*) is the amplitude of that pattern across time (Fig.1C), and *D* denotes the number of patterns or dimensions retained. Principal components analysis (PCA) is a simple way to obtain such a decomposition (see Supplementary Material for a review). PCA yields a sequence of basis patterns **u**^{α}, *α* = 1, …, *M* each accounting for a different amount of variance *µ*^{α} in the neural population. The patterns can be ordered in decreasing amount of variance explained, so that *µ*_{1} ≥ *µ*_{2} ≥, …, ≥ *µ*_{M}. By retaining the top *D* patterns, one achieves a fraction of variance explained given by , where is the total variance in the neural population. Dimensionality reduction is considered successful if a small number of patterns *D* relative to number of recorded neurons *M*, accounts for a large fraction of variance explained in the neural state space dynamics.

How well does dimensionality reduction perform in practice in neurophysiology data? We have performed a meta-analysis (Fig. 2) of a diverse set of 20 experiments spanning a variety of model organisms (macaques, rodents, and insects), brain regions (hippocampal, prefrontal, parietal, somatosensory, motor, premotor, visual, olfactory and brainstem areas), and tasks (memory, decision making, sensory detection and motor control). This meta-analysis reveals that dimensionality reduction as a method for simplifying neural population recordings performs exceedingly well. Indeed it reflects one of the most salient aspects of systems neurophysiology to have emerged over the last couple of decades: namely that neuronal recordings are often far lower dimensional than the number of recorded neurons. Moreover, in each of these works, the associated low dimensional dynamical portraits provide insights into relations between neural population activity and behavior. Despite this almost ubiquitous simplicity found in neural population recordings, prior to this work, we are unaware of any simple, experimentally testable theory that can quantitatively explain the dimensionality and accuracy of these recovered dynamical portraits.

## 3 Neural Task Complexity and Dimensionality

We now begin to describe such a theory. Central to our theory is the notion of neural task complexity (NTC), which both upper bounds the dimensionality of state space dynamics and quantifies how many neurons are required to accurately recover this dynamics. Here, we first consider the dimensionality of the dynamics, and later we consider the accuracy of the dynamics. To introduce the NTC intuitively, imagine how many dimensions a single neural trajectory could possibly explore. Consider for concreteness, the trial averaged neural population activity while a monkey is performing a simple reach to a target (Fig. 3AB). This average reach lasts a finite amount of time *T*, which for example could be about 600ms. The corresponding neural trajectory (Fig. 3C) can explore neural space for this much time, but it cannot change direction infinitely fast. The population response is limited by an autocorrelation time *τ* (see supplementary methods for details). Roughly, one has to wait an amount of time *τ* before the neural population’s activity pattern changes appreciably (Fig. 3B) and therefore the neural trajectory can bend to explore another dimension (Fig. 3C). This implies that the maximal number of dimensions the state space dynamics can possibly explore is proportional to *T/τ*. Of course the constant of proportionality is crucial, and our theory, applicable to reaching data described below, computes this constant to be (see supplementary material for a proof and a definition of *τ*), yielding, for this experiment, an .

Now most tasks have more than just time as a parameter. Consider a slightly more complex experiment in which a monkey reaches to 8 different targets (Fig. 3D). Now the manifold of trial averaged task parameters is a cylinder, parameterized by time *t* into the reach and reach angle *θ* (Fig. 3E). Since for each time *t* and angle *θ*, there is a corresponding trial averaged neural activity pattern across neurons **x**_{i}(*θ, t*), the neural state space dynamics is fundamentally an embedding of this task manifold into neural space, yielding a curved intrinsically two dimensional neural manifold that could potentially explore many more than two dimensions in firing rate space by curving in different ways (Fig. 3F). How many dimensions could it possibly explore? Well the same argument that we made for time into a reach at fixed angle, also holds for reaching across all angles at a fixed time into the reach. The total extent of angle is 2*π*. Moreover, the neural population response cannot vary infinitely fast across angle; it has a spatial autocorrelation length ∆. Intuitively, this means that the two patterns of activity **x**_{i}(*θ*_{1}*, t*) and **x**_{i}(*θ*_{2}*, t*) across neurons at two different reach angles *θ*_{1} and *θ*_{2} at the same time *t* will not be appreciably different unless |*θ*_{i} − *θ*_{j}| > ∆. Roughly, one can think of ∆ as the average width of single neuron tuning curves across reach angle.

Thus, just as in the argument for time, because the total angle to be explored is limited to 2*π*, and patterns are largely similar across angular separations less than ∆, the maximal number of dimensions a single circle around the neural manifold at any fixed time could explore, is proportional to , where the proportionality constant is again . Now intuitively, the number of dimensions the full neural manifold could explore across both time and angle would be maximized if these explorations were independent of each other. Then the maximal dimensionality would be the product of and (see supplementary material for a proof), yielding an .

More generally, consider a task that has *K* task parameters indexed by *k* = 1*, …, K*, each of which vary over a range *L _{k}*, and for which neural activity patterns have a correlation length

*λ*. Then the NTC is

_{k}For example, in the special case of reaches to all angles we have considered so far, we have *K* = 2, *L*_{1} = *T*, *λ*_{1} = *τ*, *L*_{2} = 2*π*, *λ*_{2} = ∆, and . In the general case, our theory (see supplementary material) provides a precise method to define the autocorrelation lengths *λ*_{k}, in a manner consistent with the intuition that a correlation length measures how far one has to move in behavioral manifold, to obtain an appreciably different pattern of activity across neurons in the neural manifold (Fig. 3EF). Moreover, our theory determines the constant of proportionality *C*, as well as provides a proof that the neural dimensionality *D*, measured by the participation ratio of the PCA eigenvalue spectrum (see methods) is less than the minimum of the number of recorded neurons *M* and the *NTC*:

Also, in the supplementary material, we consider a much simpler scenario in which there are a finite set of *P* neural activity patterns, for example in response to a finite set of *P* stimuli. There, the NTC is simply *P*, and we compute analytically how measured dimensionality *D* increases with number of recorded neurons *M*, and how it eventually asymptotes to the NTC if there are no further constraints on the neural representation. In the following, however, we focus on the much more interesting case of neural manifolds in (1).

We note that the NTC in (1) takes into account two very distinct pieces of information. First, the numerator only knows about the task design; indeed it is simply the volume of the manifold of task parameters (e.g. Fig. 3E). The denominator on the other hand requires knowledge of the smoothness of the neural manifold; indeed it is the volume of an autocorrelation cell over which population neural activity does not change appreciably across *all K* task parameters. Thus the theorem (2) captures the simple intuition that neural manifolds of limited volume and curvature (e.g. Fig. 3CF) cannot explore that many dimensions (though they could definitely explore fewer than the NTC). However, as we see below the precise theorem goes far beyond this simple intuition, as it provides a quantitative framework to guide the interpretation of past experiments and design future ones.

## 4 A Dimensionality Frontier in Motor Cortical Data

To illustrate the interpretative power of the NTC, we re-examined the measured dimensionality of neural activity from the motor and premotor cortices of two monkeys, H and G, recorded in [Yu et al., 2007], during an eight-direction center-out reach task, as in Fig. 3D (see also, Methods). The dimensionality of the entire dataset, i.e. the number of dimensions explored by the neural manifold in Fig. 3F, was 7.1 for monkey H and 4.6 for monkey G. This number is far less than the number of recorded single units, which were 109 and 42 for monkeys H and G respectively. So a natural question is, how can we explain this order of magnitude discrepancy between the number of recorded units and the neural dimensionality, and would the dimensionality at least increase if we recorded more units? In essence, what is the origin of the simplicity implied by the low dimensionality of the neural recordings?

Here, our theorem (2) can provide conceptual guidance. As illustrated in Fig. 4A, our theorem in general implies that experiments in systems neurophysiology can live within 3 qualitatively distinct experimental regimes, each with its own unique predictions. First, the dimensionality *D* could be close to the number of recorded neurons *M* but far less than the NTC. This scenario suggests one may not be recording enough neurons, and that the dimensionality and accuracy of dynamic portraits may increase when recording more neurons. Second, the dimensionality may be close to the NTC but far below the number of neurons. This suggests that the task is very simple, and that the neural dynamics is very smooth. Recording more neurons would not lead to richer, higher dimensional trial averaged dynamics; the only way to obtain richer dynamics, at least as measured by dimensionality, is to move to a more complex task. Finally, and perhaps most interestingly, in the third regime, dimensionality may be far less than both the NTC and the number of recorded neurons. Then, and only then, can one say that the dimensionality of neural state space dynamics is constrained by neural circuit properties above and beyond constraints imposed by the simplicity of the task and the smoothness of the dynamics alone.

Returning to the motor cortical data, it is clear that scenario (i) is ruled out in this experiment. But without the definition and computation of the NTC, one cannot distinguish between scenarios (ii) and (iii). We computed the spatial and temporal autocorrelation lengths of neural activity across reach angle and time, and found them to be ∆ = 1.82 radians and *τ* = 126 ms in monkey H, and ∆ = 1.91 radians and *τ* = 146 ms in monkey G. Given that the average reach duration is *T* = 600 ms in both monkeys, the is 10.5 for monkey H and 8.6 for monkey G. Comparing these NTCs to the dimensionalities *D* = 7.1 for monkey H and *D* = 4.6 for monkey G, and the number of recorded neurons *M* = 109 for monkey H and *M* = 42 for monkey G (see Fig. 4B), we see that this experiment is deep within experimental regime (ii) in Fig. 4A.

This deduction implies several striking predictions for motor cortical dynamics. First, assuming the rest of the unrecorded neurons are statistically homogenous to the recorded neurons (implying that population smoothness *τ* and ∆ would not change much as we added more neurons to our measured population), then if we were to record more neurons, even roughly all 500 million neurons in macaque motor cortex, the dimensionality of the neural manifold in each monkey would not exceed 10.5 and 8.6 respectively.

Equivalently, if we were to drop a significant fraction of neurons from the population, the dimensionality would remain essentially the same. In essence, dimensionality would be largely *independent* of the number of recorded neurons. The second prediction is that if we were to vary the NTC, by varying the task, then this would have a significant impact on dimensionality: it would be proportional to the NTC.

We confirm both of these predictions in Fig. 4CD. First, in the given dataset, we cannot increase the NTC further, but we can reduce it by restricting our attention to subsets of reach extents and angles. In essence we explore restricted one-dimensional slices of the full neural manifold in Fig. 3F as follows. First, in Fig. 4B, we explore different random time intervals at different fixed angles, and we plot the dimensionality explored by the segment of neural trajectory against the duration *T* of the trajectory divided by its autocorrelation *τ*. Moreover, we vary the number of recorded neurons we keep in our analysis. Second, in Fig. 4C, we pick different times and we plot the number of dimensions explored by the neural manifold (now a circle) across all angles at each chosen time, against , where ∆ is the smoothness parameter of the neural circle at that time, again also varying the number of neurons in our analysis. As can be clearly seen, in both monkeys the predictions of the theory in experimental regime (ii) in Fig. 4A are confirmed: dimensionality is largely independent of the number of recorded neurons, and it hugs closely the dimensionality frontier set by the NTC.

Overall, these results suggest a conceptual revision in the way we may wish to think about neural complexity as measured by its dimensionality. Essentially, neural state space dynamics should not be thought of as inherently simple just because its measured dimensionality is much less than the number of recorded neurons. Instead, by properly comparing dimensionality to neural task complexity, we find neural state space dynamics in motor cortex is as complex and as high-dimensional as possible given basic task constraints and neural smoothness constraints. In essence, the neural state space dynamics represented in Fig. 3F is curving as much as possible within its speed limits set by spatiotemporal autocorrelation lengths, in order to control reaching movements.

We note that theorem (2) is not circular; i.e. it is not tautologically true that every possible measured neural state space dynamics, assuming enough neurons are recorded, will have dimensionality close to the NTC. In the supplementary material, we exhibit an analytical example of a very fast neural circuit, with a small temporal autocorrelation *τ*, recorded for a long time *T*, that nevertheless has dimensionality *D* much less that because the connectivity is designed to amplify activity in a small number of dimensions and attenuate activity in all others, similar to the way non-normal networks have been proposed to play a functional role in sequence memory [Ganguli et al., 2008b]. Finally, what kind of neural dynamics would have a maximal dimensionality, equal to its NTC? As we show in the Supplementary material, a random smooth manifold, with no other structure beyond smoothness, has such maximal dimensionality.

### 5 Beyond dimensionality: accurate recovery of the geometry of dynamical portraits

The above theory reveals a simple sufficient condition under which the dimensionality of dynamical portraits would remain unchanged if we recorded more neurons: namely if the number of recorded neurons exceeds the NTC, and the unrecorded neurons are statistically similar to the recorded neurons so as not to change population smoothness estimates. But importantly, even if we obtain the dimensionality of neural state space dynamics correctly by simply recording more neurons than the NTC, our theory so far does not provide any guarantee that we obtain their geometry correctly. Here we address the fundamental question of how many recorded neurons are sufficient to obtain the correct dynamical portrait of circuit computation at a given level of precision? By definition, the correct dynamical portrait is what we would obtain from dimensionality reduction applied to recordings of all the neurons in the behaviorally relevant brain region in question. Importantly, how does the sufficient number of recorded neurons scale with the complexity of the task, the desired precision, the total number of neurons in the brain region, and other properties of neural dynamics? And, interestingly, what minimal aspects of neural dynamics are important to know in order to compute this number?

To introduce our theory, it is useful to ask, when, intuitively, would recordings from a subset of neurons yield the same dynamical portrait as recordings from all the neurons in a circuit? The simplest visualizable example is a circuit of 3 neurons, where we can only measure 2 of them (Fig. 5A). Suppose the set of neural activity patterns encountered throughout the experiment consists of a single neural trajectory, that does not curve too much, and is somewhat randomly oriented relative to the single neuron axes (or equivalently, neural activity patterns at all times are distributed across neurons). Then the act of subsampling 2 neurons out of 3 is like looking at the shadow, or projection of this neural trajectory onto a coordinate subspace. Intuitively, it is clear that the geometry of the shadow will be similar to the geometry of the neural trajectory in the full circuit, no matter which 2 neurons are recorded. On the other hand, if the manifold is not randomly oriented with respect to single neuron axes, so that neural activity patterns may be sparse (Fig. 5B), then the shadow of the full neural trajectory onto the subspace of recorded neurons will not preserve its geometry across all subsets of recorded neurons. The challenge of course, is to make these intuitive arguments quantitatively precise enough to guide experimental design.

To develop our theory of neural measurement, lets first assume the optimistic scenario in Fig. 5A, and pursue and test its consequences. If, in general, the neural data manifold (i.e. Fig. 3F) is randomly oriented w.r.t. the single neuron axes, then a measurement we can currently do, namely record from *M* randomly chosen neurons (Fig. 5C, top), becomes equivalent to a measurement we do not yet do, namely record from *M* random linear combinations of all neurons in the circuit (Fig. 5C, bottom). The former corresponds to projecting the neural manifold onto a coordinate subspace as in Fig. 5A, while the latter corresponds to projecting it onto a randomly oriented *M* dimensional subspace. If the neural manifold is randomly oriented to begin with, the nature of the geometric distortion incurred by the shadow, relative to the full manifold, is the same in either case.

This perspective allows us to then invoke a well-developed theory of how smooth manifolds in a high dimensional ambient space become geometrically distorted under a random projection (RP) down to a lower dimensional subspace [Baraniuk and Wakin, 2007, Clarkson, 2008]. The measure of geometric distortion *ϵ* is the worst case fractional error in euclidean distances between all pairs of points on the manifold, measured in the subspace, relative to the full ambient space (see Methods). The theory states that, to achieve a desired fixed level of distortion, *ϵ*, with high probability (*>* 0.95 in our analyses below) over the choice of random projection, the number of projections *M* should exceed a function of the distortion *ϵ*, the manifold’s intrinsic dimension *K* (1 for trajectories, 2 for surfaces, etc..), volume *V*, and curvature *C*, and the number of ambient dimensions *N*. In particular the theory states that , where *c*_{1}, *c*_{2}, and *c*_{3} are fixed constants, is sufficient. Thus intuitively, manifolds with low intrinsic dimensionality that do not curve much and have limited volume do not require that many measurements to preserve their geometry. Intriguingly, the number of ambient dimensions has a very weak effect; the number of required measurements grows only logarithmically with it. This is exceedingly fortunate, since in a neuroscience context, the number of ambient dimensions *N* is the total number of neurons in the relevant brain region, which could be very large. The logarithm thus ensures that this large number alone does not impose a requirement to make a prohibitively large number of measurements. Translating the rest of this formula to a neuroscience context, *K* is simply the number of task parameters, and *CV*, or curvature times volume, is qualitatively related to the NTC in (1); the numerator is the volume of the manifold in task space, and the reciprocal of correlation length is like curvature (short correlation length implies high curvature). Making this qualitative translation, the theory of neural measurement as a random projection suggests that as long as the number of recorded neurons obeys
then we can obtain dynamical portraits with fractional error *ϵ*, with high probability over the choice of a random subset of recorded neurons. Remarkably, this predicts the number of recorded neurons need only scale logarithmically with the NTC to maintain a fixed precision.

Thus this theory makes a striking prediction that we can test in neural data: for a fixed number of task parameters *K*, and a fixed number of total neurons *N*, if we vary the number of recorded neurons *M* and the NTC, and compute the worst case fractional error *ϵ* in the recovered dynamical portraits relative to what we would get if we recorded all *N* neurons, then the iso-contours of constant distortion will be straight lines in a plane formed by *M* and the *logarithm* of the NTC. Of course we cannot currently record all *N* neurons in motor cortex, so we simply treat the dynamical portraits obtained in each monkey from all recorded neurons as the ground truth: i.e. we take *N* = 109 in monkey H and *N* = 42 in monkey G, as the total number of neurons, and we subsample a smaller number of neurons *M* from this pool. Also, we focus on the case *K* = 1, as the neural manifold in Fig. 3F is sampled smoothly only in time, and not angle; the reaches were done at only 8 discrete angles. Therefore we vary the NTC exactly as in Fig. 4C, by choosing random intervals of neural trajectories of varying durations *T* at each angle. For each interval duration *T*, which in this restricted context we can think of as simply proportional to the NTC, we use data from a random subset of *M* neurons, and compute the distortion *ϵ*(*M, T*) in the resulting dynamical portraits relative to the assumed ground-truth portrait obtained from all *N* recorded neurons. The theory above in Eq. (3) predicts exactly the same scaling in this scenario, with NTC replaced by time *T*.

Examples of the effects of different distortions *ϵ*, obtained by by sampling different sets of *M* neurons, are shown in Fig. 5D for dynamic portraits and Fig. 5E for individual PC’s. More generally, for each *M* and *T*, we conducted 200 trials and we plotted the 95’th percentile of resultant distribution of distortion *ϵ* as a heat map in Fig. 5FG. (i.e. 95% of trials had distortion less than what is reported). In panel F, we measured *M* random projections, or *M* random linear combinations of all *N* recorded neurons, for varying intervals of duration *T*, as the subsampled dataset, corresponding to the hypothetical experiment in Fig. 5C, bottom. It is clear that the iso-contours of constant distortion *ϵ* are well fit by straight lines in a plane formed by *M* and the logarithm of time *T*. This is a completely expected result, as this analysis is simply a numerical verification of an already proven theory. However, it forms a quantitative baseline for comparison in panel *G*, where we repeat the same analysis in panel *F*, except we record random subsamples of *M* neurons, as in experiment 5C, top. We obtain a qualitatively similar result as in panel *F*, which is remarkable, since this analysis is no longer a simple numerical verification of a mathematical theory. Rather, it is a stringent test of the very assumption that the neural manifold in Fig. 3F is randomly oriented enough with respect to single neuron axes so that random projections form a good theoretical model for the traditional measurement process of randomly sampling a subset of neurons. In essence it is a test of the assumption that the neural manifold is more like Fig. 5A than Fig. 5B, so that the two experiments in Fig. 5C yield similar geometric distortions in dynamical portraits as a function of recorded neurons *M* and neural task complexity NTC. In particular, the striking scaling of recorded neurons *M* with the logarithm of the NTC to maintain fixed precision in recovered dynamical portraits, predicted by the random projection theory of neural measurement, is verified. Moreover, we quantitatively compare the discrepancy between the two measurement scenarios in Fig. 5H, by creating a scatter plot of how many randomly sampled neurons versus random projections it takes to get the same distortion *ϵ* across all possible neural trajectory durations *T*, or equivalently, NTC’s. Even at a quantitative level, the data points are close to the unity line, relative to the total number of recorded neurons, suggesting that for this dataset, random projection theory is an impressively good model for the neural measurement process.

In the Supplementary Material, we study how these results are modified as neural activity patterns become more sparse and aligned with single neuron axes (i.e. less extreme versions of Fig. 5B). Remarkably, the linear scaling of number of neurons with the logarithm of NTC at fixed error is preserved, albeit with a higher slope and intercept. By comparing the neural data to simulated data with different levels of sparsity, we find that the neural data is indeed close to randomly aligned with respect to single neuron axes, as suggested by the closeness of the points in 5H to the unity line.

## 6 Discussion

### 6.1 An intuitive summary of our theory

Overall, we have generated a quantitative theory of trial averaged neural dimensionality, dynamics, and measurement that can impact both the interpretation of past experiments, and the design of future ones. Our theory provides both quantitative and conceptual insights into the underlying nature of two major order of magnitude discrepancies dominating almost all experiments in systems neuroscience: (1) the dimensionality of neural state space dynamics is often orders of magnitude smaller than the number of recorded neurons (e.g. Fig. 2), and (2) the number of recorded neurons is orders of magnitude smaller than the total number of relevant neurons in a circuit, yet we nevertheless claim to make scientific conclusions from such infinitesimally small numbers of recorded neurons. This latter discrepancy is indeed troubling, as it calls into question whether or not systems neuroscience has been a success or a failure, even within the relatively circumscribed goal of correctly recovering trial-averaged neural state space dynamics in such an undersampled measurement regime. To address this fundamental ambiguity, our theory identifies and weaves together diverse aspects of experimental design and neural dynamics, including the number of recorded neurons, the total number of neurons in a relevant circuit, the number of task parameters, the volume of the manifold of task parameters, and the smoothness of neural dynamics, into quantitative scaling laws determining bounds on the dimensionality and accuracy of neural state space dynamics recovered from large scale recordings.

In particular, we address both order of magnitude discrepancies by taking a geometric viewpoint in which trial-averaged neural data is fundamentally an embedding of a task manifold into neural firing rate space (Fig 3EF), yielding a neural state space dynamical portrait of circuit computation during the task. We explain the first order of magnitude discrepancy by carefully considering how the complexity of the task, as measured by the volume of the task manifold, and the smoothness of neural dynamics, as measured by a product of neural population correlation lengths across each task parameter, can conspire to constrain the maximal number of linear dimensions the neural state space dynamics can possibly explore. We define a mathematical measure, which we call neural task complexity (NTC), which, up to a constant, is simply the ratio of the volume of the task manifold and the product of neural population correlation lengths (Eq. (1)) and we prove (see Supplementary material) that this measure forms an upper bound on the dimensionality of neural state space dynamics (Eq. (2)). We further show in neural data from the motor cortex of reaching monkeys, that the NTC is much smaller than the number of recorded neurons, while the dimensionality is only slightly smaller than the NTC (Fig. 4). Thus the simplicity of the center out reach task and the smoothness of motor cortical activity, are by themselves sufficient to explain the low dimensionality of the dynamics relative to the number of recorded neurons. A natural hypothesis is that for a wide variety of tasks, neural dimensionality is much smaller than the number of recorded neurons because the task is simple and the neural population dynamics is smooth, leading to a small NTC. In such scenarios (experimental regime (ii) in Fig. 4A), only by moving to more complex tasks, not by recording more neurons, would we obtain richer higher dimensional trial averaged state space dynamics.

We address the second, more troubling, order of magnitude discrepancy by making a novel conceptual link between the time-honored electrophysiology tradition of recording infinitesimally small subsets of neurons in much larger circuits, and the theory of random projections, corresponding in this context to recording small numbers of random linear combinations of *all* neurons in the circuit. In scenarios where the neural state space dynamics is sufficiently randomly oriented with respect to single neuron axes (e.g. Fig. 5A) these two different measurement scenarios (Fig. 5C) yield similar predictions for the accuracy with which the dynamics are recovered as a function of the number of recorded neurons, the total number of neurons in the circuit, the volume of the task manifold, and the smoothness of the neural dynamics. A major consequence of the random projection theory of neural measurement is that the worst case fractional error in the geometry of recovered neural state space dynamics increases only with the *logarithm* of the total number of neurons in the circuit (Eq. (3)). This remarkable property of random projections goes hand in hand with why systems neuroscience need not be daunted by so many unrecorded neurons: we are protected from their potentially detrimental effect on the error of state space dynamics recovery by a strongly compressive logarithm. Moreover, the error grows linearly with the number of task parameters, and only again *logarithmically* with the NTC, while it decreases linearly in the number of recorded neurons (Eq. (3)). Thus recording a modest number of neurons can protect us against errors due to the complexity of the task, and lack of smoothness in neural dynamics.

This theory then resolves the ambiguity of whether systems neuroscience has achieved success or has failed in correctly recovering neural state space dynamics. Indeed it may well be the case that in a wide variety of experiments, we have indeed been successful, as we have been doing simple tasks with a small number of task parameters and NTC, and recorded in circuits with distributed patterns of activity, making random projections relevant as a model of the measurement process. Under these conditions, we have shown there is a modest requirement on the number of recorded neurons to achieve success. Our work thereby places many previous works on dimensionality reduction in neuroscience on much firmer theoretical foundations. Having summarized our theory, below we discuss its implications and its relations to other aspects of neuroscience.

### 6.2 Dimensionality is to neural task complexity as information is to entropy

To better understand the NTC, and its relation to dimensionality, it is useful to consider an analogy between our results and applications of information theory in neuroscience [Reike et al., 1996]. Indeed, mutual information has often been used to characterize the fidelity with which sensory circuits represent the external world. However, suppose that one reported that the mutual information rate between the sensory signal and the neural response were 90 bits per second, as it is in the fly H1 neuron [Strong et al., 1998]. This number by itself would be difficult to interpret. However, just as dimensionality is upper bounded by the neural task complexity, mutual information *I* is upper bounded by entropy *H*, i.e. . Thus if one measured the entropy rate of the response spike train to be 180 bits per second [Strong et al., 1998], then by comparing the mutual information to the entropy one could make a remarkable conclusion, namely that the neural code is highly efficient: the fidelity with which the response codes for the stimulus is within a factor of 2 of the fundamental limit set by the entropy of the neural response .

Similarly, the observation that the dimensionality of recordings of 109 neurons in Monkey H in Fig. 4 is 7.1, is by itself, difficult to interpret. However, if one computed the NTC to be 10.5, then by comparing dimensionality to the NTC, one could make another remarkable conclusion, namely that motor cortex is exploring almost as many dimensions as possible given the limited extent, or volume of behavioral states allowed by the task, and the limited speeds with which neural population dynamics can co-vary across behavioral states. Thus just as entropy, as an upper bound on mutual information, allows us to measure the fidelity of the neural code on an absolute scale from 0 to 1 through the ratio of information to entropy, the NTC, as an upper bound on dimensionality, allows us to measure the complexity of neural state space dynamics on an absolute scale from 0 to 1, through the ratio of dimensionality to NTC. When this ratio is 1, neural dynamics is as complex, or high dimensional, as possible, given task and smoothness constraints.

### 6.3 Towards a predictive theory of experimental design

It may seem that neural task complexity could not be useful for guiding the design of future experiments, as its very computation requires knowing the smoothness of neural data, which would not have yet been collected. However, this smoothness can be easily estimated based on knowledge of previous experiments. As an illustration, consider how one might obtain an estimate of how many neurons one would need to record in order to accurately recover neural state space dynamics during a more complex reaching task in 3 dimensions. For concreteness, suppose a monkey has to reach to all points on a sphere of fixed radius centered at the shoulder of the reaching arm. The manifold of trial averaged task parameters is specified by time *t* into the reach, which varies from 0 to *T* ms, and the azimuthal and altitudinal angles *φ* and *θ*, each of which range from 0 to 2*π*. Now lets assume the smoothness of neural population dynamics across time will be close to the average of what we observed for 2 dimensional reaches (*τ* = 126 and 146 ms in monkeys H and G). Also lets assume reaches will take on average *T* = 600 ms as it did in the case of two dimensional reaches. Then the we obtain the estimate . Now again, lets assume that both azimuthal (∆_{φ}) and altitudinal (∆_{θ}) neural correlation lengths would be the average of the angular correlation length of two dimensional reaches (∆ = 1.82 and 1.91 radians in monkeys H and G), yielding . Then the NTC, according to Eq. (1), is proportional to the product of these 3 numbers, where in 3 dimensions, the constant of proportionality could be taken to be . This product yields an estimate of .

If we trust this estimate, then this simple computation, coupled with our theorems proven above, allows us to make some predictions that can guide experimental design. For example, the theorem in Eq. (1) implies that no matter how many neurons we record in monkey motor and premotor cortices, the dimensionality of the trial averaged state space dynamics will not exceed 25. Moreover, the theorem in Eq. (3) tells us that if we wish to recover this state space dynamics to within fractional error *ϵ* = 0.2, relative to what we would obtain if we recorded all task relevant neurons, then we should record at least *M* = 300 neurons (see Supplementary Material). Now of course, we may not wish to trust this estimate, because we may have mis-estimated the neural correlation lengths. To be safer, we could easily underestimate the correlation lengths, and thereby obtain a safe overestimate of the NTC and requisite number of neurons to record. But overall, in this fashion, by combining an estimate of a likely NTC in future experiments with the new theorems in this work, we can obtain back of the envelope estimates of the dimensionality and accuracy of recovered state space dynamics, as neuroscience moves forward to unravel the neural basis for more complex behavioral and cognitive acts.

### 6.4 Departures from our assumption of statistical homogeneity

A critical assumption in using our theory to guide future experiments is that the set of unrecorded neurons is statistically similar to the set of recorded neurons, so that the denominator of the NTC in Eq. (1) will not change much as we fix the task and record more neurons. There are several important ways that this assumption could be violated. For example, there could be strong spatial topography in the neural code of the relevant circuit, so that as we expand our electrode array to record more neurons, the new neurons might have fundamentally different coding properties. Also, we may wish to record from multiple task relevant brain regions simultaneously, in which case our theory would have to apply to each brain region individually. Moreover, there may be multiple cell types in the relevant brain region. Unfortunately, most electrophysiology recordings do not give us access to cell type information (though spike width can sometimes serve as a proxy for the excitatory/inhibitory distinction). Thus the recovered neural state space dynamics reflects the combined action of all cell types. However, if we had access to cell type information, me may wish to define state space dynamical variables for each cell type. Then the theory would apply to each cell type alone. However, if cells of different types are strongly coupled, it is not clear that the collective dynamics of the circuit should be well explained by reduced degrees of freedom, or state, that are in one-to-one correspondence with cell types. This is an important empirical issue for further studies.

In essence, our theory applies to a spatially well mixed, statistically homogenous, localized brain region whose dynamics is relevant to the task. Fortunately, a wide variety of phylogenetically newer brain regions that evolved to learn new connectivities to solve tasks that evolution itself could not anticipate, for example prefrontal, parietal, pre-motor and motor cortices, and even older hippocampal circuits, exhibit precisely these kinds of mixed representations, in which the coding properties of individual neurons exhibit no discernible spatial topography, and almost every neuron codes for a mixture of multiple task parameters (e.g. [Machens et al., 2010, Mante et al., 2013, Rigotti et al., 2013, Raposo et al., 2014]). These are precisely the properties that make the neural manifold randomly oriented with respect to single neuron axes, and therefore make our random projection theory of neural measurement relevant, and the recovery of state space dynamics relatively easy despite subsampling.

But ironically, these very same properties make the goal of understanding what each and every individual neuron in the circuit does a seemingly difficult and questionable endeavor. While this endeavor has indeed traditionally been the putative gold standard of understanding, perhaps instilled in systems neuroscience by the tremendous success of Hubel and Wiesel in discovering single cell orientation tuning in primary visual cortex [Hubel and Wiesel, 1959], it is unclear that it will continue to be a profitable path going forward, especially in recently evolved brain regions where mixed representations dominate. But fortunately, the path of moving away from understanding single neurons to recovering collective state space dynamics, is a promising route forward, and indeed one that has firmer theoretical justification now, even in the face of extreme neural subsampling.

### 6.5 A why question: the neuroscientist and the single neuron

We have shown that when neural representations are distributed, or randomly enough oriented with respect to single neuron axes (Fig. 5A), so that random projections constitute a good model of the neural measurement process (Fig. 5C), then the life of the neuroscientist studying neural circuits becomes much easier: he or she can dramatically subsample neurons, yet still recover global neural state space dynamics with reasonable accuracy. However, neural systems evolved on earth long before neuroscientists arrived to study them. Thus no direct selection pressures could have possibly driven neural systems to self-organize in ways amenable to easy understanding by neuroscientists. So one could then ask a teleological question: why did neural systems organize themselves this way?

One possible answer lies in an analogy between the neuroscientist and the single neuron, whose goals may be inadvertently aligned. Just as a neuroscientist needs to read the state of a cortical neural circuit by sampling *O*(100) randomly chosen neurons, a downstream cortical neuron needs to compute a function of the state of the upstream circuit while listening to *O*(10, 000) neurons. Intuitively, if neural activity patterns are low dimensional enough, and distributed enough across neurons, then the single neuron will be able to do this. Indeed, a few works have studied constraints on neural representations in the face of limited network connectivity. For example, [Valiant, 2005] showed that the sparser neural connectivity is, the more distributed neural representations need to be, in order for neural systems to form arbitrary associations between concepts. Also [Sussillo and Abbott, 2012] showed that if neural representations in a circuit are low dimensional and randomly oriented with respect to single neuron axes, then a neuron that subsamples that circuit can compute any function of the circuit’s state that is computable by a neuron that can listen to all neurons in the circuit. And finally, [Kim et al., 2012] showed that the hippocampal system appears to perform a random projection, transforming a sparse CA1 representation of space into a dense subicular representation of space, yielding the ability to communicate the output of hippocampal computations to the rest of the brain using very few efferent axons.

These considerations point to an answer to our teleological question: in essence, our success as neuro-scientists, in the accurate recovery of neural state space dynamics under extreme subsampling, may be an exceedingly fortunate corollary of evolutionary pressures for single neurons to communicate and compute accurately under the constraints of limited degree network connectivity.

### 6.6 Beyond the trial average: towards a theory of single trials

A natural question is how this theory would extend to the situation of single trial analyses. Several new phenomena can arise in this situation. First, in any single trial, there will be trial to trial variability, so that neural activity patterns may lie near, but not on, the trial averaged neural manifold, illustrated for example in Fig. 3F. The strength of this trial to trial variability can be characterized by a single neuron SNR, and it can impact the performance of various single trial analyses. Second, on each and every trial, there may be fluctuations in internal states of the brain, reflecting potentially cognitive variables like attention, or other cognitive phenomena, that are uncontrolled by the task. Such fluctuations would average out in the trial averaged manifold, but across individual trials would manifest as structured variability around the manifold. It would be essential to theoretically understand methods to extract these latent cognitive variables from the statistics of structured variability. Third, the trial averaged neural manifold may have such a large volume, especially in a complex task, so that in a finite number of trials *P* we may not be able to sufficiently cover this volume. One would like to know then, what is the minimum number of training trials *P* one would require, to successfully decode behavioral or cognitive variables on subsequent, held-out, single trials. Moreover, how would this minimum number of trials scale with properties of the trial averaged manifold, obtained only in the limit of very large numbers of trials?

We have already begun to undertake a study of these and other questions. Our preliminary results, some of which were stated in [Gao and Ganguli, 2015], suggest that the basic theory of trial averaged neural data analysis forms an essential springboard for addressing theories of single-trial data analysis. For example, in the case of the last question above, we find that the number of training trials *P* must scale with the NTC of the trial averaged neural manifold, in order for subsequent single trial decoding to be successful. Moreover, we have analyzed theoretically the recovery of internal states reflected in the spontaneous activity of large model neural circuits, while subsampling only *M* neurons for a finite amount of time (a dimensionless ratio of recording time to single neuron correlation time *τ*). We find that the dynamics of these internal states can be accurately recovered as long as (a) both *M* and exceed the intrinsic dimensionality explored by the manifold of latent circuit states, and (b) the square-root of the product of neurons *M* and exceeds a threshold set by both this dimensionality and the single neuron SNR [Gao and Ganguli, 2015]. In turn the dimensionality of the manifold of latent circuit states is upper bounded by its NTC, so the NTC of a latent neural manifold determines the viability of single trial analyses, just as it does in the recovery of neural manifolds explicitly associated with externally measured task parameters. And finally, one may be tempted to conjecture that, due to finite SNR, single trial decoding performance may grow without bound as the number of recorded neurons increase - a result that is qualitatively different from the trial averaged theory, which suggests that only modest only numbers of neurons are required to accurately recover neural state space dynamics. However, there are several reasons to believe that such a qualitative discrepancy may not bear out. For example, neural noise may be embedded in the same direction as the signal, resulting in information limiting correlations [Moreno-Bote et al., 2014]. Moreover, empirically, in single trial decoding in the brain machine interface community, decoding performance already achieves a point of diminishing returns at even modest numbers of recorded neurons. The precise theoretical reasons for this remain an object of future study.

But overall, our initial results in a theory of single-trial analyses, to be presented elsewhere, suggest that the theory of trial-averaged neural dimensionality, dynamics and measurement, presented here, not only provides interpretive power for past experiments, and guides the design of future trial averaged experiments, but also provides a fundamental theoretical building block for expansion of the theory to single trial analyses. In essence, this work provides the beginnings of a theoretical framework for thinking about how and when statistical analyses applied to a subset of recorded neurons correctly reflect the dynamics of a much larger, unobserved neural circuit, an absolutely fundamental question in modern systems neuroscience. A proper, rigorous theoretical understanding of this question will be essential as neuroscience moves forward to elucidate the neural circuit basis for even more complex behavioral and cognitive acts, using even larger scale neural recordings.

## 7 Materials and Methods

### 7.1 Dimensionality Measure

Our measure of dimensionality is derived from the eigen-spectrum of the neuronal covariance matrix. This matrix underlies PCA, and indicates how pairs of neurons covary across time and task parameters (see Supplementary material). The eigenvalues of this matrix, *µ*_{1} ≥ *µ*_{2} ≥, …, *µ*_{M}, reflect neural population variance in each eigen-direction in firing rate space. The participation ratio (PR),
is a natural continuous measure of dimensionality. Intuitively, if all variance is concentrated in one dimension, so that *µ _{α}* = 0 for

*α*≥ 2, then PR=1. Alternatively, if the variance is evenly spread across all

*M*dimensions, so that

*µ*

_{1}=

*µ*

_{2}= …

*µ*

_{M}, then PR=

*M*. For other PCA eigenspectra, the PR sensibly interpolates between these two regimes, and for a wide variety of uneven spectra, the PR corresponds to the number of dimensions required to explain about 80% of the total population variance (see Supplementary material).

### 7.2 Preprocessing of the Motor Cortical Dataset

We use multi-electrode array recordings from two monkeys’ (H and G) PMd and M1 areas as they performed an eight-direction center-out delayed reach task [Yu et al., 2007]. There are between 145 and 148 trials in monkey H’s dataset and between 219 and 222 trials in monkey G’s dataset for each of the eight reach directions. Neural activities from each trial are time aligned to hand movement onset (time of 15% maximal hand velocity), and restricted to the −250ms to 350ms range time window around movement onset. Each spike train is smoothed with a 20ms gaussian kernel, and averaged with trials of the same reach angle to obtain the averaged population firing rates for the eight conditions. To homogenize the activity levels between neurons of different firing rates, and to highlight variability in the data resulting from task conditions, we further applied the square-root transform to population firing rates [Thacker and Bromiley, 2001], and subtracted their cross-condition average [Churchland et al., 2012].

### 7.3 Distortion Measure

To quantify the geometric distortion of dynamic portraits incurred from projecting them down from the *N* dimensional space of all neurons to the *M* dimensional subspace of recorded neurons (or *M* dimensional random subspace for a random projection), we adopt the pairwise distance distortion measure widely employed in the theory of random projections. Let **P** be the *M*-by-*N* linear projection operator that maps points from the full *N*-dimensional neural space into the *M*-dimensional subspace. For any pair of neural activity patterns **x**^{i} and **x**^{j} in the full *N*-dimensional space, the pairwise distance distortion induced by **P** is defined as
where the ratio compensates for the global distortion introduced simply by the reduction in dimensionality, and ║**v**║ denotes the Euclidean length of a vector **v**. A distortion of 0 indicates that the pairwise distance is the same both before and after the projection (up to an overall scale). The worst case distortion over all pairs of points (*i, j*) on the neural manifold is given by,

Since under either random projection or random sampling, **P** is a random mapping, *d*^{max}(**P**) is a random variable. We characterize the distortion by the 95th percentile of the distribution of this random variable, i.e. that *ϵ* for which

Thus with high probability (95%), over the random set of *M* measurements, the worst case distortion over all pairs of points on the neural manifold, will not exceed *ϵ*. In Fig. 5, for each value of *M* and *T*, we estimated *ϵ* by computing *d*^{max}(**P**) 200 times for different random choices of **P**, and set *ϵ* to be the 95th percentile of this empirical distribution of distortions.

## 8 Acknowledgements

We thank S. Lahiri for inputs on dimensionality upper bound proofs. This work was supported by the Stanford Graduate Fellowship (P.G. and E.T.), the Mind, Brain and Computation Trainee Program (NSF IGERT 0801700, P.G. and E.T.), the Ruth L. Kirschstein National Research Service Award Predoctoral Fellowship (E.T.), NIH Director’s Pioneer Award (8DPIHD075623) and the Burroughs Wellcome, Simons, McKnight, and James S. McDonell foundations.

## Appendix

### Analytical expressions for fraction of variance explained

For the exponential correlation function, evaluates to,

Taking advantage of ’s ordering, we evaluate the integral,

The minimum *ω** given the fraction parameter *r* is then,

With the *τ* ≫ 1, we ignore terms beyond to obtain the clean, final expression,

For the gaussian temporal correlation function, we state without showing that the Fourier transform is in fact nicely ordered. To compute the fraction of variance explained, we switch the order of the summation over *t* and the integration of *ω* to obtain,

With the optimal *ω** ≈ 2 inverf(*r*)*/τ*, we then have the final expression,

## Footnotes

↵* sganguli{at}stanford.edu

## References

- [Achlioptas, 2003].↵
- [Assisi et al., 2007].↵
- [Baraniuk and Wakin, 2007].↵
- [Bathellier et al., 2008].↵
- [Briggman et al., 2006].↵
- [Bromberg-Martin et al., 2010].↵
- [Chapin and Nicolelis, 1999].↵
- [Churchland et al., 2012].↵
- [Clarkson, 2008].↵
- [Cunningham and Yu, 2014].↵
- [Dasgupta and Gupta, 2003].↵
- [Ganguli et al., 2008a].↵
- [Ganguli et al., 2008b].↵
- [Gao and Ganguli, 2015].↵
- [Gray, 1972].↵
- [Haddad et al., 2010].↵
- [Hegdé and Van Essen, 2004].
- [Hubel and Wiesel, 1959].↵
- [Indyk and Motwani, 1998].↵
- [Johnson and Lindenstrauss, 1984].↵
- [Kim et al., 2012].↵
- [Li et al., 2006].↵
- [Machens et al., 2010].↵
- [Mante et al., 2013].↵
- [Marchenko and Pastur, 1967].↵
- [Matsumoto et al., 2005].↵
- [Mazor and Laurent, 2005].↵
- [Moreno-Bote et al., 2014].↵
- [Narayanan and Laubach, 2009].↵
- [Paz et al., 2005].↵
- [Peyrache et al., 2009].↵
- [Raman et al., 2010].↵
- [Raposo et al., 2014].↵
- [Reike et al., 1996].↵
- [Rigotti et al., 2013].↵
- [Sadtler et al., 2014].↵
- [Sasaki et al., 2007].↵
- [Shepherd, 2004].↵
- [Stevenson and Kording, 2011].↵
- [Stopfer et al., 2003].↵
- [Strang and Aarikka, 1986].↵
- [Strong et al., 1998].↵
- [Sussillo and Abbott, 2012].↵
- [Thacker and Bromiley, 2001].↵
- [Valiant, 2005].↵
- [Warden and Miller, 2010].↵
- [Yu et al., 2007].↵