## Abstract

Neural datasets often contain measurements of neural activity across multiple trials of a repeated stimulus or behavior. An important problem in the analysis of such datasets is to characterize systematic aspects of neural activity that carry information about the repeated stimulus or behavior of interest, which can be considered “signal”, and to separate them from the trial-to-trial fluctuations in activity that are not time-locked to the stimulus, which for purposes of such analyses can be considered “noise”. Gaussian Process factor models provide a powerful tool for identifying shared structure in high-dimensional neural data. However, they have not yet been adapted to the problem of characterizing signal and noise in multi-trial datasets. Here we address this shortcoming by proposing “signal-noise” Poisson-spiking Gaussian Process Factor Analysis (SNP-GPFA), a flexible latent variable model that resolves signal and noise latent structure in neural population spiking activity. To learn the parameters of our model, we introduce a Fourier-domain black box variational inference method that quickly identifies smooth latent structure. The resulting model reliably uncovers latent signal and trial-to-trial noise-related fluctuations in large-scale recordings. We use this model to show that predominantly, noise fluctuations perturb neural activity within a subspace orthogonal to signal activity, suggesting that trial-by-trial noise does not interfere with signal representations. Finally, we extend the model to capture statistical dependencies across brain regions in multi-region data. We show that in mouse visual cortex, models with shared noise across brain regions out-perform models with independent per-region noise.

## 1 Introduction

Recent advances in electrophysiological and calcium fluorescence imaging technologies have enabled the collection of increasingly high-dimensional neural datasets. Making sense of such datasets will rely on the development of flexible statistical methods for extracting relevant structure. Gaussian process factor models provide one powerful tool for identifying low-dimensional latent structure from high-dimensional neural response data. These models seek to characterize neural time-series data in terms of a small number of smoothly evolving latent variables, and have been successfully used to characterize neural representations in a variety of contexts [1, 2, 3, 4, 5, 6].

Standard Gaussian Process factor analysis (GPFA) uses a Gaussian process prior to impose smoothness on inferred latent variables, but do not explicitly consider stimulus or task conditions. However, neural data often exist in the form of repeated trials, whereby the same condition is presented to an animal multiple times. These repeated presentations give rise to neural activity that varies across trials around some time-varying “signal” component that is typically estimated using the peri-stimulus time histogram (PSTH). Understanding this signal, and its relationship to trial-to-trial variability, is of central importance to the models of coding in the nervous system [7, 8], yet latent factor models have not been developed to explicitly study this question. Here we address this shortcoming by developing an extension to Gaussian process factor analysis with Poisson spiking (P-GPFA) which we call signal and noise P-GPFA (SNP-GPFA). This model incorporates both signal and independent per-trial components that vary across trials. We refer to these latter components as “noise”, in the sense that they are not time-locked to the repeated stimulus, though they may well reflect other signals unrelated to the experimental stimulus of interest.

In both P-GPFA and SNP-GPFA models, because the Gaussian process is not a conjugate prior for a Poisson observation model, posterior inference is intractable in closed form. Variational inference methods have become increasingly common for applications of Gaussian processes [9, 10, 5]. They achieve tractability by approximating the posterior distribution *p*_{θ}(x|y)with a well-behaved variational distribution *q*_{ϕ}(x|y) [11]. For P-GPFA and SNP-GPFA, because the calculation of the expectation under *q*_{ϕ}(x|y) of the joint distribution *p*_{θ}(**x**, **y**) is intractable, we use a ‘black-box’ approach, which works via sampling of the joint distribution [12].

However, black-box variational inference approaches for Gaussian Process Factor models with long time-series can be computationally cumbersome. Therefore, we introduce a variant of black-box variational inference which uses a Fourier-transformed latent representation that factorizes across Fourier modes. This procedure diagonalizes the Gaussian Process (GP) covariance, avoiding a large matrix inversion during inference, thereby providing speed and computational improvements. We demonstrate the inference technique is fast and flexible in a simpler P-GPFA framework, and then use it to learn the SNP-GPFA model quickly and efficiently.

The SNP-GPFA model recovers separate signal and noise subspaces, which allows us to answer a number of scientific questions regarding these facets of neural activity. Here, we address two scientific questions with SNP-GPFA. **1)** We characterize the overlap between signal and noise subspaces, and **2)** We characterize the extent to which noise is shared across cortical region using multi-region neural recordings.

For the first, the alignment of subspaces that reflect different aspects of neural activity has been explored in other contexts [6, 13], as well as the characterization of the subspace of neural noise [14]. Previous work suggests that signal and noise subspaces may be orthogonal [14], and such orthogonal representations may not limit neural information [15]. Our model directly addresses this question. Using SNP-GPFA on primate data we find that there is indeed more noise activity orthogonal to the signal subspace than in the signal subspace, particularly when a visual stimulus is present. This suggests that trial-by-trial variability does not interfere with stimulus encoding.

To address the second scientific question, we include SNP-GPFA analyses on simultaneously recorded visual regions in rodent cortex to ask if trial-varying activity is shared or independent across cortical regions. We compare performance of SNP-GPFA models that varying in their number of shared and independent noise latents across cortical regions. We find that the model that has shared noise latents performs best on cross-validation measures, suggesting trial-by-trial variability has shared structure across cortical regions.

## 2 Poisson Gaussian Process Factor Analysis (P-GPFA)

We begin by introducing the Poisson-GPFA model, which has been used previously to identify continuous latent states from population spike train recordings [10, 16, 6]. The observations of our model are spike-train data, represented by the neurons-by-time matrix **Y** ∈ ℕ^{N×T}.

We seek to learn a P-dimensional latent variable that linearly maps to the data via a loadings matrix , followed by some nonlinear function *f* and Poisson observations.

Our choice of non-linear function *f* is the softplus *f*(*x*) = log(1 + exp(*x*)).

Each latent *x*_{j}(*t*) (*j* ∈ {1 … *P*}, *t* ∈ {1, 2 … *T*}) evolves according to a Gaussian process, , with covariance matrix **K**(*θ*) defined by a squared exponential kernel [**K**(*θ*)]_{tt′} = *ρ* exp(−|**z**(*t*) − **z**(*t*′)|^{2}/2*ℓ*^{2}), where hyperparameters *θ* = {*ℓ, ρ*} include a length scale *ℓ* controlling smoothness and a marginal variance *ρ* controlling magnitude.

Given that the marginal likelihood of this model, *p*(**Y|W**) = *∫ p*(**Y|W**, **X**)*p*(**X**|*θ*)*d***X** is not available in closed-form, it is common to use a variational inference approach to learn the parameters of such models [16, 10]. Recall that variational inference seeks to maximize an evidence lower bound (ELBO) using a variational distribution [11]. Here, because the expectation term in the ELBO, , cannot be calculated analytically, we employ a ‘black box approach’ which uses Monte-Carlo samples to estimate the expectation term [17]. This inference method is called black-box variational inference (BBVI).

### 2.1 Fourier-domain black-box variational inference

BBVI can be computationally cumbersome. Therefore, to learn the P-GPFA and SNP-GPFA models, we introduce a novel inference method which performs BBVI over a Fourier-represented latent space, which increases both inference tractability and speed. Factorizing and learning the time series in the Fourier domain, rather than the time domain, allows us to take advantage of computational savings conferred by a diagonal covariance matrix while overcoming problems of uncorrelated timepoints which is typical when the variational distribution is factorized over time [18].

Our motivation for this approach is that the GP prior over *x*_{j}(*t*) describes a stationary process, as its covariance only depends on pairwise distances. This allows us to diagonalize the covariance **K** by the Fourier transform (Fig 1**A**). Here, the covariance matrix **K** is diagonalized by where **B** is the orthonormal discrete Fourier transform matrix with , . The diagonalized kernel is represented as where is the frequency-domain variance and represents an adjusted frequency of the GP kernel.

Inference can be conducted completely in the Fourier domain, precluding the need to invert the prior covariance **K**. The joint likelihood is expressed as
where represents the Fourier-transformed latents and is a length-*T* vector of ones, and *i* ∈ {1, 2 … *N*} denotes neuron index. The diagonalized representation demonstrably speeds up computational time (Fig 1 **B**). Moreover, the inversion of the time-domain **K** can present tractability challenges due to computer precision [19], however, the inversion of is trivial so long as the vector along the diagonal, , does not contain values that are too small. When small values are present, we regularize by adding a small constant value (10^{−7}).

This Fourier-represented GP has additional computational advantages, including methods to prune unnecessary Fourier coefficients that do not substantially contribute to explaining variability in **Y**. Pruning frequencies constrains the number of coefficients in the Fourier representation to a much smaller number than would be necessary in the time-domain. For a more detailed treatment, see [20]. For our purposes, pruning of the Fourier representation has the consequence of pruning the variational distribution, increasing inference efficiency. Fourier BBVI also preserves time-correlations while permitting a diagonal variational covariance. Ultimately, this Fourier-domain BBVI method can be viewed as an alternative to many other methods that work to make Gaussian processes computationally efficient, such as inducing points and sparse GP approximations [21, 22, 23].

We use our Fourier-domain BBVI to learn the Fourier latents via direct optimization of the variational distribution , factor loading parameters **W**, and hyperparameters *ℓ*. (Note there is an invariance between hyperparameter *ρ*, and the loadings matrix **W**, so we need not directly learn *ρ* in this model.) The speed up from optimization with Fourier domain BBVI can be realized most starkly in the domain where the time-series is very long. Figure 1**B** compares Fourier-domain inference to time-domain inference for a Poisson observation GPFA model with a single latent (i.e. *P* = 1, *T* = 1500) and *N* = 10 neurons. Inference is sped up by conversion to the Fourier domain as the bottleneck in time-domain inference is the inversion of a 1500 × 1500 covariance matrix **K**. By additionally specifying a minimum frequency, sufficiently small frequencies are pruned and the variational distribution and prior covariance can be cut from 1500 values to 62. This provides an additional substantial speed advantage of approximately an order of magnitude. It is important to note that the speed-up of our method depends on the specifics of the number of neurons, latents, latent length, and pruning. For subsequent analyses in this paper, the speed-up of BBVI due to the Fourier-domain implementation is anywhere from 20-70%.

We verify our Fourier-BBVI inference method using simulated neural spikes drawn from a Poisson distribution with four GP latent dimensions. Our inference approach is able to reconstruct rates of individual neurons as well as latent structure (Fig 2). The inference procedure also identifies correct latent dimensionality, as the ELBO is maximal at the true number of latent dimensions (Fig 2**C**).

The non-conjugacy of P-GPFA (and SNP-GPFA), and thus the reason we need to use the sophisticated inference of Fourier-BBVI, is due to the fact that the observations are Poisson, as opposed to Gaussian. This is an important choice as Poisson observations better describe neural data. We show, using data from rodent visual cortex (Fig 2**D**), the cross-validated mean squared error of the inferred spike rate to smoothed spike rate from held-out trials. The model with Poisson observations performs significantly better that the GPFA model with Gaussian observations. Others have noted similar advantages to Poisson observation factor models for neural data in other settings [10]. For this reason we wish to use a Poisson observation characterization for our SNP-GPFA model.

## 3 SNP-GPFA

To isolate noise and signal subspaces in the P-GPFA framework, we introduce a model that includes separate noise and signal latent structure (SNP-GPFA). We assess the model on two neural data sets. One contains multi-neuron spiking activity from 65 neurons recorded in primate V1 during passive viewing of a drifting sinusoidal grating stimulus, with 72 different orientations for *D* = 35 repeated trials. The second consists of spiking activity from 67 neurons from two regions of rodent visual cortex, recorded during passive viewing of *D* = 20 repeats of a 32-second sinusoidal grating stimulus. Gratings had 8 different orientations which persisted for 4 seconds each. For more information, see [24, 25] and the supplemental materials.

The SNP-GPFA model describes neural activity on trial *j* as
where *P* signal latents are drawn from a “signal” Gaussian process, with covariance **K**_{s} and concatenated to form , which are shared across trials. On each trial, *Q* independent noise latents are drawn from a “noise” Gaussian process, with covariance **K**_{n}, forming . Loading weights **W**_{s}^{⊤} and **W**_{n}^{⊤} parametrize a mapping from the dimensionality of the signal space *P* or the noise space *Q* to the full *N*-dimensional neural response space. Thus, **W**_{s} is of size *P × N* and **W**_{n} is *Q × N*. Covariance matrices **K**_{s} and **K**_{n} are constructed by evaluating a radial basis covariance at all pairs of time points in a trial. The SNP-GPFA model is outlined schematically in Fig 3**A**. For clarity, we visualize the model with only one signal and noise latent dimension, but the dimensions of the two spaces can be arbitary.

To perform inference for the SNP-GPFA model, we develop a variational approach similar to that for P-GPFA. We use a variational distribution *q*_{ϕ} for the latents, parametrized as a fully-independent multivariate normal distribution of dimension where , which corresponds to the number of Fourier coefficients needed to represent the signal. We determine by assuming a minimum length scale of 10 (*ℓ* ≥ 10), which substantially shrinks the number of Fourier coefficients required to represent the latent signal and noise processes (from 321 to 44 dimenions for rodent data, and from 511 to 108 dimensions for primate data). This choice is appropriate, as we typically we do not learn length scales smaller than this value.

We fit the SNP-GPFA model to data, and found that the the signal component successfully captures the PSTH, or mean response across trials (Fig. 3**B**). Importantly, the model also identifies a noise component that accurately predicts trial-by-trial spiking variability. Figure 3**C** shows 3 example trials for the top neuron in 3**B**. The per-trial rate deviations, given by a **w**_{n,i}^{⊤}**X**_{n}, where **w**_{n,i} is an isolated row of **W**_{n}, accurately capture per-trial spiking deviations. This can be easily seen in trial three, where a sharp burst at the end of the trial is captured by the noise component of the model.

### 3.1 Learning dimensionality

To identify the dimensionality of signal and noise latents, we used a cross-validation known as co-smoothing training [26]. We first trained on a subset of randomly selected trials (10 for rodent data, 20 for primate data) and used Fourier-BBVI to learn **W**_{s}^{⊤}, **W**_{n}^{⊤} and **X**^{s}. To test the accuracy of the learned parameters (**W**_{s}^{⊤}, **W**_{n}^{⊤}, *ℓ*) and signal latents, we withhold a small random selection of neurons and then learn the noise latents **X**^{n} over the held-out trials. We evaluate the cross-validated log-likelihood of the held out neurons using these new noise latents and the inferred structure from the initial trials. We perfomed five-fold cross validation and averaged over folds. For additional information about the data preprocessing and cross-validation, see supplemental materials.

We find the signal dimensionality by first leaving out the noise component, and increasing the number of signal dimensions until there is a decrease in CV performance. We then incrementally increase the number of noise dimensions until CV performance decreases. Interestingly, noise components increased CV performance suggesting population-level structure in trial-to-trial variability. For the primate data, this procedure identified 5 signal dimensions and 6 noise dimensions. (See section 3.3 for details of dimensionality for the multi-region rodent data.)

### 3.2 Visualizing signal and noise subspaces

We visualize the resulting signal and noise subspaces for neural population data recorded from primates. For this experiment, a drifting gratings stimulus was present for the first half of the of a 2.5 second trial (see [24] and the supplemental materials for more details). For a particular stimulus orientation (0°), we show the first 3 PCs of the five dimensional signal subspace. Here, we only show three PC dimensions for clarity. Note that during the stimulus presentation, there is a strong sinusoidal component to the latent neural activity. However, after the stimulus presentation period, this structure is no longer present. This latent signal structure in our SNP-GPFA model is nearly identical to the results of P-GPFA with no noise component, and to the plot in [10], which also uses traditional P-GPFA run on the same data. However, unlike P-GPFA, SNP-GPFA additionally extracts a noise subspace (Fig 4 B). This subspace has no obvious structure and does not include the same sinusoidal component in the first half of the trial. As expected, plotting the first noise latent PC across three example trials, there is no obvious pattern to the noise deviations, reflecting idiosyncratic variations in population firing rates across trials (Fig 4 C).

An important question that arises with this model is whether or not the noise subspace overlaps with the signal subspace. Overlap of these subspaces implies that trial-to-trial variability in the noise components can corrupt the population response along the signal dimensions, thereby interfering with representation of the signal. Previous work [15] suggests that noise only interferes with the signal reprsentation if it lies in a direction defined by the derivatives of neural tuning curves, and recent work suggests that noise and signal subspaces may indeed be nearly orthogonal [14, 27, 28]. Because our model contains separate latent components for signal and noise, we can explicitly compare the relative angle between these subspaces. More specifically, we look to assess how strongly the pure-noise component of neural activity projects into the signal subspace. Under the SNP-GPFA model, the noise-subspace component of neural activity is **Y**_{n} = **W**_{n}^{⊤}**X**_{n}. To assess overlap with the signal subspace, we compute the singular value decomposition of the signal-component loading weights, **W**_{s} = **USV**^{⊤}, which provides a basis for the signal subspace via the columns of **V**. The portion of variance of the noise within the signal subspace for each time point is then given by , where **Y**_{n,t} is the *t*^{th} column of **Y**_{n}, or the L2 norm along the six noise dimensions. The portion of variance orthogonal to the signal subspace is thus simply .

Figure 4**D** shows the resulting L2-norm time-series both within and orthogonal to the signal subspace. This is for a single example trial for the same single stimulus as in **A-C** (orientation of 0°). To visualize the fractional variance into and out of the signal subspace, we normalize each trace by the total variance at each time point, (Fig 4 **E**). For this trial, the noise activity tends to mostly lie in the subspace orthogonal to the stimulus activity. However, this is not true when the overall noise variance is high. At these moments, the noise exists mostly in the signal subspace. Figure 4**F** shows the fractional noise variance orthogonal to the signal subspace averaged over all trials for the orientation of 0° (black line) and all trials and orientations (grey line). It is primarily during the stimulus presentation time that the noise activity is preferentially orthogonal to the stimulus subspace. When there is no stimulus, after the halfway point of the trial, there is a slight preference for the noise activity to lie in the signal subspace.

### 3.3 Shared and independent noise in multi-region data

The rodent dataset we examined contained data from two simultaneously-recorded visual cortical regions, an upstream area “V1” and a downstream area “AL”. The SNP-GFPA model can therefore be extended to allow for a characterization of shared variability across these regions. For simplicity, let’s consider two versions of the model: (1) a “shared-noise” model, which is the SNP-GPFA model applied to both regions simultaneously (see eq 4); or (2) an independent noise model, which includes a block-diagonalization of **W**_{n} into a V1 component **W**_{n}^{V1} and AL component **W**_{n}^{AL}. The independent noise model describing neural activity for trial *j* is thus:
where are noise latents that map exclusively to V1 activity, and are noise latents that map exclusively to AL activity. By contrast, departures from a block-diagonal structure in the loadings matrix reflects the degree to which latent variability is shared across brain regions.

We first validate the approach on simulated data. We generate two datasets, one with a full **W**_{n} matrix, and this other with block-diagonal structure. We perform inference on each of these datasets, one using the original model outlined in eq 4, and the other using the block diagonal structure in eq 5. We show that the model with its corresponding generated data exhibits higher CV performance (Fig 5 **A,B**). For additional details see supplement.

On the multi-region rodent data, we compare the SNP-GPFA model with a block-diagonal **W**_{n} to models where **W**_{n} has an increasing number of shared noise latents. We determine a six dimensional signal component. For the noise dimensionality, we start with a complete block representation (eq 5) of 5 V1 dimensions and 4 AL dimensions, which contain no shared components. We then compare CV performance of this model to ones where increasing numbers of noise latents are shared between the regions (Fig 5**C**). For information regarding how we select the proper number of noise and signal dimensions in this framework, see supplemental materials. We find that models with at least two shared noise dimensions perform better than models with one or fewer shared noise dimensions. Means and standard error shown over five-fold CV (Fig 5). This suggests that there is trial-varying structure in neural population activity that is shared across cortical regions.

## 4 Conclusion

We have introduced a Gaussian process factor analytic model for spike train data that extracts separate signal and noise latent structure from trial-structured data. To learn this model we employ a novel inference method based on black-box variational inference in the Fourier domain, which allows for faster and more stable inference by diagonalizing the posterior covariance and pruning unnecessary frequencies. The resulting SNP-GPFA model is able to extract signal latents that characterize population PSTHs in real neural data, and noise latents that capture trial-to-trial variability that is shared across neurons. We then use the results of the model fit to answer scientific questions about trial-based neural data. We find that noise activity tends to project primarily in a subspace orthogonal to signal activity, especially when a stimulus is present, suggesting an optimal type of neural encoding [15]. We additionally use our model on multi-region rodent data and compare performance where noise is shared between cortical regions as opposed to being independent to each region. We find that noise models with shared structure better describes neural data suggesting cortical variability is shared across cortical regions. Overall, the model is a promising method for understanding the relationship between stimulus-locked and trial-varying neural activity at the population level. We believe there is a great number of additional scientific questions that the SNP-GPFA model can help answer, including determining how signal and noise representations relate to behavior, and further exploring how block-diagonal loadings matrices may partition signal and noise latent representations in multi-region data. We provide downloadable code for the community to use the SNP-GPFA model on their own trial-based neural data.

### Broader Impact

Here, we propose a new model for neuroscientists to uncover latent structure in trial-based neural population data. Trial-based neural recordings with identical stimuli are ubiquitous in neuroscience research. However, trial-by-trial variability in neural activity is not well understood. More broadly, it is unclear in general what the function of neural noise is in the brain. Our model works on neural population data to separate out neural noise latent representations from stimulus-locked representations. It additionally uses a novel inference technique that is rapid and stable. Here, we provide a general, easy-to-use tool for neuroscientists, and we hope others are encouraged to employ it to understand trial-based neural information in their own experimental set-up. We provide code for download. We do not foresee any negative consequences to society resulting from this work.

## Acknowledgements

SLK was supported by NIH grant F32MH115445-03, MCA and JWP were supported by grants from the Simons Collaboration on the Global Brain (SCGB AWD543027), and a U19 NIH-NINDS BRAIN Initiative Award (5U19NS104648). JWP was also supported by the NIH BRAIN initiative (NS104899 and R01EB026946). YY and SLS were supported by the NIH (R01EY024294 and R01NS091335), the NSF (1450824 and 1707287) the Simons Foundation (SCGB 325407) and the McKnight Foundation.