Abstract
The ability to identify interpretable, low-dimensional features that capture the dynamics of large-scale neural recordings is a major challenge in neuroscience. Repeated temporal patterns (sequences) are not succinctly captured by traditional dimensionality reduction techniques, so neural data is often aligned to behavioral task references. We describe a task-independent, unsupervised method, which we call seqNMF, that provides a framework for extracting sequences from high-dimensional datasets, and assessing the significance in held-out data. We test seqNMF on simulated datasets under a variety of noise conditions, and also on several neural datasets. In a hippocampal dataset, seqNMF identifies neural sequences that match those calculated manually by reference to behavioral events. In a songbird dataset, seqNMF discovers abnormal motor sequences in birds that lack stereotyped songs. Thus, by identifying temporal structure directly from neural data, seqNMF enables dissection of complex neural circuits in the absence of reliable temporal references from stimuli or behavioral outputs.
Introduction
The ability to detect and analyze temporal sequences embedded in a complex sensory stream is an essential cognitive function, and as such is a necessary capability of neuronal circuits in the brain [13, 28, 5, 26], as well as artificial intelligence systems [14, 57]. The detection and characterization of temporal structure in signals is also useful for the analysis of many forms of physical and biological data. In neuroscience, recent advances in technology for electrophysiological and optical measurements of neural activity have enabled the simultaneous recording of hundreds or thousands of neurons [9, 33, 52, 29], in which neuronal dynamics are often structured in sparse sequences [23, 24, 44, 19, 45]. Such sequences can be identified by averaging across multiple trials, but only in cases where an animal receives a temporally precise sensory stimulus, or generates a sufficiently stereotyped motor output.
However, it could be useful to extract sequences on a moment-to-moment basis (without averaging), for example to study internal neuronal dynamics in the brain during learning, sleep, or diseased states. In these applications, it is not possible to use external timing references, and sequences must be extracted directly from the neuronal data. A traditional unsupervised approach for directly extracting structure in neuronal data is dimensionality reduction. Intuitively, sequences may be thought of as low dimensional, and yet dimensionality reduction techniques such as PCA and NMF do not work for sequences, because those methods only model synchronous patterns of activity.
Alternative approaches that search for repeating neural patterns require surprisingly challenging statistical analysis [7, 41, 49]. While progress has been made in analyzing non-synchronous sequential patterns using statistical models that capture cross-correlations between pairs of neurons [51, 21, 53, 59, 22], such methods may not have statistical power to scale to patterns that include many (more than a few dozen) neurons, may require long periods (≥ 105 timebins) of stationary data, and may have challenges in dealing with (non-sequential) background activity. For a review highlighting features and limitations of these methods see [49]. Here we took an alternative, matrix factorization-based, approach that aims to extract sequences. We reasoned that this approach would complement existing methods by providing a more holistic and potentially simpler description of neural firing dynamics.
One promising method for the unsupervised detection of temporal patterns is convolutional non-negative matrix factorization (convNMF) [56, 55] (Figure 1), which has been applied to the analysis of audio signals such as speech [43, 55, 61], as well as neural signals [47]. ConvNMF identifies exemplar patterns (factors) in conjunction with the times and amplitudes of pattern occurrences. This strategy eliminates the need to average activity aligned to any external behavioral references. While convNMF produces excellent reconstructions of the data, it does not automatically produce the minimal number of factors required. Indeed, if the number of factors in the convNMF model is greater than the true number of sequences, the algorithm returns overly complex and redundant factorizations. These redundant factorizations are different each time the algorithm is run, producing inconsistent results [47]. Notably, there is nothing in the convNMF algorithm that favors the minimal factorization, as would be favored by the principle of ‘Occam’s Razor’.
Here we describe a modification of the convNMF algorithm that suppresses redundant factors, biasing the results toward factorizations with a minimal number of factors. This is achieved by adding a penalty term to the convNMF cost function. Unlike other common approaches such as sparsity regularization [65, 43, 50, 47] that constrain the make-up of each factor, our regularization penalizes the correlations between factors that result from redundant factorizations. We build on earlier applications of soft-orthogonality constraints to NMF [10] to capture the types of temporally offset correlations that may occur in the convolutional case.
Our algorithm, which we call seqNMF, produces minimal and consistent factorizations in synthetic data, including under a variety of noise conditions, with high similarity to ground-truth sequences. We further tested seqNMF on hippocampal spiking data in which neural sequences have previously been described. Finally, we use seqNMF to extract sequences in a functional calcium imaging dataset recorded in vocal/motor cortex of untutored songbirds that sing pathologically variable songs. We found that repeatable neural sequences are activated in an atypical and overlapping fashion, suggesting potential neural mechanisms for this pathological song variability.
Results
Matrix factorization framework for unsupervised discovery of features in neural data
Matrix factorization underlies many well known unsupervised learning algorithms [60] with applications to neuroscience [15], including principal component analysis (PCA) [46], non-negative matrix factorization (NMF) [34], dictionary learning, and k-means clustering. We start with a data matrix, X, containing the activity of N neurons at T times. If the neurons exhibit a single repeated pattern of synchronous activity, the entire data matrix can be reconstructed using a column vector w representing the neural pattern, and a row vector h representing the times and amplitudes at which that pattern occurs (temporal loadings). In this case, the data matrix X is mathematically reconstructed as the outer product of w and h). If multiple patterns are present in the data, then each pattern can be reconstructed by a separate outer product, where the reconstructions are summed to approximate the entire data matrix (Figure 1A) as follows:
Where Xnt is the (nt)th element of matrix X. Here, in order to store K different patterns, W is a N×K matrix containing the K exemplar patterns, and H is a K×T matrix containing the K timecourses:
Given a data matrix with unknown patterns, the goal of these unsupervised learning algorithms is to discover a small set of patterns (W) and a corresponding set of temporal loading vectors (H) that approximate the data. In the case that the number of patterns (K) is sufficiently small (less than N and T), this corresponds to a dimensionality reduction, whereby the data is expressed in more compact form. NMF additionally requires that W and H must contain only non-negative numbers. The discovery of unknown factors is often accomplished by minimizing the following cost function, which measures (using the Frobenius norm, the element-by-element sum of all squared errors between a reconstruction and the original data matrix X:
The factors W* and H* that minimize this cost function produce an optimal reconstruction While this general strategy works well for extracting synchronous activity, it is unsuitable for discovering temporally extended patterns—first, because each element in a sequence must be represented by a different factor, and second, because NMF assumes that the columns of the data matrix are independent ‘samples’ of the data, so permutations in time have no effect on the factorization of a given dataset. It is therefore necessary to adopt a different strategy for temporally extended features.
Convolutional non-negative matrix factorization (convNMF)
Convolutional NMF (convNMF) [56, 55] extends NMF to provide a framework for extracting temporal patterns and sequences from data. While classical NMF represents each pattern as a single vector (Figure 1A), convNMF explicitly represents an exemplar pattern of neural activity over a brief period of time;the pattern is stored as an N × L matrix, where each column (indexed by ℓ = 1 to L) indicates the activity of neurons at different timelags within the pattern (Figure 1B, where we call this matrix pattern w1 by analogy with NMF). The times at which this pattern/sequence occurs are stored using timeseries vector h1, as for NMF. The reconstruction is produced by convolving the N × L pattern with the timeseries h1 (Figure 1B).
If the dataset contains multiple patterns, each pattern is captured by a different N × L matrix and a different associated timeseries vector h. A collection of K different patterns can be compiled together into an N × K × L array (also known as a tensor) W and a corresponding K × T timeseries matrix H. Analogous to NMF, convNMF generates a reconstruction of the data as a sum of K convolutions between each neural activity pattern (W), and its corresponding temporal loadings (H): where the tensor/matrix convolution operator ✻ (notation summary, Table 1) reduces to matrix multiplication in the L = 1 case, which is equivalent to standard NMF. The quality of this reconstruction can be measured using the same cost function shown in Equation 3, and W and H may be found iteratively using similar multiplicative gradient descent updates to standard NMF [34, 56, 55].
While convNMF can perform extremely well at reconstructing sequential structure, it can be challenging to use when the number of sequences in the data is not known [47]. In this case, a reasonable strategy would be to choose K at least as large as the number of sequences that one might expect in the data. However, if the number of sequences is than the actual number of sequences, convNMF will identify more significant factors than are minimally required. This is because each sequence in the data may be approximated equally well by a single sequential pattern or by a linear combination of multiple partial patterns. A related problem is that running convNMF from different random initial conditions produces inconsistent results, finding different combinations of partial patterns on each run [47]. These inconsistency errors fall into three main categories (Figure 1C):
Type 1: Two or more factors are used to reconstruct the same instances of a sequence.
Type 2: Two or more factors are used to reconstruct temporally different parts of the same sequence, for instance the first half and the second half.
Type 3: Identical factors are used to reconstruct different instances of a sequence.
Together, these inconsistency errors manifest as strong correlations between different redundant factors, as seen in the similarity of their temporal loadings (H) and/or their exemplar activity patterns (W).
SeqNMF: A constrained convolutional non-negative matrix factorization
Regularization is a common technique in optimization that allows the incorporation of constraints or additional information with the goal of improving generalization performance or simplifying solutions to resolve degeneracies [25]. To reduce the occurrence of redundant factors (and inconsistent factorizations) in convNMF, we sought a principled way of penalizing the correlations between factors by introducing a penalty term, , into the convNMF cost function of the following form:
In this section, we will motivate a novel cost function that effectively minimizes the number of factors by penalizing spatial and temporal correlations between different factors. We will build up the full cost function by addressing, one at a time, the types of correlations generated by each type of error.
Regularization has previously been used in NMF to address the problem of duplicated factors, which, similar to Type 1 errors above, present as correlations between the H’s [10]. Such correlations are measured by computing the correlation matrix HHT, which contains the correlations between the temporal loadings of every pair of factors. The regularization may be implemented using the penalty term , where the norm ||·||1,i≠j sums the absolute value of every matrix entry except those along the diagonal (notation summary, Table 1) so that correlations between different factors are penalized, while the requisite correlation of each factor with itself is not. Thus, during the minimization process, similar factors compete, and a larger amplitude factor drives down the H of a correlated smaller factor. The parameter λ controls the magnitude of the penalty term .
In convNMF, a penalty term based on HHT yields an effective method to prevent errors of Type 1, because it penalizes the associated zero lag correlations. However, it does not prevent errors of the other types, which exhibit different types of correlations. For example Type 2 errors result in correlated temporal loadings that have a small temporal offset and thus are not detected by HHT. One simple way to address this problem is to smooth the H’s in the penalty term with a square window of length 2L − 1 using the smoothing matrix S (Sij = 1 when |i − j| < L and otherwise Sij = 0). The resulting penalty, , allows factors with small temporal offsets to compete, effectively preventing errors of Type 1 and 2.
Unfortunately this penalty does not prevent errors of Type 3, in which redundant factors with highly similar patterns in W are used to explain different instances of the same sequence. Such factors have temporal loadings that are segregated in time, and thus have low correlations, to which the cost term ||HSHT|| is insensitive. One way to resolve errors of Type 3 might be to include an additional cost term that penalizes the similarity of the factor patterns in W. A challenge with this approach is that, in the convNMF framework, there is no constraint on temporal translations of the sequence within W. For example, if two redundant factors contain identical sequences that are simply offset by one time bin (in the L dimension), then these patterns would have zero correlation. Such offsets might be accounted for by smoothing the W matrices in time before computing the correlation (Table 3), analogous to ||HSHT||. The general approach of adding an additional cost term for W correlations has the disadvantage that it requires setting an extra parameter, namely the λ associated with this cost.
Thus, we chose an alternative approach to resolve errors of Type 3 that simultaneously detects correlations in W and H using a single correlation cost term. We note that, for Type 3 errors, redundant W patterns have a high degree of overlap with the data at the same times, even though their temporal loadings are segregated at different times. To introduce competition between these factors, we first compute, for each pattern in W its overlap with the data at each time t. This quantity is captured in symbolic form by (See Table 1). We then compute the pairwise correlation between the temporal loading of each factor and the overlap of every other factor with the data. The correlation cost sums up these correlations across all pairs of factors, implemented as follows:
When incorporated into the update rules, this causes any factor that has a high overlap with the data to suppress the temporal loadings (H) of any other factors active at that time. Thus, factors compete to explain each feature of the data, favoring solutions that use a minimal set of factors to give a good reconstruction. We refer to this minimal set as an efficient factorization. The resulting global cost function is:
The update rules for W and H are based on the derivatives of this global cost function, leading to a simple modification of the standard multiplicative update rules used for NMF and convNMF [34, 56, 55] (Table 3). Note that the addition of this correlation cost term does not formally constitute regularization, because it also includes a contribution from the data matrix X. rather than just the model variables W and H.
Below, we test the performance of this penalty based on correlations between factors. We will later consider different approaches to adding penalties to the convNMF cost function, including an L1 norm penalty. We will also examine a parameter sweep of the number of factors (K), as well as additional penalties to bias the tradeoff between temporal or pattern correlations.
Testing the performance of seqNMF on simulated sequences
To compare the performance of seqNMF to unregularized convNMF, we simulated neural sequences of a sort commonly encountered in neuronal data (Figure 2A). The simulated data were used to test several aspects of the seqNMF algorithm: convergence, consistency of factorizations, the ability of the algorithm to discover the correct number of sequences in the data, and robustness to noise. As an initial pass, simulated datasets were constructed by placing three ground-truth sequences at random non-overlapping times. Each sequence ensemble consisted of 10 neurons evenly spaced throughout a duration of 30 timesteps. The resulting data matrix had a total duration of 15000 timesteps and contained on average 60±6 instances of each of the three sequences. The seqNMF algorithm was run for 1000 iterations and reliably converged to a stable asymptotic value of root-mean-squared-error (RMSE) (Figure 2B). RMSE reached to within 10% of the asymptotic value within 100 iterations.
Consistency of seqNMF factorization
We set out to determine if seqNMF exhibits the desirable property of consistency—namely whether it returns similar sequences each time it is run on the same dataset using different random initializations of W and H. Consistency was assessed as the extent to which there is a good one-to-one match between factors across different runs (Methods 10). Due to the inefficiencies outlined in Figure 1C, with K larger than the true number of sequences, convNMF yielded low consistency scores typically ranging from 0.2 to 0.4 on a scale from zero to one (Figure 2C, orange). In contrast, seqNMF factorizations were nearly identical across different fits of noiseless data, producing consistency scores that were always higher than any we measured for convNMF, and typically (>80% of the time) higher than 0.99 (Figure 2C, gray). Both convNMF and seqNMF had near-perfect reconstruction error for all combinations of K and L that exceed the number and duration of sequences in the data (not shown). However, convNMF exhibited low consistency scores, a problem that was further exacerbated for larger values of K. In contrast, seqNMF exhibited high consistency scores across a wide range of values of both K and L.
We also tested the consistency of seqNMF factorizations for the case in which a population of neurons is active in multiple different sequences. Such neurons that are shared across different sequences have been observed in several neuronal datasets [44, 45, 24]. For one test, we constructed two sequences in which shared neurons were active at a common pattern of latencies in both sequences; in another test, shared neurons were active in a different pattern of latencies in each sequence. In both tests, seqNMF achieved near-perfect reconstruction error, and consistency was similar to the case with no shared neurons (Figure 2E, F).
Validating the statistical significance of extracted sequences
To assess statistical significance, one can apply seqNMF to a subset of the data and measure whether the extracted sequences appear in held-out data substantially more than sequences drawn from a null model. We measured the appearance of sequences in held-out data by measuring their overlap with held-out data, . The overlap is high at timepoints at which the sequence occurs (relative to other timepoints). For a sequence that matches ground truth in synthetic data, this distribution of overlap values exhibits a heavy tail, indicating the presence of large outliers that correspond to times where the extracted sequence appears in held out data. In contrast, a candidate sequence that does not reliably occur in the held-out data produces a distribution of overlaps that appears more symmetric (Figure S1).
While there are many ways of detecting outliers and quantifying “heavy-tailedness” of a distribution, we use the skewness (the third central moment) as a simple measure. In particular, we generate null distributions by circularly shifting the pattern matrices W along the time lag dimension (see Methods 10) and compare the skewness of these distributions to the skewness of the distribution produced by the unshifted W.
Runs of seqNMF on simulated and real data have revealed thatthe algorithm produces two types of factors that can be immediately ruled out as candidate sequences: 1) empty factors with zero amplitude in all neurons at all lags and 2) factors that have amplitude in only one neuron. The latter case occurs often in datasets where one neuron is substantially more active than other neurons, and thus accounts for a large amount of variance in the data. SeqNMF also occasionally generates factors that appear to capture one moment in the test data, especially in short datasets, where this can account for a substantial fraction of the data variance. Such sequences are easily identified as non-significant when tested on held-out data using the skewness test.
Note that if λ is set too small, seqNMF will produce multiple redundant factors to explain one sequence in the data. In this case, each redundant candidate sequence will pass the significance test outlined here. We will address below a procedure for choosing λ and methods for determining the number of sequences.
SeqNMF extracts the correct number of sequences in noise-free synthetic data
A successful factorization should contain the same number of significant factors as exist sequences in the data, at least in datasets for which the number of sequences is unambiguous. To compare the ability of seqNMF and convNMF to recover the true number of patterns in a dataset, we generated simulated noise-free data containing between 1 and 10 different sequences. We then ran many independent fits of these data, using both seqNMF and convNMF, and measured the number of significant factors. We found that convNMF overestimates the number of sequences in the data, returning K significant factors on nearly every run. In contrast, seqNMF tends to return a number of significant factors that closely matches the actual number of sequences (Figure 2D).
Robustness to noisy data
SeqNMF was able to correctly extract sequences even in data corrupted by noise of types commonly found in neural data. We consider four common types of noise: participation noise, in which individual neurons participate probabilistically in instances of a sequence; additive noise, in which neuronal events occur randomly outside of normal sequence patterns; temporal jitter, in which the timing of individual neurons is shifted relative to their typical time in a sequence;and finally, temporal warping, in which each instance of the sequence occurs at a different randomly selected speed. To test the robustness of seqNMF to each of these noise conditions, we factorized data containing three neural sequences at a variety of noise levels. The value of λ was chosen using methods described in the next section. SeqNMF proved relatively robust to all four noise types, as measured by quantifying the similarity between seqNMF factors and ground-truth sequences (Methods section 10, Figure 3). For low noise conditions, seqNMF produced factors that were highly similar to ground-truth; this similarity gracefully declined as noise increased. Visualization of the extracted factors revealed that they tend to qualitatively match ground-truth sequences even in the presence of high noise (Figure 3). Together, these findings suggest that seqNMF is suitable for extracting sequence patterns from neural data with realistic forms of noise.
We also tested the performance of seqNMF as a function of dataset size. To do so, we generated data of different sizes containing different numbers of instances of the underlying ground-truth sequences, ranging from 1 to 20. For intermediate levels of additive noise, we found that 3 examples of each sequence were sufficient for seqNMF to correctly extract factors with similarity scores within 10% of asymptotic performance (Figure S3).
Method for choosing an appropriate value of λ
Here we present procedures for guiding the choice of λ in seqNMF that address two goals of regularization: to simplify the solution space of ill-posed problems and to reduce overfitting. The choice of λ controls a trade-off between reconstruction accuracy and the efficiency/consistency of the resulting factorizations (Figure 4). The goal is to reconstruct only the repeating temporal patterns in the data and to do so with an efficient, maximally uncorrelated set of factors. We will first describe a procedure that balances a measure of correlation between factors with reconstruction error. We then describe a procedure based on cross-validation in held-out data. Both of these procedures are validated under a variety of noise conditions using simulated data for which the ground truth factors are known.
In the first procedure we measure the effect of λ on both reconstruction error and correlation cost in synthetic datasets containing three sequences (Figure 4). For any given factorization, the reconstruction error is , and the efficiency may be estimated using the correlation cost tern . SeqNMF was run with many random initializations over a range of λ spanning six orders of magnitude. For small λ, the behavior of seqNMF approaches that of convNMF, producing a large number of redundant factors with high correlation cost. In the regime of small λ, correlation cost saturates at a large value and reconstruction error saturates at a minimum value (Figure 4A). At the opposite extreme, in the limit of large λ, seqNMF returns a single significant factor with zero correlation cost because all other factors have been suppressed to zero amplitude. In this limit, the single factor is unable to reconstruct multi-sequence data, resulting in large reconstruction error. Between these extremes, there exists a region in which increasing λ produces a rapidly increasing reconstruction error and a rapidly decreasing correlation cost. Following the intuition that the optimal choice of λ for seqNMF would lie in this cross-over region where the costs are balanced, we set out to quantitatively identify, for known synthetic sequences, the optimal λ at which seqNMF has the highest probability of recovering the correct number of significant factors, and at which these factors most closely match the ground truth sequences.
The following procedure was implemented: For a given dataset, seqNMF is run several times ata range ofvalues of λ, and saturating values of reconstruction costand correlation cost are recorded (at the largest and smallest values of λ). Costs are normalized to vary between 0 and 1, and the value of λ at which the reconstruction and correlation cost curves intersect is determined (Figure 4B). This intersection point, λ0, then serves as a precise reference by which to determine the correct choice of λ. We then separately calibrated the reference λ0 to the λ’s that performed well in synthetic datasets, with and without noise, for which the ground-truth is known. This analysis revealed that values of λ between 2λ0 and 5λ0 performed well across different noise types and levels (Figure 4B,C). For additive noise, performance was better when λ was chosen to be near λ0, while with other noise types, performance was better at higher λs (≈ 5λ0). For all of the data shown in Figure 3, we chose λ = 2λ0. Figure S2 shows how choosing λ = λ0 for additive noise and λ = 5λ0 for the other noise types yields slightly improved performance. Note that the procedure for choosing λ does not need to be run on every dataset analyzed, rather, only when seqNMF is applied to a new type of data for which a reasonable range of λ is not already known. Similar ranges of λ appeared to work for datasets with different numbers of ground-truth sequences—for the datasets used in Figure 2D, a range of λ between 0.001 and 0.01 returned the correct number of sequences at least 90% of the time for datasets containing between 1 and 10 sequences (Figure S4). Furthermore, this method for choosing λ also works on datasets containing sequences with shared neurons (Figure S5).
Our second method for choosing λ directly tests generalization error by randomly holding out a small subset of elements in the data matrix [64, 6] (Figure S6). This held-out set is only used to test the performance of seqNMF, but is not used for fitting. At high values of λ, seqNMF extracts only one factor, which exhibits similar reconstruction error on training data and held-out test data. At low values of λ, seqNMF extracts a large number of factors, yielding better reconstruction error on the training data, but the performance of these factors on held-out data is often far worse, corresponding to overfitting. At intermediate values of λ, within the optimal range described above, there was often a minimum in the reconstruction error on held-out data (test error). This corresponds to the classical approach for choosing regularization strength using cross-validation. In some datasets, the minimum in test error can be subtle or nonexistent, so we instead identify the λ corresponding to the rapid divergence between training error and test error (Figure S6C). In many of our test datasets, this divergence point agrees with the ground-truth and with the procedure described above based on the crossover between correlation cost and reconstruction cost. One caution in using the cross-validation method to choose an optimal λ is that it fails on synthetic datasets that have zero or very low noise (because of a lack of overfitting), as well as in datasets with temporal warping. More broadly, difficulties using cross-validation to choose λ may reflect that the primary function of the seqNMF penalty term is to reduce factor correlations and redundancies, not to minimize over-fitting.
Can we choose K rather than choosing λ?
A goal of the seqNMF correlation cost term is to limit the factorization to a small number of non-redundant factors. An alternative approach may be to directly constrain the number of factors (K) in the convNMF algorithm without regularization. If the number of underlying sequences in the data is unambiguous and is precisely known, as for the simulated datasets described above, then this approach works well, yielding factorizations close to ground truth sequences. We have found that the number of underlying sequences can sometimes be estimated by running convNMF for all reasonable values of K and selecting the value that yields the best cross-validated performance on held-out data. This method works reasonably well for simulated datasets with participation noise, additive noise, or temporal jitter over a range of noise levels that might be expected in real neural data. In some cases, there is a clear minimum in the test error at the correct K. In other cases there is a distinguishing feature such as a kink or a plateau in the test error as a function of K that could potentially be used to estimate the correct number of sequences (Figure S7). Notably, this method fails to identify the number of underlying sequences in the case of temporal warping—an issue to which we will return in the next section.
Strategies for dealing with ambiguous cases
In some datasets, there is not a unique answer for the desired factorization of sequences. A common example of such ambiguity arises when neurons are shared between different sequences, as is shown in Figure 5A and B. In this case, there are two ensembles of neurons (1 and 2), that participate in two different types of events. In one event type, ensemble 1 is active alone, while in the other event type, ensemble 1 is coactive with ensemble 2. There are two different reasonable factorizations of these data. In one factorization the two different ensembles are separated into two different factors, while in the other factorization the two different event types are separated into two different factors. We refer to these as ‘parts-based’ and ‘events-based’ respectively. Note that these different factorizations may correspond to different intuitions about underlying mechanisms. ‘Parts-based’ factorizations will be particularly useful for clustering neurons into ensembles, and ‘events-based’ factorizations will be particularly useful for correlating neural events with behavior.
We have found that seqNMF and convNMF can produce either type of factorization, depending on initial conditions and the structure of shared neurons in the data. It may therefore be useful to explicitly control the tendency to produce these different factorizations by the addition of penalties on either W or H correlations. Note that in the ‘events-based’ factorization, the Hs are orthogonal (uncorrelated) while the Ws have high overlap;in the ‘parts-based’ factorization, the Ws are orthogonal while the Hs are strongly correlated. Note that these correlations in W or H are unavoidable in the presence of shared neurons and the presence of such correlations does not indicate a redundant factorization. Update rules to implement penalties on correlations in W or H are provided in Table 3 with derivations in Appendix 1. Figure S9 shows examples of using these penalties on the songbird dataset described in Figure 7.
Another type of ambiguity arises from the presence of systematic variations in the amplitude ortiming of neuronal participation in a sequence. A notable example of this is data with temporal warping. In the case of high λ, seqNMF extracts a single factor for the underlying ground truth sequence. In contrast, at lower λ seqNMF extracts multiple factors for the underlying ground truth sequence, corresponding to slower and faster variations of the sequence, effectively tiling the space of warped sequences at a finer granularity depending on the strength of the penalty (λ). Note that each of these factorizations corresponds to a reasonable interpretation, in the context of seqNMF, for the same underlying timewarping process. Different neural datasets may require estimating warping with different degrees of precision, depending on the behavior being studied, leading to different reasonable choices of λ.
Another case requiring a choice between different reasonable levels of λ occurs when a sequence exhibits two variants in which, for example, two subensembles of neurons participate with different amplitudes in different instances of the sequence. Depending on the desired level of granularity, controlled by the choice of λ, this dataset could be factorized either as a single sequence or as two sequences. Any example in which a sequence has multiple close variants, either in the timing or activity of different neurons, can lead to this type of ambiguity. Depending on what type of factorization is desired, a different value of λ might be preferable. In real datasets, it can be useful to explore the factorization for different values of λ between λ0 and 10λ0. There may often be a range of λ that give rise to different reasonable factorizations. Note that high λ risks missing sequences, especially sequences that occur rarely or include only a small number of neurons, and low λ may give rise to redundant factors.
Addition of a sparsity penalty to seqNMF or convNMF
Sparsity regularization is a widely used strategy for achieving more interpretable and generalizable results across a variety of algorithms and datasets [65], including convNMF [43, 50]. In some of our datasets, we found it useful to include L1 regularization for sparsity. The multiplicative update rules in the presense of L1 regularization are included in Table 3, and as part of our code package. Sparsity on the matrices W and H may be particularly useful in cases when sequences are repeated rhythmically (Figure S8). For example, the addition of a sparsity regularizer on the W update will bias the W exemplars to include only a single repetition of the repeated sequence, while the addition of a sparsity regularizer on H will bias the W exemplars to include multiple repetitions of the repeated sequence. This gives one fine control over how much structure in the signal to pack into W versus H. Like the ambiguities described above, these are both valid interpretations of the data, but each may be more useful in different contexts.
Application of seqNMF to hippocampal sequences
To test the ability of seqNMF to discover patterns in electrophysiological data, we analyzed spiking activity in datasets of simultaneously recorded hippocampal neurons acquired in the Buszaki lab and available from a public repository (https://crcns.org/data-sets/hc) [2, 1]. The data were acquired in two rats as part of published studies describing sequences in the hippocampus [45, 16]. In these experiments, rats were trained to alternate between left and right turns in a T-maze to earn a water reward. Between alternations, the rats ran on a running wheel during an imposed delay period lasting either 10 or 20 seconds. By averaging spiking activity during the delay period, the authors reported long temporal sequences of neural activity spanning the delay. In some rats, the same sequence occurred on left and right trials, while in other rats, different sequences were active in the delay period during the different trial types.
Without reference to the behavioral landmarks, seqNMF was able to extract sequences in both datasets. The automated method described above was used to choose λ (Figure 6). In Rat 1, with λ = 2λ0, most runs of seqNMF extracted a single significant factor, corresponding to a sequence active throughout the running wheel delay period and immediately after, when the rat runs up the stem of the maze (Figure 6B). Some runs of seqNMF extracted two factors, splitting the delay period sequence and the maze stem sequence; this is a reasonable interpretation of the data, and likely results from variability in the relative timing of running wheel and maze stem traversal. At somewhat lower values of λ, seqNMF more often split these sequences into two factors. At even lower values of λ, seqNMF produced more significant factors. Such higher granularity factorizations may correspond to real variants of the sequences, as they generalize to held-out data (Figure S7J).
In Rat 2, at a λ of 1.5 λ0, three significant factors were typically identified (Figure 6C). The first two correspond to distinct sequences active for the duration of the delay period on alternating trials. The third sequence was active immediately following each of the alternating sequences, corresponding to the time at which the animal exits the wheel and runs up the stem of the maze. Taken together, these results suggest that seqNMF can detect multiple neural sequences without the use of any behavioral landmarks. Having validated this functionality in both simulated data and previously published neural sequences, we then applied seqNMF to find structure in a novel dataset, in which the ground truth is unknown and difficult to ascertain using previous methods.
Application of seqNMF to abnormal sequence development in avian motor cortex
We applied seqNMF to analyze new functional imaging data recorded in songbird HVC during singing. Normal adult birds sing a highly stereotyped song, making it possible to detect sequences by averaging neural activity aligned to the song. Using this approach, it has been shown that HVC neurons generate precisely timed sequences that tile each song syllable [23, 48, 37]. In contrast to adult birds, young birds sing highly variable babbling vocalizations, known as subsong, for which HVC is not necessary [3]. The emergence of sequences in HVC occurs gradually over development, as the song matures from subsong to adult song [44].
Songbirds learn their song by imitation and must hear a tutor to develop normal adult vocalizations. Birds isolated from a tutor sing highly variable and abnormal songs as adults [18]. Such ‘isolate’ birds provide an opportunity to study how the absence of normal auditory experience leads to pathological vocal/motor development. However, the high variability of pathological ‘isolate’ song makes it difficult to identify neural sequences using the standard approach of aligning neural activity to vocal output.
Using seqNMF, we were able to identify repeating neural sequences in isolate songbirds (Figure 7A). At the chosen λ (Figure 7B), seqNMF typically extracts three significant sequences (Figure (7C). Similarly, our masked cross-validation test indicated good convNMF performance at K = 3, with over-fitting starting at K = 4 (Figure S7I). The extracted sequences include sequences deployed during syllables of abnormally long and variable durations (Figure 7D-F).
In addition, the extracted sequences exhibit properties not observed in normal adult birds. We see an example of two distinct sequences that sometimes, but not always, co-occur (Figure 7). We observe that a short sequence occurs alone on some syllable renditions, while on other syllable renditions, a second longer sequence is generated simultaneously. This probabilistic overlap of different sequences is highly atypical in normal adult birds [23, 36, 48, 37]. Furthermore, this pattern of neural activity is associated with abnormal variations in syllable structure—in this case resulting in a longer variant of the syllable when both sequences co-occur. This acoustic variation is a characteristic pathology of isolate song [18]. Thus, even though we observe HVC generating some sequences in the absence of a tutor, it appears that these sequences are deployed in a highly abnormal fashion.
Application of seqNMF to a behavioral dataset: song spectrograms
Although we have focused on the application of seqNMF to neural activity data, this method naturally extends to other types of high-dimensional datasets, including behavioral data with applications to neuroscience. The neural mechanisms underlying song production and learning in songbirds is an area of active research. However, the identification and labeling of song syllables in acoustic recordings is challenging, particularly in young birds where song syllables are highly variable. Because automatic segmentation and clustering often fail, song syllables are still routinely labelled by hand [44]. We tested whether seqNMF, applied to a spectrographic representation of zebra finch vocalizations, is able to extract meaningful features in behavioral data. SeqNMF correctly identified repeated acoustic patterns in juvenile songs, placing each distinct syllable type into a different factor (Figure 8). The resulting classifications agree with previously published hand-labeled syllable types [44]. A similar approach could be applied to other behavioral data, for example movement data or human speech, and could facilitate the study of neural mechanisms underlying even earlier and more variable stages of learning. Indeed, convNMF was originally developed for application to spectrograms [56]; notably it has been suggested that auditory cortex may use similar computations to represent and parse natural song statistics [40].
Discussion
As neuroscientists strive to record larger datasets, there is a need for rigorous tools to reveal underlying structure in high-dimensional data [20, 54, 11, 8]. In particular, sequential structure is increasingly regarded as a fundamental property of neuronal circuits [23, 24, 44, 45], but standardized statistical approaches for extracting such structure have not been widely adopted or agreed upon.
Here, we explored a simple matrix factorization-based approach to identify neural sequences [47]. The convNMF model elegantly captures sequential structure in an unsupervised manner [56, 55]. However, in datasets where the number of sequences is not known, convNMF may return redundant, inefficient, or inconsistent factorizations. In order to resolve these challenges, we introduced a new regularization (penalty) term to encourage the model to identify sparse and non-redundant sequential firing patterns. Furthermore, we carefully explored the robustness of this method to noise and developed procedures for choosing hyperparameters (K and λ) based on cross-validation and assessing the significance of identified sequences based on shuffled null distributions. Our results show that seqNMF is highly robust to many forms of noise. For example, even when (synthetic) neurons participate probabilistically in sequences at a rate of 50%, the model typically identifies factors with greater than 80% similarity to the ground truth (Figure 3A). Additionally, seqNMF performs well even with limited data, successfully extracting sequences that only appear a handful of times in a noisy data stream (Figure S3)
Prior investigations of neural sequences have relied on manual alignment of neural activity to behavioral events, such as animal position for the case of hippocampal and cortical sequences [24, 45], or syllable onset for the case of songbird vocalizations [23]. This approach is not ideally suited for the case of highly variable behaviors, such as in early learning and development [44]. For example, the analysis of neural activity in singing juvenile birds has been challenging because of the difficulty in identifying distinct syllable types on which to perform the temporal alignment. This problem would also apply to isolate song birds because of the pathologically variable nature of their vocalizations. By applying seqNMF, we were able to identify neural sequences without reference to song syllables, enabling future work into the neural basis of singing in isolate birds.
As in many data analysis scenarios, a variety of statistical approaches may be brought to bear on finding sequences in neural data. A classic method is to construct cross-correlogram plots, showing spike time correlations between pairs of neurons at various time lags. However, other forms of spike rate covariation, such as trial-to-trial gain modulation, can produce spurious peaks in this measure [7]; recent work has developed statistical corrections for these effects [51]. After significant pairwise correlations are identified, one can heuristically piece together pairs of neurons with significant interactions into a sequence. This bottom-up approach may be better than seqNMF at detecting sequences involving small numbers of neurons if such microsequences contribute only a small amount of variance in the overall dataset. On the other hand, this bottom-up approach may fail to identify long sequences with high participation noise or jitter in each neuron [49]. One can think of seqNMF as a complementary top-down approach, which performs very well in the high-noise regime since it learns a template sequence at the level of the full population that is robust to noise at the level of individual units.
Statistical models with a dynamical component, such as Hidden Markov Models (HMMs) [38], linear dynamical systems [30], and models with switching dynamics [35], can also capture sequential firing patterns. These methods will typically require many hidden states or latent dimensions to capture sequences, similar to PCA and NMF which require many components to recover sequences. However, since dynamical models are much more constrained than PCA or NMF, they can yield more interpretable results. For example, visualizing the transition matrix of an HMM can provide insight into the order in which hidden states of the model are visited, mapping onto different sequences that manifest in population activity [38]. One advantage of this approach is that it can model sequences that occasionally end prematurely, while seqNMF will always produce the full sequence. On the other hand, this pattern completion property makes seqNMF robust to participation noise and jitter. In contrast, a standard HMM must pass through each hidden state to model a sequence, and therefore will have trouble whenever one of these hidden states is skipped. Thus, we expect HMMs (or related models) and seqNMF to exhibit complementary strengths and weaknesses.
Another contribution of our work is a natural framework in which to bias factorizations towards parts-based versus events-based solutions. While existing computational work has focused on neural sequences that do not have ensembles of shared neurons, such shared populations have been observed during song learning [44], demonstrating that neural sequences in real biological data can substantially overlap. Such shared sequences can lead to different reasonable factorizations of the data that may correspond to different interpretations of underlying mechanisms. For example, we found that neural sequences in HVC of isolated songbirds are well-described by both parts- or events-based factorizations (figure S9), each of which could correspond to a different biophysical model of sequence generation. This capacity for a combinatorial description of overlapping sequences distinguishes convNMF and seqNMF from clustering methods [22, 39] and methods based on hypothesis testing [49, 51], which seek to identify full snapshots of repeated population firing patterns rather than parts- or events-based representations. Another difference between these methods and seqNMF, particularly when using an events-based factorization, is its ability to model different amplitudes in the sequences by changing the magnitude of the event loadings in H.
More generally, a key strength of seqNMF is that it can be easily tuned to the requirements and goals of a particular analysis. In addition to changing between a parts- and events-based factorization, one can tune the overall sparsity in the model by classic L1 regularization. Future work could incorporate outlier detection into the objective function as has been done in other matrix factorization models [42]. One could also incorporate additional parameters to model changes in neural sequences across trials or days during development or learning of a new behavior, similar to extensions of PCA and NMF to multi-trial data [63]. Thus, adding convolutional structure to factorization-based models of neural data represents a rich opportunity for future developments in statistical methodology.
Despite limiting ourselves to a relatively simple model for the purposes of this paper, we extracted biological insights that were difficult to achieve by other methods in practical experimental datasets. Overall, seqNMF can extract neural sequences from large-scale population recordings without reference to stereotyped behavior or rigid sensory stimuli, enabling the dissection of neural circuit activity during rich and variable animal behaviors.
Author contributions
ELM, AHB, AHW, MSG and MSF conceived the project, based on previous discussions of MSG, MSF and ELM. ELM, AHB and MSF designed and tested the seqNMF regularizers, the method for validating the significance of sequences in a held-out dataset, and the method for choosing λ. ELM, AHB, AHW, and MSF designed and tested the method for measuring RMSE on a masked test set. ELM and AHB wrote the algorithm and demo code. ELM and NID collected the imaging data in singing birds. ELM and SG analyzed imaging data. All authors contributed to writing the manuscript.
Methods and Materials
Table of key resources
Key resources, and references for how to access them, are listed in Table 2.
Contact for resource sharing
Further requests should be directed to Michale Fee (fee{at}mit.edu).
Software and data availability
Our seqNMF MATLAB code is publicly available as a github repository, along with some of our data for demonstration:
https://github.com/FeeLab/seqNMF
The repository includes the seqNMF function, as well as helper functions for selecting λ, testing the significance of factors, plotting, and other functions. It also includes a demo script that goes through an example of how to select λ for a new dataset, test for significance of factors, plot the seqNMF factorization, switch between parts-based and events-based factorizations, and calculate cross-validated performance on a masked test set.
We plan to post more of our data publicly on the CRCNS data-sharing platform.
Generating simulated data
We simulated neural sequences containing between 1 and 10 distinct neural sequences in the presence of various noise conditions. Each neural sequence was made up of 10 consecutively active neurons, each separated by three timebins. The binary activity matrix was convolved with an exponential kernel (τ = 10 timebins) to resemble neural calcium imaging activity.
SeqNMF algorithm details
Our algorithm for seqNMF (convNMF with additional regularization to promote efficient factorizations) is a direct extension of the multiplicative update convNMF algorithm [56], and draws on previous work regularizing NMF to encourage factor orthogonality [10].
The uniqueness and consistency of traditional NMF has been better studied than convNMF, but in special cases, NMF has a unique solution comprised of sparse, ‘parts-base’ features that can be consistently identified by known algorithms [17, 4]. However, this ideal scenario does not hold in many practical settings. In these cases, NMF is sensitive to initialization, resulting in potentially inconsistent features. This problem can be addressed by introducing additional constraints or regularization terms that encourage the model to extract particular, e.g. sparse or approximately orthogonal, features [27, 31]. Both theoretical work and empirical observations suggest that these modifications result in more consistently identified features [58, 31].
For seqNMF, we added to the convNMF cost function a term that promotes competition between overlapping factors, resulting in the following cost function:
We derived the following multiplicative update rules for W and H (Appendix 1):
Where the division and × are element-wise. The operator shifts a matrix in the → direction by ℓ timebins, i.e. a delay by ℓ timebins, and shifts a matrix in the ← direction by ℓ timebins (notation summary, Table 1). Note that multiplication with the K × K matrix (1 − I) effectively implements factor competition because it places in the kth row a sum across all other factors. These update rules are derived in Appendix 1 by taking the derivative of the cost function in Equation 8.
In addition to the multiplicative updates outlined in Table 3, we also renormalize so rows of H have unit norm; shift factors to be centered in time such that the center of mass of each W pattern occurs in the middle;and in the final iteration run one additional step of unregularized convNMF to prioritize the cost of reconstruction error over the regularization (Algorithm 1). This final step is done to correct a minor suppression in the amplitude of some peaks in H that may occur within 2L timebins of neighboring sequences.
Calculating consistency
The consistency between two factorizations measures the extent to which it is possible to create a one-to-one match between factors in factorization A and factors in factorization B. Specifically, given two factorizations (WA, HA) and (WB, HB) respectively, consistency is measured with the following procedure:
For each factor number k, compute the part of the reconstruction explained by this factor in each reconstruction, and .
Reshape and into vectors containing all the elements of each matrix respectively, then compute C, a K × K correlation matrix where Cij is the correlation between the vectorized and
Permute the factors greedily so factor 1 is the best matched pair of factors, factor 2 is the best matched pair of the remaining factors, etc. The quality of the match is measured by the correlation between the reconstructions computed using just each factor individually.
Measure consistency as the ratio of the power (sum of squared matrix elements) contained on the diagonal of the permuted C matrix to the total power in C
Thus, two factorizations are perfectly consistent when there exists a permutation of factor numbers for which there is a one-to-one match between what parts of the reconstruction are explained by each factor.
Testing the significance of each factor on held-out data
In order to test whether a factor is significantly present in held-out data, we measure the distribution across timebins of the overlaps of the factor with the held-out data, and compare the skewness of this distrubution to the null case (Figure S1). Overlap with the data is measured as , so this quantity will be high at timepoints when the sequence occurs, producing a distribution of with high skew. In contrast, a distribution of overlaps exhibiting low skew indicates a sequence is not present in the data, since there are few timepoints of particularly high overlap. We estimate what skew levels would appear by chance by constructing null factors where temporal relationships between neurons have been eliminated. To create such null factors, we start from the real factors then circularly shift the timecourse of each neuron by a random amount between 0 and L. We measure the skew of the overlap distributions for each null factor, and ask whether the skew we measured for the real factor is significant at p-value α, that is, if it exceeds the percentile of the null skews. Note the required Bonferroni correction for K comparisons when testing K factors.
Choosing appropriate parameters for a new dataset
Choice of appropriate parameters (λ, K and L) will depend on the data type (sequence length, number, and density;amount of noise;etc.).
In practice, we find that results are relatively robust to choice of parameters. When K or L is set larger than necessary, seqNMF tends to simply leave the unnecessary factors or time bins empty. For λ, the goal is to find the ‘sweet spot’ (Figure 4) to explain as much data as possible while still producing sensible factorizations, that is, minimally correlated factors, with low values of . Our software package includes demo code for determining the best parameters for a new type of data, using the following strategy:
Start with K slightly larger than the number of sequences anticipated in the data
Start with L slightly longer than the maximum expected factor length
Run seqNMF for a range of λ’s, and for each λ measure the reconstruction error and the factor competition regularization term
Choose a λ slightly above the crossover point λ0
Decrease K if desired, as otherwise some factors will be consistently empty
Decrease L if desired, as otherwise some time bins will consistently be empty
In some applications, achieving the desired accuracy may depend on choosing a λ that allows some inconsistency. It is possible to deal with this remaining inconsistency by comparing factors produced by different random initializations, and only considering factors that arise from several different initializations, a strategy that has been previously applied to standard convNMF on neural data [47].
During validation of our procedure for choosing λ, we compared factorizations to ground truth sequences as shown in Figure 4. To find the optimal lambda we used the product of two curves. The first curve was obtained by calculating the fraction of fits in which the true number of sequences was recovered as a function of λ. The second curve was obtained by calculating similarity to ground truth as a function of λ. Similarity to ground truth is measured as the consistency the factorization and the noiseless sequences used to generate the data. The product of these two curves was smoothed using a three-sample boxcar sliding window, and the width was found as the values of λ on either side of the peak value that correspond mose closely to the half-max points of the curve.
Measuring performance on noisy data by comparing seqNMF sequences to ground-truth sequences
We wanted to measure the ability of seqNMF to recover ground-truth sequences even when the sequences are obstructed by noise. Our noisy data consisted of two ground-truth sequences, obstructed by a variety of noise types. We first took the top seqNMF factor, and made a reconstruction with only this factor. We then measured the correlation between this reconstruction and reconstructions generated from each of the ground-truth factors, and chose the best match. Next, we measured the correlation between the remaining ground-truth reconstruction and the second seqNMF factor. The mean of these two correlations was used as a measure of similarity between the seqNMF factorization and the ground-truth (noiseless) sequences.
Testing generalization of factorization to randomly held-out (masked) data entries
The data matrix X was divided into training data and test data by randomly selecting 5 or 10% of matrix entries to hold out. Specifically, the objective function (equation 5, in the Results section) was modified to: where × indicates elementwise multiplication (Hadamard product) and M is a binary matrix with 5 or 10% of the entries randomly selected to be zero (held-out test set) and the remaining 95 or 90% set to one (training set). To search for a solution, we reformulate this optimization problem as: where we have introduced a new optimization variable Z, which can be thought of as a surrogate dataset that is equal to the ground truth data only on the training set. The goal is now to minimize the difference between the model estimate, , and the surrogate, Z, while constraining Z to equal X at unmasked elements (where mij = 1) and allowing Z to be freely chosen at masked elements (where mij = 0). Clearly, at masked elements, the best choice is to make Z equal to the current model estimate as this minimizes the cost function without violating the constraint. This leads to the following update rules which are applied cyclically to update Z, W, and H.
The measure used for testing generalization performance was RMSE. For the testing phase, RMSE was computed from the difference between and the data matrix X only for held-out entries.
Algorithm speed
In practice, our algorithm converges rapidly: fewer than 100 iterations on a typical 150 neuron by 10,000 time point data matrix. Typically, 100 iterations on such data take less than 30 seconds on a standard PC. However, applications to much larger datasets may require faster performance. In these cases, we recommend running seqNMF on smaller subsets of the dataset, perhaps by incorporating seqNMF regularization into an online version of convNMF [62], and/or parallelizing the algorithm by running it on shorter datasets and merging/recombining factors that are common across these shorter runs (finding common factors by e.g. [47]).
Notes on data preprocessing
While seqNMF is generally quite robust, proper preprocessing of the data can be important to obtaining reasonable factorizations on real neural data. A key principle is that, in minimizing the reconstruction error, seqNMF is most strongly influenced by parts of the data that exhibit high variance. This can be problematic if the regions of interest in the data have relatively low amplitude. For example, high firing rate neurons may be prioritized over those with lower firing rate. As an alternative to subtracting the mean firing rate of each neuron, which would introduce negative values, neurons could be normalized divisively or by subtracting off a NMF reconstruction fit in method that forces a non-negative residual [32]. Additionally, variations in behavioral state may lead to seqNMF factorizations that prioritize regions of the data with high variance and neglect other regions. It may be possible to mitigate these effects by normalizing data, or by restricting analysis to particular subsets of the data, either by time or by neuron.
Hippocampus data
The hippocampal data we used was collected in the Buzsaki lab [2, 1], and is publicly available on the Collaborative Research in Computational Neuroscience (CRCNS) Data sharing website. The dataset we refer to as ‘Rat 1’ is in the hc-5 dataset, and the dataset we refer to as ‘Rat 2’ is in the hc-3 and dataset. Before running seqNMF, we processed the data by convolving the raw spike trains with a gaussian kernel of standard deviation 100ms.
Animal care and use
We used male zebra finches (Taeniopygia guttata) from the MIT zebra finch breeding facility (Cambridge, MA). Animal care and experiments were carried out in accordance with NIH guidelines, and reviewed and approved by the Massachusetts Institute of Technology Committee on Animal Care (protocol 0715-071-18).
In order to prevent exposure to a tutor song, birds were foster-raised by female birds, which do not sing, starting on or before post-hatch day 15. For experiments, birds were housed singly in custom-made sound isolation chambers.
Calcium imaging
The calcium indicator GCaMP6f was expressed in HVC by intercranial injection of the viral vector AAV9.CAG.GCaMP6f.WPRE.SV40 [9] into HVC. In the same surgery, a cranial window was made using a GRIN (gradient index) lens (1mm diamenter, 4mm length, Inscopix). After at least one week, in order to allow for sufficient viral expression, recordings were made using the Inscopix nVista miniature fluorescent microscope.
Neuronal activity traces were extracted from raw fluorescence movies using the CNMF_E algorithm, a constrained non-negative matrix factorization algorithm specialized for microendoscope data by including a local background model to remove activity from out-of-focus cells [66].
We performed several preprocessing steps before applying seqNMF to functional calcium traces extracted by CNMF_E. First, we estimated burst times from the raw traces by deconvolving the traces using an AR-2 process. The deconvolution parameters (time constants and noise floor) were estimated for each neuron using the CNMF_E code package [66]. Some neurons exhibited larger peaks than others, likely due to different expression levels of the calcium indicator. Since seqNMF would prioritize the neurons with the most power, we renormalized by dividing the signal from each neuron by the sum of the maximum value of that row and the 95th percentile of the signal across all neurons. In this way, neurons with larger peaks were given some priority, but not much more than that of neurons with weaker signals.
Acknowledgements
This work was supported by a grant from the Simons Collaboration for the Global Brain, the National Institutes of Health (NIH) [grant number R01 DC009183] and the G. Harold & Leila Y. Mathers Charitable Foundation. ELM received support through the NDSEG Fellowship program. AHB received support through NIH training grant 5T32EB019940-03. MSG received support from the NIH [grant number U19NS104648]. AHW received support from the U.S. Department of Energy Computational Science Graduate Fellowship (CSGF) program. Thanks to Pengcheng Zhou for advice on his CNMF_E calcium data cell extraction algorithm. Thanks to Wiktor Młynarski for helpful convNMF discussions. Thanks to Michael Stetner, Galen Lynch, Nhat Le, Dezhe Jin, Edward Nieh, Adam Charles and Jane Van Velden for comments on the manuscript and on our code package. Special thanks to the 2017 Methods in Computational Neuroscience course [supported by NIH grant R25 MH062204 and Simons Foundation] at the Woods Hole Marine Biology Lab, where this collaboration was started.
Appendix 1 Deriving multiplicative update rules
Standard gradient descent methods for minimizing a cost function must be adapted when solutions are constrained to be non-negative, since gradient descent steps may result in negative values. Lee and Seung invented an elegant and widely-used algorithm for non-negative gradient descent that avoids negative values by performing multiplicative updates [34]. They derive these multiplicative updates by choosing an adaptive learning rate that makes additive terms cancel from standard gradient descent on the cost function. We will reproduce their derivation here, and detail how to extend it to the convolutional case [56] and apply several forms of regularization [43, 50, 10]. See Table 3 for a compilation of cost functions, derivatives and multiplicative updates for NMF and convNMF under several different regularization conditions.
Standard NMF
NMF performs the factorization . NMF factorizations seek to solve the following problem:
This problem is convex in W and H separately, not together, so a local minimum is found by alternating W and H updates. Note that:
Thus, gradient descent steps for W and H are:
To arrive at multiplicative updates, Lee and Seung [34] set:
Thus, the gradient descent updates become multiplicative: where the division and × are element-wise.
Standard convNMF
Convolutional NMF factorizes data . convNMF factorizations seek to solve the following problem:
The derivation above for standard NMF can be applied for each ℓ, yielding the following update rules for convNMF [56]:
Where the operator ℓ → shifts a matrix in the → direction by ℓ timebins, i.e. a delay by ℓ timebins, and ← ℓ shifts a matrix in the ← direction by ℓ timebins (Table 1). Note that NMF is a special case of convNMF where L = 1.
Incorporating regularization terms
Suppose we want to regularize by adding a new term, to the cost function:
Using a similar trick to Lee and Seung, we choose a ηW, ηH to arrive at a simple multiplicative update. Below is the standard NMF case, which generalizes trivially to the convNMF case.
Note that:
We set:
Thus, the gradient descent updates become multiplicative: where the division and × are element-wise.
This framework enables flexible incorporation of different types of regularization or penalty terms into the multiplicative NMF update algorithm. This framework also extends naturally to the convolutional case. See Table 3 for examples of several regularization terms, including L1 sparsity [43, 50] and soft orthogonality [10], as well as the terms we introduce here to combat the types of inefficiencies and cross correlations we identified in convolutional NMF, namely, smoothed orthogonality for H and W, and smoothed cross-factor orthogonality, the primary seqNMF regularization term. For the seqNMF regularization term, , the multiplicative update rules are:
Where the division and × are element-wise. Note that multiplication with the K × K matrix (1 − I) effectively implements factor competition because it places in the kth row a sum across all other factors.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵