Decomposition of retinal ganglion cell electrical images for cell type and functional inference

Identifying neuronal cell types and their biophysical properties based on their extracellular electrical features is a major challenge for experimental neuroscience and the development of high-resolution brain-machine interfaces. One example is identification of retinal ganglion cell (RGC) types and their visual response properties, which is fundamental for developing future electronic implants that can restore vision. The electrical image (EI) of a RGC, or the mean spatio-temporal voltage footprint of its recorded spikes on a high-density electrode array, contains substantial information about its anatomical, morphological, and functional properties. However, the analysis of these properties is complex because of the high-dimensional nature of the EI. We present a novel optimization-based algorithm to decompose electrical image into a low-dimensional, biophysically-based representation: the temporally-shifted superposition of three learned basis waveforms corresponding to spike waveforms produced in the somatic, dendritic and axonal cellular compartments. Large-scale multi-electrode recordings from the macaque retina were used to test the effectiveness of the decomposition. The decomposition accurately localized the somatic and dendritic compartments of the cell. The imputed dendritic fields of RGCs correctly predicted the location and shape of their visual receptive fields. The inferred waveform amplitudes and shapes accurately identified the four major primate RGC types (ON and OFF midget and parasol cells), a substantial advance. Together, these findings may contribute to more accurate inference of RGC types and their original light responses in the degenerated retina, with possible implications for other electrical imaging applications.

1 Fitting the EI decomposition 1.1 Definitions • The matrix X ∈ R T ×N represents the EI of a neuron.This matrix contains N observed voltage waveforms, one for each of the N recording electrodes.Each observed waveform has temporal length T .
• The decomposition is parameterized by the following variables: -B ∈ R C×T , the basis waveforms matrix, consisting of C distinct basis waveforms, each with temporal length T .
-A ∈ R C×N ≥0 , the learned amplitudes (weights) on each of the basis waveforms for every recording electrode.Every element in A is required to be non-negative, because the amplitudes are interpreted as the contributions of physical compartments of the neuron to the overall recorded signal.
τ ∈ Z C×N , the integer-valued time shifts applied to each basis waveforms for every recording electrode.

Decomposition fitting algorithm
The EI decomposition is fitted by minimizing the mean square error between the recorded waveforms of the EI and the waveforms reconstructed from the decomposition parameters, with additional regularization terms to induce sparsity in the learned weights and to weakly constrain the compartment basis waveform shape.Let B (τn) denote temporal shifting of each respective waveform in B by the numbers of samples specified by column n of the matrix τ , A :,n denote column n of the matrix A, X :,n denote column n of the matrix X, and B :,c denote row c of the matrix B.
The first term in the objective, 1   2 N n=1 |B (τn) A :,n − X :,n | 2 2 , is a least squares error term, ensuring that the decomposition accurately represents the EI.The second term, λ L N i=1 r(A :,n ), is a sparsity regularizer applied to the learned weights, and has associated strength hyperparameter λ L .r(A :,n ) is specifically chosen as an L 2,1 group-sparsity regularizer, with groups {soma, dendrite} and {axon} to allow the somatic and dendritic components to appear simultaneously on the same electrode.The final term, ), is a Gaussian prior regularizer used to control the shapes of each of the learned basis waveforms, and has associated strength hyperparameter λ p , as well as hyperparameters µ c and Σ c specifying the means and covariances for each basis waveform.
The overall optimization problem (1) is not jointly convex in terms of the decomposition model parameters {B, A, τ }.This is both because τ is integer-valued, and because of the simultaneous estimation of B and A. Due to this non-convexity, the decomposition is fitted using a novel iterative algorithm for shifted semi-non-negative matrix factorization.The algorithm alternates between two steps: (1) an amplitude and time shift optimization step, holding the compartment basis waveforms fixed; and (2) a basis waveform shape fitting step, using the previously-inferred amplitudes and time shifts to learn the waveform shape.Pseudocode for the overall fitting algorithm is provided in Algorithm 1, and mathematical details for each of the sub-steps are provided subsequently.

Amplitudes and time shifts optimization
The amplitude and time shift optimization step holds the compartment basis waveforms B fixed, and uses a combination of search and convex minimization to simultaneously fit the amplitudes A and time shifts τ on every electrode.The overall objective for this step is constructed by discarding all terms that depended solely on B from the objective of problem (1), leaving the optimization problem arg min Though problem (2) is still not jointly convex in terms of A and τ , its structure is considerably simpler than that of the full problem (1).In particular, problem (2) is separable by electrode and can be solved separately for each electrode.In addition, if the time shifts τ are held fixed, each single-electrode problem simplifies into a regularized non-negative linear least square problem, arg min which is a constrained convex minimization problem.Leveraging this property, the amplitudes and time shifts optimization is performed using a coarse-to-fine search over candidate time shifts, solving problem (3) with convex minimization for each candidate time shift, and then selecting the combination of {A :,n , τ n } that minimizes the objective.Because the optimization is solved separately for each recording electrode, this approach avoids combinatorial explosion over the possible combinations of time shifts.Each convex minimization problem is solved using FISTA [1], an accelerated proximal gradient descent algorithm, using a modified form of the constrained optimization formulation of the L 2,1 -regularized problem described in [2] to support the additional non-negativity constraint on the learned weights.A detailed description as well as mathematical proof for this formulation is provided in Section 2.2.Pseudocode for the amplitudes and time shifts optimization is presented in Algorithm 2. Notably, the optimization problems within both the coarse and fine searches can be easily parallelized for improved runtime.
Algorithm 2 Amplitudes and time shifts coarse-to-fine search 1: Hyperparameters: λ L 2: procedure Amplitude-Time-Search(X, B) A ′ :,n ← arg min an end if

Compartment basis waveform shape optimization
The compartment basis waveform shape optimization step holds the amplitudes A and time shifts τ fixed, and solves the convex quadratic minimization problem arg min for the basis waveform shapes B. Construction of the matrix B (τn) requires temporal shifting of the basis waveforms comprising B, complicating computer representations of problem (4).Because a circular shift of t samples in time domain can be expressed simply in Fourier domain as multiplication by the complex phase shift e −iωt , the difficulty of representing time shifts is alleviated by solving the problem in Fourier domain.Letting B denote the discrete Fourier transform of the basis waveforms, X the discrete Fourier transform of the observed EI, ⊙ Hadamard (element-wise) matrix multiplication, P (τn) the complex-valued Fourier phase shift matrix corresponding to temporally shifting the compartment basis waveforms by the amounts specified in τ n , and G the stacked real-imaginary discrete Fourier transform analysis matrix (see Section 3.2 for definition), problem (4) is expressed in Fourier domain as This formulation is equivalent to the original time domain problem (4) due to Parseval's relation.
As problem (5) corresponds to minimization of a convex quadratic, it is solved by constructing and solving a system of linear equations derived from the first-order conditions of optimality.A complete derivation of the coefficients of this linear system is provided in Section 3.
2 Non-negative L 2,1 -regularized minimization The EI decomposition fitting procedure solves problem (3) as a non-negative L 2,1 -regularized linear least squares problem.Because the L 2,1 regularizer is not smooth, this problem cannot be solved directly with gradient descent.Following [2], we reformulate the problem into a different constrained convex minimization problem that can be easily solved using proximal gradient descent.The derivation of the proximal operator for the re-formulated problem builds heavily on the original derivation of the unconstrained L 2,1 -regularized proximal operator performed in [2], and so we repeat that derivation first.

Unconstrained L 2,1 -regularized minimization
The derivation for the proximal algorithm solving the unconstrained L 2,1 -regularized convex minimization problem (e.g.without the non-negativity constraint) is taken from [2].
where L(W ) is a smooth convex loss function in the variable W .This problem cannot be solved using vanilla gradient descent, because the L 2,1 penalty term |W | 2,1 is not differentiable everywhere (e.g. at the origin).Reference [2] replaces this unconstrained non-smooth problem with the constrained smooth optimization problem min where .., G} and G is the number of groups in the L 2,1 group-sparsity regularizer.By inspection, D is convex, since it is the Cartesian product of ice-cream-cones.The objective in problem (7) is smooth and convex, and thus problem (7) is a constrained smooth convex minimization problem.This problem can be solved using proximal gradient descent or one of its accelerated variants (e.g.FISTA [1]), provided that projection onto the convex set D can be performed efficiently.This projection problem, projecting an arbitrary (U, v) onto the convex set D, is expressed as Equation ( 8) can be separated by group, resulting in Assuming that none of the variables in W are shared between groups, problem (9) is solved found by separately solving each of the one-group problems  3), (w i , t i ) lies outside the feasible region, and the associated dotted red lines and points correspond to the minimum norm projections onto the feasible set.
Consider the case where w i is one-dimensional.A drawing of the corresponding two-dimensional (w i , t i ) coordinate space illustrating the three possible cases for projection is provided in Figure 1.The first case, |w| > |t|, corresponds to the gray shaded region of Figure 1.This is outside the feasible region, and the projection corresponds the orthogonal projection onto the edge of the ice-cream cone.The second case, |w| ≤ |t|, corresponds to the green shaded region of Figure 1.This is already feasible, so nothing needs to be done.The third case, |w| ≤ −t, corresponds to the blue shaded region of Figure 1.This is outside the feasible region, and the solution is to project onto the origin.Formally, these rules can be expressed as Because the cone is rotationally-symmetric about the t axis, the solution to the one-dimensional projection problem can be generalized to N dimensions as Proof.(11) and ( 12) solve the optimization problem (10).
The proof is performed by verifying that (11) and ( 12) satisfy the following condition of optimality for constrained smooth convex minimization problems: For the convex minimization problem arg min x∈F {L(x)} with smooth convex objective L(x) and convex feasible set F , the point x * ∈ F is optimal if and only if ⟨x − x * , L ′ (x * )⟩ ≥ 0, ∀x ∈ F .The gradient of the objective in (10) with respect to (u i , v i ) is (u i − w i , v i − t i ).Consider each of the three possible cases: Plugging this into the left hand side of the optimality condition, we get where the first inequality is due to Cauchy-Schwarz, and the second inequality is due to |u i | < v i , which results from the constraint (u i , v i ) ∈ D. Thus case (1) of ( 11) and ( 12) is proven.
where the first inequality is due to the Cauchy-Schwarz inequality, and the second inequality comes from the conditions |w i | ≤ −t i and t i ≤ 0. Thus case (3) of equations ( 11) and ( 12) is proven.
Thus the claim is proven.
The overall FISTA algorithm solving the unconstrained L 2,1 -regularized problem as a constrained convex minimization problem is given in Algorithm 3. Note that f γ,Wi,ti (W i+1 , t i+1 ) in the backtracking line search is defined as and π D G corresponds to the projection operator from equation (8).

L 2,1 -regularized minimization with non-negativity constraint
This derivation is performed by the authors.
For the EI decomposition, the learned compartment amplitudes are required to be non-negative, corresponding to the constrained minimization problem arg min where W ≥ 0 constrains all variables in W to be non-negative, L(W ) is a smooth convex function of W , and the L 2,1 penalty |W | 2,1 is not smooth.Like before, problem ( 14) is reformulated as a smooth constrained optimization problem arg min where C is the set .., G}, and G is the number of groups in the group-sparsity regularizer.C is a convex set, since it is the Cartesian product of intersections of convex sets (in particular, an ice-cream cone and the non-negative orthant), which must be convex.The projection onto the convex set C is given by the optimization problem (2) w i ≥ 0, w i ≤ t i , corresponding to the green shaded region; (3) t i < 0, w i < −t i , corresponding to the blue shaded region; and (4) w i < 0, t i > 0, corresponding to the red shaded region.In cases (1), (3), and (4), (w i , t i ) lies outside the feasible region, and the associated red dotted lines and points correspond to the minimum norm projections onto the feasible set.
As before, the objective in ( 16) is separated by group, yielding Assuming that no variables are shared between groups, problem (17) can be solved separately for each group, yielding Consider the case where w i is one-dimensional.A drawing of the two-dimensional (w i , t i ) coordinate space is provided in Figure 2 to illustrate the four possible cases for projection: 1. w i ≥ |t i |, corresponding to the gray shaded region of Figure 2. In this case, the solution is to orthogonally project (w i , t i ) onto the w i = v i line.
2. w i ≥ 0, w i ≤ t i , corresponding to the green shaded region of Figure 2. In this case, the point (w i , t i ) is already feasible, and so nothing needs to be done.
3. t i < 0, w i < −t i , corresponding to the blue shaded region of Figure 2. In this case, the projection is performed by mapping (w i , t i ) onto the origin.
4. w i < 0, t i > 0, corresponding to the red shaded region of Figure 2. In this case, the projection is performed by mapping (w i , t i ) to (0, t i ).
In mathematical terms, this corresponds to and Unlike the unconstrained L 2,1 case, the projection is no longer rotationally symmetric about the t i -axis because of the non-negativity constraint.Thus, generalizing the above to the N -dimensional case requires accounting for the possibility that some but not all of the components of w i lie outside the non-negative quadrant.
The situation where some but not all of the components of w i lie outside the non-negative quadrant is now examined in greater detail.Define J .The single-group objective function from (18) is then split into First, consider only the variables {u } (corresponding to the negative-valued components of w i only).These variables are included only in the first term of (21).No matter the value of t i , the optimal value for each of these variables {(u } is exactly 0, corresponding to the lower w < 0 half of Figure 2. Intuitively, this is because all nonzero feasible values of {u } are positive, and thus would increase the value of [w Then, consider the remaining variables in the problem, v i and {u }.These variables lie in the upper w ≥ 0 half of Figure 2, and thus, for these particular variables, the projection operator π D (w, t) previously developed in Section 2.1 for the unconstrained regularized problem can be applied to solve for v i and {u Letting max denote the element-wise maximum operator, the projection operator for the nonnegative regularized problem can be expressed concisely as The overall FISTA algorithm solving the non-negative L 2,1 -regularized problem is given in Algorithm 4. Like before, f γ,Wi,ti (W i+1 , t i+1 ) in the backtracking line search is defined as Algorithm 4 FISTA for non-negative L 2,1 -regularized problem 10: 11: while True do ▷ Backtracking line search until Converged

23:
return W ⋆ 24: end procedure 3 Solving waveform shapes linear system in Fourier domain This section describes the construction of the linear system of equations associated with the firstorder conditions of optimality for the waveform shapes fitting step.This optimization is performed in the Fourier domain, to simplify accounting for the basis waveform temporal shifts, and thus the coefficients of the linear system are found by differentiating the mean square error loss function with respect to the real and imaginary components of the basis waveforms.
The construction of the linear system is rather tedious; for simplicity we start by first finding the linear coefficients for the unregularized waveform shapes problem (without the Gaussian shape prior), and then derive the modifications needed to solve the regularized waveform shapes problem (with the Gaussian shape prior).

Solving for basis waveforms without the Gaussian shape prior
Let B denote the discrete Fourier transform (DFT) of the basis waveforms B, X:,l the DFT of the l th observed data waveform, Pl the complex-valued Fourier phase shift matrix corresponding to the previously-estimated time shifts for the l th data waveform, A :,l the previously-estimated amplitudes for the l th data waveform, and ⊙ Hadamard element-wise matrix multiplication.The least squares optimization problem in frequency domain over all observed data waveforms for a given cell is arg min This is an unconstrained convex minimization problem, and has first order condition of optimality Interchanging summation and the gradient operator leaves Let the subscript Re correspond to the real component of a complex number (e.g.ỹRe = Re{ỹ}), and let the subscript Im correspond to the imaginary component of a complex number (e.g.ỹIm = Im{ỹ}).Furthermore, define L l ≜ 1 2 |( B ⊙ P (l) )A :,l − X:,l | 2 2 as the contribution of data waveform l to the overall mean square error.Taking the derivative of L l with respect to ( BRe ) f,k , the real component of the f th Fourier coefficient of the k th basis waveform, results in Similarly, taking the derivative of L l with respect to ( BIm ) f,k , the imaginary component of the f th Fourier coefficient of the k th basis waveform, results in where µ i is the prior mean waveform of the specific compartment (e.g. a typical somatic waveform), and Σ is a temporal covariance matrix.The entries of temporal covariance matrix can be specified using a 1D Gaussian process kernel, for example, an exponentiated quadratic kernel k Like in Section 3.2.1, the waveform shape optimization with the Gaussian prior is performed in the Fourier domain, and thus the prior regularization penalty must be translated to the Fourier domain.Define F as the DFT analysis matrix for computing the one-dimensional Fourier transform of realvalued input.F is a complex-valued (⌊ T 2 ⌋ + 1) × T matrix, where T is the number of samples in time domain for the basis waveforms.We further define F Re ≜ Re{F } and F Im ≜ Im{F } as the real and imaginary components of the matrix F , respectively.F Re is a real-valued ⌊ T 2 ⌋ + 1) × T matrix, and F Im is a real-valued (⌊ T −1 2 ⌋ × T matrix.Note that splitting F into real and imaginary components is only useful in the present case when the signal b is real-valued, since otherwise separating the real and imaginary components of the Fourier transform would not be linear operations.
Because the basis waveforms b are real-valued, the real component of the basis waveform Fourier transform is bRe = F Re b, and the imaginary component is bIm = F Im b.Define the real-valued vector z ≜ [ bT

Re bT
Im ] T , and the Because z = Gb is a linear transformation of the Gaussian random variable b, z is also a jointly Gaussian random variable, and is distributed according to which has pdf The Gaussian prior penalty term associated basis waveform j, P j , is then rewritten as and therefore the overall waveform shapes objective function, consisting of the sum of the mean square error terms and the Gaussian prior penalty terms, is

Constructing linear equations from the first-order conditions
The linear system of equations from Section 3.2.1 must be updated to include the Gaussian prior penalty.Specifically, the contributions from the gradient of Gaussian prior penalty must be incorporated into the first-order conditions of optimality.The gradient of the Gaussian prior penalty with respect to the basis waveform Fourier coefficients z k is A notational difference regarding the indexing of the basis waveform Fourier coefficients must be resolved in order to incorporate the gradients in equation ( 44) with the first-order condition equations (32) and (32).Specifically, ( BRe ) f,k = {z k } (Re,f ) and ( BIm ) f,k = {z k } (Im,f ) , where the subscripts (Re, f ) and (Im, f ) represent the indices of the entries in z k corresponding to the real and imaginary components of the Fourier coefficient at frequency f , respectively.Similarly, let {(GΣ k G T ) −1 } (Re,f ),(Im,g) denote the entry at the (Re, f ) row and (Im, g) column of the matrix (GΣ k G T ) −1 , e.g. the entry at the m th row and n th column, where m is the index of the entry in z k corresponding to the real component of the Fourier coefficient at frequency f , and n is the index of the entry in z k corresponding to the imaginary component of the Fourier coefficient at frequency g.
n , τn ← empty values 6: for τ ′ n ∈ coarse grid of possible time shifts do ▷ Coarse search over τ 7:

Figure 1 :
Figure 1: Projection onto the 1D ice-cream cone.There are three possible cases: (1) |w i | > |t i |, corresponding to the gray shaded region; (2) |w i | ≤ |t i |, corresponding to the green shaded feasible region; (3) |w i | ≤ −t i , corresponding to the blue shaded region.In cases (1) and (3), (w i , t i ) lies outside the feasible region, and the associated dotted red lines and points correspond to the minimum norm projections onto the feasible set.

Figure 2 :
Figure2: Projection onto the 1D non-negative subset of the ice-cream cone.There are now four cases: (1) w i ≥ |t i |, corresponding to the gray shaded region; (2) w i ≥ 0, w i ≤ t i , corresponding to the green shaded region; (3) t i < 0, w i < −t i , corresponding to the blue shaded region; and (4) w i < 0, t i > 0, corresponding to the red shaded region.In cases (1), (3), and (4), (w i , t i ) lies outside the feasible region, and the associated red dotted lines and points correspond to the minimum norm projections onto the feasible set.