Omitted Variable Bias in GLMs of Neural Spiking Activity

Generalized linear models (GLMs) have a wide range of applications in systems neuroscience describing the encoding of stimulus and behavioral variables, as well as the dynamics of single neurons. However, in any given experiment, many variables that have an impact on neural activity are not observed or not modeled. Here we demonstrate, in both theory and practice, how these omitted variables can result in biased parameter estimates for the effects that are included. In three case studies, we estimate tuning functions for common experiments in motor cortex, hippocampus, and visual cortex. We find that including traditionally omitted variables changes estimates of the original parameters and that modulation originally attributed to one variable is reduced after new variables are included. In GLMs describing single-neuron dynamics, we then demonstrate how postspike history effects can also be biased by omitted variables. Here we find that omitted variable bias can lead to mistaken conclusions about the stability of single-neuron firing. Omitted variable bias can appear in any model with confounders—where omitted variables modulate neural activity and the effects of the omitted variables covary with the included effects. Understanding how and to what extent omitted variable bias affects parameter estimates is likely to be important for interpreting the parameters and predictions of many neural encoding models.


22
Regression models have been widely used in systems neuroscience to explain how external 23 stimulus and task variables as well as internal state variables may relate to observed neural activity 24 (Brown et al., 2003;Kass et al., 2005). However, in many cases, the full set of variables that explain 25 the activity of the observed neurons is not observed or is not even known. It is important to 26 recognize that, in these cases, omitted variables can cause the parameter estimates for the effects 27 that are included in a regression model to be biased. That is, parameter estimates for the modeled 28 effects would be different if other, omitted variables were to be included in the model (Box, 1966). 29 In experiments from behaving animals (Niell and  evidence that neural activity is affected by many more variables than are typically considered 32 relevant (Kandler et al., 2017;Stringer et al., 2018). At the same time, although it has long been a 33 concern in statistics ( Pearl, 2009) and has received some attention in other fields (Greenland, 1989; 34 Clarke, 2005), omitted variable bias, as a general problem, appears underappreciated in systems 35 neuroscience. Here we demonstrate why systematically considering omitted variable bias may be 36 important in neural data analysis and examine how omitted variable bias can affect one popular 37 framework for describing neural spiking activity -the generalized linear model (GLM) with Poisson 38 observations. 39 In general, regression methods aim to estimate variations in a response variable as a function of 40 other variables or covariates. When the goal of modeling is to maximize prediction accuracy, such 41 as with brain machine interfaces, interpreting the model parameters may not be a high priority. 42 However, in many other cases, parameter estimates are, at least to some extent, interpreted and 43 analyzed. For instance, tuning curves or receptive fields may be measured and compared under 44 different stimulus or task conditions or before and after a manipulation. In fully controlled 45 experiments where the covariates are assigned at random, estimated coefficients can often be 46 interpreted as estimates of causal effects (Gelman and Hill, 2007). However, for many cases in 47 neuroscience, it may be difficult or impossible to completely control or randomize all the relevant 48 variables. 49 In modeling neural activity, omitted variable bias can appear in any situation where neurons are 50 modulated by omitted variables and the omitted variables (often called confounders) are not 51 independent from the variables included in the model -the ones whose effects we are trying to 52 estimate. Minimizing the influence of confounding variables is a major part of most experimental 53 design (Rust and Movshon, 2005), and the statistical effects of confounding variables are well 54 understood (Wasserman, 2004). However, when the goal of modeling is description or explanation 55 (Shmueli, 2010), the effects of these omitted variables are frequently neglected. To give a concrete 56 example, imagine an idealized neuron in primary motor cortex (M1) whose firing, unlike typical M1 57 neurons (Georgopoulos et al., 1982), is not at all modulated by reach direction but, instead, is 58 modulated by reach speed (Fig 1). In a typical experimental setting, an animal's reach directions 59 are randomized, but reach speed cannot be randomized or tightly controlled. If the average speed 60 differs across reach directions, such a hypothetical neuron will appear to be tuned to reach 61 direction, despite not being directly affected by direction. First, fitting a typical tuning curve for 62 reach direction, we would infer that such a neuron has a clear preferred direction and non-zero 63 modulation depth. On the other hand, if we then fit a second model that included both reach 64 direction and speed, we would infer that the neuron is modulated by speed alone, and it would be 65 apparent that the original preferred direction and modulation depth estimates were biased due 66 to the omitted variable. 67 In adding additional variables, previous studies have largely focused on the fact that including 68 previously omitted variables improves model accuracy or the fact that neural activity is often 69 influenced by a host of task variables. In M1, for instance, including speed improves model 70 accuracy (Moran and Schwartz, 1999), but the presence of many correlated motor variables (e.g. 71 kinematics, end-point forces, muscle activity) makes it difficult to interpret how neurons represent 72 movement overall (Humphrey et al., 1970;Omrani et al., 2017). Here, instead of focusing on the 73 advantages or complexities of models with many variables, we focus on the well-known, but 74 under-discussed, fact that the parameters describing the original effects change as additional 75 variables are included. The hypothetical M1 neuron above points to a more general question about 76 regression models of neural activity. What happens when we cannot or do not include variables 77 that are relevant to the process that we are modeling? 78 Here we first evaluate the statistical problem of omitted variable bias in the canonical generalized  79  linear model with Poisson observations. Then, as a case study, we examine how speed affects  80 estimates of direction tuning of neurons in primary motor cortex, as well as, two other case studies 81 where the spike counts are modeled as a function of external variables: orientation tuning in 82 primary visual cortex (V1) and place tuning in the hippocampus (HC). In each of these case studies 83 we find that commonly omitted variables (speed in M1, population activity in V1, and speed and 84 heading in HC) can bias the estimated effects of commonly included variables (reach direction in 85 M1, stimulus orientation/direction in V1, and place in HC). Across all three case studies, including 86 the omitted variables reduces the estimated modulation due to typical tuning effects. We also 87 illustrate how omitted variable bias can affect generalized linear models of spike dynamics where 88 a post-spike history filter aims to describe refractoriness and bursting (Truccolo et al., 2005). The 89 goal of these models is typically to differentiate aspects of spike dynamics that are due to the 90 neurons own properties (e.g. membrane time constant, resting potential, after-hyper-polarization 91 currents) from those due to input to the neuron from other sources (Brillinger and Segundo, 1979;92 Paninski, 2004). In this setting, the input to the neuron is typically not directly observed, but is 93 approximated by stimulus or behavioral covariates, local field potential, or the activity of other 94 neurons. Here we show that omitting the input can lead to large biases in post-spike history filters, 95 and that including omitted variables describing the input can change the interpretation and 96 stability of the estimated history effects. 97 GLMs have been used in many settings to disentangle the effects of multiple, possibly correlated, 98 stimulus or task variables (Fernandes et  be more robust to non-spherical covariate distributions than other methods, such as spike-103 triggered averaging (Paninski, 2004;Pillow, 2007). Although these advantages are important, GLMs 104 are not immune to bias. Here we show how the possibility of omitted variable bias, in particular, 105 should encourage researchers to be cautious in their interpretation of model parameters, even in 106 cases where a GLM achieves high predictive accuracy (Shmueli, 2010). 107

108
Here we introduce the problem of omitted variable bias and examine differences between omitted 109 variable bias in linear models and the canonical Poisson GLM. We then consider three tuning curve 110 estimation problems: estimating direction tuning in primary motor cortex, place tuning in 111 hippocampus, and orientation tuning in primary visual cortex and show how omitted variables in 112 each of these three cases can alter parameter estimates. Finally, we consider a GLM that aims to 113 describe the dynamics of post-spike history and show how omitted inputs can bias the estimated 114 history effects and qualitatively change model stability. 115

Omitted Variable Bias in Linear Regression and canonical Poisson GLMs 116
When relevant variables are not included in a regression model, the estimated effects for the 117 variables that are included can be biased (Box, 1966). Omitted variable biases can cause the  118  parameters describing the effects of the original variables to be over-or under-estimated, and  119 model fits can change qualitatively when omitted variables are included (Fig 1). 120 To understand the problem of omitted variable bias it will be helpful to briefly review the well-121 known case of multiple linear regression, where the bias can be described analytically (Box, 1966). 122 In the linear setting, consider the generative model 123 where observations are a linear combination of observed and omitted ℎ variables plus 125 normally-distributed i.i.d. noise ~(0, ). For simplicity, we ignore the intercept term, but in the 126 analysis that follows it may also be considered as part of . If we then fit the (mis-specified) model 127 without ℎ using maximum likelihood estimation (equivalent to the ordinary least squares 128 solution, in this case) the estimated parameters will be 129 where denotes the effect of the noise, and the bias ( ) −1 ℎ ℎ will, generally, be non-zero. 132 There will be no bias only in the cases where the omitted variables do not affect the observations 133 ( ℎ = 0) or when the omitted variables and observed variables are not collinear ( ℎ = 0). Note 134 that ( ) −1 ℎ is the matrix of regression coefficients for the omitted variables using the 135 observed variables as predictors. For linear regression, the omitted variable bias thus depends on 136 both the extent to which the omitted variables affect the observations ℎ and the extent to which 137 the omitted variables can be (linearly) predicted from the observed variables. 138 Although there is a closed-form solution for the omitted variable bias for linear regression, the 139 generalized linear setting is not as tractable (Gail et al., 1984;Clogg et al., 1992;Drake and 140 McQuarrie, 1995). We will consider the case of a canonical Poisson GLM, in particular, where 141 In the more general case, GLMs have 144 where −1 (⋅) is the inverse link function, and is distributed following an exponential family 146 distribution (McCullagh and Nelder, 1989). For a canonical GLM the log-likelihood takes the form 147 ℒ( , ℎ ) ∝ ∑ ( + ℎ, ℎ ) − ( + ℎ, ℎ ) + ( , ℎ ) 148 where the nonlinear function (⋅) depends on both the link function and the noise model. For  149 canonical GLMs, this log-likelihood is concave and the maximum likelihood estimate ̂ satisfies 150 ℒ |̂= 0 . The exact form of (⋅) will depend on the model, but for linear regression ( ) is 151 proportional to 2 2 , and for canonical (log-link) Poisson regression (⋅) = exp(⋅). 152 Now, with omitted variables, instead of maximizing the correct log-likelihood, we maximize instead 153 For the omitted variable bias in ̂ to be 0, we need both ℒ |̂= 0 and ℒ |̂= 0 at the same value 155 of . Although, neither MLE has a closed form solution, this condition implies that, if there is no 156 bias due to the omitted variables, 157 where ′ (⋅) is the derivative of (⋅). For linear regression this equality reduces to the OLS form 159 derived above, and for canonical Poisson regression we have 160 This equality is satisfied when observations are not modulated by the omitted variables ℎ = 0 or, 162 more generally, when the effect of the omitted variables δλ = exp( + ℎ ℎ ) − exp ( ) is 163 orthogonal to the included variables . Note that with linear regression, ℎ = 0 implies that the 164 estimates will not be biased, but here this is not the case unless = 0 as well. Due to the 165 structure of the canonical Poisson GLM, omitted variable bias can thus occur even in a properly 166 randomized, controlled experiment (Gail et al., 1984), 167 It is important to note that the maximum likelihood estimates themselves are consistent. That is, 168 the estimators converge (in probability) to their true values when the generative model is correct. 169 The bias here is a result of the model being mis-specified. This mis-specification affects the location 170 of the maximum and, also, the shape of the likelihood. Optimization methods, such as Newton's 171 method, will typically contain omitted variable bias in each parameter update. For canonical 172 Poisson regression, for instance, the updates take the form 173 at iteration where the weight matrix is diagonal with entries = and ( ) −1 is the 175 Fisher scoring matrix (inverse Hessian of the log-likelihood) at the current estimate ̂. Since the 176 mis-specified model will use = exp ( ) instead of the exp( + ℎ ℎ ), both the weight matrix 177 and the gradient ( − ) will be biased at each step of the optimization (except when | = 178 0). Traditional standard errors for the MLE will also typically be influenced by omitted variables, 179

206
Dashed line denotes the effect of when ℎ is fixed.

Omitted Variable Bias in Tuning Curve Estimation 208
When fitting tuning curve models to spike count data, omitted variable bias can cause preferred 209 stimuli and modulation depths to be misestimated and can even lead to completely illusory tuning 210 (Fig 1). To illustrate how omitted variable bias affects GLMs of neural spiking, not just in theory, 211 but in practice, we consider three case studies where we fit typical tuning curve models that omit 212 potentially relevant variables along with augmented models that include these additional 213 variables. We first consider modeling spike counts across trials and on slow (>100ms) timescales. 214 Here we assess 1) the tuning of neurons in motor cortex to reach direction, with speed as a 215 potential omitted variable, 2) the tuning of neurons in hippocampus to position, with both speed 216 and head-direction as potential omitted variables, and 3) the tuning of neurons in visual cortex to 217 the direction of motion of a sine-wave grating, with population activity as a potential omitted 218 variable. In each of these cases studies, we show how the omitted variables are not independent 219 from the commonly included variables and how neural responses are modulated by the omitted 220 variables. These two properties, together, can lead to omitted variable biases. 221 In our first case study, we model data recorded from primary motor cortex (M1) of a macaque 222 monkey performing a center-out, planar reaching task. In this task, speed differs systematically 223 across reach directions (Fig 2A), with average speed differing by as much as 35±3% (for the targets 224 at 45 and 225 deg relative to right, Fig 2B). To model neural responses, we first fit a traditional 225 tuning curve model (Georgopoulos et al., 1982;Amirikian and Georgopulos, 2000), where the 226 predicted responses depend only on target direction. Here we use a circular, cubic B-spline basis 227 (5 equally spaced knots) to allow for deviations from sinusoidal firing, but, in most cases, the 228 responses of the n=81 neurons in this experiment are well described by cosine-like tuning curves 229 with clear modulation for reach direction. We then fit a second model that includes effects from 230 movement speed. Here we use covariates based on ( to interact (Fig 2C, bottom). Together, the fact that direction and speed are not independent along 235 with the fact that neural responses appear to be modulated by speed could lead to biased 236 parameters estimates for the model where speed is omitted. 237 Comparing the models with and without omitted variables we find that, averaged across the 238 population, there are only minimal shifts in the preferred direction (3±2 deg) when speed is 239 included in the traditional tuning curve model, and there do not appear to large, systematic shifts 240 in the population distribution of PDs (Kuiper's test, p>0.1). At the same time, there is substantial 241 variability between neurons in the size of the PD-shift (circular SD 32±5 deg). Across the 242 population, modulation depth (measured using the standard deviation of the tuning curve) 243 decreases slightly on average (3±2%), and the size of the modulation change also varies 244 substantially between individual neurons (SD of changes 18±3%). An example neuron in Fig 1C  245 (bottom), for instance, has a modulation decrease of 9±5% and the preferred direction changes 246 4±9 deg when speed is included in the model (standard error from bootstrapping  fact that the omitted variables may covary with position and the fact that neurons appears to be 296 modulated by the omitted variables, suggest that there may be omitted variable bias. 297 Here, in one recording during an open field foraging task we find that the average speed (Fig 2A)  298 and heading (Fig 2B) differ extensively as a function of position. Within a given neuron's place field, 299 the distributions of speed and heading may be very different from their marginal distributions. 300 Across the population of n=68 place cells (selected from 117 simultaneously recorded neurons, 301 see Methods), average in-field speed was between 80-135% of the average overall speed (5.5cm/s), 302 and the animal's heading can be either more or less variable in-field (circular SD 57-80 deg) 303 compared to overall (75 deg). 304 As previous studies have shown, we also find that neural responses are modulated by speed and 305 head direction. Responses due to place, speed, and heading are shown for one example neuron 306 in Fig 3. This neuron shows a stereotypical place-dependent response (Fig 2B), but splitting the 307 observations by speed (Fig 3C, top) or heading (Fig 3B, bottom) by quartiles/quadrants reveals that 308 there is also tuning to these variables. The neuron appears to increase its firing with increasing 309 speed and responds most strongly when the rat is facing the left. These dependencies are well fit 310 by the full model where the firing rate depends, not just on position, but also on the (log-311 transformed) speed and the heading (Fig 3D, bottom). For the example place cell shown here, the 312 location of the place-field does not change substantially when the omitted variables are included 313 ( Fig 3E). However, the modulation (SD of the rate map) decreases by 27%. That is, 27% of the 314 apparent modulation due to position when it is modeled alone, can be explained by speed and 315 heading effects. 316 Across the population of place cells, there were no clear, systematic difference in the place field 317 locations, but the modulation (SD of the rate map ( )) decreases by 9±1% on average when speed 318 and heading are included. Individual neurons showed substantial variability in their modulation 319 differences (population SD 10±1%). As in M1, including the omitted variables increased spike 320 prediction accuracy -the average cross-validated (10-fold) pseudo-R 2 was .29±.02 for the original 321 model and .31±.02 for the model including speed and heading activity. This difference seems small, 322 since there is large variability in pseudo-R 2 values across the population, but the average increase 323 in pseudo-R 2 was 11±3% (Fig 5). Given that neurons appear to be modulated by speed and heading, 324 it is unsurprising that including these variables improves model fit. However, as before, it is 325 important to note that this modulation can lead to biases in the place field estimates for the model 326 with only position. 327 also varies with stimulus direction, then there is the potential for omitted variable bias. 346 Here we assess neural activity from n=90 simultaneously recorded neurons across many (2400) 347 repeated trials with 12 different movement directions. We find that there is high trial-to-trial 348 variability in the population rate (Fig 4A), and the average firing across all neurons does differs 349 across stimulus directions, up to ~50%. For this recording, the most extreme differences were 350 between the 180 deg stimulus where the average rate across the population was 3.4±0.1Hz and 351 the 60 deg stimulus where the average rate was 6.3±0.1Hz (Fig 4B). By adding the (log-352 transformed) population rate as a covariate to a more typical model of direction tuning, we find 353 that population activity may lead to omitted variable bias in models of direction tuning alone. 354 As in the case studies above, there do not appear to be any consistent or systematic effects on the 355 preferred stimulus direction at the population level (Kuiper's test, p=0.1). However, the modulation 356 depth (measured using SD of the tuning curve) decreases substantially 15±2% when population 357 rate is included in the model, and there is again high variability across neurons (SD 20±2%). In this 358 case, model accuracy increases substantially when the omitted variable is included. The cross-359 validated (10-fold) pseudo-R 2 is .26±.02 for the original model and .43±.02 for the model including 360 population activity, with an average increase of 164±31% (Fig 5). Unlike in M1 where the effect of speed was highly diverse for different neurons, in this case study 389 the effect of the population rate is largely consistent. Higher population rates are associated with 390 higher firing rates, and, for most neurons, the effect of the population rate is stronger in the 391 preferred direction(s), consistent with a multiplicative effect. Note that here, we do not include the 392 neuron whose rate we are modeling in the calculation of the population rate. However, using the 393 population rate as an omitted variable requires some interpretation. The population rate will 394 certainly be affected by the tuning of the, relatively small, sample of neurons that we observe. If 395 we have a disproportionate number of neurons tuned to a specific preferred direction, the 396 population rate in those directions will be higher. This suggests that in a different recording, the 397 covariation between the stimulus and the population rate could very likely be different. However, 398 it appears that the omitted variable biases in this case are mostly driven by noise correlations, 399 where neural activity is correlated on single trials even within the same stimulus condition, rather 400 than stimulus correlations, where neural activity is correlated due to similar tuning. When we 401 shuffle the data within each stimulus condition (removing noise correlations) the average change 402 in the modulation depth is -1±2% (SD 18±3%), and the effect of the omitted variable becomes

Omitted Variable Bias in the Estimation of Post-Spike History Effects 439
In addition to modeling spike counts over trials or on relatively slow (>100ms) behavioral 440 timescales, GLMs are also often used to describe detailed, single-trial spike dynamics on fast 441 (<10ms) timescales. One common covariate used in these types of models is a post-spike history 442 effect where the probably of spiking at a given time depends on the recent history of spiking. 443 Modeling these effects allows us to describe refractoriness, bursting (Paninski, 2004;Truccolo et 444 al., 2005), and a whole host of other dynamics (Weber and Pillow, 2017). Conceptually, the goal of 445 these models is to disentangle the sources of rate variation based only on observations of a 446 neuron's spiking, with history effects, ideally, reflecting intrinsic biophysics. However, since the full 447 synaptic input is typically not known with extracellular spike recordings there is potential for 448 omitted variable biases. 449 To illustrate the potential pitfalls of omitting the input to a neuron, consider using the GLM to 450 capture single neuron dynamics in the complete absence of external covariates 451 where the rate is determined by a baseline parameter along with a filtered version of the 453 neuron's past spiking with ℎ ( ) = ∑ ( ) ( − ) >0 . This is a perfectly acceptable model of intrinsic 454 dynamics, but for most spike data that we observe this isolated neuron model may not provide a 455 realistic description of a neuron receiving thousands of time-varying synaptic inputs. If we fit this 456 model to data where the input to the neuron did vary over time, 457 ( ) = exp ( + ℎ( ) + ℎ ℎ ( )) 458 then the history filter in the first model will attempt to capture variation in spiking due to the time-459 varying input, in addition to any intrinsic dynamics. For example, when ℎ is periodic, the 460 estimated history filters of the original model will attempt to capture this periodic structure (Fig  461  6A-B). Just as in the tuning curve examples above, the fact that history effects covary with the input 462 and the fact that the input modulates the neuron's firing leads to omitted variable bias. When the 463 input is omitted from the model, the biased history effects simply provide the best (maximum 464 likelihood) explanation of the observed spiking (Fig 6C).

465
These examples with strong, periodic input are not necessarily biologically realistic, but they make 466 it apparent how the post-spike history can be biased by omitted input variables. In vivo, neurons 467 instead appear to be in a high-conductance state, where membrane potential fluctuations have 468 approximately 1/ power spectra (Destexhe et al., 2001(Destexhe et al., , 2003. When these naturalistic input 469 statistics are used to drive the GLM, omitted variable bias can occur, as well. Here we simulate a 470 GLM receiving 1/ noise input with = 0 (white noise) 1 and 2 (Fig 7). For white noise input, the 471 MLE accurately recovers the simulated post-spike history filter when the input is omitted from the 472 model, but when = 1 or 2 the estimates become increasingly biased (Fig 7A,C). With the full 473 model, where the input is included as a covariate, the history is recovered accurately no matter 474 what the input statistics are. Just as in the periodic case, however, these different input statistics 475 alter the auto-correlation, and, when the input is omitted from the model, the maximum likelihood 476 history filter simply aims to capture these patterns. 477 In GLMs for single-neuron dynamics, one effect of omitted variable bias is that it may lead us to 496 misinterpret how stable a neuron's dynamics are. Even if the true history filter only reduces the 497 neuron's firing rate following a spike (as in Fig 7C), the estimated filter can be biased upwards when 498 the input is omitted. If we were to simulate the activity of this neuron based on the biased filter, 499 the bias could cause the neuron's rate to diverge if the rate becomes high enough. To assess the 500 stability of the estimated post-spike history effects quantitatively, here we make use of an quasi-501 renewal approximation analysis introduced in (Gerhard et al., 2017). Given a history filter, this 502 approach finds an approximate transfer function describing the neuron's future firing rate (output) 503 given its recent (input) firing rate (see Methods). For all estimated models, the transfer function 504 has a stable fixed point near the neuron's baseline firing rate. When the true input is omitted and 505 > 0, the estimated history filters also have an unstable fixed point where the neuron's firing rate 506 will diverge if the rate exceeds this point (Gerhard et al., 2017). Here we find that omitted variable 507 bias leads to apparent fragility (Fig 7B,D). The stable region shrinks as increases, and even when 508 the true dynamics are strictly stable (as in Fig 7C,D), omitted variable bias can lead us to mistakenly 509 conclude that the neuron has fragile dynamics. 510 With most extracellular spike recordings, the synaptic input that the neuron receives is unknown. 511 However, there may also be omitted variable bias when history effects are estimated from real 512 data. In this case, the input to a neuron can be approximated by stimulus or behavioral variables, . Just as in the simulations above, including or omitting these variables can then alter the 516 estimated history effects, even though they are not as directly related to spiking as the synaptic 517 input itself. Here we consider total population spiking activity as a proxy for synaptic input and 518 consider how including population activity alters the history filters when compared to a model of 519 history alone. 520 We examine two datasets: spontaneous activity from primary visual cortex of an anesthetized 521 monkey with n=62 simultaneously recorded neurons and activity from dorsal hippocampus of a 522 sleeping rate with n=39 simultaneously recorded neurons. To model population covariates we 523 sum the spiking of all neurons, excepting the one whose spiking we aim to predict, and low-pass 524 filter the signal (see Methods). Similar to previous results (Okun et al., 2015), we find that, since 525 neurons often have correlated fluctuations in their spiking (Fig 8A,D), the population rate is a good 526 predictor for single neuron activity. Moreover, when we add population covariates to a GLM with 527 post-spike history effects the history filter changes. 528 In the V1 dataset, the post-spike gain decreases by 7.8±0.5% on average when population 529 covariates are included, and 14.9±0.8% when considering only the first 50ms after a spike (Fig 8B). 530 The effects of adding population covariates are less pronounced in the hippocampal dataset. The 531 post-spike gain decreases by 2.5±0.3% on average, and 9.5±1.2% when considering only the first 532 50ms after the spike (Fig 8E). Based on the quasi-renewal approximation, all neurons in both the 533 V1 and hippocampal datasets have fragile transfer functions where there is a stable fixed point 534 (near the neuron's average firing rate) and an unstable fixed point where the neuron's rate 535 diverges if the input becomes too strong. For V1, the average upper-limit of the stable region is 536 80±3Hz for the models with history only and 143±7Hz for the models with population covariates 537 (Fig 8C). In the hippocampal data, the average upper-limit of the stable region is 38±6Hz for the 538 models with history only and 75±13Hz for the models with population covariates (Fig 8F). Each 539 neuron is, thus, apparently, more stable after the population covariates are included. 540 As in the case studies using tuning curves, adding covariates also improves spike prediction 541 accuracy. In the V1 dataset, the average log likelihood ratio relative to a homogeneous Poisson 542 model is 2.2±0.3 bits/s for the history model and 3.3±0.3 bits/s for the model with population 543 covariates. In hippocampus, the log likelihood ratio is 0.9±0.3 bits/s for the history model and 544 2.0±0.5 bits/s for the model with population covariates. The larger effects in V1 are likely explained 545 by the fact that the population rate is predictive for many more neurons here than for the 546 hippocampal data. In the hippocampus, only 26% of the neurons have an increase of over 0.5 547 bits/s when the population covariates are included, compared to 85% of neurons in V1. Altogether 548 these results demonstrate how omitted variable bias could affect estimates of post-spike history 549 filters in vivo. In both datasets we find that when population covariates are included in the GLM 550 spike prediction accuracy increases, post-spike gain decreases, and apparent stability increases.

562
When the goal of modeling is causal inference or understanding of biological mechanisms, the 563 potential for biases due to omitted variables is often clear. The statistical effects of confounders 564 (Wasserman, 2004), as well, as the limits that they place on neuroscientific understanding are 565 widely appreciated (Jonas and Kording, 2017; Krakauer et al., 2017; Yoshihara and Yoshihara, 566 2018). However, when the goal of modeling is to create an abstract, explanation or summary of 567 observed neural activity, the fact that omitted variables can bias these explanations is not always 568 widely acknowledged. Here we have illustrated the potential for omitted variable bias in two types 569 of commonly used GLMs for neural spiking activity: tuning curve models using spike counts across 570 trials and models that capture single-neuron dynamics with a post-spike history filter. In each 571 model, adding a previously omitted variable, as expected, improved spike prediction accuracy. 572 However, what we emphasize here is that, when omitted variables were included, the estimates 573 of the original parameters changed. For three case studies using tuning curves we found that by 574 adding a traditionally omitted variable tuning curves showed less modulation due to the originally 575 included variables. In models of single neuron dynamics, adding omitted variables led to 576 decreased post-spike gain and greater apparent stability. Importantly, omitted variables can arise 577 in GLMs in any situation where an omitted variable affects neural activity and the effect of the 578 omitted variable is not independent of the included variables. 579 The case studies here are not unique, and many studies have described how adding additional 580 variables to a tuning curve or single neuron model can improve prediction accuracy. In M1, in 581 addition to movement speed, joint angles, muscle activity, end-point force, and many other 582 variables also appear to modulate neural responses (Fetz, 1992;Kalaska, 2009). In addition to 583 speed and head direction in the hippocampus, theta-band LFP, sharp-wave ripples, and 584 environmental features, such as borders, appear to modulate neural activity (Hartley et al., 2014). 585 And in V1, there is growing evidence that population activity (Lin et al., 2015) and non-visual 586 information (Ghazanfar and Schroeder, 2006) modulates neural responses. In each of these 587 systems, neural responses are affected by many, many factors. Responding to many task variables 588 may even be functional, allowing downstream neurons to more effectively discriminate inputs 589 (Fusi et al., 2016). In any case, it seems clear that our models do not yet capture the full complexity 590 of neural responses (Carandini et al., 2005). By omitting relevant variables, current models are 591 likely to be not just less accurate but also biased. 592 Parameter bias may be problematic in and of itself. However, omitted variable bias may also have 593 an important effect on generalization performance. As noted in (Box, 1966), in a new context, the 594 effect of the omitted variables and the relationship between the omitted and included variables 595 may be different. Since the parameters of the included variables are biased, this change can 596 reduce generalization accuracy. This phenomena may explain, to some extent, why tuning models 597 fit in one condition often do not generalize to others (Graf et al., 2011;Oby et al., 2013). For models 598 of single-neuron dynamics, omitted variable bias can also have a negative effect on the accuracy 599 of simulations. Previous work has shown that simulating a GLM with post-spike filters estimated 600 from data often results in unstable, diverging simulations. Although several methods for stabilizing 601 these simulations have recently been developed (Gerhard et al., 2017;Hocker and Park, 2017), 602 one, perhaps primary, reason for this instability may be that the post-spike filters are biased due 603 to omitted synaptic input. Since estimated post-spike filters may reflect not just intrinsic neuron 604 properties but also the statistics of the input, interpreting and comparing post-spike filters may be 605 difficult. Different history parameters may be different due to intrinsic biophysics (Tripathy et al.,606 2013) or due to differing input, and resolving this ambiguity will likely involve more accurately 607 accounting for the input itself (Kim and Shinomoto, 2012). 608 The possibility of omitted variable bias does not mean that estimated parameters, predictions, 609 and simulations from simplified model are useless, but it may mean that we need to be cautious 610 in interpreting these models and their outputs. When reporting the results of regression, in 611 addition to avoiding describing associations with causal language, it may be generally useful to 612 discuss known and potential confounds. Previous studies have already identified several specific 613 cases of omitted variable bias where careful interpretation is necessary. For instance, omitted 614 common input can bias estimates of interactions between neurons (Brody, 1999), and omitted 615 history effects can bias receptive field estimates (Pillow and Simoncelli, 2003 There is a well-known aphorism from George EP Box that, "All models are wrong, but some are 646 useful." The lengthier version of this quip is, "All models are approximations. Essentially, all models 647 are wrong, but some are useful. However, the approximate nature of the model must always be 648 borne in mind." GLMs are certainly useful descriptions of neural activity. They are computationally 649 tractable, can disentangle the relative influence of multiple covariates, and often provide the core 650 components for Bayesian decoders. Here we emphasize, however, one ubiquitous circumstance 651 in systems neuroscience where the "approximate nature" of the models should be "borne in mind." 652 Namely, omitted variables can bias estimates of the included effects. 653

Neural Data 655
All data analyzed here was previously recorded and shared by other researchers through the 656 Collaborative Research in Computation Neuroscience (CRCNS) Data Sharing Initiative (crcns.org). 657 Data from primary motor cortex is from CRCNS dataset DREAM-Stevenson_2011 (Walker and  658 Kording, 2013). These data were recorded using a 100-electrode Utah array (Blackrock 659 Microsystems, 400 mm spacing, 1.5 mm length) chronically implanted in the arm area of primary 660 motor cortex of an adult macaque monkey. The monkey made center-out reaches in a 20x20cm 661 workspace while seated in a primate chair, grasping a two-link manipulandum in the horizontal 662 plane (arm roughly in a sagittal plane). Each trial for the center-out task began with a hold period 663 at a center target (0.3-0.5 s). After a go cue, subjects had 1.25 s to reach one of eight peripheral 664 targets and then held this outer target for at least 0.2-0.5 s. Each success was rewarded with juice, 665 and feedback (1-2cm diam) about arm position was displayed onscreen as a circular cursor. Spike 666 sorting was performed by manual cluster cutting using an offline sorter (Plexon, Inc) with 667 waveforms classified as single-or multi-unit based on action potential shape and minimum ISIs 668 greater than 1.6 ms (yielding n=81 single units). Here we take model tuning curves using spike 669 counts between 150ms before to 350ms after the speed reached its half-max. Average movement 670 speed for each trial was calculated from 0-250ms after the speed reached its half-max (290 trials 671 in total). For the details of the surgery, recording, and spike sorting see (Stevenson et al., 2011). spike history effects, the spike trains were binned at 1ms and the recording length was 27min. 679 Here we model all neurons with firing rates >0.5Hz (n=39). For the open field data where we model 680 place tuning, spike trains were binned at 250ms and the recording length was 93min. Place cells 681 (n=68) were selected based on having an overall firing rate <5Hz (to rule out interneurons), a peak 682 firing rate >2Hz, and a contiguous set of pixels (after smoothing with an isotropic Gaussian =8cm) 683 of at least 200 cm 2 where the firing rate was above 10% of the peak rate. For details of the surgery, 684 recording, and spike sorting see (Diba and Buzsaki, 2008;Mizuseki et al., 2009). 685 Data from primary visual cortex is from CRCNS dataset PVC-11 (Kohn and Smith, 2016). Here we 686 use spontaneous activity, during a gray screen, (Monkey 1) and responses to drifting sine-wave 687 gratings (Monkey 1) both from an anesthetized (Sufentanil -4-18 microg/kg/hr) adult monkey 688 (Macaca fascicularis). Recordings were made in parafoveal V1 (RFs within 5 degrees of the fovea) 689 using a 96-channel multi-electrode array (Blackrock Microsystems), 400 mm electrode spacing, 690 1mm depth. After automatic spike sorting and manual cluster adjustment, 87 and 106 units were 691 recorded during spontaneous activity and grating presentation, respectively. Only neurons with 692 waveform SNR>2 and firing rates >1Hz were analyzed, n=62 for spontaneous and n=90 for grating 693 data. For the spontaneous activity we bin spike counts at 1ms and the recording length was 20min. 694 For the drifting grating data, we analyzed spike counts from 200ms to 1.2s after stimulus onset on 695 each trial -12 directions, 2400 trials total. Gratings had a spatial frequency of 1.3 cyc/deg, temporal 696 frequency of 6.25Hz, size of 8-10 deg (to cover receptive fields of all recorded neurons) and were 697 presented for 1.25s with a 1.5s inter-trial interval between stimuli. For surgical, stimulus, and 698 preprocessing details see (Smith and Kohn, 2008;Kelly et al., 2010). 699

Tuning Curve Models 700
For the M1 data we use a circular, cubic B-spline basis with 5 equally spaced knots 701 ( ) = exp( + ∑ ( )) 702 where (⋅) are the splines that depend on the reach direction , weighted by parameters and 703 the parameter defines a baseline firing rate. To include the effect of speed, we then add three 704 covariates 705 ( , ) = exp( + ∑ ( ) + 1 + 2 cos( ) + 3 sin ( )) 706 where indicates the speed, and the parameters allow for a multiplicative speed effect as well 707 as possible cosine-tuned speed x direction interactions as in (Moran and Schwartz, 1999). 708 For place fields in hippocampus we use isotropic Gaussian radial basis functions (⋅) equally 709 spaced (30cm) on an 6x6 square lattice with a standard deviation of 30cm 710 ( ) = exp( + ∑ ( )) 711

Stability analysis 743
Here we make use of a stability analysis proposed in (Gerhard et al., 2017). Briefly, we use a quasi-744 renewal approximation of the conditional intensity by considering the effect of the most recent 745 spike, at time ′, and averaging over possible spike histories preceding this spike 746 0 ( , ′ ) = exp( + ( − ′ )) 〈exp( * )〉 ( < ′ ) 747 where ( − ′) = ( − ′) and represents the history of spiking. By assuming that is 748 generated from a homogeneous Poisson process with firing rate 0 , the second term can be 749 approximated by 750 〈exp( * )〉 ( < ′ ) ≈ exp ( 0 ∫ ( ( ) − 1) Given this approximation, we can then estimate the inter-spike interval distribution as we would 752 for a true renewal process and the steady-state distribution of inter-spike intervals is given by 753 To assess stability, we can then examine how the predicted steady-state firing rate depends on 756 the assumed rate of the homogeneous Poisson process 0 . In particular, when ( 0 ) = 0 the 757 quasi-renewal model has a fixed-point. To allow for external input, we incorporate the average 758 effect of the covariates into the conditional intensity approximation 759 0 ( , ′ ) = exp( + ℎ( − ′ ) + 〈 〉) 〈exp(ℎ * )〉 ( < ′ ) 760 0 ( , ′ ) = exp( + ℎ( − ′ ))〈exp( )〉 〈exp(ℎ * )〉 ( < ′ ) 761 Note that, in general, adding inputs will only change the stability of the model to the extent that 762 these covariates change the estimate of ℎ. 763 764