ABSTRACT
Adaptive information processing, comprised of local computations and their efficient routing, is crucial for flexible brain function. Spatial attention is a quintessential example of this adaptive process. It is critical for recognizing and interacting with behaviorally relevant objects in a cluttered environment1. Object recognition is mediated by ensembles of computational units distributed across the ventral visual hierarchy. How the deployment of spatial attention aids these hierarchical computations is unclear. Based on pairwise correlation analysis, two key mechanisms have been proposed: First is an improvement in the efficacy of unique information directed from one encoding stage to another, suggested by evidence along the visual hierarchy2-6. Based on the theoretical results that even weak correlated variability can substantially limit the encoding capacity of a neuronal pool7, a second proposal is an improvement in the sensory information capacity of an encoding stage through a reduction in shared fluctuations8,9. However, pairwise analyses capture both unique and shared components of fluctuations, and therefore cannot disambiguate the proposed mechanisms. To test these proposals, it is crucial to estimate the attentional modulation of unique information flow across and shared information within the stages of the visual hierarchy. We investigated this in the multi-stage laminar network of visual area V4, an area strongly modulated by attention10-12. Using network-based statistical modeling, we estimated the strength of inter-layer information flow by measuring statistical dependencies that reflect how the cortical layers uniquely drive each other’s neural activity. We quantified their modulation across attention conditions (attend-in vs. attend-away) in a change detection task and found that deployment of attention indeed strengthened unique dependencies between the input and superficial layers. Using the partial information decomposition framework13, we estimated modulation of shared dependencies and found that they are reduced within laminar populations, specifically the putative excitatory subpopulations. Surprisingly, we found a strengthening of unique dependencies within the laminar populations, a finding not previously predicted. Crucially, these modulation patterns were also observed across behavioral outcomes (hit vs. miss) that are thought to be mediated by endogenous state fluctuations14-16. By “decomposing” the modulation of dependency components and in combination with prior theoretical work7, our results suggest the following computational model of optimal sensory states that are attained by either task demands or endogenous fluctuations in brain state: enhanced information flow between and improved information capacity within encoding stages.
MAIN TEXT
Visual cortex has a laminar organization and both sensory computations and information flow patterns are layer specific, forming the building blocks of the ventral visual hierarchy17-21. We hypothesized that the deployment of spatial attention strengthens unique inter-layer information transfer between and weakens shared information within the input and superficial layers, both of which are crucial nodes of information flow along the ventral visual hierarchy. To test this, our primary goal was to quantify unique statistical spiking dependencies between the populations of each layer, which requires characterizing the joint spiking activity of the laminar ensemble. Cortical ensembles are highly interconnected, and this can be a source of both shared and unique dependencies between any pair of ensemble members (Fig 1a, S1a-b). Pairwise measures (such as correlations) reflect both unique and shared sources of dependences (Fig 1b,c), whereas network-based methods primarily extract non-shared sources of dependencies (Fig 1d). Using the Partial Information Decomposition13 framework, we proceeded to use a combination of network-based probabilistic graphical models (PGM)22 and pairwise regression models to analyze the neurophysiological data and “decompose” the modulation of these dependencies by spatial attention (Fig 1e, see SI Methods). Fitting our data with Dynamic Bayesian Networks (DBN)23-28, a class of PGMs, allowed unbiased discovery of multi-timescale directed and cyclical dependencies that are Granger causal29, without making assumptions about their specific nature (linear/nonlinear), direction, or latency (Fig 2a, Fig S2). Directed dependencies are expected to reflect the unique information flow in our data. We estimated the probability of discovered dependencies (referred henceforth as weights of a graph edge) and used it as a measure of their strength30 (Fig 2a,b). We used a time-shuffled estimate of the edge weights to determine the statistical significance of the discovered dependencies (Fig 2a). We refer to this approach as multi-timescale weighted Dynamic Bayesian Networks (MTwDBN). When tested on a ground truth synthetic network with recurrent connections, MTwDBN approach robustly recovered the dependency structure (Fig 2c-e), and performed significantly better than existing methods, especially when the populations were sparsely sampled (Fig 2f), as is the case in neural recordings. Since the dependencies are discovered without assumption of independence between predictors, edge weights in MTwDBN can be interpreted as an accurate depiction of the structure and strength of unique dependencies in a highly interconnected network (Fig 2g).
We applied the MTwDBN analysis to laminar recordings in visual area V4 (an area strongly modulated by attention8,11,31-33) of non-human primates performing an attention demanding orientation change detection task (Fig 3a,b; see Methods and Fig S3). We first analyzed the ensemble of layer-wise populations (Fig 3c) and quantified the net attentional modulation of dependencies across different timescales (Fig 3d). At longer timescales (> 60 ms lag)33, while attention weakened pairwise dependencies, in agreement with previous findings34,35, we found a strengthening of unique dependencies. This allowed us to infer (Fig 1e) that it is the shared component of dependencies that is weakened by attention (Fig 3d), thereby providing direct evidence for a previously hypothesized mechanism of perceptual improvement by attention34,35. Directed dependencies between layers, specifically those between the input and superficial layers, an important link in feedforward processing, were strengthened by attention (Fig 3e). On the other hand, shared dependencies within the input and superficial layers were weakened by attention (Fig 3f). Surprisingly, unique dependencies within the layers were strengthened by attention at most timescales (Fig 3g). Taken together, these results provide the first direct evidence that attention improves unique dependencies both within and across stages of the ventral visual hierarchy, while weakening shared dependencies within these stages (Fig 3h).
Neuronal cell-classes in the cortex contribute differentially to information processing31,32,36. To test if the above model holds when we allow discovery of cell-class specific dependencies, we next analyzed an ensemble of broad- and narrow-spiking layer-wise populations (Fig 3i) and quantified the net attentional modulation of dependencies across different timescales (Fig 3j). The pattern of net modulation of unique and shared dependencies in this ensemble largely mirrored that discovered in the layer-wise aggregated ensemble, specifically at the longer timescales. Same was the case for modulation of unique dependencies between layers and shared dependencies within layers (consistent across both animals) (Fig 3k,l). Theoretical predictions7, including our hypotheses, regarding effects of shared fluctuations on information representation apply primarily to excitatory neurons. When we quantified the attention modulation of shared dependencies within layers in a cell-class specific manner, we found that the broad-spiking population (putative excitatory neurons) showed a robust weakening of shared dependencies within layers (consistent across animals) (Fig 3l). On the other hand, the narrow-spiking population (putative inhibitory neurons) showed a distinct pattern, one that was dominated by strengthening of shared dependencies within layers. Consistent with the findings in the aggregated ensemble, unique dependencies within the layers were strengthened by attention at most timescales (Fig 3m). Taken together, these results provide the first direct evidence that attention specifically weakens shared dependencies in the projection populations within encoding stages, in addition to robustly improving unique dependencies both within and across stages of the ventral visual hierarchy (Fig 3n).
To test if the pattern of inter- and intra-layer dependency modulation that we observed is a signature of brain states that are optimal for perceptual behavior, we analyzed the laminar ensemble activity for a subset of trials within the attend-in condition in which the animal was equally likely to correctly detect (Hit) or fail to detect (Miss) the orientation change (Fig 4a,b). Controlling for task and stimulus conditions, these behavioral fluctuations are thought to arise from endogenous fluctuations such as attentive sampling and arousal changes14-16,37,38. The pattern of net modulation of unique and shared dependencies across behavioral outcome (Fig 4c) largely mirrored that discovered across attentive states (Fig 3d) at the longer timescales. Same was the case for modulation of unique dependencies between layers and dependencies within layers (Fig 4d), suggesting that the pattern of enhanced inter-layer unique dependencies and weakened intra-layer shared dependencies is a hallmark of brain states that are optimal for perceptual performance (Fig 4e). Combined with prior theoretical work7, our results suggest a model of enhanced inter-layer communication and improved intra-layer information capacity during optimal states that are either imposed by task demands or are attained via endogenous fluctuations in brain state.
Our approach demonstrates that the information flow structure in a neural ensemble is robustly described by Dynamic Bayesian Network modeling that is adapted to include multiple lags spanning timescales of interest. This unbiased approach allows us to dive deeper into the causal functional interactions (beyond correlations) in a multivariate system with unknown nonlinear dependencies. Further, the results demonstrate that weighing dependencies using confidence measures provides a more accurate information flow structure that can be utilized to investigate how this structure is modulated by changes in brain and behavioral states. Finally, a novel combination of this approach with pairwise models allows decomposition of changes in unique and shared components of dependencies. Since directed unique dependencies suggest causality and shared ones suggest common inputs, this allows us to gain insights into underlying mechanisms. In contrast, prior correlational studies have quantified net dependency modulation, and thus the underlying mechanisms remained hypothetical 2,9,34.
While approaches such as GLM have been very useful in providing improved phenomenological models of individual neural responses to sensory stimulation in early sensory circuits39, they are not suitable for dependency structure learning in a highly recurrent cortical ensemble. Their inherent assumption of the independence of predictor variables can discover spurious dependencies due to shared inputs, and result in a dense and likely inaccurate structure. Since structure learning in DBNs involves determining conditional independence by solving an optimization problem that penalizes density, our approach is ideal for generating a sparse and hence interpretable unique dependency structure in multivariate data.
This class of DBN models does not identify functional consequences of dependencies between populations, such as enhancement or suppression of target activity. Future work in this direction will further enhance the functional interpretability of discovered dependencies. Nonetheless, these models are effective in elucidating structure and modulation of information flow in a multivariate system such as the laminar cortical network.
In general, discovery of dependency structure using our approach is sensitive to the inclusion of relevant neural variables. It is indeed possible that a gross classification of subpopulations as broad and narrow in our study fuses distinct but relevant neural variables. On the other hand, our three and six population laminar DBNs yield consistent patterns regarding modulation of dependencies, suggesting an inclusion of relevant variables for the purposes of this study. It is important to note that while statistical dependencies discovered using our approach imply directed functional connectivity, these do not necessarily imply direct anatomical connectivity: an edge in the structure could reflect indirect anatomical connectivity.
MTwDBN based models provide a quantitative description of information flow pattern in an ensemble. In this study, they provide the first structural description of the attention modulation of dependencies in cortical space and time in a compartmentalized network, such as the laminar cortical network. They allow us to quantify the unique contributions of activity history and network interactions to information processing in such a neural ensemble. We expect this framework to extend to neuronal ensembles in other parts of the nervous system, and to play an important role in revealing flexible information processing principles in the brain.
METHODS SUMMARY
Synthetic Data
Dataset for Figure 1 was constructed by simulating a probabilistic spiking network with causal and shared connectivity among 8 neural variables. Dataset for evaluating the MTwDBN pipeline (Fig 2) was generated by simulating a network of stochastically spiking excitatory and inhibitory neurons.
Electrophysiological Data
16-channel laminar probes were used to record data from visual area V4 of 2 macaque monkeys as they performed an attention demanding task over 30 sessions total 31. The probe recorded single units with overlapping receptive fields across the depth of the cortex. Single units were classified using peak-to-trough duration.
Behavior
Monkeys were trained to perform an attention-demanding change detection task 31.
Fitting
DBN graph structure was fitted to 200 ms of visual stimulus flashes. Data was prepared using 15 ms bins to fit a DBN with either 21 (3 laminar populations, 7 lags) or 42 nodes (3 lamina x 2 waveform types, 7 lags). Bootstrap methods were used to generate edge weights or confidence measures of edges discovered from the data.
FIGURE CAPTIONS
Figure 1 Pairwise- and network-model parameters as a function of unique vs. shared dependencies. a A synthetic ensemble of eight neural variables with two kinds of dependencies – unique or shared – between seven source variables (black) and one target variable (cyan). All interactions are excitatory. Strength of dependencies is determined by model parameters Punique and Pshared (see SI Methods). b Normalized total mutual information, measured by uncertainty coefficient, as a function of the sum of model parameters (Punique, Pshared) that varied unique and shared components of mutual information in a monotonic way (Fig S1c-d). c Coefficients of pairwise model (univariate logistic regression) as a function of Puniqueand Pshared. White arrow provides a visual guide for direction of highest change in coefficients. d Coefficients of network model (multivariate LASSO regression) as a function of Punique and Pshared. White arrow provides a visual guide for direction of highest change in coefficients. e Schema for utilizing pairwise and network methods for the estimation of total (black) and unique (green) information modulation respectively, and to infer the modulation of shared information (purple). Shaded blocks indicate indeterminate modulation direction of shared information in the network.
Figure 2 DBN based estimation of multi-timescale weighted unique dependency. a Analysis flow for multi-timescale weighted DBN (MTwDBN) model fitting. b Edge weight of MTwDBN as a function of connection weight in a 2-population simulated network using the pipeline in a. c Spiking activity of 6 subpopulations in a simulated laminar network. Connectivity is visualized in the overlaid schematic. d Directed dependencies (edge in the graph) in the simulated network in c, estimated using MTwDBN fitting. e Summary graph of dependencies across all timescales from d. f F score (harmonic mean of precession and recall of dependency structure) as a function of % of population observed. F score was estimated for shuffle corrected weighted DAGs (MTwDBN), weighted DAGs with fixed threshold (FT), or unweighted DAGs. g DBN decoder accuracy with different sizes of MTwDBN DAGs. Graph edges for the decoder were sampled from the learned structure either in an unbiased fashion or biased with the edge weights.
Figure 3 Modulation of dependencies in a V4 laminar network across attention conditions. a Experimental protocol: Paired Gabor stimuli with different contrasts (see Methods); one stimulus was presented inside the receptive fields (RFs) of the recorded neurons and the other at an equally eccentric location across the vertical meridian. Attention was cued either to the neurons’ RFs (“IN”) or to the location in the contralateral visual hemifield (“AWAY”). Orientation of one of the two stimuli changed at a random time. Monkey was rewarded for detecting the change by making a saccade to the corresponding location. Task difficulty was controlled by the magnitude of orientation change. b Recording approach: laminar recordings in visual area V4. c Neural populations used for MTwDBN analysis. Current source density analysis identified different layers (superficial, input, deep), and isolated single units were assigned to one of these layers (see SI Methods). d Top: MTwDBN-based modulation (green) of all unique dependencies between the laminar populations. Modulation of the same dependencies as estimated by logistic regression (black). Bottom: PID framework-based estimated modulation sign of shared dependencies (see schema in Fig 1e). Thicker line along the time axis indicates the timescales of attentional modulation in prior studies34,35. e Sign of average modulation of unique dependencies between layers. f Sign of average modulation of shared dependencies within layers. g Sign of average modulation of unique dependencies within layers. h Summary of dependency modulation pattern. i Neural populations used for laminar MTwDBN analysis. Isolated single units were classified as broad and narrow based on peak-to-trough duration in their average spike shape (see SI Methods). j-n Same as d-h, but for populations shown in i. I: input layer; S: superficial layer; M1, M2: subject-wise dependencies; broad, narrow: cell-class specific dependencies. See Fig S4 for modulation indices in e-g, k-n.
Figure 4 Modulation of dependencies in a V4 laminar network across behavioral outcomes at perceptual threshold. a Example session showing performance as a function of task difficulty. (orange box: threshold orientation change at which the animal was equally likely to correctly detect (Hit) or fail to detect (Miss) the change). b Laminar populations used for multi-lag analysis. c Modulation magnitude (top) and sign (bottom) of all unique (green) and total (black) laminar dependencies in b across Hits and Misses at perceptual threshold. Estimated modulation sign of shared dependencies (bottom, see Fig 1e). d Modulation sign of between (BL) and within (WL) layer dependencies. See Fig S5 for modulation indices. e Summary of laminar dependency modulation pattern.
Figure S1 Information decomposition in a multi-variate system
a Partial information decomposition (PID) framework based definition of types of information that multiple sources can have about a target.
b Application of pairwise and network-based statistical models for approximate information decomposition in a multivariate system. Panel on the right illustrates interpretation of the modulation of these dependencies using the PID framework.
c, d Reduction in the proportion of total entropy (or information fraction) as a function of parameters (P_unique, P_shared) that control unique and shared information in the model. For the sake of computational efficiency, information fraction estimation as a function of Pshared (d) was performed using a subset of variables in the simulated network.
Figure S2 Additional MTwDBN performance metric for synthetic data
Recall, precision and F score measures for weighted DAGs in Fig 2d as a function of number of lags.
Figure S3 Laminar recordings in area V4
a Stacked contour plot showing spatial receptive fields (RFs) along the laminar probe from an example session. The RFs are well aligned, indicating perpendicular penetration down a cortical column. Zero depth represents the center of the input layer as estimated with current source density (CSD) analysis. b CSD displayed as a colored map. The x-axis represents time from stimulus onset; the y-axis represents cortical depth. The CSD map has been spatially smoothed for visualization. c An example trial showing single-unit activity across the cortical depth in the attend-in condition. The time axis is referenced to the appearance of the fixation spot. Spikes (vertical ticks) in each recording channel come from the single unit with the highest spike rate in this trial. The gray boxes depict stimulus presentation epochs. In this particular trial, 8 sample stimuli with different contrasts were presented, followed by a target stimulus flash with an orientation change that the monkey responded to correctly. Spike waveforms for two example narrow (cyan) and broad (orange) single units are shown.
Figure S4 Modulation of dependencies in a V4 laminar network across attention conditions. MTwDBN-based modulation (green) of all unique dependencies between the laminar populations. Modulation of the same dependencies as estimated by logistic regression (black). Open circles indicate no significant edges for a given time lag. Schematic at the top of every column depicts the populations used for identifying laminar dependency structure within (left) and between (right) cortical layers. broad, narrow: cell-class specific populations identified by spike shape (see SI Methods).
Figure S5 Modulation of dependencies in a V4 laminar network across behavioral outcomes at perceptual threshold. MTwDBN-based modulation (green) of all unique dependencies between the laminar populations. Modulation of the same dependencies as estimated by logistic regression (black). Schematic at the top of every column depicts the populations used for identifying laminar dependency structure within (left) and between (right) cortical layers.
AUTHOR CONTRIBUTIONS
AD and MPJ conceptualized the project. ASN collected the electrophysiological data. AD and MPJ generated synthetic data. AGS and AD analyzed the data. MPJ supervised the project. AGS, AD, ASN and MPJ wrote the manuscript.
Extended Methods
Partial Information Decomposition
Information theory does not provide a complete description of the informational relationships between variables in a system composed of three or more variables. The information I(T; S1, S2) that two ‘source’ variables S1 and S2 hold about a third ‘target’ variable T decomposes into four parts: (i)U(T; S1|S2), the unique information that only S1 (out of S1 and S2) holds about T; (ii) U(T; S2|S1), the unique information that only S2 holds about T; (iii) R(T; S1, S2), the redundant information that both S1 and S2 hold about T; and (iv) S(T; S1, S2), the synergistic information about T that only arises from knowing both S1 and S2 (see Fig S1a). The set of quantities {U(T; S1 |S2), U(T; S2|S1), R(T; S1, S2), S(T; S1, S2)} is called a ‘partial information decomposition’ related as follows:
Thus, the Partial Information Decomposition (PID)1 framework characterizes the mutual information between variables by decomposing it into unique, shared, and synergistic components.
In a multi-neuronal network, unique (and synergistic) components of the mutual information between neural variables due to causal interactions (unique and synergistic) are captured by directed statistical dependencies. On the other hand, shared components of mutual information between neural variables correspond to shared neuronal inputs, and the sign of their modulation can be estimated from the modulation of unique and total mutual information (Fig S1b).
Information Decomposition with Pairwise and Network Models
a. Generation of synthetic data
A synthetic network was generated in which unique and shared dependencies between variables were controlled. Eight variables were randomly initialized to values of 0 or 1. One variable was assigned as the target, the other seven as sources. One source was designated as the unique source. Samples were generated in the following manner: 1) the value of all variables was set to 1 with probability Pshared and to 0 with probability 1 − Pshared and 2) the values of the target and unique source were set to 1 with probability Punique and to 0 with probability 1 − Punique. This created a shared dependency between all eight variables and a single unique dependency between the target and the unique source. Two thousand (2000) samples each were generated with Pshared, Punique ∈ {0.1, 0.2, 0.3, 0.4}. The total normalized mutual information between the unique source and the target variable was calculated by the uncertainty coefficient:
, where H is entropy. Total entropy of target and unique source variables, H(target, unique source), was additionally calculated for Punique ∈ {0.1, 0.2, 0.3, 0.4} and Pshared ∈ {0.1, 0.4}. Values were normalized to the entropy for (Punique, Pshared) = (0.1, 0.1) to represent the change in total entropy (information fraction). Similarly, total entropy was calculated for 100 random subsets of three source variables plus the target variable for Punique ∈ {0.1, 0.4} and Pshared ∈ {0.1, 0.2, 0.3, 0.4} and normalized to entropy for (Punique, Pshared) = {0.1, 0.1}.
b. Pairwise models
Univariate logistic regression was performed between the unique source and target variable using the statsmodel python package1 for all values of Pshared and Punique.
Multi-timelag Weighted Dynamic Bayesian Network (MTwDBN) Analysis Pipeline
a. Fitting Dynamic Bayesian Network models
A Dynamic Bayesian Network (DBN) framework was used to learn dependencies between neural populations5. The pgmpy python package6 (https://github.com/pgmpy/pgmpy) and custom-written python code was used to fit all DBN models. Each binned-and-sliced data table was first bootstrapped B times. For each session/condition, a hill climb tabu-search with a history window of 7 was performed 120 times, each from a unique random starting graph, to find a suitable fitting directed acyclic graph (DAG)7. The Akaike information criterion (AIC) was used8,9 as the scoring metric in the tabu-search. The variables associated with the latest time slice (0 ms) were termed the effect variables; all others were termed potential cause variables. The search was restricted to DAGs where edges can only be incident on an effect variable. Edges between effect variables were allowed as they may capture dependencies at timescales shorter than our chosen bin widths, however such edges have no causal interpretation and were excluded from further analysis. The resulting DAGs were termed unweighted, as their edges (dependencies) are described in a present/absent manner. Of the 120 starting points, only the DAG with highest AIC score was used for further analysis. This resulted in B unweighted DAGs per session/condition.
b. Estimation of weighted DAG
The B DAGs from each session/condition were used to estimate weighted DAGs10. A pool of 100 weighted DAGs were estimated by taking 100 bootstrap samples from the B unweighted DAGs and averaging: the weight ∈ (0,1) for each edge corresponds to the proportion of the B unweighted DAGs where the edge was present. This resulted in 100 weighted DAGs for each session/condition.
c. Testing significance of DAG edges
To test for significance of the discovered edges, the 100 weighted DAGs were compared to 100 control DAGs that were generated in the same manner as above but with time-shuffled data. The binned-and-sliced data tables were shuffled as follows: for each row, permute the data from each of the 7 time slices which arose from the same population. Finally, permute each column (corresponding to each population/time slice combination). To test for significance, a distribution of weights was generated by combining weights for a given edge across all sessions (conditions treated separately). The distribution of edge weights from unshuffled data were compared to the distribution of edge weights from the shuffled data using a one-sided Mann Whitney U Test. If the distribution from unshuffled data had significantly higher values than from shuffled data (p<0.05), the edge was marked as significant for that condition and said to have survived time shuffling.
MTwDBN Validation
a. Synthetic Neural Network Models
Synthetic neural network models were constructed using stochastic spiking neurons11,12. Individual neurons in the model were treated as coupled, continuous-time, two-state (active and quiescent) Markov processes. The active state represents a neuron firing an action potential and its accompanying refractory period, whereas the quiescent states represent a neuron at rest. The transition probability for the i-th neuron to decay from active to quiescent state in time dt was Pi(active → quiescent) = αi ∂(dt), where αi represented the decay rate of the active state of the neuron. Parameter αi sets the upper bound on firing rate of the stochastically spiking neuron, akin to a refractory period. The transition probability for the i-th neuron to change from quiescent to active state (i.e., spike) was Pi(quiescent → active) = βiG(Si)∂(dt) 11. This caused the firing probability to be a function of the input, with βi as its peak value. Parameter Si was the total synaptic input to neuron i, given as Si(t) = Ni(t) + Ii(t), where Ni was the net input from other neurons in the local network and Ii was the net external input to the neuron. The network input was Ni(t) = ∑j wij Aj(t), where wij are the weights of the synapses. The activity variable Aj(t) was set to one if the jth neuron was active at time t and zero otherwise. The model neurons had no intrinsic capacity to oscillate because the inter-spike interval was the sum of two independent exponential random variables with parameters αi and βiG(Si), respectively.
The model parameters were chosen as follows: Excitatory (E) and inhibitory (I) neurons in the network were differentiated based on two model parameters: αE = 0.0075 ms; αI = 0.4; ms and βE = 1; βI = 2.
1. Two-Population Model
The model consisted of two excitatory neurons (each representing a distinct population). There was a single synaptic connection from the first neuron to the second neuron. Ten such two- population models were constructed with synaptic weights ∈ (0.5, 1.0, …, 9.0, 9.5). 1000 trials of spiking data each 100 ms long were simulated for each model.
2. Six-population Laminar Model
Laminar models consisted of 45 neurons in total, 15 in each cortical layer (superficial, input, deep). Each layer contained 10 excitatory and 5 inhibitory neurons, giving a total of 6 populations (3 layers x 2 neuron types). The network topology for synaptic connectivity is depicted in Fig 2c. The weights wij for synaptic connections are 1 for the inter-laminar connections, 1.5 for intra-laminar connections where the presynaptic unit is an E unit, and -2 for intra-laminar connections where the presynaptic unit is an I unit. 1000 trials of spiking data, each 2000 ms long, were simulated using this model.
b. Preprocessing for MTwDBN analysis
Data from single units was grouped by population in each model simulation. The “multi-unit” spiking activity of these populations was used for the analysis. Before aggregating activities of single units into populations, d % (d being a pre-selected number, see Effect of sub-sampling in synthetic laminar model) of single units in each population were dropped (not used for analysis) from the laminar model data. The data from each trial were discretized into 1.2ms bins of either 0, 1, or 2 to denote no spikes, one spike, or multiple spikes in a time bin. The data were lagged 2 times to give 3 time slices (2.4, 1.2, 0 ms). The data from all trials were concatenated together to generate a single data table with 6 or 18 columns (2 or 6 populations x 3 time slices). The binned and discretized spiking activity of a single population in a single time slice was viewed as a variable in either the pairwise regression or DBN framework. Data with structure as produced by this preprocessing step are termed binned-and-sliced data tables.
c. Analysis: Relation between synaptic weights and DAG edge weights in two-population models
Binned-and-sliced data tables from each two-population model were passed separately through the MTwDBN pipeline (B = 200) to generate 100 weighted DAGs each. Weights were obtained by averaging across the 200 unweighted DAGs. The 100 weighted DAGs were used to compute 95% confidence intervals for these weights.
d. Analysis: Effect of sub-sampling on recovering ground-truth in synthetic laminar model
To assess the effect of sub-sampling neural data on MTwDBN outputs, d % (d ∈ {0, 20, 40, 60, 80}) of neurons in each of 6 populations were dropped from the laminar model before preprocessing (Preprocessing Synthetic Neural Network Data) resulting in five binned-and-sliced data tables. Each data table was passed through the MTwDBN pipeline (B = 50), generating 50 unweighted DAGs and 100 weighted DAGs. Edges were considered significant either by surviving time shuffling (MTwDBN) or by passing a fixed threshold of 0.5 (weighted + FT). To measure how well they conformed to ground truth connectivity, an F-score was calculated for models (MTwDBN, weighted + FT, unweighted) fitted to a given subsampled dataset. We use a framework5 where edges are considered regardless of the time lag in which they appear. The F-Score is defined as: , with , and C = # connections inferred by DBN which are in the ground truth, M = # connections in ground truth not inferred by DBN, I =# connections inferred by DBN which are not in the ground truth. R and P refer to Recall and Precision.
e. Analysis: Effect of number of time lags on recovering ground-truth in synthetic laminar model
To assess the impact of including different number of time lags on recovering ground-truth dependencies, the synthetic laminar data was preprocessed (d = 0) with # of lags ∈ {3, 4, 5, 6, 7}. Each binned-and-sliced data table was run independently through the MTwDBN pipeline (B = 50) and F-scores, Recalls, and Precisions calculated.
f. Analysis: Validating Predictive Power of Edge Weights in Synthetic Laminar Model
To validate whether the edge weights contain additional information about the population dependencies above unweighted edge weights, the binned-and-sliced data table from the synthetic laminar network (d = 0, no neurons dropped) was divided into two data tables row-wise, datatrain (95% of data) and datatest (5% of data). datatrain was bootstrapped 10 times and separately passed through the MTwDBN pipeline (B = 50) to generate 10 × 100 weighted DAGs. The set of edges in the weighted graph that survived time-shuffling are denoted Eset. Subsampled unweighted DAGs containing n edges (6 or 13) were obtained by sampling without replacement from Eset by either sampling uniformly (UW, unweighted) or sampling using the DAG edge weights as sampling weights (W, weighted). Models UW and W were used for prediction by fitting parameters to 12,000 samples from datatest by maximum likelihood estimation13. From the remaining samples in datatest not used for parameter fitting, 4,000 samples were used for validation. For each sample in the validation set, one of the effect variables was chosen at random and both UW and W were used to predict the value of this effect variable from the cause variables. The prediction accuracies of the two models were compared to the true effect variable value using the M score14. This was repeated for 100 variations of UW and W (created by varying the random seed used to sample edges), and the resulting 100 M scores were compared using a two-tailed paired t-test for both n=6 and n=13. The W model is considered to outperform UW in prediction if MW > MUW and p < 0.05.
Attention Data
a. Behavioral task
Well-isolated single units were recorded from area V4 of two rhesus macaques during an attention-demanding orientation change detection task. The task design and the experimental procedures are described in detail in a previous study15,16. While the monkey maintained fixation, two oriented Gabor stimuli were flashed on for 200 ms and off for variable intervals (randomly chosen between 200 and 400 ms). The contrast of the stimulus was randomly chosen from a uniform distribution of 6 contrasts (c = [10%, 18%, 26%, 34%, 42%, and 50%]). One of the stimuli was located at the receptive field overlap region (Attend In) and the other at an equally eccentric location across the vertical meridian (Attend Away). At the beginning of a block of trials, we presented instruction trials where the monkey was spatially cued to the covertly attend to one of two stimulus locations. One of the two stimuli changed in orientation at an unpredictable time (minimum 1s, maximum 5s, mean 3s). The monkey was rewarded for making a saccade to the location of orientation change. 95% of the orientation changes occur at the cued location, and 5% occur at the uncued location (foil trials). We observed impaired performance and slower reaction times for the foil trials, suggesting that the monkey was indeed using the spatial cue to perform the task. The difficulty of the task was controlled by changing the degree of orientation change (randomly chosen from the following: 1°, 2°, 3°, 4°, 6°, 8°, 10°, and 12°). If no change occurred before 5 s, the monkey was rewarded for holding fixation (catch trial, 13% of trials).
b. Electrophysiological recording
While the monkey was performing the attention task, we used artificial dura chambers to facilitate the insertion of 16-channel linear array electrodes (“laminar probes”, Plexon, Plexon V-probe) into cortical sites near the center of the prelunate gyrus. Neuronal signals were recorded, filtered, and stored using the Multichannel Acquisition Processor system (Plexon). Neuronal signals were classified as either isolated single units or multiunit clusters by the Plexon Offline Sorter program. For the data collected from linear array electrodes, we used current source density analysis to identify the superficial (Layers 1-3), input (Layer 4), and deep (Layers 5 and 6) layers of the cortex based on the second derivative of the flash-triggered LFPs15,17. Cell bodies of single units with bi-phasic action potential waveforms were assigned to the same layer in which the electrode channel was situated during recordings. Units that had tri-phasic waveforms or other shapes were excluded from analyses. Units with peak-to-trough duration greater than 225μs were classified as broad-spiking putative excitatory neurons; units with peak-to-trough duration less than 225μs were classified as narrow-spiking putative inhibitory neurons. Extracellular data were collected over 32 sessions (23 sessions in monkey A, 9 in monkey C) yielding 337 single units in total. Unit yield per session was considerably higher in monkey C than monkey A, resulting in a roughly equal contribution of both monkeys toward the population data.
c. Data Selection
All analyses in this study were performed on spiking data during an interval of 60-260 m after stimulus onset excluding orientation changes. Only single units whose spike waveforms were successfully classified as broad or narrow and for whom the layer identity could be successfully discerned were used in the analysis. There were 29 sessions which had one or more such units recorded. For layer-wise analyses, only sessions with at least one unit from each layer was included (18 sessions). For broad- and narrow-spiking layer-wise (layer-class) analyses, only sessions with at least one unit from two or more populations were included (27 sessions). Pairwise and network-based dependency analysis were performed on each session separately.
d. Analysis: Attention Conditions
Data from Attend In and Attend Away trials were analyzed independently. For each attention condition, only data from trials where the animal successfully detected the orientation change or from catch trials where the animal maintained fixation were used.
e. Analysis: Behavioral Performance
We fit the behavioral data with a logistic function and defined the threshold condition as the orientation change that was closest to the 50% threshold of the fitted psychometric function for that session. This subset of trials from within the attend-in condition in which the animal was equally likely to correctly detect (Hit) or fail to detect (Miss) the orientation change was used for our analysis. Data from Hit and Miss trials were analyzed independently.
f. Preprocessing for MTwDBN analysis
Single units were grouped according to neocortical layer (superficial/input/deep) for layer-wise (3 populations) analyses and additionally by spike waveform (narrow/broad) for layer+class (3×2 populations) analyses. The “multi-unit” spiking activity of these populations was used for the analysis. The data from each stimulus presentation (60 - 260 ms after stimulus onset) were discretized into 15 ms bins and 6 lags to give 7 time slices (-90, -75, -60, -45, -30, -15, 0 ms). The spiking activity of each population in each bin was discretized to 1 or 0 to denote if there were spikes or not. The data from all stimulus presentations in a session/condition combination were concatenated together. Data tables had 21 columns for layer-wise analyses (3 layers x 7 time slices) and 42 columns for layer-class analyses (6 layer-class populations x 7 time slices). The binned and discretized spiking activity of a single population in a single time slice was viewed as a variable in either the pairwise regression or DBN framework. Data from each session/condition were preprocessed separately. To keep analyses consistent across conditions being compared (Attend-In vs. Away OR Hit vs. Miss), the size of the bootstraps was equal to the maximum number of rows of the two conditions.
g. Estimation of Condition Modulated Edges
Binned and sliced data tables from each session/condition were independently passed through the MTwDBN pipeline (B = 200) to generate 100 weighted DAGs each. Estimation of condition modulated edges was only performed for those that survived time shuffled in at least one of the conditions to be compared (Attend In or Away, Hit or Miss). For each session where both cause and effect populations were recorded, 5000 pairs of weighted DAGs were bootstrapped from the two conditions to be compared. For each pair, attention modulation indices (AMI) and hit modulation indices (HMI) were calculated as:
This resulted in 5000 modulation values for each edge that survived time-shuffling and each recording session where both cause and effect populations were recorded.
h. Pairwise Dependency Analysis
Total dependency weights between neocortical populations were estimated using univariate logistic regression. This analysis was only performed on edges that survived time-shuffling in the MTwDBN analysis. For each session where both cause and effect populations were recorded, 5000 samples were bootstrapped from the binned-and-sliced data tables (see Data Preprocessing) for each condition separately. To keep analyses consistent across Attend-In/Away and Hit/Miss, the size of the bootstraps was equal to the maximum number of rows of the two conditions to be compared. Logistic regression was performed separately from each cause variable to an effect variable using the statsmodel python package, BGFS solver1. The absolute value of the β1 coefficient was treated as the total dependency weight to mirror the unsigned dependencies discovered using DBNs. This resulted in 5000 pairs of β1 across the conditions to be compared. For each pair, attention modulation indices (AMI) and hit modulation indices (HMI) were calculated as:
This resulted in 5000 modulation values for each edge that survived time-shuffling in the MTwDBN analysis and each recording session where both cause and effect populations were recorded.
i. Calculating Confidence Intervals of Modulation Indices
Modulation indices (either attention or hit) were grouped according to time lag and additionally whether they were between layers (all layers or input ⇔ superficial only) or within layers (all layers, input only, superficial only). For layer-class analyses, they could be additionally grouped according to broad or narrow waveforms. Due to the large sample sizes generated from bootstrapping, hypothesis testing discovers all modulations to be significantly different from zero (p = 0). Therefore, an estimation statistics approach was used to estimate confidence intervals of modulation indices. The mean modulation index and bias-corrected and accelerated bootstrap 95% confidence intervals were calculated in python (scipy.stats.bootstrap, modified to allow for setting size of bootstrap). To estimate confidence intervals more conservatively given the large sample sizes, the number of resamples and the size of each bootstrap were each set to 5000.
ACKNOWLEDGEMENTS
This research was supported by NIH R00 EY025026, NIH R21 MH126072 and SFARI 875855 to MPJ, NARSAD Young Investigator Grant, Ziegler Foundation Grant, Yale Orthwein Scholar Funds, NIH R21 MH126072 and SFARI 875855 to ASN, NSF GRFP fellowship to AGS and by NEI core grant for vision research P30 EY026878 to Yale University. We thank Steve Chang and Weikang Shi for helpful comments on the manuscript.
Footnotes
↵† senior author