Practical Bayesian Inference in Neuroscience: Or How I Learned To Stop Worrying and Embrace the Distribution

Typical statistical practices in the biological sciences have been increasingly called into question due to difficulties in replication of an increasing number of studies, many of which are confounded by the relative difficulty of null significance hypothesis testing designs and interpretation of p-values. Bayesian inference, representing a fundamentally different approach to hypothesis testing, is receiving renewed interest as a potential alternative or complement to traditional null significance hypothesis testing due to its ease of interpretation and explicit declarations of prior assumptions. Bayesian models are more mathematically complex than equivalent frequentist approaches, which have historically limited applications to simplified analysis cases. However, the advent of probability distribution sampling tools with exponential increases in computational power now allows for quick and robust inference under any distribution of data. Here we present a practical tutorial on the use of Bayesian inference in the context of neuroscientific studies. We first start with an intuitive discussion of Bayes’ rule and inference followed by the formulation of Bayesian-based regression and ANOVA models using data from a variety of neuroscientific studies. We show how Bayesian inference leads to easily interpretable analysis of data while providing an open-source toolbox to facilitate the use of Bayesian tools.


Materials and Methods
Bayesian inference was performed on a range of data typical to neuroscience experiments.Regression models, ANOVA models, and group comparisons are performed on single-unit activity recorded from inferior colliculus (IC) neurons in response to auditory stimuli in young and aged rats (Palombi et al., 2001;Simon et al., 2004;C.F et al., 2012;Herrmann et al., 2017).Random-effects regression models are performed on single units recorded in the auditory cortex (A1) using high-density recording arrays in response to infrared neural stimulation (Izzo et al., 2007;Cayce et al., 2011Cayce et al., , 2014;;Coventry et al., 2023) of the medial geniculate body (MGB).To underscore that meaningful Bayesian inference does not require cluster computing or extensive computational resources, all computations were performed on an MSI GS-66 laptop with an Intel i7 processor with an Nvidia RTX2070 GPU.Our inference programs are CPU-bound, not requiring any GPU resources.Computations can be performed on most modern CPUs, but accelerate with more CPU threads and cores and parallelization on GPUs.All surgical procedures used in this study were approved by [redacted for double-blind review].

Disruption of Temporal Processing in the Inferior Colliculus Due to Aging
The inferior colliculus (IC) is the major integrative center of the auditory pathway, receiving excitatory inputs from ventral and dorsal cochlear nuclei, excitatory and inhibitory inputs from the lateral and medial superior olivary complex (Kelly and Caspary, 2005) and inhibitory inputs from superior paraolivary nucleus and the dorsal and ventral nuclei of the lateral lemniscus (Cant and Benson, 2006;Loftus et al., 2010).The IC encodes auditory information through hierarchical processing of input synaptics with local IC circuitry (Caspary et al., 2002;Rabang et al., 2012;Grimsley et al., 2013;Coventry et al., 2017).Age-related changes in auditory processing primarily arise as deficits in temporal processing (Frisina and Frisina, 1997;Parthasarathy et al., 2010;Parthasarathy and Bartlett, 2012;Herrmann et al., 2017).This dataset is composed of single unit responses recorded from young (Age≤ 6 months) and aged (age ≥ 22 months) Fisher 344 rats.Auditory brainstem responses were recorded from animal subjects a few days prior to surgery to ensure hearing thresholds were typical of the rodent's age.Single unit recordings were performed in a 9'x9' double-walled, electrically isolated anechoic chamber (Industrial Acoustics Corporation).Animals were initially anesthetized via a bolus injection of ketamine (VetaKet, 60-80 mg/kg) and medetomidine (0.1-0.2 mg/kg) mixture via intramuscular injection.Oxygen was maintained via a manifold and pulse rate and blood oxygenation monitored through pulse oximetry.Supplemental doses of ketamine/medetomidine (20 mg/kg ketamine, 0.05 mg/kg medetomidine) were administered intramuscularly as required to maintain surgical plane of anesthesia.An incision was made down midline and the skull exposed.Periosteum was resected and a stainless steel headpost was secured anterior to bregma via 3 stainless steel bone screws.A craniectomy was made above inferior colliculus (-8.5 anterior/posterior, 1 mm medial/lateral from bregma).A single tungsten electrode was advanced dorsally towards the central nucleus of the inferior colliculus (ICC) during which bandpass noise (200 ms, center frequencies 1-36kHz in five steps per octave, 0.5 octave bandwidth) was delivered.ICC was identified based on shortlatency driven responses to bandpass noise search stimuli with ascending tonotopy and narrowly tuned responses to pure tones of varying frequencies.Once neurons were identified, responses from 5-10 repetitions of sinusoidal amplitude-modulated tones (750 ms tone length, modulation depth between -30 to 0 dB) were recorded using a preamplifying headstage (RA4PA, Tucker-Davis Technologies) and discretized at a sampling rate of 24.41 kHz (RZ-5, TDT).Sinusoidal amplitude-modulated tones were defined as: where m is modulation depth ranging between 0.032-1 (-30 -0 dB),   the modulation frequency, φ the reference phase of the modulator, A the scaling factor for stimulus sound level, and () the broadband noise stimulus.Single units were filtered between 0.3 and 5 kHz.Offline spike sorting was performed using OpenExplorer (TDT).

Thalamocortical Infrared Neural Stimulation
Infrared neural stimulation (INS) is an optical technique using coherent infrared light to stimulate nerves and neurons without the need for genetic modification of the target or direct contact with tissue that offers spatially constrained activation above electrical stimulation (Wells et al., 2005;Izzo et al., 2007;Cayce et al., 2011Cayce et al., , 2014;;Coventry et al., 2020Coventry et al., , 2023)).In this study, rats were chronically implanted in A1 with 16 channel planar Utah-style arrays (TDT, Alacua FL) and stimulating optrodes in the medial geniculate body of auditory thalamus (Thor Labs, Newton NJ).
Rodents were initially anesthetized with a bolus injection of a ketamine (80 mg/kg) and medetomidine (0.2 mg/kg) cocktail.Oxygen was maintained via a manifold and pulse rate and blood oxygenation monitored through pulse oximetry.Supplemental doses of ketamine/medetomidine (20 mg/kg ketamine, 0.05 mg/kg medetomidine) were administered intramuscularly as required to maintain surgical plane of anesthesia.An incision was made down midline and the skull exposed.The periosteum was removed via blunt dissection and 3 stainless steel bone screws were placed in skull for headcap stability.An additional titanium bones crew was placed in skull to serve as a chronic ground and reference point for recording electrodes.Craniectomies were made above medial geniculate body (-6 anterior/posterior, -3.5 medial/lateral from bregma) and auditory cortex (-6 anterior/posterior, -5 medial/lateral from bregma).Fiber optic stimulating optrodes were placed in the midpoint of MGB (-6 dorsal/ventral from dura) and affixed to the skull using UV-curable dental acrylic (MidWest Dental).A 16 recording channel planar array was putatively placed in layers 3/4 of auditory cortex, with placement confirmed by short-latency high amplitude multiunit activity elicited from band pass noise (200 ms, center frequencies 1-36kHz in five steps per octave, 0.5 octave bandwidth) test stimuli.Recording electrodes were sealed onto the headcap.Animals were allowed to recover for 72 hours prior to the beginning of the recording regime.All recordings were performed in a 9'x9' electrically isolated anechoic chamber.During recording periods, animals received a intramuscular injection of medetomidine(0.2mg/kg) for sedation.Optical stimuli were delivered from a 1907 nm diode laser (INSight open source optical stimulation system) coupled to the optrode with a 200 μm, 0.22 NA fiber (Thor Labs FG200LCC).Laser stimuli were controlled via a RX-7 stimulator (TDT) and consisted of train stimuli with pulse widths between 0.2-10 ms, interstimulus intervals between 0.2-100 ms and energy per pulse between 0-4 mJ.Applied laser energies were randomized to limit effects from neural adaptation with 30-60 repetitions per pulse width/interstimulus interval combinations.Signals from recording electrodes were amplified via a Medusa 32 channel preamplifier and discretized and sampled at 24.414 kHz with a RZ-2 biosignal processor and visualized using Open-Ex software (TDT).Action potentials were extracted from raw waveforms via real-time digital band-pass filtering with cutoff frequencies of 300-5000 Hz.Single units were extracted offline via superparamagnetic clustering in WaveClus (Quiroga et al., 2004).Studies were performed to assess the dose-response profiles of opticallybased deep brain stimulation over the span of several months.As each electrode recorded diverse populations of neurons which are potentially subject to change due to electrode healing in, age of the device, and adaptation to the stimulus, a within subjects, repeated measures regression model was warranted.Bayesian hierarchical regressions can easily deal with complex models such as these.

An Example of Bayesian Analysis Reporting Guidelines
Bayesian Analysis Reporting Guidelines (BARG) (Kruschke, 2021) was initially proposed to promote transparent and reproducible Bayesian statistics reporting.While initially devised for social and psychological sciences, we adapted the BARG to suit neuroscientific data.

Bayesian Model Descriptions and Sensitivity
Analyses.This report follows the guidelines for reporting of Bayesian Analysis (BARG) (Kruschke, 2021) consisting of: • Necessary software and source code directory Step 2A, 6 This section describes the computational tools used for statistical analyses, including CPU and GPU use.For example: Bayesian modeling was performed using Python 3.6.8on a Razer Blade 15 Laptop with an Intel Core i7 processor (6 cores) and an Nvidia RTX2070 GPU.Models were implemented in PyMC version 4.11.5 (Salvatier et al., 2016), a probabilistic programming module in the Python environment.All source code is available at this paper's github repository (Link to software).All source data is available at this article's open science framework repository (Link to Data).

Goals of the Analyses
This section serves to establish goals of the analyses, brief description of the statistical models used and validation of Bayesian approaches.

BARG: Preamble
The goal of Bayesian regression analyses is to infer a linear relationship within inferior colliculus single unit firing rates resulting from changes in depth of sinusoidal amplitude modulated stimuli.While this is normally established using frequentist linear regression methods, Bayesian approaches allow for flexible and explicit model descriptions which provide rich and descriptive inference and quantification of uncertainty in measurement of single unit activity.Inference is completed using direct probability measures on posterior distributions as opposed to less intuitive and difficult to interpret p-values.Bayesian approaches are also data driven and account for previous knowledge to be encoded as prior distributions (see section 1.3).The regression model utilized is: where FR is the mean evoked firing rate.Firing rate functions were calculated from recorded peristimulus time histograms.Parameter  quantifies the effect of modulation depth (m) on evoked firing rates respectively.The  parameter describes the model intercept and quantifies subthreshold spontaneous activity and the  quantifies model error.

Priors and rational for prior choice is described in this section
There is significant data detailing inferior colliculus responses to SAM stimuli from our lab and the auditory neuroscience community writ large(Citations redacted for double blind review).However, the role of modulation depth on IC firing rates is understudied.As observations of single unit IC activity tends towards normal distributions, normal likelihood and prior distributions were chosen.Normal distributions also have the advantage of being moderately informative, refraining from undue influence on the posterior from the prior, allowing data to "speak for itself."

Posterior Decision Rules
This section details the decision rules used in inference (ROPE+HDI, Bayes factors, etc).
Inference was performed on posterior distributions with credible regions (analogous to frequentist confidence intervals) defined as a highest density interval (HDI) of 95% of parameter maximal a posteriori density (MAP) parameter estimates which represent the most probable value of the coefficient.MAP estimates are analogous to maximum likelihood estimation found in frequentist approaches.This allows for the quantification of parameter uncertainty as variance observed in posterior parameter distributions, with narrow HDIs representing more certain estimates.It is customary to define a region of practical equivalence (ROPE) if prior information dictates that incremental parameter changes are effectively the same.As we lack prior knowledge to inform the choice of a prior rope, we take an agnostic approach that any change seen is worth investigating and thus ROPEs are not presented.An effect was deemed significant if it's 95% HDI did not overlap with 0, in line with proposed decision rules typical of Bayesian inference (Kruschke, 2011(Kruschke, , 2014)).

Final Model
This section details the final model after prior and posterior sensitivity analyses.Helpful to include a descriptive figure of the inference model Posterior predictive checks and sensitivity analysis were performed to titrate the best performing models as measured against observed data (Section 3).The final regression model is schematized in figure S1.Final models included deterministic nodes at outputs of prior nodes to prevent NUTS from becoming stuck in regions of the sampling space which are difficult to explore 1 .

BARG:
Step 3A,C This section details the methods and results of any model sensitivity analyses.As an example, Model sensitivity analyses from hierarchical linear regression are given below.
To evaluate the dependance of hyperprior and prior parameters on Bayesian hierarchical linear regression, leave one out (LOO) cross validation (Gelman et al., 2014).A series of models were evaluated with model variances varied to test sensitivity of each model.Initial data analyses suggested that natural-log transformations of the dependent variable (firing rate) produced distributions which are better modeled as normal distributions.To this end, hierarchical models under test were as follows:

Table S1: Regression models under test
For each model, the variance hyperprior was varied to assess the impact of prior parameters on posterior predictions.Prior classes were defined as: informative (variance ≤ 1), moderately informative (variance = 5), and weakly informative (variance ≥ 10).Primary metrics for model comparison were expected log pointwise predictive density (ELPD), defined as (Vehtari et al., 2017):

Posterior Predictive Checks
This section describes any posterior predictive checks performed and a description of the posterior predictive decision rule.
A key advantage of Bayesian-based inference approaches is the ability to directly and explicitly compare model fits to observed data.During model development, posterior predictive checks were performed by sampling from the posterior distribution (16,000 draws).Kernel density estimates of posterior predictive draws from the posterior distribution were compared to kernel densities of observed data.Goodness of fit was quantified using the Bayesian p-value (Gelman et al., 2021).Similar to the frequentist p-value, the Bayesian p-value is also a measure of discrepancy, quantifying the probability that posterior predictive-based draws are more extreme than observed data.The Bayesian p-value is defined as: where I is the indicator function,   is the posterior predictive distribution and y is the posterior distribution.Similar to the posterior distribution, posterior predictive distribution and Bayesian pvalues were estimated using NUTS.The closer the Bayesian p-value is to 0.5, the better the data sampled from the posterior distribute around the observed data.
Posterior predictive fits and Bayesian p-values for the hierarchical linear and multinominal regression models suggest excellent posterior predictive fits with  � = . for the hierarchical linear regression model (Main article, Fig 4).

Prior and Posterior Trace plots
This section presents prior and posterior trace plots which are useful for diagnosing model fits.

BARG:
Step 2B,C Critical to the performance of HMC MCMC sampling is the convergence of sampling traces.Output trace plots display the chain of sampled values and the resulting kernel density estimates of sampled prior distributions.All sampled traces showed no divergences in sampling, suggesting that sampled traces were "well behaved", providing accuate and effective sampling of the distribution.The Gelman-Rubin statistic, quantifying within and between chain estimates and correlation was  � < . for all sampled traces thus showing good MCMC convergence.For clarity, traces are available on open science framework, with traces for the posterior presented in Figure S3.Traces were checked for characteristic sampling behavior (Hoffman and Gelman, 2011) with no pathological traces found in models.

Table S2 :
are unknown distributions representing the true data generating function for estimates of true posterior predictive function ( �|) from observed data y.Estimated   ,   distributions are obtained via cross validation during LOO analysis.In general, higher values of ELPD are a result of higher out of sample predictive fit indicative of a better model.Weight values generated by LOO cross validation were also analyzed and predict the probability of each model given observed data.Finally, we observed the standard error of the ELPD estimate (SE), and the difference between the model with highest ELPD and every other model (dSE) with dSE of the top model set to 0.00 by definition.All LOO calculations were performed post hoc with the python package arviz, a plugin for PyMC.LOO model comparison results for the Bayesian hierarchical regression models.Var: Prior variance parameter, log: log predictor and predicted variable model.semilog:semilogpredictor model.ST: Student T Likelihood models.N: Normal likelihood models and between chain estimates and correlation was  � < ., indicative of convergence of marginal posterior parameter values(Gelman and Rubin, 1992).
Furthermore, traces of sampled prior parameters in regression models suggest effective sampling of the posterior distribution (FigS3).Furthermore, the Gelman-Rubin statistic, quantifying within