Abstract
Nervous systems have evolved to combine environmental information with internal state to select and generate adaptive behavioral sequences. To better understand these computations and their implementation in neural circuits, natural behavior must be carefully measured and quantified. Here, we collect high spatial resolution video of single zebrafish larvae swimming in a naturalistic environment and develop models of their action selection across exploration and hunting. Zebrafish larvae swim in punctuated bouts separated by longer periods of rest called interbout intervals. We take advantage of this structure by categorizing bouts into discrete types and representing their behavior as labeled sequences of bout-types emitted over time. We then construct probabilistic models – specifically, marked renewal processes – to evaluate how bout-types and interbout intervals are selected by the fish as a function of its internal hunger state, behavioral history, and the locations and properties of nearby prey. Finally, we evaluate the models by their predictive likelihood and their ability to generate realistic trajectories of virtual fish swimming through simulated environments. Our simulations capture multiple timescales of structure in larval zebrafish behavior and expose many ways in which hunger state influences their action selection to promote food seeking during hunger and safety during satiety.
Introduction
Methods to quantify freely moving animal behavior have seen explosive growth in recent years. This growth has been fueled by rapid improvements in the quality and accessibility of scientific cameras, pose estimation algorithms, and behavioral models. A modern behavioral analysis pipeline1 commonly involves: (1) acquiring video of an animal behaving, (2) compressing this data into a low-dimensional time series representing the animal’s posture (or postural dynamics) in each video frame, and (3) using computational tools to annotate each frame with a discrete behavioral state label (e.g. “head-grooming”, “rearing”). While each of these steps pose significant challenges, variations of this pipeline have been used to discover large sets of stereotyped actions generated by worms2,3,4,5,6, flies7,8,9, fish10,11,12,13,14, and mice15 behaving in controlled lab environments. Importantly, these procedures produce statistical summaries of behavior that facilitate comparison of nervous system function across animals or animal groups.
The larval zebrafish is an important animal species for investigating the development16,17,18, structure19, 20, and function21,22,23 of vertebrate nervous systems. Further, they are convenient to use in behavioral studies because of their simple body plan, temporally discrete behavior, and stereotyped locomotor repertoire. Pose estimation24,25,26,27,28 is an active area of machine learning research and many solutions have been developed to estimate pose from video, even for animals with complicated and flexible bodies such as locusts, fruit flies, and humans. Larval zebrafish, however, have a simpler body shape with fewer degrees of freedom, reducing the complexity of this problem. Once extracted, animal pose data can be temporally segmented into a sequence of minimal and stereotyped behavioral elements (referred to as “movemes”29, “actions”29, “motifs”15, or “syllables”15). This parsing problem becomes challenging for animals that move continuously or display many behaviors, but several effective approaches have been described1. However, since larval zebrafish swim in punctuated bouts, temporal segmentation of their behavior into bout and interbout epochs is straightforward. Together, the anatomy and movement patterns of larval zebrafish simplify behavioral analyses and allow each swim bout to be represented as a point in a high dimensional posture or kinematic parameter space. Several studies have taken advantage of these properties to analyse and categorize swim bouts, and the most comprehensive14 effort to date identified 13 basic types of swims used during hunting30, 31, taxis behaviors32,33,34,35, escape maneuvers36,37,38, social interactions39, and spontaneous40 swimming in the light and dark.
While locomotor patterns produced by larval zebrafish have been well studied, much less is understood about the complex generative processes underlying the animal’s natural action selection. For example, when a zebrafish larvae swimming through a natural environment selects and generates a swim bout from its locomotor repertoire, what variables and computations shape this decision? How does this decision depend on external inputs like the locations, sizes, shapes, and motion patterns of objects in the local environment? How does this decision depend on internal inputs like short- or long-term behavioural history, or hunger state? To work toward addressing these questions, we use a moving camera system to collect high spatial resolution video of larvae swimming in a large arena with abundant prey. This approach allows the fish to explore and hunt throughout a vast environment as we monitor fine postural features (e.g. tail shape, eye positions) together with microscopic details of its surroundings. Importantly, since zebrafish larvae frequently move their eyes to hunt, stabilize gaze, navigate, and surveil for threats, continuous eye position measurements can provide rich information about the animal’s behavioral state. The use of a moving camera system also eliminates a trade-off between spatial resolution and arena size, allowing us to observe up to hundreds of consecutive swim bouts without interference from arena boundaries. By contrast, fixed camera systems can require small enclosures that physically limit the space of possible behavioral sequences and bias action selection by promoting thigmotactic41 behaviors. Here, we avoid these issues to generate behavioral data that should more closely approximate larval zebrafish behavior in the wild. We next use this data to construct probabilistic behavioral models – building on point process models with a long history in computational neuroscience42,43,44,45 – to evaluate how internal and external inputs shape larval zebrafish behavior. By sampling from these models, we can simulate behavioral trajectories that capture salient features of larval zebrafish behavioral dynamics spanning multiple timescales, from reactions to prey (<100 milliseconds), across stretches of hunting and exploration (seconds to minutes), and throughout changing hunger state (minutes to hours).
Results
Acquiring Behavioral Data with BEAST
To collect high resolution video (13 µm per pixel; 60 fps) of larval zebrafish swimming in a large arena, we built a moving camera rig called BEAST (for Behavioral Evaluation Across SpaceTime, 1A, Supp Video 1). This setup is similar to previously described rigs46, but the camera moves on a motorized gantry to remain positioned above the fish. We observed 7-8 days-post-fertilization nacre−/− zebrafish larvae (n=130) one at a time. To evaluate the influence of hunger on their behavior, each fish was either given abundant paramecia (fed group, n=73) or deprived of food for 2.5-5 hours (starved group, n=57) prior to observation. Previous studies have shown that brief food deprivation is sufficient to robustly increase larval zebrafish food intake47, 48 and to increase their likelihood to approach, rather than avoid, prey-like visual stimuli49, 50. We therefore expect the fed and starved fish groups to display different behavioral patterns and aim to construct behavioral models to quantify and characterize these effects.
We place each fish in the arena and repeatedly recruit it to the center to initiate up to 18 observational trials. We guide the fish to the center by projecting inwardly drifting concentric gratings onto a screen embedded in the tank bottom, thereby leveraging the fish’s optomotor response35. Once at the center, a static natural scene replaces the gratings and the fish is recorded for up to 3 minutes or until it reaches the arena edge or tracking fails. Swim paths from 3 representative trials are shown (1B) and the fish’s heading direction throughout the indicated portion (1B, purple line) of one trial is plotted (1C). Swim bouts can be seen as brief fluctuations in heading direction over time and the timing of bout and interbout epochs is determined from this signal (1D, Methods). We translate and rotate each image frame to register the video to the reference frame of the fish, as shown for the indicated 200 millisecond swim bout (1E). We encode fish posture (S1) in every image frame by estimating the vergence angle31 of each eye and the shape of the tail46 (1F). Larvae can accelerate rapidly (e.g. during escape swims), which can cause the online tracking system to fail. Also, offline pose estimation is occasionally compromised due to motion blur (during very high speed swims) or body roll (causing one eye to occlude the other in the image). We retain only video segments in which all postural features can be accurately extracted in every image frame for further analysis (see Methods).
The processed dataset contains 40 hours of behavioral data parsed into bout and interbout epochs (4002 video segments) (1G). Across all swim bouts (n = 200,559), the change in heading angle per bout is narrowly and symmetrically distributed (1H). The arena contains abundant prey (mostly paramecia and some rotifers) and the fish tend to hunt prey located near the water surface. Zebrafish larvae converge their eyes during hunts to pursue prey with binocular vision. While not hunting, larvae keep their eyes more diverged, increasing their visual coverage of the environment which should improve their ability to detect threats. These animals should therefore experience a natural trade-off between seeking food and seeking safety, and these opposing states can be seen in the bimodal distribution of eye position measurements during interbout intervals (1I). We maintain a large circular field of view centered around the fish head (1J) with sufficient image sharpness to extract positions, sizes, shapes, and motion patterns of objects near the water surface (1K, Methods) with which the fish may interact. We later use this information to construct compressed representations of environmental state to predict the type of the next swim bout.
Bout Categorization and Regulation of Action Selection by Hunger State
Hunger state has a large influence on larval zebrafish behavior and this can be seen by comparing eye position histograms (similar to 1I) across fed and starved fish groups within the first 10 minutes of testing (2A). The fraction of interbout intervals during which eyes are converged (threshold: mean vergence angle = 24°) increased by 183% from 0.124 in fed fish to 0.351 in starved fish in this time period. Fed fish also display wider eye divergence during non-hunting behavior compared to starved fish (2A, fed: peak at 12.5°; starved: peak at 15.5°; solid vertical lines). These observations indicate that hunger promotes increased food seeking behavior while satiety promotes wider eye divergence.
To quantify the structure of larval zebrafish behavioral sequences and compare this structure across fed and starved groups, we first aim to categorize all observed swim bouts into discrete types. To that end, each swim bout is represented as a 10-frame (167 ms) postural sequence beginning at bout initiation (2B). This gives a 220-dimensional representation (2 eye and 20 tail measurements per frame) of the postural dynamics associated with each swim bout. We next perform dimensionality reduction to embed these 220-D observations in a 2-D space with t-distributed stochastic neighbor embedding (tSNE51, 52) (2C) and use density-based clustering8, 53 to isolate 5 major classes of swim bouts (S2-S4). These classes consist of hunting bouts (here called J-turn, pursuit, abort, strike) and non-hunting bouts (here called exploratory). Zebrafish larvae typically initiate hunts by converging their eyes and orienting toward prey with a J-turn31, and reduce the distance to prey while maintaining eye convergence with pursuit54 bouts (also called approach14 swims). Larvae typically end hunts with eye divergence, either during a strike (also called capture14 swim) or during hunt termination with an abort55. We find starved fish upregulate use of all hunting bouts, with strikes upregulated most and aborts least (2D). Bout-class selection probabilities of fed and starved fish gradually converge over 40 minutes (S5), presumably as the hunger state of each group shifts from their opposing initial conditions (high hunger or high satiety) toward an intermediate state near nutrient equilibrium.
To increase the granularity of the swim bout categories and improve the sensitivity of our analyses, we further subdivide the 3 largest bout-classes to yield a total of 10 exploring bout-types and 8 hunting bout-types (2E). Each of these 18 bout-types is composed of nearly equal numbers of leftward and rightward bouts. We use 2 scalar kinematic measurements to subdivide swim bout-classes: |Δheading| and |Δtail-shape| (S6). |Δheading| is the magnitude of heading angle change per bout, and |Δtail-shape| is the sum of the magnitudes of frame-to-frame changes in tail shape across each 10-frame bout representation, a postural measurement that correlates with distance traveled per bout and presumably energy expenditure (see Methods). We evenly split J-turns into 2 types (j1, j2) by |Δheading|. Pursuits are more abundant, so we split first by |Δheading| and then again by |Δtail-shape| to yield 4 types (p1-4). Exploratory bouts are split into 3 groups by |Δheading| and again into 3 groups each by |Δtail-shape| to yield 9 types (e1-9). A small subset of exploratory bouts (called e0) seem to correspond to orofacial and/or pectoral fin movements occuring in the absence of tail motion. e0 bouts occur with eyes diverged, include actions like suction, jaw movement, swallowing, and prey expulsion, and are the most likely bout-type to follow strike. e0 events were isolated prior to splitting exploratory types e1-9. Since e0 bouts are near threshold for bout detection, we expect this category to also include some erroneously detected bouts due to measurement noise and temporal segmentation errors. Detailed kinematic summaries of all bout-types are found in S6-S7. With labels assigned to all swim bouts, the distributions of interbout intervals preceding and following each of these 36 bout-types can be compared (2F, S8). It is apparent that larvae select longer interbout intervals during exploration (i.e. while not hunting), and select shorter interbout intervals during hunts. Note also that distributions of interbout intervals preceding (or following) left and right versions of each bout-type are nearly equivalent, demonstrating the extent to which left-right symmetry organizes larval zebrafish behavior at the population level.
By comparing bout-type abundances (2G) and interbout interval durations (2H) across fed and starved fish groups in the first 40 minutes of testing, we find that hunger influences larval zebrafish bout-type selection in previously unreported ways. Fed fish upregulate selection of low-energy exploratory bouts (e1-3) relative to starved fish, with selection of the lowest-displacement forward swim e1 increased by 107%. Starved fish are more likely to start hunts, especially through high-angle J-turns (j1 up 35%, j2 up 70%), and increase use of pursuits (up 71%), aborts (up 34%), and strikes (up 96%). Hunger state also affects interbout interval selection, especially following exploratory bouts, with fed fish selecting longer interbout intervals. This effect is most pronounced following low-energy exploratory bouts, with the mean duration of interbout intervals following bout-types e1-3 increased by 101 ms in fed fish relative to starved fish. Also, during interbout intervals preceding exploring bouts, fed fish maintain wider eye divergence than starved fish. (2I). Taken together, these results indicate that satiation promotes multiple strategies that should increase protection against predation. Fed fish might decrease their visibility to predators by upregulating low-energy exploratory bouts and spending more time at rest (i.e. longer interbout intervals). Fed fish might also be better able to detect predators by maintaining wider eye divergence while exploring (low-energy exploratory bouts have highest eye divergence, see S7). By contrast, starved fish swim more frequently and cover more distance per bout while exploring. Starved fish should also increase their food intake by starting more hunts and increasing the likelihood that hunts end with strike.
In these experiments, zebrafish larvae alternate between modes of exploring and hunting, with each mode defined by distinct behavioral dynamics. This mode-switching is highlighted in an example bout sequence containing a successful hunt (2J, Supplementary Video 2). The trajectory of the fish throughout this hunt is reconstructed (2K) and plotted together with 999 other hunts that also end in strike (2L). We define a complete hunt as a bout sequence that begins with a J-turn, ends with an abort or strike, and is padded with only pursuits (for hunts longer than 2 bouts). The full dataset contains 7230 complete hunts (19.6% end in strike). To better quantify the behavioral sequences observed in this study, we next construct probabilistic models to predict the timing and type of swim bouts.
Constructing Probabilistic Models to Predict Interbout Intervals
We model the data as a marked renewal process45, 56, a stochastic process that generates a sequence of discrete events in time, each characterized by an associated “mark” (3A). Marked renewal processes are statistical models that specify the conditional distribution of the time and type of the next event in a sequence given the history of preceding events. First, we consider bout timing. Our key question for model construction is, “what features of the event history carry predictive information about the timing of the next event?” To this end, we choose five interpretable features to represent behavioral history across multiple timescales (3B). On the shortest timescale, we model the interbout interval (in) as a function of preceding bout-type (bn−1) and preceding interbout interval (in−1). On an intermediate timescale, we use hunt dwell-time (thunt) and explore dwell-time (texplore) features to encode how long the fish has been dwelling in either a hunting or exploring mode just prior to in (see Methods). On the longest timescale, we encode how long the fish has been in the tank prior to in(tank-time, ttank), which can relate how fish behavior changes with hunger state. By comparing models composed from different features, we can learn how past actions predict future behavior.
We found that starved fish select shorter interbout intervals than fed fish, but how else do patterns of bout timing differ across fish groups? To interpret how feeding state influences the functional relationship between behavioral history and interbout interval selection, we consider 2 forms for each predictive feature: pooled and split. In the pooled form, data from fed and starved fish are pooled together to fit one set of weights relating that predictive feature to interbout in. In the split form, a separate set of feature weights are fit for each fish group. We use a generalized linear model57 (GLM) with an exponential inverse link function to generate a probability distribution over in(3C, Technical Appendix). Briefly, the dot product of a basis function representation (S9) of the feature input with the corresponding feature weights is computed. This value is passed through an inverse link function to give the mean of a probability distribution over in. Since the interbout intervals are measured in units of frames elapsed, we considered 3 types of probability distributions over nonnegative counts: geometric, Poisson, and negative binomial (NB). We find the data to be too complex for modeling with geometric or Poisson distributions, which are parameterized by just their mean. Instead, we can fit the data better with the more flexible NB distribution, which is parameterized by its mean and variance. This requires an additional set of feature weights to estimate the variance of the in distribution.
In 3D-F, we visualize results from fitting pooled and split forms of models composed from each intrinsic feature listed in 3B. For each model, we select an appropriate prior variance on the weights through an empirical Bayes58 hyperparameter selection method (Technical Appendix), and for features that take a scalar value as input (those in 3F), we also choose the number of basis functions. We compute the marginal log likelihood (MLL) of each model to select hyper-parameters and to choose a preferred feature form (pooled or split, indicated with gold star). In modeling the interbout interval as a function of preceding bout-type, a separate in distribution is produced for each of the 18 possible values for bn−1 (with 2 example values shown, 3D). The NB distribution fits observed data better than geometric or Poisson, so we use the NB throughout the rest of the paper. With a split form, the preceding bout-type feature captures subtle bout-type specific differences in interbout interval durations across fed and starved fish groups (3E). For each remaining feature, we report the predictive mean of the NB distribution for fed and starved groups across a range of possible input values (3F). We include an additional set of split-bias weights in each model in 3F to capture the overall mean and variance of interbout intervals for each fish group (2 extra free parameters per fish group). This design choice allows models composed from pooled features (3F, top row) to capture shifts in mean and variance of interbout intervals across fish groups without using the more complex split features (3F, bottom row).
After fitting the interbout models, we next visualize their predictive output to see how behavioral history influences future interbout interval durations selected by the fish. On a short timescale, we find consecutive interbout interval durations are autocorrelated (3F, column 1). On an intermediate timescale, interbout intervals get shorter as hunt sequences get longer (3F, column 2). By contrast, as exploring sequences get longer, interbout intervals also get longer (3F, column 3). After accounting for the shift in mean, the relationship between these 3 features and interval in is similar across fish groups. By contrast, fed and starved fish display opposing patterns of interbout interval selection on the longest timescale (3F, column 4). Starved fish initially hunt more, producing shorter interbout intervals. Fed fish initially explore more, producing longer interbout intervals. As their hunger states equilibrate, interbout intervals selected by fed and starved fish become more similar. These opposing behavioral patterns require the split form of the ttank feature to be modeled appropriately.
Constructing Probabilistic Models to Predict Bout-Types
The second component of the marked renewal process is a model of how the next bout-type is selected depending on behavioral history, including the interval immediately preceding it. We use the previously introduced intrinsic features (from Figure 3) and include also 4 extrinsic features (4A) to model how locations (vloc), sizes (vsize), and relative velocities (vx, vy) of objects in the fish’s local environment relate to bout-type selection. We use locations of putative prey objects to construct an 868-dimensional image (vloc) encoding their final positions prior to initiation of the next swim bout. We modify vloc to produce the other extrinsic features by scaling the intensity of each represented object (S9), with an example vsize input shown (4A). Similar to the interbout models, we take the dot product of a basis function representation of a feature input with its corresponding weights to produce a vector of bout-type “activations”, ψ (4B). This vector ψ is passed through a softmax function to generate a valid probability distribution, π, over all 36 possible bout-types (Technical Appendix). As before, we perform hyper-parameter selection for all features (S10) and select a preferred form (pooled or split). We also include a set of split-bias weights to account for differences in baseline bout-type abundances across fed and starved fish groups (36 extra free parameters per fish group). This again allows simpler pooled models to account for differences in bout-type abundances across groups, and we choose pooled forms for all bout-predicting features except ttank.
The tendency of larval zebrafish to switch between modes of exploring and hunting can be seen in the block structure of bout transition probabilities captured by the preceding bout-type feature (4C-D). While exploring, larvae are likely to link consecutive exploratory bouts of similar energy (note increased transition probability along diagonal in 4C). This bout transition pattern may permit larvae to maintain speed while exploring, especially in combination with autocorrelated interbout intervals. It is known that larval zebrafish can adjust their swimming speed to match optic flow stimuli59, and it seems they employ speed control mechanisms during natural exploratory behavior as well. Larvae enter hunting mode with a transition to J-turn, after which they are likely to emit some pursuit bouts before an abort or strike. Larval zebrafish behavior is symmetric at the population level, so we construct symmetric features to simplify and improve models. We exploit symmetry in the preceding bout-type feature by considering ipsilateral (4C) and contralateral (4D) bout transition probabilities. It has been shown that zebrafish larvae tend to link bouts in the same direction while exploring a featureless environment, and that neural circuits in the anterior rhombencephalic turning region (ARTR) mediate switching between leftward and rightward states40. We too see that larvae tend to link exploring bouts through ipsilateral bout transitions (4C) rather than contralateral transitions (4D). This pattern extends also to hunting, with nearly every transition being more likely to occur ipsilaterally, except for transitions into abort (4D, arrow), during which fish are likely to switch left-right state. While bout transition probabilities are different across fed and starved fish, the transition structure is similar enough that the conservative model selection approach favors the simpler pooled-bn−1 form (with 648 feature weights) over the split-bn−1 form (with 1296 feature weights). For comparison, see the split-bn−1 model (S11).
Prey object locations also influence bout-type selection, especially during hunts, and pooled-vloc weights associated with each hunting bout-type are shown (4E). Since vloc inputs are high-dimensional, we compress them by summing vloc pixels within spatial bins (as shown). Objects located in red spatial bins increase the probability that the corresponding bout-type will be selected next by the fish. Larvae typically select a J-turn to orient toward a laterally located object (~15-60° relative to heading), and the magnitude of heading angle change depends on prey location30 (compare j1, j2). Energetic pursuits (p3-4) are more likely when prey are further away, while larger |Δheading| pursuits (p2, p4) are more likely when prey are located more laterally. Prey locations also influence how hunts will end, with strike becoming most likely with an object located almost directly in front of the mouth. By contrast, aborts are only weakly related to prey location and may be selected by the fish to terminate unsuccessful hunts. The vloc weights associated with exploring bouts are more spatially uniform with near-zero or negative weights (not shown), increasing the likelihood of exploring bout selection when prey are absent. Taken together, prey locations relate to larval zebrafish bout-type selection in expected ways, and these basic relationships are captured by the pooled-vloc feature.
The generative process underlying bout-type selection depends nonlinearly on the preceding interbout interval, and we capture this dependency with the pooled-in model (4F). The activations (ψ) of 5 separate bout-types across the range of in input values are shown (4F, left panel), with larger activations indicating higher bout-type probability. Full bout-type probability distributions (π) evaluated at 3 example values of in are also shown (4F, panels i-iii), with probabilities across fed and starved fish averaged for display. At very short preceding interbout interval values (in < 0.25 sec), activations of p3, strike, and abort have different dynamics, indicating how timing is intricately involved in bout-type generation during hunts. As in extends from 0 to 0.25 seconds, strikes become less likely, aborts become more likely, and p3 probability peaks near 150 ms before decreasing again. As in reaches 0.5 seconds (4F, panel ii), exploring bouts become more probable. As in reaches 1 second (4F, panel iii), low-energy e1-3 and high-|Δheading| e3, e6, e9 bouts become most likely.
Longer timescale bout-type dependencies are captured with thunt and texplore features (4G-H). As hunts extend, aborts become less likely and strikes become more likely (4G, left panel). Pursuits are always likely when larvae are in hunting mode (i.e. thunt is non-zero), but the probability mass shifts toward the short straight p1 bout as hunts get longer (4G, panel iii) and larvae have approached a target. In these experiments, larvae spend most of their time exploring, and bout-type probabilities shift slightly as larvae spend more time in exploring mode. As explore dwell-time increases, larvae become more likely to select the slow straight e1 bout (4H) and also become more locked into the exploring behavioral state. The probability of emitting a J-turn decreases ~46% as texplore increases from 5 bouts (4H, panel i) to 40 bouts (4H, panel iii). This effect may be partially explained by decreased food density as fish navigate toward the arena edge. On the longest timescale, the ttank feature captures slow fluctuations in bout-type probabilities over the course of observation. As with the models of bout timing, the ttank feature must be split to capture opposing behavioral trends of fed and starved fish on this timescale. Separate bout-type probability distributions for fed and starved fish are shown (4I, panels i-iii) and bout-type selection becomes similar across fish groups as their hunger states converge at ttank = 40 minutes (4I, panel iii).
Comparing and Combining Behavioral Models
Having constructed several single-feature models of larval zebrafish behavior, we next aim to compare their quality. To this end, we compute the marginal log likelihood (MLL) of each model and report improvement over the simplest baseline model (pooled-bias). For interbout models (5A), the baseline NB pooled-bias model has 2 parameters (1 for mean, 1 for variance). For model comparison, we show MLL for pooled and split forms of every model introduced in Figure 3. We include also bn−2 as well as in−2, in−3, in−4, and in−5 to see how predictive information decays over time. We find that preceding bout-type is best able to predict in, but that preceding interbout interval is a close second, and that more distant behavioral history also provides useful information to model the interbout interval. In general, MLL is similar across pooled and split forms for each feature, except for ttank. We interpret this to suggest that any mild gains in predictive performance through use of split features may be offset by their increased complexity and decreased number of training examples per split weight. We therefore select the pooled form for all features except those indicated with an asterisk.
While preceding bout-type is the best predictor of in, can we build stronger models by combining features? More generally, to what extent do the representations of behavioral history encode non-redundant information about interbout in? We approach these questions by combining all pairs of features (in their selected forms) to produce 45 paired interbout GLMs (5B). For each paired model, we compute MLL (as in 5A) and report improvement over the stronger of its 2 feature components. Paired models that show large improvement should combine features that provide some unique information about interbout into improve model accuracy. We find that preceding bout-type and preceding interbout interval combine to produce the strongest paired interbout model (white circle, 5B). This paired [bn−1, in−1] model improves over bn−1 alone by 0.05 nats per interbout, or 29% relative to baseline. By contrast, the paired [bn−1, texplore] model (white square, 5B) improves over bn−1 by just 0.002 nats per interbout, or 1.4%. Since the models can improve by combining features, we combine all features from 5A to construct a combo interbout model (5C). We add features one at a time via greedy stepwise selection, adding the feature that increases MLL most at each step. Model quality improves and eventually saturates during combo model construction (note plateau in MLL).
We next repeat this procedure for the bout-type models (5D-F). We include also results from GLMs composed from extra extrinsic features listed in 4A (vsize, vx, and vy). We find that preceding bout-type is by far the best predictor of bout-type bn, followed by hunt dwell-time and explore dwell-time, and then preceding interbout interval. As constructed, the GLMs fail to fully capture the complex relationship between the state of the fish’s local environment and its bout-type selection. This problem is challenging for several reasons. Identified environmental objects include both prey (e.g. paramecia, rotifers) and non-prey (e.g. dust, algae), which differentially influence larval zebrafish behavior. Second, environmental objects are abundant in these experiments (mean # of identified objects per bout = 12). This complicates visual scenes encountered by the fish and also our representations of environmental state. Third, the locations, sizes, shapes, and motion patterns of nearby objects are likely to interact in complex ways to influence larval zebrafish action selection. To improve our understanding of how sensory input relates to bout-type selection, we construct feed-forward neural networks that take the extrinsic features as input (S12) and combine them nonlinearly to form a prediction. We find this neural network model improves substantially over the vloc GLM (see S12), with predictions of all bout-types improving on held-out data, especially hunting bouts. However, bout-type bn is still far better predicted by preceding bout-type. This result indicates that more sophisticated modeling approaches may be necessary to more effectively predict future larval zebrafish behavior from naturalistic environmental data, but also that bout-type selection depends strongly on the animal’s very short-term behavioral history1.
The strongest paired bout model (white circle, 5E) again combines preceding bout-type with preceding interbout interval, even though preceding interbout interval is just the 4th strongest individual feature. This paired [bn−1, in] model improves over bn−1 alone by 0.09 nats per bout, or 14%. By contrast, the paired [bn−1, texplore] model (white square, 5E) improves over bn−1 by just 0.005 nats per bout, or 0.8%. This result again indicates that distinct features of the fish’s short-term behavioral history (i.e. preceding bout-type and preceding interbout interval) encode non-redundant information about its future action selection. As before, we construct a combo bout model by iteratively adding features (5F). The strongest 3-component bout model ([bn−1, in, vloc]) adds information about prey locations to the strongest paired bout model, combining sources of internal and external information over a timescale of a second or less. Model quality again saturates during construction of the combo bout model.
We next take a closer look at the quality of the combo interbout and bout models. To confirm the NB distribution was a proper choice, we compare the NB combo interbout model (from 5C) to similar combo interbout models constructed instead with either Poisson or geometric distributions (5G). While it requires more free parameters, the NB model clearly outperforms the others on held-out data. For the bout models, we have so far considered overall quality by assessing how well bout-types can be predicted in general. Here we show predictive performance of the combo bout model for each individual bout-type (5H), as measured by the F1 score60 of each one-vs-rest classifier. We compare this performance to that of the baseline pooled-bias bout model (5H, white lines). This shows us which bout-types are easiest to predict (e.g. pursuits, aborts, strikes, e0), and which are most challenging (e.g. e4-6, j2). In addition, we reproduce this analysis for each of the single feature bout models described in Figure 4, and also for the strongest paired and 3-component bout models (S13). We find the combo bout model distributes probability mass over similar bout-types (5I), which should be expected if the generative processes involved in the production of similar bout-types are also similar.
Simulating Behavioral Trajectories of Fed and Starved Fish
A strong test of a behavioral model is to evaluate its ability to generate realistic behavior in novel contexts. To that end, we alternate sampling from the combo bout and interbout models (here called a combo renewal process) to move a virtual fish through an artificial environment with abundant prey (6A, Supplementary Videos 3-4). We simulated behavioral trajectories of 50 fed and 50 starved fish and found that simulations capture multiple timescales of structure in larval zebrafish behavior. Prey object locations influence hunting bout selection in expected ways (6B), and fed fish select longer interbout intervals (6C), as expected. The effects of hunger can be seen by comparing bout transition probabilities of fed and starved fish in simulations (6D). For example, fed fish are more likely to transition to low-energy exploring bouts, while starved fish are more likely to transition to high-energy exploring bouts. Starved fish also adjust their behavior in several ways to increase food intake. They are more likely to transition to J-turns (especially j2) and to extend hunt sequences by linking pursuits55. Starved fish are also less likely to transition to abort and more likely to transition to strike than fed fish.
The combo renewal process simulations also capture longer timescale behavioral dependencies observed in our experiments. With real fish, we find that hunts ending in strike are much longer than those ending in abort (6E, left panel). This trend is absent in simulations generated from a first-order Markov renewal process (6E, middle panel) in which interbout intervals and bout-types depend on just preceding bout-type (bn−1). By contrast, the combo renewal process simulations do better to recover this higher-order hunting structure (6E, right panel). We also generate more realistic exploring behavior with the combo renewal process. Upon entering exploring mode, larvae are initially likely to exit, but get more locked into this mode with time (see 4H). Accordingly, we see an overabundance of short and also long exploring sequences in combo renewal process simulations compared to simpler simulations (6F-G). Finally, the simulations capture slowly changing differences in bout-type selection probabilities across fed and starved fish as their hunger states equilibrate over 40 minutes (6H).
Discussion
Behavior is the principal output of the nervous system and, like the nervous system, it is complex and high-dimensional29, 61. To make studying the brain more manageable, animal behavior is often constrained during neurobehavioral experiments. This reductionist approach has many benefits: it simplifies behavioral description, reduces experimental variability, and improves interpretability of neural data. However, an important frontier in neuroscience is to better understand how brains function in natural conditions62. The NIH BRAIN Initiative63 has identified study of the “Brain In Action”64 as a priority research area, stating that “[a] critical step ahead is to study more complex behavioral tasks and to use more sophisticated methods of quantifying behavioral, environmental, and internal state influences on individuals.”65 Continued development of such tools should allow for improved quantification of ethologically relevant behaviors occurring in naturalistic environments. Importantly, these tools should capture the dynamics of minimal behavioral elements, scale to big datasets, and be compatible with modern techniques to record neural population activity. Here, we describe such an approach for the study of larval zebrafish behavior.
In this study, we first acquired high resolution behavioral and environmental data of larval zebrafish exploring and hunting in an unrestrictive naturalistic environment. We then used dimensionality reduction and density-based clustering to categorize individual swim bouts into discrete types. Finally, we constructed a probabilistic generative model (specifically, a marked renewal process) that predicts future larval zebrafish behavior by combining information about hunger state, multiple timescales of behavioral history, and the positions and properties of potential prey in its local environment. This approach offers a means to summarize larval zebrafish behavioral patterns, to validate and compare models composed from separate predictive features, and to simulate realistic behavioral trajectories through virtual environments. By constructing a behavioral model of how state influences action, we can make predictions about what types of signals must be present in the neural system driving this behavior.
Our study generates testable hypotheses about how neural mechanisms might give rise to observed behavioral patterns on multiple timescales. On a short timescale, we see that larvae are likely to link consecutive exploring bouts through ipsilateral transitions, but also tend to begin hunts ipsilaterally. It has been shown that a reciprocally connected neural circuit in the anterior hindbrain (the ARTR) alternates between “leftward” and “rightward” states to mediate temporal correlations in turn direction during exploration40. However, how the ARTR or related circuits may bias reactions to prey stimuli is unknown. We predict the ARTR or related networks may asymmetrically modulate premotor systems (e.g. reticulospinal neurons66, hunting command neurons67, or premotor tectal assemblies68) such that, with the ARTR in a leftward state, leftward J-turns are generated preferentially to rightward J-turns. Alternatively, ARTR-state-dependant modulation could also occur further upstream in the sensorimotor hierarchy, for example, through asymmetric modulation of left- and right-hemisphere retinorecipient areas that process prey stimuli (e.g. optic tectum, AF7)68, 69. In this case, identical prey stimuli presented to the left or right eye may not be equally salient. Instead, output from the left and right retinas might be processed asymmetrically to promote ipsilateral transitions from exploring to hunting.
Following hunt initiation, several behavioral patterns interact to influence hunt outcome. As hunts extend, larvae select shorter interbout intervals, pursuits become finer, and abort probability decreases while strike probability increases. By contrast, as the time since the previous bout increases, abort probability increases, pursuit probability rises and falls, and strike probability decreases. We posit these time-sensitive hunting patterns depend on reciprocal connectivity between the nucleus isthmi (NI) and the tectum and pretectum55. It has been shown that the NI becomes active following hunt initiation and that NI ablation leads to specific deficits in hunt sequence maintenance, thereby increasing abort probability. Henriques et al.55 propose a model for NI-mediated feedback facilitation of (pre)tectal prey responses to facilitate extension of hunt sequences. This model can help explain why abort probability decreases and strike probability increases as hunts extend, though how bout-type selection during hunts depends so intricately on preceding interbout interval duration is not well understood. It is possible that tectal prey representations attenuate as the interbout interval extends beyond a few hundred milliseconds, potentially through phasic NI feedback that ramps and then decreases as each interbout extends. Alternatively, perhaps premotor populations49,67,68 involved in pursuit and strike generation become increasingly inhibited as interbouts get longer, increasing hunt termination probability. Since larvae tend to abort hunts through a contralateral bout transition, we suspect abort generation may frequently coincide with a change in ARTR state. Such a mechanism might facilitate switches in spatial attention from one hemifield to the other, thereby inhibiting return70 to a previously pursued object.
In our study, we observe many behavioral differences across fed and starved fish that equilibrate over tens of minutes. With respect to hunting, we see that starved fish are more likely to initiate and extend hunts. While engaged in a hunt sequence, starved fish also upregulate transitions to strike and downregulate transitions to abort relative to fed fish. It has been shown that food deprivation modulates larval zebrafish tectal processing of prey-like and predator-like visual stimuli such that food-deprived larvae are more likely to perceive and approach small moving visual objects50. Specifically, hunger induces recruitment of additional prey-responsive tectal neurons and neuroendocrine and serotonergic signaling mediates this effect50. We posit this mechanism may also increase tectal input to NI, thereby increasing NI-mediated feedback to facilitate hunt sequence extension and increased strike probability in starved fish. While not yet tested, direct hunger-state modulation of tectal-projecting NI neurons is also plausible. Other studies show that lateral hypothalamic neuron activity correlates positively with feeding rate71 and that lateral hypothalamic neurons respond to both sensory and consummatory food cues72. This brain region is likely critically involved in sustaining increased hunting over tens of minutes through modulating visual responses to prey and/or facilitating hunting (pre)motor circuits.
While the above mechanisms can explain why fed fish initiate fewer hunts in these experiments, we hypothesize satiety signals affect several other circuits to induce additional changes in exploring behavior. Fed fish select longer interbout intervals, lower-energy exploring bouts, and maintain wider eye divergence preceding all exploring bout-types. To coordinate these behavioral patterns, satiety cues may separately modulate midbrain nuclei involved in regulating bout timing73, nMLF neurons involved in regulating swim bout duration and tail-beat frequency59, and oculomotor centers involved in controlling eye position74,75,76. It is clear that feeding state coordinates a complex array of behavioral modifications, likely through modulation of many circuits distributed across the larval zebrafish brain.
There are many avenues to extend this work. In future studies of naturalistic larval zebrafish behavior, moving camera systems could employ faster camera frame rates, shorter exposure durations, and better tracking algorithms to improve raw behavioral data. These modifications will allow for higher resolution pose estimation (e.g. by including pectoral fin dynamics, pitch and roll estimates, and tail half-beat analysis14), facilitate more comprehensive bout-type classification, and yield significantly longer continuous behavioral sequences. Richer datasets will enable future models to extract nuanced environmental dependencies, like prolonged attention to single prey amongst many distractors during long hunt sequences (Bolton et al., manuscript in preparation). Such models might also combine behavioral history information with raw environmental video77 to sharpen behavioral predictions. Future models may simultaneously infer discrete behavioral states and their dynamics8,15,78,79, though the non-Markovian dependencies on past behavior present new challenges80. Likewise, there are many other internal state variables that could govern action selection, and future models could seek to infer these latent states80 rather than using proxy covariates like tank-time. Behavioral states and dynamics differ slightly from one individual larva to the next, and long-term behavioral recordings combined with hierarchical models81, 82 will allow us to study how these behavioral differences emerge and change throughout early development. In addition, the contribution of particular neural populations in generating naturalistic behavioral patterns may be probed by combining these behavioral models with experiments to activate, inhibit, or ablate specific neural cell-types in observed fish. Finally, our approach to modeling naturalistic larval zebrafish behavior may be combined with new technologies to record large neural populations in freely swimming fish83,84,85,86,87. This will present opportunities to construct joint models of neural activity and natural behavior, providing important tools to study the brain in action88, 89.
Methods and Materials
Animal Care
All fish were 7-8 days post-fertilization (dpf) Nacre−/− zebrafish raised at 27C. Fish were given abundant live paramecia as food beginning at 5dpf. On test day, fish from the fed group remained in their Petri dish with abundant paramecia while fish from the starved group were placed in clean water for 2-5 hours prior to testing. Testing was performed between 10 AM and 6 PM with 4-6 fish usually tested per day.
BEAST Design
The gantry was acquired from CNC Router Parts (CRP4848 4ft × 4ft CNC Router Kit; www.cncrouterparts.com) and was modified to run upside-down on top of a support structure constructed from aluminum T-slotted framing available through 80/20 Inc (www.8020.net). Three electric brushless servo motors (CPM-MCPV-3432P-ELN ClearPath Integrated Servo Motors) and 3 Amp DC Power Supply (E3PS12-75) were acquired from Teknic (www.teknic.com). The camera (EoSens 3CL) was acquired from Mikrotron (www.mikrotron.de) with a frame-grabber from National Instruments (NI PCIe-1433; www.ni.com). The camera lens was acquired from Nikon (AF-S VR Micro-Nikkor 105mm f/2.8G IF-ED; www.nikonusa.com). A long-pass infrared filter was placed over the lens (62 mm Hoya R72; www.hoyafilter.com) to block light from the projector and collect transmitted light from an array of 16 IR-LED security dome lights (850 nm Wide Angle Dome Illuminators) positioned on the air table below the fish tank. The projector was acquired from Optoma (Optoma GT1080; www.optoma.com) and mounted on the side of the air table to project onto a diffusive screen (Rosco Cinegel 3026 Tough White Diffusion (216); www.stagelightingstore.com) embedded in the bottom of the plexiglass tank.
Data Acquisition
The walls of the observation arena were assembled with light gray LEGO blocks to confine the fish to a water volume of 300 × 300 × 4 mm. Approximately 15 ml of water containing a high density of live paramecia were added near the center of the arena prior to testing each fish. This paramecia stock also contained some rotifers and algae particles. For testing, single fish were transferred to the arena where inwardly drifting concentric gratings were projected to bring the fish to the arena center. Zebrafish larvae tend to turn and swim in the direction of perceived whole-field motion, a reflexive behavior called the optomotor response, and we leverage this response to relocate the fish. Once the overhead camera detected the arrival of the fish, the first observational trial was initiated and the drifting gratings were replaced by a static color image of small pebbles, a natural image with reasonably high spatial contrast. Next, the camera moved automatically on the gantry to maintain position above the fish and capture video with a frame-rate of 60Hz and 2 millisecond exposure duration per frame. The fish was tracked for 3 minutes or until it reached the edge of the arena or tracking failed. The fish could evade the tracking camera with a high-speed escape maneuver, but these events were fairly rare. At the end of each trial, the camera returned to the arena center, video data was transferred from memory to hard disk, and concentric gratings were once again used to bring the fish back to the arena center to initiate another trial (up to 18 trials per fish). If the fish did not return to the center within 10-15 minutes, the experiment was terminated. The tracking algorithm was written to keep the darkest pixel in the image (usually contained within one of the eyes of the fish) within a small bounding box located at the image center. If the darkest pixel was located outside this bounding box, a command was sent to the motors to reposition the camera to the location of that darkest pixel. In this way, the camera moved smoothly from point to point to follow the fish, using the “Pulse Burst Positioning Mode” setting for the ClearPath motors. We run the ClearPath motors with Teknic’s jerk-limiting RAS technology engaged to generate smooth motion trajectories and minimize vibration during point to point movement. Experiments were run using PsychToolBox in Matlab.
Image Registration and Fish Pose Estimation
In every image frame, connected component pixel regions corresponding to the left eye, right eye, and swim bladder were identified. The fish head center was defined as the average position of the centers of these 3 regions of interest. Heading direction is defined as the direction of the vector from the swim bladder center to the midpoint between the two eye centers. This information is used to translate and rotate each image for subsequent pose estimation and environment analysis. Only image frames in which all postural features could be extracted were included for further analysis. One common issue with pose estimation was caused by body roll of the fish, usually during an attempt to strike at a prey object, in which the fish would roll (rotate around its rostro-caudal axis) enough for one eye to occlude the other from the view of the overhead camera. Rather than estimate eye vergence angles in these situations, these image frames were excluded. Another common issue was caused by high-speed maneuvers by the fish during which a 2 millisecond exposure was insufficient to capture a suitably sharp image, thus causing either image registration or pose estimation algorithms to be compromised. We included only image frames in which all postural features could be accurately extracted for further analysis. Video segments containing problematic frames were split into separate video segments.
Temporal Segmentation of Bout and Interbout Epochs
Bout and interbout epochs were identified by taking the absolute value of the frame to frame difference in heading angle and thresholding this time-series at 0.7 degrees. This binary signal was then dilated (radius = 2 elements) and eroded (radius = 1 element) with built-in Matlab functions (imdilate, imerode) to merge bout fragments and expand bout epochs to include one extra frame at the beginning and end of each bout. These operations set the minimal duration of both bout and interbout epochs to 3 frames (50 ms).
Bout Summary Statistics
Δheading per bout: The change in heading angle per bout was calculated by averaging the heading direction of the fish over all frames in the preceding interbout epoch and subtracting this value from the average heading direction through all frames in the following interbout epoch. Positive values are assigned to leftward bouts.
|Δheading| per bout: The absolute value of Δheading per bout.
distance traveled per bout: The change in head position per bout was calculated by averaging the position of the fish head in the arena over all frames in the preceding interbout epoch (starting position) and also for the frames in the following interbout epoch (ending position). The distance between these two points is the distance traveled per bout.
|Δtail-shape| per bout: This non-negative 1-dimensional quantity summarizes how much the tail changes shape during the 10 frames used to represent each swim bout (as in 2B). To compute this quantity, let the 10-frame tail angle measurements be placed in an array T with shape 20 × 10. First, the magnitude of frame to frame changes in each tail segment angle are computed to give a new array with shape 20 × 9. |Δtail-shape| is the sum of the absolute values of these 180 array elements. This value is divided by 180 to give units: radians per segment per frame. In Matlab syntax, this is computed as: Several additional summary statistics are used to describe the fish eyes during each bout. These metrics are computed from the 10-frame representations of each bout and are described in S7.
tSNE Input
As described in 2B, each swim bout is represented as a 220-dimensional vector encoding the posture of the fish through 10 image frames beginning at bout initiation with 20 tail vectors and 2 eye vergence angles per frame (see also S1). All rightward bouts (Δheading < 0) were mirror reflected prior to running tSNE by swapping the left and right eye measurements and multiplying all tail angle measurements by Because tail measurements are tenfold more abundant than eye measurements and we sought to emphasize eye data in our clustering, we decreased the relative magnitude of the tail measurements by encoding tail measurements in radians and encoding eye measurements in degrees. In practice, we found that we could identify roughly the same bout-classes across a fairly wide range of scaling factors used to emphasize eye measurements relative to tail measurements. While it is common to preprocess data with PCA prior to running tSNE, we achieved qualitatively similar embedding results with and without this step, so we simply ran tSNE on the 220-D inputs.
Each of the 200,559 swim bouts are represented as 220-D vectors as described above and are embedded in a 2-D space with tSNE. We implement tSNE with Barnes-Hut approximations with CUDA to decrease tSNE runtime (https://github.com/georgedimitriadis/t_sne_bhcuda). Euclidean distance was used as a distance metric. Following embedding, 5 bout classes were identified with the routines described in S2. The 3 largest clusters were then further subdivided by kinematic variables, |Δheading| and |Δtail-shape|, as described in S6 and above.
Identifying and Encoding Environmental Information
As described in 1K, objects near the water surface preceding each swim bout are identified with image processing routines. Following image translation and rotation, a stack of the 6 images preceding bout initiation are cropped as in 1K. High spatial contrast objects are identified in Matlab by filtering each image in this stack with a Laplacian of Gaussian 2-D filter (size = 13 × 13 pixels, sigma = 1.6). Contiguous 3-D object volumes in this image stack are smoothed with 3-D dilation and erosion. The average position of each object in the image frame preceding bout initiation was used to encode object location with the extrinsic feature vloc. The number of pixels assigned to each object in this final frame was used to encode object size with the extrinsic feature vsize. The velocity of each object was computed by calculating the distance between the center of mass of each object in the 6th image frame preceding bout initiation (t-minus 100 ms) and the 1st frame preceding bout initiation (t-minus 17 ms) and dividing by the time elapsed. If an object was not properly segmented through all 6 frames, the velocity was calculated from fewer frames (with a minimum of 3 frames). The velocity vectors for each object were used to encode x-velocity and y-velocity in the extrinsic features vx and vy. While the image resolution is sufficient to extract additional features such as the orientation, eccentricity, and detailed shape of each of object, we have not yet included this information in our predictive models.
Intrinsic Features to Predict Interbout Intervals and Bout-types
preceding interbout interval
the duration of the preceding interbout interval in seconds. For models to predict interbout interval duration (in), this is the duration of interbout in−1. For models to predict bout-type (bn), this is the duration of interbout in.
preceding bout-type
the category (i.e. bout-type) of the preceding swim bout. There are 18 bout-types, each of which is composed of left and right versions, giving 36 categories in total.
hunt dwell-time
the integer number of observed preceding consecutive hunting bouts (i.e. J-turn, pursuit, abort, strike). As hunt sequences extend, this value increases.
explore dwell-time
the integer number of observed preceding consecutive exploring bouts. For all predicted interbout intervals and bout-types, either hunt dwell-time or explore dwell-time will be zero, with the other being a positive integer. Only the contiguous bout sequence containing the predicted interbout interval (in) or bout-type (bn) is used to define these feature values.
tank-time
the amount of time (in minutes) elapsed since the first trial was initiated for that fish.
Extrinsic Features to Predict Bout-types
vloc: locations of potential prey in the local environment.
vsize: sizes of potential prey in the local environment..
vx: relative x-velocities of potential prey in the local environment.
vy: relative y-velocities of potential prey in the local environment.
See S9-S10 for more information on intrinsic and extrinsic feature encoding.
Simulations
For our combo renewal process simulations (Figure 6, Supplementary Videos 3-4), we simulated an environment with prey objects that move as biased random walking particles. Each particle has a constant size and speed. These particles influence bout-type selection as their locations, sizes, and relative velocities are encoded with the extrinsic features vloc, vsize, vx, and vy. We simulated 50 fed and 50 starved fish for 40 minutes each (in 20 two-minute trials). Similar to our real experiments, prey are distributed with a centro-peripheral gradient, with the highest density of prey located near where trials begin. To simulate a behavioral sequence, an interbout interval duration is selected by randomly sampling from an interbout interval distribution generated from the combo interbout interval model (described in 5C). Next, a bout-type is selected by randomly sampling from a bout-type probability distribution generated from the combo bout model (described in 5F). Upon selection of a bout-type, we move the fish through its virtual world along the bout-type specific trajectories described in S14. Since the combo bout model depends on the environment in addition to behavioral history and hunger state, the virtual prey objects influence the fish’s behavioral trajectory. We call this combined bout and interbout model a combo renewal process. For comparison, we simulate 50 fed and 50 starved fish with a simpler model in which interbout intervals and bout-types depend only on preceding bout-type (bn−1). In these simulations, interbout interval durations are sampled from a probability distribution generated by the selected form of the preceding bout-type feature (split-bn−1), and bout-types are sampled from a bout-type probability distribution generated from the selected form of the preceding bout-type feature (pooled-bn−1). This model is referred to as a first-order Markov renewal process, and has no environmental dependencies. For both the combo renewal process simulations and the first-order Markov renewal process simulations, model weights are set at the maximum a posteriori (MAP) estimate of the GLM weights from the trained models.
Data Availability
All data and modeling code will be made available upon final acceptance of the manuscript.
Competing Interests
The authors declare no competing financial interests.
Author Contributions
R.J. and F.E. conceived the project. T.P. and R.J. designed and constructed the BEAST. R.J. and C.W. conceived the basic experimental design. R.J. wrote the data acquisition code. R.J., C.W., and E.S. collected the data. R.J. processed and prepared all data for use in modeling studies. S.L. conceived and implemented all models with input from R.J. A.M. advised on neural network models. K.H. made the animations in Supplementary Videos 1 and 3. R.J. wrote the paper with input from all authors. S.L., C.W., K.H., and F.E. edited the paper. S.L. wrote the Technical Appendix.
Supplementary Video Legends
Figure SV1: Data acquisition with the BEAST (animated). This is an animation of the BEAST rig used to acquire data for this study. The larval zebrafish swims in an expansive arena (300 × 300 × 4 mm water volume) and is tracked with a camera moving automatically overhead. The tank rests on an air table to isolate the fish from the vibration of the moving gantry. The camera has an infrared filter and collects light from the array of IR-LEDs positioned on the air table. The diffusive screen embedded in the tank bottom scatters the infrared light and helps to create even lighting for behavioral imaging. A projector is used to project inwardly drifting gratings (not shown) onto the screen in the tank bottom to bring the fish to the center of the arena, where the camera waits to detect the arrival of the fish. Once at the center, the projector delivers a static color image of pebbles (pictured in the video).
Figure SV2: Aligned behavioral video containing a hunt sequence. This video shows the behavioral sequence described in 2J-K. All behavioral video in this study is acquired at 60 frames per second, but this video is shown at half-speed (30 frames per second). The video contains 439 image frames (7.32 seconds in real time). The fish emits 2 exploring bouts, and then transitions into a 13-bout hunt sequence ending with a successful strike to capture a prey. The fish initiates the hunt by orienting toward a prey object with a high-|Δheading| leftward J-turn (j2-L). However, upon orienting toward this prey object, another prey arrives in the same location, and the fish appears to switch to a separate isolated prey target using a high-|Δheading|, high-|Δtail-shape| pursuit bout (p4-L). The fish then pursues this moving prey object, closing the distance to the target and aligning its heading direction with the heading direction of the target. During prey capture, the prey can be seen entering the mouth of the fish and then moving around inside the mouth before it is swallowed. Swallowing of the prey coincides with emission of an e0 bout. These subtle movements highlight a challenge in parsing larval zebrafish behavior into bouts and interbouts since swallowing behavior and other orofacial and pectoral fin movements are near our threshold for bout detection. In the right panel of the video, the 20 tail vectors used to define tail-shape are shown in each image frame, and the vergence angle of each eye is indicated by coloring the outline of each eye with the “cool” colormap from Matlab, ranging from blue (diverged) to magenta (converged). Prey objects are extracted from the movie for display with image processing routines similar to those used to encode environmental state preceding each swim bout.
Figure SV3: Animation of a simulated behavioral trajectory The behavioral trajectory shown in 6A is reproduced here in an animation. This excerpt was originally 411 frames (6.85 seconds of simulated time), but is slowed down 3x for display in this video. This behavior is generated by sampling from our combo renewal process model (see Methods). In this sequence, the virtual fish exits exploring mode to hunt a moving prey object, ending this virtual hunt with a strike toward the prey. This example captures many salient features of naturalistic larval zebrafish behavior observed in our study. That being said, we searched our simulated data to locate a behavioral sequence for display, and this example is not necessarily representative of an average simulated hunt. Also, when the fish strikes at the virtual prey object in this movie, we remove that prey from the video for display purposes. However, in our simulations, the environment runs in open-loop and prey are not affected by the behavior of the virtual fish.
Figure SV4: Visualizing action selection in combo renewal process simulations Here we show a full 2-minute behavioral trial simulated from our combo renewal process model. This video is slowed down 4x relative to simulated time. The behavioral sequence shown in 6A is taken from this simulated trial (beginning with bout b161 at 6:14 in the video). Fish head position (colored circle) and heading direction (rightward) are held constant, and the virtual environment translates and rotates as the fish moves during swim bouts. Virtual prey are shown as black circles and the corresponding representation of prey locations with the vloc feature is shown (upper right panel). For simulation (see 6A, Methods, Technical Appendix for more detail), an interbout interval duration is randomly sampled from a probability distribution generated from the combo interbout model (bottom right in video). Sampled interbout interval durations are indicated with a red bar, and the distribution from which it was sampled is shown in black. A gray bar moves rightward to indicate the passage of time during interbout intervals. As time passes, we update the bout-type probability distribution generated from our combo bout model (bottom left of video), evaluated for the currently displayed value of the ongoing interbout interval (this is the preceding interbout interval on which the next bout-type partially depends). In this way, the change in bout-type probabilities with increasing interbout interval can be observed. Once the sampled time for the interbout interval has elapsed, a bout-type is randomly sampled from the bout-type probability distribution (indicated with a star). The virtual fish then moves through its environment along the designated trajectory for that bout-type (see S14).
Technical Appendix
This appendix describes the details of our mathematical notation, feature encoding, construction of the marked renewal process model, procedures for fitting the model, methods for obtaining posterior credible intervals of key parameters, and methods for hyperparameter selection.
Notation
The data consists of many bout sequences, as described above. Let denote the bout type of the n-th bout in the s-th sequence, and let denote the preceding interbout interval. Let Nsdenote the number of bouts in the s-th sequence and S denote the total number of sequences. The set is the sequence of bouts and interbout intervals in the s-th sequence, and is the set of all sequences’ data.
The features are functions of the preceding bouts and interbout intervals and of the current environmental input. Let denote the history of bouts and interbout intervals preceding the n-th bout in sequence s.1 These histories will form the input to the interbout interval and bout type GLMs. Let denote the set of environmental inputs to the n-th bout in the s-th sequence, and let denote the sequence of environmental inputs for each bout in the sequence.
Feature functions map behavioral history and environmental inputs to a vector representation of length Df that can be input to a GLM. Here, f denotes the type of feature, like previous bout type or preceding interval, and each feature type may have its own dimensionality. The feature functions we consider only use one of the inputs at a time, though more complex feature functions could be used in future work. Next we specify how these feature functions are computed.
Feature Encoding
For clarity, we omit the subscript-s from the following definitions as the feature functions are not sequence-dependent. The first feature is a constant bias, . This allows for baseline probabilities of different bout types and a baseline mean and dispersion of the intervals.
Discrete features like the preceding bout type are represented with a one-hot encoding. If the preceding bout type is bn−1 = k, we have Scalar features like in, thunt, texplore, and ttank are encoded with a set of basis functions. Let denote a generic function that maps behavioral history and environmental input to a scalar value in the set . For example, the preceding interval is a function of the behavioral history that outputs a non-negative scalar value. We use a data-driven approach to determine a set of basis functions for representing these scalar values in a way that allows the GLMs to learn nonlinear dependencies. First, we compute the empirical cumulative distribution function (CDF) . We approximate this function by computing the histogram of X values on a fine grid (typically with 100 equispaced bins). To construct a basis that has higher precision around the most common values, we map through the CDF to obtain a point [0, 1], and then we evaluate a set of J orthogonal basis functions at that point to obtain a feature vector. Formally, we have, The basis functions are obtained by orthogonalizing a set of radial basis functions , where, for u ∈ [0, 1]. We evenly space means at values and set the standard deviation to . Finally, we perform a QR factorization to obtain the orthonormal basis on the interval [0, 1]. As discussed below, each feature type has a different value of J, as chosen via an empirical Bayes procedure. Figure S9 shows the bases computed on a dense grid of points in for a variety of features.
Finally, we have a set of environmental features including the location, size, x-velocity, and y-velocity of the paramecia in the fish’s field of view. Each of these features is a scalar value for each paramecium in the field of view. We summarize the activity of all paramecia by summing the feature values for all paramecia within a spatial bin. For small enough bins, this representation entails little loss of information, but future work could consider more sophisticated feature functions. The spatial bins are constructed by dividing the angular and radial dimensions of the fish’s field of view into bins. A bin that spans Δθ radians in the angular dimension and covers the interval [r, r +Δr) in the radial dimension has area . This area grows with the distance from the fish, r. To account for this difference in area, we normalize the summed feature values by the area to get a “feature density” within each bin. As with the temporal bases, we choose the number of angular and radial bins via the empirical Bayes procedure described below.
Constructing the Marked Renewal Process
The marked renewal process specifies a conditional probability density over a sequence of bouts and interbouts given model weights . By the chain rule, this density factorizes into a product of conditional probabilities of each interval and bout type given all preceding bouts and intervals, and, by assumption, the n-th bout depends only on the current environment. Formally, We assume that all sequences share the same set of weights and are conditionally independent given the weights so that .2
We use generalized linear models (GLMs) for both the intervals and the bout types. In a GLM, we first compute a weighted combination of the features via the “activation functions.” The output of the activation functions are then passed through a fixed nonlinear function to obtain the parameters of the conditional distribution on intervals or bout types, as appropriate. Typically, there is a separate activation function, and hence a separate set of weights, for each parameter in the final conditional distribution.
Interbout Interval GLMs
First consider a Poisson interval model parameterized by its mean . The mean is obtained from the bout history and environmental input as follows. Let denote the set of features input to the GLM. The activation function for the mean is given by, where is a weight vector in the set of weights , and it is specific to this parameter (µ) and feature (f). It is the same dimensionality as the output of the feature function φf. Thus, the output of the activation function is a real-valued scalar. Finally, the output of the activation function is passed through an inverse link function (here, an exponential function) to obtain the mean, Multi-parameter conditional distributions, like the negative binomial distribution parameterized by a mean and dispersion, require two separate activation functions and sets of weights. Table 1 lists the various forms of generalized linear models we used to model interbout intervals.
Bout Type GLMs
We modeled the conditional distribution of the next bout type in a similar fashion. The bout type follows a categorical distribution parameterized by π ∈ ΔK, a non-negative vector of length K that sums to one; i.e. a probability distribution over the K = 36 bout types. We modeled this probability vector with a GLM with a softmax inverse link function, where is an activation function that specifies the relative likelihood of the next bout type being of type k.
Weight Sharing
We assessed between-group differences and lateral biases by tying the weights for some features and comparing model performance. For example, to assess the lateral symmetry we compared the standard bout-type GLM described above to a “symmetric” GLM in which the weights for left and right bout types are constrained to be the same. More specifically, if k and k! correspond to the left and right versions of a bout type, like a particular J-turn, we constrain for symmetric features f.
Enforcing left/right symmetry is slightly more challenging for the weights of the preceding bout type feature. In that case, we can think of the collection of weights as a K × K “transition” matrix. If the bout types are sorted so that the first half correspond to left and the second half correspond to right, we tie the weights of the upper-left and lower-right quadrants (i.e. the ipsilateral transition weights), as well as those in the upper-right and lower-left quadrants (i.e. the contralateral transition weights). Likewise, we enforce symmetry in the environmental input weights by tying the weights of the each hemisphere of the field of view and its ipsi- or contralateral bout types.
Prior Distributions
We introduce a Gaussian prior on the model weights in order to regularize against overfitting, particularly with high-dimensional features. For each parameter θ and feature f we have, We learn the variance via the empirical Bayes procedure described below. We considered Laplace priors, which are akin to regularization, as well, but initial explorations showed minimal differences from the Gaussian distribution.
Fitting the Model
We fit the model with maximum a posteriori (MAP) estimation. We computed the log joint probability density, where the first term is the log likelihood from (4) and the second is the prior from (8). We used Autograd (https://github.com/HIPS/autograd/) to compute its gradients with respect to the model weights . We optimized the log joint probability with Newton’s method, using the conjugate gradient method to solve the linear system in the Hessian matrix , as implemented in SciPy.
Once the MAP estimate of the weights was obtained, we computed a Laplace approximation to the posterior distribution, where the inverse covariance matrix is the negative Hessian at the mode, (Note that we have committed a slight abuse of notation by treating as a vector rather than a set. To be precise, take to be the concatenation of the vectorized weights.)
Posterior Credible Intervals
We obtain approximate posterior credible intervals from the posterior covariance matrix Σ. Consider a particular parameter θ and feature f. Let denote its posterior mean and Σθ , f denote its posterior covariance. (The covariance of this single parameter and feature’s weights is just one diagonal block of the complete covariance matrix Σ.) The 95% posterior credible interval for the d-the entry in the weight vector is approximately . For scalar features represented by basis function expansions, we obtain the credible intervals like those shown in Figure 3F by first constructing a matrix of basis function evaluations where the t-th row is φf evaluated at the t-th percentile of the feature values. Then we compute the posterior credible interval in feature space as.
Empirical Bayes Hyperparameter Selection
Since the posterior density (12) integrates to one over , the Laplace approximation also yields an approximation to the marginal log likelihood,
The marginal likelihood is the probability of the data, integrating out the weights under their prior distribution. As seen in (14), it increases with the log joint probability density at the mode , but it decreases as the curvature at the mode grows and the posterior covariance Σ shrinks to zero. That is, the marginal likelihood balances posterior probability density around the mode with posterior uncertainty. A model with high marginal likelihood should assign high probability to the data over a large set of weights. This balance makes the marginal likelihood a natural measure of model fitness.
We select the hyperparameters—specifically, the prior variances and the number of basis functions or spatial bins J —by comparing the marginal likelihood estimates over a grid of values for each single-feature model. For each feature, we scan over a grid of hyperparameter values, and for each hyperparameter setting we fit a GLM with only that feature and the bias. We approximate the marginal likelihood using (14) and select the hyperparameter setting that achieves the maximum value over the grid. We then use these hyperparameter settings on composite models that combine multiple features. While the optimal single-feature hyperparameters are not necessarily optimal for the full model, we expect the selection to be biased toward over-parameterized models with more basis functions and larger prior variances. Erring in this direction is more conservative, in that it allows the full models greater flexibility, even if it is not necessary.
Neural Network Models of Environment Dependence
We also explored artifical neural network models to learn more complicated environment dependencies in the bout type GLMs. For these models, we treated φv*, where * denotes one of loc, size, x, or y, as the input to a multilayer feedforward network whose last layer is a softmax, as above, to output a probability distribution over the K = 36 bout types. Again, we use weight sharing to enforce lateral symmetry in the input-output function, but here the weights must be shared within each layer. Since the training function is no longer convex, we instead train the model with the Adam90 optimizer using mini-batches of randomly selected sequences. We use cross-validated log likelihood to determine the number of layers and hidden units. The Laplace approximation is not appropriate for this multimodal log probability, so we instead compare models on the basis of the log likelihood they assign to test data.
Acknowledgements
Research was funded by NIH grant U19NS104653 and Simons Foundation grant SCGB-325207. S.L. was supported by a Simons Collaboration on the Global Brain postdoctoral fellowship (SCGB-418011) and the Siebel Scholarship. We thank Ed Soucy, Joel Greenwood, and Adam Bercu of the Harvard Center for Brain Science Neuroengineering Core for technical support. We thank Andrew Bolton for helpful discussions on pose estimation and prey quantification, George Dimitriadis and Adam Kampff for assistance with GPU implementation of tSNE, and Matthew Johnson for helpful modeling discussions.
Footnotes
1 For bout models, the history also includes is,n, the interval preceding the n-th bout.
2 The first bout/interbout interval (bs,1, is,1) require some care as, at that point, there is no history to condition on. Likewise, the last interval is truncated if tracking is lost. We do not explicitly model these effects; instead, we assume the sequences are stationary and let the initial distribution be the empirical distribution of bouts and intervals, and we drop the last interval from our model. Since the vast majority of bouts are not at the start or end of the sequence, these assumptions have little effect on our inferred model parameters.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵
- [59].↵
- [60].↵
- [61].↵
- [62].↵
- [63].↵
- [64].↵
- [65].↵
- [66].↵
- [67].↵
- [68].↵
- [69].↵
- [70].↵
- [71].↵
- [72].↵
- [73].↵
- [74].↵
- [75].↵
- [76].↵
- [77].↵
- [78].↵
- [79].↵
- [80].↵
- [81].↵
- [82].↵
- [83].↵
- [84].↵
- [85].↵
- [86].↵
- [87].↵
- [88].↵
- [89].↵
- [90].↵