Abstract
Our understanding of how vision functions as primates actively navigate the real-world is remarkably sparse. As most data have been limited to chaired and typically head-restrained animals, the synergistic interactions of different motor actions/plans inherent to active sensing – e.g. eyes, head, posture, movement, etc. - on visual perception are largely unknown. To address this considerable gap in knowledge, we developed an innovative wireless head-mounted eye tracking system called CEREBRO for small mammals, such as marmoset monkeys. Our system performs Chair-free Eye-Recording using Backpack mounted micROcontrollers. Because eye illumination and environment lighting change continuously in natural contexts, we developed a segmentation artificial neural network to perform robust pupil tracking in these conditions. Leveraging this innovative system to investigate active vision, we demonstrate that although freely-moving marmosets exhibit frequent compensatory eye movements equivalent to other primates, including humans, the predictability of the visual system is enhanced when animals are freely-moving relative to when they are head-fixed. Moreover, despite increases in eye/head-motion during locomotion, gaze stabilization actually improved over periods when the monkeys were stationary. Rather than impair vision, the dynamics of gaze stabilization in freely-moving primates has been optimized over evolution to enable active sensing during natural exploration.
Main
Primate vision has been the subject of intense study for many decades and is arguably the most well understood neural system in the simian brain (Felleman and Van Essen 1991; A. J. Parker and Newsome 1998). And yet, our understanding of primate vision is incomplete. Like all sensory systems, the primate vision evolved to overcome the challenges of actively moving, exploring and engaging with the objects, individuals and the environment from different perspectives (P. R. L. Parker et al. 2022; Miller et al. 2022; Ngo et al. 2022). While the processes of visual encoding have been extensively studied with head-restrained subjects observing stimuli presented on a screen, details of how vision functions as primates actively move through and explore the real-world are remarkably limited. The tacit expectation being that the visual processes discovered in head-fixed preparations reflect the core of sensory encoding that will generalize to freely-moving animals once mechanisms to stabilize gaze are considered. Similar to all vertebrates, the head and eyes coordinate their respective movements in primates to enable a stable percept of visual inputs (Guitton 1992; Martinez-Trujillo, Wang, and Crawford 2003; Angelaki and Hess 2005; Angelaki and Cullen 2008). Because data on this complement of mechanisms has been limited to chaired animals unable to locomote, the synergistic effects of different motor actions – e.g. eyes, head, posture, movement, etc. - on primate visual perception and cognition during active exploration of the world are almost entirely unknown. The principal bottleneck being technical. Previous studies have achieved limited precision for projecting the gaze of free-moving primates into visual scenes (Shepherd and Platt 2006; Milton, Shahidi, and Dragoi 2020; Mao et al. 2021) and as a result have not detailed the stability nor the eye and head dynamics that contribute to gaze during free-motion. Here we introduce a method that precisely quantifies eye-movements and can accurately project the gaze of a primate into scenes as individuals freely explore an environment.
Recent work with mice demonstrate that eye-tracking systems can be miniaturized and mounted to the head in order to provide insight into vision during natural visual behavior (Michaiel, Abe, and Niell 2020; Meyer et al. 2018; Wallace et al. 2013; Meyer, O’Keefe, and Poort 2020), but these systems are not well suited for comparable studies in primates for at least two critical reasons. First, systems in mice rely on a tether which restricts the 3D mobility of primates. Second, they lack the precision and temporal resolution needed to accurately characterize high-resolution vision of primates. To address this challenge, we developed an innovative head-mounted eye tracking system to enable the study of active, natural visual behaviors, and related neural processes, in freely-moving marmosets. Our system - CEREBRO - allows for Chair-free Eye-Recording using Backpack mounted micROcontrollers at a speed and resolution needed to accurately quantify the visual behavior and underlying neural mechanisms of natural, active vision in primates. Using CEREBRO we confirmed that freely-moving marmosets exhibit frequent compensatory eye movements that enable them to stabilize gaze when viewing real world scenes consistent with previous studies in body restrained animals. By using this innovative system, however, we discovered that gaze stabilization and predictability was enhanced when the monkeys were moving naturally despite increases in eye/head-motion during locomotion. This suggests that previously unreported synergistic mechanisms for gaze stabilization are not only integral to primate active vision in the real-world but can be enhanced for greater compensation as animals move naturally through their environment.
Results
CEREBRO is a head mounted eye tracking system for freely moving marmosets
The small body size of marmosets (∼300-400g) necessitated certain design considerations when developing CEREBRO. The first of these decisions is related to the weight and wearability of the system itself. Based on previous experience, the estimated 60g weight of the complete system would not be feasible to be entirely situated on animal’s head without affecting its visual behavior. To resolve this issue, we separated the system into two separate, but integrated hardware submodules: the Head-piece and the Backpack. The Head-piece includes the camera assembly and scaffold fitted to animals’ head (fig 1a); and the Backpack includes the backend electronics for camera synchronizing, image acquisition and local data storage on the custom designed printed circuit boards (PCBs) (fig1b). Both the PCB and the head-piece IR LED are powered using a 600 mAh, Lithium-Polymer (Li-Po) battery that is housed inside the backpack.
The second consideration for this system was the cameras used here, as we sought to accurately quantify the high-resolution vision of primates. Commercially available parallel communication camera modules such as OV7725, OV2640, etc. have low frame rates (< 30hz) which are not ideal for eye tracking in primates. Better camera modules with faster frame rate (60 or 90hz) and HD resolution (1080p) are available but they do not follow parallel communication protocol. Rather these cameras use a camera serial interface called MIPI CSI-2 (Mobile Industry Processor Interface Camera Serial Interface). We also wanted to use a microprocessor as the core of the system to get better temporal resolution of the incoming data. Since STM32H750 can only support a parallel camera interface, we used a STMIPID02 MIPI CSI-2 deserializer to deserialize CSI communication from a MIPI CSI-2 camera sensor to DCMI in order to use cameras with sufficient recording frame rate and resolution needed here (Supplementary fig 3b). For our current system, we used OV4689 camera modules with two MIPI lanes and customized Flex cable length of 15 cm (SincereFirst, Guangzhou, China).
Head-Piece Module
The head-piece weighs ∼20g and comprises five different components (fig 1a): (1) the scaffold: a curved metal tube that serves as an anchor for all the pieces, (2) eye-cam: an HD MIPI camera (90fps) with Visible light filter and a Macro lens for looking at the eye, (3) world cam: an HD MIPI camera (60fps) for capturing the world scene in front of the camera, (4) an IR LED to illuminate the eye, and (5) a strategically placed Hot mirror to image the eye. The mechanical parts for the head-piece are made up of a Titanium alloy (Ti-6Al-4V) due to strength and weight considerations, and are held to the scaffold using M2 and M4 screw sets (see Supplementary Information 1 for detailed assembly description). Each piece is custom designed using FreeCAD and 3D printed using Direct-LASER-Sintering (DLS). The eye camera, IR LED and the hot mirror are positioned to achieve a direct image of the eye with optimal illumination.
To achieve reliable eye-tracking in a fully unrestrained animal, the eye camera’s position requires to be fixed with respect to the animal’s eye. This is accomplished by attaching 2 vertical headposts (5 mm diameter, 1 cm tall cylinders with a flat cut on one side) on the animals’ head (fig 1c). The two headposts restrict any rotation or translational movement of the headpiece and allows the headpiece to be placed at the same position for every recording session.
Backpack Module
The backpack module weighs 40g and comprises two Customized PCBs, as well as the casing and harness worn by the animals (fig 1b). Each camera has its own dedicated PCB (fig 1d). Both cameras (eye cam and world cam) have long flex cables (15 cm) and are operated by the PCB.
Customized Printed Circuit Boards
The PCB used here is an embedded system that runs on an STM32H750 microprocessor (fig 1d). STM32H750 is a 32-bit Arm Cortex-M7 core processor running up to 480 MHz and is better suited for interfacing with camera sensors since it embeds a digital camera interface (DCMI). Each PCB can be programmed and debugged using the Serial Wire Debug (SWD) mode. The PCBs store the raw camera data on a locally mounted, readily available UHS-1 (Ultra High Speed) class, SD card. The binary files are decoded offline to get an AVI format video of the stream and a CSV file with timestamp and IMU data. Two PCBs are stacked on top of each other and are enclosed in a custom designed 3D printed casing with a custom designed back harness (described below). To measure camera synchronized movements (acceleration and orientation) of head, the boards interface with an inertial measurement unit (IMU). The PCB boards also feature a 0.96” SPI TFT display which allows the users to preview the captured camera image and ensure the placement of the head-piece at the start of the session.
Eye-Tracking System Wearability
When fully configured on a marmoset (fig 1e) the CEREBRO allows a full range of motions characteristic of the marmoset’s natural behavior, while monitoring its position in the environment, eye position, and view of the scene (fig 1f). Overall, CEREBRO weighs ∼60 g (headpiece module: 20g + backpack module: 40g), which is comparable to the weight of two infant marmosets that an adult would normally carry on its back. To test the influence of this added weight on mobility, we compared marmosets’ behavior both with and without CEREBRO in multiple sessions (30 min) of freely-moving exploration in a large open arena, (200 cm × 100 cm x 240 cm). The animals carried an IR reflective 4mm bead on their backpacks in the arena and the body movement was tracked using motion tracking system “OptiTrack” with a resolution of less than 1mm at 120 Hz. We defined locomotion as any movement of the animal greater than 5 cm/sec (i.e. moving more than 20-30% the typical marmoset body length in under one second). Using this threshold criterion, we quantified the percentage of time animals spend moving through the arena. Over 12 sessions spread across months, we observed no significant difference (fig 1g) in the time animals spent locomoting vs stationary and scanning the environment (p=0.83). Even though the animals moved for the same amount of time, there is still a possibility that the weight can impede the speed of the animal. Upon investigation, we did not observe any significant difference in either the distance covered by the animals (p=0.569) or the average speed of the animal with and without CEREBRO (p=0.5692). This leads us to conclude that our system does not significantly impede the animals’ movement and can be comfortably used with a freely behaving animal.
Fast, efficient and accurate pupil detection
In the traditional experimental paradigms that have dominated primate neuroscience, animals are head-fixed and the light source can be controlled. The consistency of the light allows for highly accurate eye tracking to be achieved by thresholding the dark iris observed under infra-red light from a brightly illuminated image and the pupil detected either using its centroid or center of the ellipse fit. A critical challenge for eye-tracking in freely moving animals is the lack of control over the illumination. As an individual moves through any real-world environment external lighting conditions and shadows are constantly changing. This point is illustrated by our application of thresholding for pupil detection of a freely-moving marmoset (Fig 2a), which can lead to labeling of shadows at the edge of the eye rather than the pupil. Previous studies that examined gaze in free-moving primates relied on conventional pupil thresholding (Shepherd and Platt 2006; Milton, Shahidi, and Dragoi 2020; Mao et al. 2021), which limited the precision that could be obtained. To achieve accurate primate eye-tracking in natural, freely-moving conditions, an alternative method for pupil detection was needed.
Artificial Neural Networks (ANNs) offer an alternative approach to pupil detection under real-world conditions because they rely on features of an image and not necessarily the grayscale values, and therefore are resistant to brightness changes and/or shadows/occlusions. Such approaches have been successfully applied using commercial software, such as DeepLabCut (DLC), to track the pupil of freely-moving mice (Michaiel, Abe, and Niell 2020). To further optimize pupil tracking in freely-moving marmosets, here we developed a custom semantic segmentation ANN called UNet (fig 2b) (Ronneberger, Fischer, and Brox 2015) that yielded superior performance in these conditions. Whereas classical convolutional networks are classifiers which output a single class label with a confidence value, the UNet architecture achieves this by assigning a class label to each pixel of the image. The core logic for the UNet implemented here was to supplement a classical convolutional network (contracting path) with successive upscaled layers such that the final output image is of the same dimension with object of interest segmented out as white (>0) and background set to black (0). This workflow allows to detect pupil features robust from lighting and noise (fig 2b). With a fully trained network, we reliably detected pupils across various sessions and different animals. The network is easily trainable with fewer epochs and a training dataset of only 250-500 images (fig 2c). The data preparation and training of the ANN is explained in detail in Supplementary information 4, which also includes a user-friendly GUI (Supplementary fig 4b).
Accurately calculating the gaze point from the world camera necessitates that the pupil position from the eye camera be calibrated to the real-world position. To this end, we developed the following procedure. The animal is chaired and head-fixed in front of a computer screen placed ∼35 cm directly in front of the subject. Because marmosets naturally look at faces (Mitchell, Reynolds, and Miller 2014), 1-12 small marmoset faces are presented as calibration targets at multiple locations on the display monitor (fig 2d). A custom designed graphical user interface was used to adjust for scaling and offset in horizontal and vertical axes by a human operator offline.
This allowed us to reliably map pupil eye position to screen coordinates (fig 2e), which when head-free, generalized to world coordinates in front of the marmoset. The GUI provides functionality to project the estimated eye position, based on the manually defined gain and center parameters, onto the video of the scene camera in order to evaluate the quality of the eye calibration and iteratively refine the scaling and offset parameters to better fit the eye position data. The transformed raw eye positional data can then be exported in a format convenient for further analysis. To compare the system against other pupil-based eye trackers we computed the root-mean-square (RMS) stability of eye position during stable fixation epochs when animals were head-fixed. We find that across 7 recording sessions the RMS stability was 0.05 (+/− 0.0012 std). These estimates provide a lower bound on the system precision that rivals head-stabilized pupil-based eye trackers.
Electrophysiology in combination with CEREBRO
CEREBRO was designed explicitly to create a tool to investigate the neurobiology of natural, active vision in freely-moving monkeys. As such, integral to its design was the capacity to simultaneously record neural activity, eye behavior, and the visual scene of the animal, as each are integral to this broader ambition. We tested the validity and sufficiency of the system to this end by performing experiments with CEREBRO while activity of single neurons was recorded with chronically implanted multi-electrode arrays (N-form, Modular Bionics) in the visual cortex (V1/V2) of the marmosets. These experiments sought to (1) estimate visual tuning properties of neurons in primate visual cortex, i.e. receptive field mapping, as well as orientation and spatial frequency tuning (Emerson et al. 1987; DeAngelis, Ohzawa, and Freeman 1993; Ringach, Sapiro, and Shapley 1997; Livingstone, Pack, and Born 2001), using CEREBRO in more traditional head-fixed paradigms, so as to demonstrate the accuracy of our eye-tracking system by replicating these classic effects, and (2) obtain eye, head and body behavior, visual scene, and activity of single neurons simultaneously in freely-moving paradigm to demonstrate the capacity of CEREBRO in investigating primate active vision.
To recapitulate the tuning properties of V1 neurons, subjects were head-fixed while wearing CEREBRO and presented with the following stimuli: flashing dots for receptive field mapping, and drifting gratings with different orientations and spatial frequencies for tuning properties (fig. 3a; see online methods). Critically, marmosets were allowed to free-view the video screen during stimulus presentation and offline corrections for eye position enable accurate reconstruction of visual properties following a recently developed free-viewing approach (Yates et al. 2023). A key difference here from the previous study with head-fixed marmosets is that the visual input is obtained by the world camera’s view along with the estimated eye position from CEREBRO instead of what was known to be displayed on the screen, thus validating that these methods could generalize to real-world stimuli. Results from three example neurons demonstrate visual receptive fields estimated at the peak visual latency (fig. 3b top row, Online Methods). The orientation tuning at the peak visual latency (fig. 3b bottom row, Online Methods) estimated from the CEREBRO world camera image (in red) is shown in comparison to the tuning curves estimated from the known stimuli presented on the screen (in blue). These results replicate classic findings about the properties of neurons in early visual cortex thereby demonstrating that our eye-tracking system and calibration approach can accurately record neural responses in response to visual stimuli on the primate retina.
Following this, the animal was allowed to actively explore a 200cm × 100 cm arena decorated with various visual stimuli. Activity of single neurons was continuously recorded in the period of head-restraint and freely-moving condition; body and head movements were simultaneously recorded using OptiTrack system (fig. 3c, Online Methods). The comparison of spike waveforms in head-restraint versus freely-moving condition among 5 example neurons demonstrates the stability of neural recording throughout the session (fig. 3d). Within the same session, the eye behavior and the neural activity (spike rate) change between the head restrained vs freely moving animal. We explore some of these behaviors in the following sections.
Visual behavior of freely moving marmosets
The primary motivation to develop CEREBRO was to precisely quantify the characteristics of eye-movements and gaze in freely-moving, naturally behaving primates. To this end, we recorded visual behavior in marmosets wearing CEREBRO as they explored an open rectangular arena (fig 4a). As marmosets do not continuously move when in open-field environments, we distinguished between the following two behavioral states in these test sessions to determine whether differences in visual behavior emerged: a) “stationary” - the monkey was seated or standing and visually scanning the environment without physically changing locations (b) “locomotion” - the monkey was physically moving and changing its position in the environment (Online Methods). In a typical session, a marmoset remains stationary for extended periods at fixed locations (occupancy map in fig 4a), and then moved between those locations (gray traces, fig 4a). We conjectured that the gaze dynamics likely differs between these two behavioral states, as each differs in motoric demands and exploratory function.
In a freely moving primate, visual exploration is accomplished by changing gaze which can be defined as a sum of head and eye movements (fig 4b). Stable eye positions in this context are rare because even when an animal fixates at a fixed position in the scene – referred to here as a gaze-fixation - the eyes must still move smoothly to compensate for any head-movements for retinal stabilization. These compensatory eye movements reflect the vestibular ocular reflex (VOR) that well conserved across species and normally engaged to reduce retinal motion that is due to head and body motion (Wallman et al. 1982; Angelaki and Hess 2005; Angelaki and Cullen 2008; Cullen 2011, 2012). These compensatory movements correlate negatively with the head-movement velocity to subtract its effect and achieve stable gaze. By contrast, during rapid gaze-shifts, equivalent to traditional saccades in the head-fixed case, the VOR is suppressed and there is a combination of conjugate head and eye movements along the same direction, with eye velocity reversing at the end of the rapid shift as VOR is restored and gaze is again stabilized by compensatory eye movements compensating for continuing head velocity. The inset in Figure 4b illustrates this point as the sum of eye and head position (gaze) exhibits steps in position with stable periods in between. While head position is continuously changing, the eye position exhibits saw-toothed type patterns in which a jump in position is followed by decay backwards that compensates for the change in head position. To distinguish epochs of rapid gaze shifts and compensatory movements, we set a gaze velocity threshold of ±200 °/sec. Examination of eye movements during gaze-shifts and gaze-fixations revealed a clear trend of negative correlation between head and eye movements for fixation periods reflecting compensatory eye movements to stabilize gaze (purple, fig 4c) in a freely-moving marmoset. For the gaze velocity threshold set, we find compensatory movements are well separated from other rapid gaze shifts (green, fig 4c).
A core feature of mammalian ocular-motor behavior is known as the main sequence; a characteristic linear relation between the amplitude and peak velocity of eye movements. To test whether the main sequence is evident in a freely-moving primate, and characterize it with respect to head and gaze, we quantified the main sequence and amplitude-duration relationship. For conjugate gaze shifts we observed a characteristic pattern wherein eye velocity reached a peak velocity more quickly than head-velocity. As head-velocity decayed with a long tail, eye velocity reversed direction in order to counter-act the head-velocity and stabilize gaze (Fig 4d). This pattern was evident both for small and large gaze shifts, with the duration of the gaze shift being longer for larger gaze shifts. To quantify these patterns more accurately, we next plotted the peak velocity, latency to peak, and duration of the eye, head, and gaze components as a function of gaze amplitude (Fig 4e-h). Although the main sequence was evident in freely-moving marmosets here, evidence suggested differences in how the eye and head components contribute to it as a function of gaze amplitude (Fig 4e). Whereas the peak velocity of eye movements saturated for gaze shifts of roughly 10 degrees in amplitude around 400 degrees/sec, the peak velocity of head shifts continued to increase even for the largest measured shifts out to 80 degrees. The latency of the peak eye velocity always leads peak head velocity as a function of gaze amplitude with each growing as a function of shift amplitude (Fig. 4f). The peak of the gaze velocity follows more closely with eye velocity for small shifts (<36 degrees) and with head-velocity for larger shifts. The duration of gaze shifts follows a roughly linear relation with gaze amplitude (Fig. 4g). In summary, the components of eye and head shifts differ in their contribution to gaze shifts depending on the amplitude of the shift, with eye velocity rapidly saturating in its maximum velocity for shifts of about 10 degrees in size after which the slower initiating head-shift contributes more to the total gaze movement. As the gaze amplitude becomes larger, the contributions of the head increase linearly to the overall shifts while the eye saturates in its contribution after about 20 visual degrees (fig 4h).
To determine how visual behavior differed between behavioral contexts we compared the eye-movements of marmosets when head-fixed and freely-moving, distinguishing between instances when individuals were visually scanning the environment while ‘stationary’ and instances when animals were actively moving through the world by ‘locomotion’ in the latter context (fig 5a). As demonstrated in fig 5b, differences in marmosets’ visual behavior when head-fixed and both freely-moving contexts were stark, with the eye movements being notably dynamic in freely-moving contexts reflecting VOR adjustments for self-motion. We quantified these differences further by calculating the approximate entropy of marmoset eye movements in each of these three contexts. This analysis estimates the randomness of a time series where higher value of approximate entropy means a system that is more random and vice versa – thereby allowing a comparison of the collective consistency and predictability of the visual behavior. Overall, we observed that marmoset eye-movements in the head-restrained condition exhibited a significantly lower entropy than both the freely-moving conditions (stationary and locomotion; fig 5c), suggesting that eye movements in a freely moving animal is more chaotic/random. However, the approximate entropy for “gaze” of freely-moving animals – the combination of both head and eye-movements – a different pattern emerged. Here the approximate entropy in both freely-moving contexts was in fact lower than in the head-fixed context. This suggests that the synergistic combination of head and eye movements in a freely-moving animal yields a more predictable visual behavior than when head-fixed, a likely computational optimization of the visual system to stabilize visual perception in animals as they naturally move and explore the world. The statistics of maximal speed and amplitude of the eye movement shows a significant faster and larger eye movement in the freely-moving context (fig 5d).
In addition to significant contrasts in visual behavior between head-fixed and freely-moving contexts, we also observed differences in the latter when marmosets were stationary or locomoting. Specifically, marmosets make significantly more gaze shifts per second (fig 5e) with greater speeds (fig 5f) and larger amplitudes and maximal speed (fig 5g) during locomotion than during stationary phases. This pattern raises the question if the gaze of the animal is less stable during locomotion given the larger gaze shift amplitudes. To investigate that we calculated at the Root-mean-square stabilization of the gaze during gaze fixations during locomotion and stationary contexts. Analyses revealed that gaze was on average about 3 times more stable than head movements during gaze fixations (fig 5h), consistent with compensatory eye movements help correct for head-motion in these periods. When comparing the stability of gaze to that of head, however, we observed that although head movements were less stable during locomotion gaze stability surprisingly improved during locomotion (fig 5h), despite the increase in head-motion. Thus, while the head is clearly less stable during locomotion, compensatory eye movements appear to provide better stabilization achieving more stable epochs of gaze fixation than even the sedentary phase. This highlights the importance and potential context-dependence of gaze stabilization, wherein it appears to be enhanced to achieve greater stability during locomotion.
Discussion
Here we quantified the active visual behavior of a freely-moving primate by developing an innovative head-mounted eye-tracking system. Our system achieves several technical innovations that enabled accurate quantification of head and eye movements in a small bodied (∼300-400g) monkey – common marmosets - with the resolution and speed needed to accurately quantify primate visual behavior in real-world contexts. Specifically, this system achieves high-speed recording of the eye (90 FPS) and world (63 FPS), as well as applies an innovative solution to overcoming the challenge that frequent changes in lighting cause for pupil detection by leveraging a segmentation neural network. As CEREBRO was designed to seamlessly integrate with wireless neural recording methods, this system is poised to not only elucidate the dynamics of active visual behaviors but the supporting neural processes. While findings here recapitulate the core features of conjugate gaze movements from prior studies using head-free - but chair-restrained - macaques and extend those to freely-moving primates, analyses also revealed several related characteristics of visual behavior that have not been observed previously. Perhaps most notably, we observed that the entropy of the visual system was significantly better when animals were freely-moving than when head fixed, and that the stability of the visual gaze improved during locomotion. Our technical innovations described here enable us to examine visual processing in the evolutionary context to which it has been optimized, wherein coordinated head-eye movements act to stabilize the visual system during natural exploration, critical features that cannot be recapitulated in the absence of ethological movements.
The contribution of eye and head movements towards conjugate gaze shifts in freely-moving marmosets were qualitatively similar that of chair restrained but head-free macaques and other mammals (Martinez-Trujillo, Wang, and Crawford 2003; Freedman and Sparks 1997; Morasso, Bizzi, and Dichgans 1973; Tomlinson and Bahra 1986; Guitton, Douglas, and Volle 1984; Collewijn 1977; Goossens and Van Opstal 1997). Although broadly similar, marmosets did exhibit some quantitative differences relative to macaques, potentially due to their smaller head-size. For example, marmoset the eye movement velocity and amplitude saturate for much smaller gaze shifts around 10-20 degrees, after which the head contributes to the bulk of the shift in gaze (Fig. 4h), whereas in the macaque this transition does not occur until one reaches much larger gaze shifts between 20-40 degrees (Freedman and Sparks 1997). Moreover, most gaze shifts in macaques under 20 degrees in size are predominantly driven by shifts in eye position and not the head, while in the marmoset the same gaze shifts would have significant head-movement components. Only the smallest of gaze shifts under 5-10 degrees in the marmoset, a range comparable to the movements made in head-fixed marmosets (Fig 5d), are dominated by changes in eye position after which the head would normally make significant contributions. These differences likely reflect an efficiency tradeoff related to smaller head-size in marmosets (Malinzak, Kay, and Hullar 2012).
Because we recorded marmoset eye-movements in head-fixed and freely-moving conditions, this novel dataset can both be compared to prior studies and extend our understanding of visual behavior in this New World primate. As in previous experiments (Mitchell, Reynolds, and Miller 2014), when head-fixed the oculomotor range is relatively fixed within about 10 visual degrees. But when marmoset is freely-moving that range increases to roughly 20 degrees (Fig. 5d), suggesting the limited oculomotor range when head-fixed represents more of a motor preference than a physical limitation. In a previous study of chaired but head-free marmosets, a paradigm was used to evoke large gaze shifts involving up to 180-degree head rotation (Pandey, Simhadri, and Zhou 2020). Although similar large shifts were rare in the study here, a similar linear relation between peak head velocity and gaze shifts that peak near a velocity of 750 °/sec for an 80-degree gaze shifts were evident in freely-moving marmosets suggesting that this aspect of head-gaze control are relatively invariant to the form of the task. A direct comparison of the two freely-moving contexts here revealed a number of notable differences including larger amplitude gaze shifts with higher maximum velocities when locomoting (Fig. 5, d-g). By contrast, when stationary and visually scanning the environment, marmoset biased to smaller gaze shifts indicating that motor demands of actively moving through space likely drives differences in how the head and eyes coordinate to stabilize the visual field.
Our understanding of primate vision is almost entirely based on studies employing conventional head-fixed paradigms. While it has long been assumed that primate visual behavior in conventional, head-fixed paradigms reflects the core features of the system and - once movement is compensated - be representative of the system in more ethological contexts, there have been few explicit tests. Quantifying primate visual behavior with the innovative eye-tracking system here afforded the powerful opportunity to examine this assumption more directly is indeed accurate. Indeed, we observed broad qualitative similarities in many features of conjugate eye movements between head-fixed and freely-moving contexts. However, it was also apparent that certain assumptions about results in the more conventional paradigm may not be strictly true. The first being that gaze stability was not worsened despite greater eye and head movements during locomotion, and in fact, was slightly enhanced. We observed that the approximate entropy of the marmoset visual system was significantly lower (i.e. improved) when individuals were freely-moving than head fixed suggesting that the collective visual system is more consistent and predictable when marmosets are moving and exploring the environment. Likewise, a comparison of freely-moving marmosets when stationary – i.e. visually scanning - and locomoting revealed that the stability of visual gaze - as measured by RMS – was significantly better when animals were locomoting. In other words, despite an increase in the number, speed and length of the gaze shifts when locomoting, vision actually became more stable. These findings contradict the intuition that greater movement freedom would introduce chaos and instability to vision. Rather, as evidenced here, the coordinated movements of head and eyes have been optimized to accommodate this self-motion.
The context-dependence of visual stability between the stationary scanning and locomotion states could reflect changes in VOR at the neural level, as well as other positional strategies that optimize the VOR system (Vidal, Graf, and Berthoz 1986; Graf et al. 1995). Such context-dependence of gaze control has long been appreciated from head-mounted eye tracking studies in humans (Land and Lee 1994; Hayhoe and Ballard 2014), and recently includes free motion in natural contexts (Matthis, Yates, and Hayhoe 2018). These strategies likely reflect species specific adaptations to movement. For example, the natural movement statistics of head-position in free-moving primates is known to differ substantially from that of rodents (Carriot et al. 2014). However, technical limitations for using free-moving primates in neural investigations, particularly the precision of conventional pupil thresholding in those contexts, has kept the number of studies relatively limited (Shepherd and Platt 2006; Milton, Shahidi, and Dragoi 2020; Mao et al. 2021). The methods introduced here provide a basis to study gaze during free-motion with non-human primates, even as small as the marmoset (∼300-400g). In future work the system could be expanded to address other features of oculomotor control that could be especially relevant to free-motion these contexts, including vergence and torsional eye movements (Tweed, Haslwanter, and Fetter 1998; Ong and Haslwanter 2010). The current eye tracking system offers an opportunity to study how computational constraints and movement strategies during active vision act to achieve stability and provide high acuity vision in different contexts, including locomotion and navigation through the environment. The necessity over evolution to stabilize vision in the freely-moving context is the basal state of all organisms, including human and nonhuman primates.
The sensory and motor systems of animals co-evolved. Amongst the most significant selective forces acting on animals is to move and interact with the world, and to do so requires sensory feedback. This ground truth inherently couples sensory and motor systems. Despite this, the study of the primate visual system has largely ignored considerations of movement, assuming instead that a head-restrained animal only able to move their eyes was representative of vision. Leveraging our innovative head-mounted eye-tracking system for marmosets, we provide compelling evidence that the coordinated actions of eye and head movements are optimized to stabilize primate vision and increase its predictability. These patterns emphasize the significance in considering the ethological relevance of how primates are tested in vision studies. Evidence suggests that neural responses in head-fixed paradigms are not necessarily predictive of how the same single neurons – or population ensembles – respond in naturalistic contexts (Jovanovic et al. 2022; McMahon et al. 2015). Indeed, investigations of vision in freely-moving mice suggest that at least some elements of neural activity in this context are distinct (P. R. L. Parker et al. 2022; Chaplin and Margrie 2020). Our innovative eye-tracking system can be leveraged in marmosets to precisely examine the primate visual system during natural, freely-moving behaviors (Ngo et al. 2022; Shaw, Wang, and Mitchell 2023) and address a suite of foundational questions that we have as of yet been unable to address.
Online Methods
Fitting the animal with CEREBRO
The head assembly for CEREBRO is custom fitted to every animal. For the initial fitting, the animal is anesthetized using a combination of Ketamine (dose:20 mg/kg) and Acepromazine (dose: 0.75 mg/kg). All the parts (Supplementary fig 1b) are adjusted for best view of the eye and the front camera and fastened using set screws.
Multiple system synchronization
Our Neural recording systems, motion trackers and CEREBRO were all synchronized using an ESP32 microcontroller based custom board. CEREBRO and the motion tracking system sent TTL signals for different mode shifts that were recorded by the synchronization microcontroller. For synchronizing the neural signal, we sent a TTL pulse (100 mV |100 m) to the electrophysiology system (in recording mode) before putting it on the animal. The timestamp of this signal was logged by the sync microcontroller and since this signal was also logged into the real data stream of the electrophysiology system, we can easily align the time of this pulse in the data to other systems (CEREBRO and motion tracking). The board was connected to a computer via serial communication and the timing data was logged on to a console terminal (CoolTerm).
Animal subjects
The development and testing of CEREBRO were performed on two common marmosets (monkey M, male; monkey S, female). Both subjects were group housed and were 2 years old at the time of implant. Monkey M had a chronic implant in left V1 and monkey S had bilateral chronic implants in both left and right V1. All surgeries and experiments were performed in the Cortical Systems and Behavior Laboratory at University of California, San Diego (UCSD), and approved by the UCSD Institutional Animal Care and Use Committee in accordance with National Institute of Health standards for care and use of laboratory animals.
Electrophysiology
Neural activity was recorded with multi-electrode arrays (N-form, Modular Bionics, Berkeley CA) chronically implanted in V1. The N-form array has 64 channels in total embedded in 16 shanks located in a 4×4 grids evenly spaced 0.25 mm apart. Each shank has 4 iridium oxide electrodes located at 0.5, 0.375, 0.25, and 0 mm from the tip. Implantations followed the standard surgical procedures for chronically implanted arrays in primates. In each recording session, a wireless Neurologger (SpikeLog-64, Deuteron Technologies) was hosted in a 3.5 cm (width) x 2.5 cm (height) x 1.2 cm (depth) protecting case and connected to the array to record the extracellular voltage at 32 kHz. Spike sorting was performed offline using Kilo sort (Pachitariu et al. 2016)and manually curated using the graphic user interface Phy.
Receptive field and tuning property of V1 neurons
For the receptive field mapping, marmosets were head-fixed and freely viewed the stimulus of 120 randomly positioned black and white flashing dots generated at a frequency of 10 Hz for 4 minutes. Eye calibration was performed offline with the GUI provided in this paper. We then used the calibrated eye position on the world camera images to estimate the visual input on the retina, and mapped the receptive fields using forward correction (Zhao et al. 2020). For the orientation and spatial frequency tuning of the V1 neurons, we presented the full-field drifting gratings with 12 orientations (evenly-spaced 30-degree intervals) and 3 spatial frequencies (0.5, 1, 2 cycles/degree). This information was used in calculating the tuning curves with the information from the screen; when calculating the tuning curves with the information from CEREBRO, we applied Hough transform on the world camera images to detect the lines and further calculated the orientation and spatial frequency of the gratings.
Head and body movement tracking using OptiTrack
The head and body movement of the animals were recorded using OptiTrack-an image-based motion tracking system. For the head movements, the added 3 IR reflective beads (12mm in diameter) on the head-assembly of the animal. For the body, we added a single IR bead on to the backpack. The arena was fitted with 10 OptiTrack cameras strategically placed to cover every view of the arena. The cameras were calibrated before each session and sessions were performed only for calibrations where the error for the worst camera was less than 0.1 mm. The data collected from the session was manually curated to remove stray/false markers and any gaps were filled with linear interpolation.
Locomotion behavior analysis
Locomotion is detected with body and head tracking data from OptiTrack. We first calculated the velocity of the body IR bead and set a threshold of 3 cm/s after applying a 0.2 Hz low-pass filter to select the period when the monkey had a fast body movement. However, not all the fast body movement came from locomotion; it could happen when the monkey made a body turn, or a sudden change of posture. We add two other conditions: 1) the duration of the period is larger than 3s; 2) the position of the head is lower than 18 cm. These two conditions are determined by manually matching the data with the video. After the locomotion period is detected, the rest of the time is defined as sedentary.
Data Availability
The GUI for UNET segmentation-based pupil detection can be found at https://github.com/Vickey17/UNET_implementation_V2. The STL designs for the headpiece and its material description can be downloaded at https://github.com/Vickey17/CEREBRO_HeadPiece. Data for the paper can be obtained upon request from the authors.
Acknowledgements
We thank Alex Huk and Cris Niell for their invaluable insights throughout the development of this system. We thank Brian Corneil for his valuable comments on the manuscript. This work was supported by grants from The BRAIN Initiative (R01 NS118457 & U01 NS116377) and AFOSR (FA9550-19-1-0357)