Three-dimensional markerless motion capture of multiple freely behaving monkeys for automated characterization of social behavior

Given their high sociality and close evolutionary distance to humans, monkeys are an essential animal model for unraveling the biological mechanisms underlying human social behavior and elucidating the pathogenesis of diseases exhibiting abnormal social behavior. However, behavioral analysis of naturally behaving monkeys requires manual counting of various behaviors, which has been a bottleneck due to problems in throughput and objectivity. Here, we developed a three-dimensional markerless motion capture system that utilized multi-view data for robust tracking of individual monkeys and accurate reconstruction of the three-dimensional poses of multiple monkeys living in groups. Validation analysis in two monkey groups revealed that the system enabled the characterization of individual social dispositions and relationships through automated detection of various social events. Analyses of social looking facilitated the investigation of adaptive behaviors in a social group. These results suggest that this motion capture system will significantly enhance our ability to analyze primate social behavior.

Non-human primates, including macaque monkeys, are social animals who utilize their 51 knowledge about individual group members and their relationships to navigate a complex and 52 dynamic social environment 1-6 . Together with their evolutionary proximity to humans, this 53 makes monkeys a vital animal model for unraveling the biological mechanisms underlying 54 human social behavior and elucidating the pathogenesis of neuropsychiatric disorders resulting 55 in significant difficulties in social life 7,8 . However, to investigate the social functions and 56 dysfunctions of non-verbal primates, it is necessary to assess their social interactions based on 57 body actions, which are their primary communication tools 9,10 . Traditionally, behavioral 58 actions are counted with visual inspection by human experts; however, the significant costs 59 and reproducibility problems associated with manual annotation have been a major obstacle 60 for these analyses. Recently, markerless motion capture using deep learning has been expected 61 to overcome this issue as it allows high-throughput quantification of actions automatically and 62 reproducibly 11 . 63 However, analysis of the social behavior of macaque monkeys in groups by utilizing a 64 markerless motion capture has not been achieved. Since monkeys move freely in their living 65 environment in a three-dimensional (3D) manner, 3D tracking and pose estimation of multiple 66 monkeys are essential for detecting their social interactions. Previous reports on a primate 3D 67 markerless motion capture system 12-14 did not implement a multi-animal tracking algorithm 68 and thus cannot be applied to groups of monkeys. Moreover, simple triangulation with single-69 view (2D) markerless motion capture 15,16 is inappropriate because frequent and severe 70 occlusions of monkeys hinder the tracking of individuals. Therefore, multi-animal tracking 71 algorithms in a 3D space are required to reduce failures in individual tracking. 72 Here, we constructed a new pipeline utilizing multi-camera (multi-view) data for robust 73 tracking and 3D pose reconstruction of multiple monkeys in their living environment. 74 performance with a conventional approach that does not utilize multi-view information but 125 reconstructs the 3D motion of a monkey tracked separately in each view (control algorithm). 126 The IDP and IDR of the control algorithm dropped by approximately 20% and 10%, 127 respectively, compared to the proposed algorithm (Fig 2b); its poor performance was partly 128 due to erroneous integration of different monkey detections across views because of ID 129 tracking errors (ID switch) in some views. We also tested tracking performance after randomly 130 reducing the frequency of ID detection to check the system's potential for applications in which 131 ID detection will be sparser, e.g., field recordings where severe occlusions are common or use 132 of faces for ID instead of color tags. We found that the performance of the proposed algorithm 133 was maintained well compared with the control algorithm (significant interaction [IDP,p = 134 0.038; IDR, p = 0.041] between the algorithms and the ID detection rate in two-way repeated 135 measures ANOVA), suggesting our system was robust to reductions of the ID detection rate, 136 thanks to the utilization of multi-view information for tracking (Fig 2c). 137 To verify the practicality of this motion capture system, we attempted to identify individuals 138 by their posture patterns. The postures were classified in an unsupervised manner using k-139 means clustering after dimension reduction (Fig 2d, e), and the frequency and duration of each 140 posture were used to discriminate individuals by a support vector machine (SVM). We found 141 that the SVM could discriminate individuals with an accuracy of >95% (98.5 ± 0.3% and 95.8 142 ± 1.6% in Groups I and II, respectively; Fig 2f), suggesting that our system is sufficiently 143 accurate to extract each individual's movement characteristics. 144 145 Automatic detection of social behavioral events and behavioral characterization 146 Then, we analyzed the social behaviors of two groups of monkeys (Groups I and II; Table  147 S1). The analyses were focused on the last 8 recording days in the 3 (Group I) and 4 weeks 148 (Group II) stay in the cage. In this study, we defined and automatically counted affiliative 149 (Proximity, Groom), vigorous (Chase, Glare, Grab, Pounce), and other (Mount, Look) social 150 behaviors based on quantitative motion parameters, e.g., distance between two monkeys and 151 their postures (Fig 3a; see also Methods and Text S1 for detailed definitions). Note that we 152 defined the Look as orienting the head toward a conspecific according to previous studies 3,15,20 153 since measuring the actual gaze direction based on the eye movements of freely moving 154 monkeys is difficult in our recording setup. Comparisons between the automatic event 155 detections and manual event annotations by human experts for Chase (recall: 70%; precision: 156 76%; n = 23 events detected by the experts), Groom (recall: 74%; precision: 83%; n = 68 157 events), and Mount (precision: 91%; n = 10 events; recall could not be calculated because 158 Mount was an infrequent event) indicated high detection performance for these events (see 159 Movie S2 for examples of the events). 160 Automated counting of each event revealed the unique characteristics in the social 161 disposition of individuals and groups (Fig 3b). In the female puberty group (Group I), Monkey 162 B, the smallest in the group, tended to orient her head to the other individuals more frequently. 163 On the other hand, Monkey R, which originated from a different colony to the others, tended 164 not to participate in vigorous behavior but showed affiliative behavior to Monkey G. In the 165 juvenile male group (Group II), the counts of vigorous behaviors were high, consistent with 166 the visual inspections that playing as a whole was common in this group. Pair-or individual-167 specific social behaviors were fewer in Group II than in Group I, but were still detected. 168 To examine the stability of the detected behavioral characteristics, we calculated the mean 169 duration of Look and Proximity for each analysis day in Group I, in which significant 170 individual differences of those events were found (Fig 3c, d). The results showed stability 171 throughout the 8 successive recording days (significant main effect of ID or pair, but no 172 significant main effect or interaction relating to the recording date in two-way repeated 173 measures ANOVA; see Table S3 for the ANOVA results), suggesting the analysis could extract 174 the social characteristics of each individual or pair. In addition, the tendency of Monkey B to 175 perform looking behavior frequently was preserved across the initial 7 recording days (early 176 phase) and the last 8 recording days (late phase) (Fig. 3e). Conversely, the proximity duration 177 of the G-R and G-W pairs was significantly increased (p = 0.0094) and decreased (p = 0.048), 178 respectively, in the late phase, suggesting a proximity pair transition (Fig 3f). These results 179 indicate that the system may be helpful in tracking long-term changes in the social 180 relationships within monkey groups. 181 We then examined whether individuals could be discriminated from the patterns of their 182 social behavior by SVM (Fig 4a, b) in the same way as with the posture patterns (Fig 2f). The 183 results indicated high prediction accuracy (91.8 ± 5.7% and 80.0 ± 6.4% in Groups I and II, 184 respectively), although accuracy was slightly lower than with posture patterns, especially in 185 the male juvenile group (Group II). To examine which social behavioral events were important 186 for individual discrimination, we evaluated the performance of the SVM model when only a 187 single type of event was used and when a single type of event was missing. The performance 188 of the SVM models using a single type of event (Fig 4c) indicated that many different events 189 could individually predict the monkey ID above chance level, although performance was lower 190 than with the full SVM model using all events. On the other hand, missing a single type of 191 event did not result in a drop in discrimination performance, except for Look (Fig 4d). These 192 results indicate that the social behavioral events analyzed here, especially Look, were effective 193 in characterizing individual monkeys. Furthermore, ignoring the ID of the social behavior 194 partner decreased discrimination performance (Fig 4e), indicating that social relationships 195 were important for ID discrimination. The overall results suggest that our system has the 196 potential to detect the social behavioral characteristics of individuals and their relationships. 197 198 Analysis of social looking 199 Looking at a conspecific is a critical component of monkey social behavior 1,9 . We further 200 analyzed Look behavior, which was the most effective behavioral event for the individual 201 discrimination above. First, we compared the Look duration calculated with and without 202 shuffling the monkey motion data across recording sessions (Fig S2a). The actual duration of 203 Look behavior was significantly larger than that with the shuffled data, supporting the 204 tendency to look at other monkeys. We also calculated Look duration with temporally shifting 205 monkey motion data ( Fig S2b) and found that the peak was at zero time shift, suggesting that 206 looking follows the target monkey's movement. Counting the third party's Look behavior 207 toward each social behavioral event demonstrated that the monkeys tended to observe vigorous 208 behavior, such as Chase (Fig 5a), and a detailed analysis of Chase demonstrated that the third 209 party's facial direction followed the chasing movement (Fig 5b). Such social looking is 210 essential for understanding ongoing situations and relationships between group members [1][2][3][4][5]21 . 211 In primates, mutual looking (both parties look at each other) has social meaning, as direct 212 staring is a threatening behavior, and gaze aversion is often associated with anxiety and 213 submissiveness 9,22 . We compared the actual mutual look duration with the chance-level 214 duration that would be expected if each monkey's looking behavior was independent (Fig S2c). 215 The actual mutual look duration was significantly shorter than the chance level, suggesting 216 monkeys may avoid mutual looking. To analyze which monkey showed stayed or withdrawn 217 looking at the end of the mutual look event, we counted the stay ratio among all mutual look 218 events for each pair of monkeys (Fig 6a, b). The result revealed that one monkey often 219 withdrew from looking at most of the members in each group (Monkeys B and W in Groups I 220 and II, respectively), one monkey stayed in looking at all members in Group II (Monkey B), 221 and two monkeys withdrew or stayed depending on the member in Group II (Monkeys G and 222 R). Interestingly, when we analyzed the relationship between the stay rate and total Look 223 duration in pairs in each group (Fig 6c), negative correlations were found in both groups. 224 Previous studies reported that while subordinate monkeys tend to avert their gaze from 225 dominant monkeys when two monkeys face each other 22 , the dominance rank is negatively 226 correlated with looking duration 17 . Thus, the stay-withdraw pattern we found might reflect the 227 monkeys' hierarchy. These results suggest the utility of analyzing looking behavior based on 228 the motion capture system for investigating monkeys' adaptive behaviors in a social group. 229 230

231
Here, we constructed a pipeline for the long-term 3D markerless motion capture of monkeys 232 living in groups. The pipeline utilized multi-camera (multi-view) data for robust tracking of 233 individual monkeys and accurate reconstruction of their 3D poses. Using this system, we 234 obtained 3D motion data of monkeys living in groups and analyzed social behavior based on 235 motion data for the first time. Our analysis demonstrated that this system could characterize 236 individual motion and social traits and define their relationships. We further demonstrated the 237 system's usefulness for analyzing the adaptive behaviors of monkeys in social groups through 238 the detailed analysis of their looking behavior. 239 Measuring monkeys' 3D poses and motion in groups enabled the analysis of their various 240 social interactions. In contrast to image classifications using supervised machine learning for 241 detecting a specific behavior with large amounts of manually annotated training data 23,24 , 3D 242 motion data can be used flexibly to detect various behaviors, including those that are difficult 243 to detect with image-based analysis, e.g., looking. Previous studies also proposed 3D 244 markerless motion capture systems for monkeys 12-14 and demonstrated detailed and automatic 245 analysis of various behaviors of freely moving monkeys. However, they cannot be applied to 246 monkeys in a group due to the lack of a multi-animal tracking algorithm. Our system 247 overcomes this limitation through robust individual tracking using multi-view data (Fig 1c, d). 248 Our system used an optimization algorithm for cross-view matching, but machine learning 249 algorithms for cross-view matching have also been suggested 25 . An important advantage of the 250 optimization-based approach is its ease of extension. The development of 2D processing deep 251 learning algorithms is more active than for 3D algorithms (paperswithcode.com) because of 252 their high versatility. In addition, many existing training datasets 26-28 were made for 2D image 253 processing. It would be relatively easy for our system to be applied to existing datasets of 254 different species. Moreover, updating the 2D processing algorithm or ID detection algorithm 255 would directly enhance the performance of our pipeline. 256 Social looking is a critical component of monkeys' social behavior and has received much 257 attention in studies on primate social behavior. Monkeys understand the relationship between 258 others through observations 1-3,5,21 . Their visual attention toward others depends on their social 259 relationship 17,22 , and gaze and gaze aversion may be social signals by themselves 9,22 . 260 Neuroscience studies in laboratory settings have revealed the neural mechanisms involved in 261 monkeys' looking behavior 6,7,29 and their impairment in animal models of autism 30,31 . 262 Furthermore, the neural bases of sophisticated social functions in monkeys have been mainly 263 studied in highly controlled laboratory settings while monkeys' movement is constrained 6 . 264 Although these studies have provided many important insights into the neural basis of social 265 behaviors and their dysfunctions 6,7,29,32 , such approaches have problems of external validity. 266 Examination in a more naturalistic (ethologically relevant) social environment is needed to 267 compensate for this limitation [33][34][35][36][37][38][39] . Detailed quantitative analysis of social behavior, including 268 social looking, of freely behaving monkeys in groups with markerless 3D motion capture will 269 provide unique opportunities to extend findings in a specific social task in the laboratory. 270 In addition, markerless 3D motion capture could be applied to analyze various social 271 functions of primates, e.g., the long-term changes of relationships in groups, the developmental 272 trajectory of social skills, and relationships in larger groups 10,38,40 . Our approach may also be 273 applicable to field studies, which require the tracking of individuals with facial identification 38 , 274 thanks to its robust tracking ability using multi-view data. Extensions for detecting other social 275 gestures in different modalities, such as vocalizations and facial expressions 2,9 , and 276 computational analysis of primate social behavior using motion data with high spatiotemporal 277 resolution 41-43 will be critical next steps. small temperature-controlled room (2 × 1.5 × 2 [height] m) was used for recording ( Fig S1). 291 A small rectangular hole (0.7 × 0.7 [height] m) with an electric door connected both rooms, 292 making it easy for the experimenters to clean each room by keeping the monkeys in the other 293 room. We placed eight cameras (acA2040-35gc, Basler) equipped with a wide lens (ML410 294 4-10 mm, Theia) on the wall near the ceiling inside the large room surrounding the center of 295 the room. Each camera was mounted in a custom stainless-steel housing with an acrylic dome 296 window (O'Hara) securely fastened to the cage frame to prevent the monkeys from moving or 297 touching the camera. Videos (2,048 × 1,536 px, 24 fps) were captured synchronously from the 298 eight cameras using the Motif acquisition system (Loopbio). 299 300 Camera calibration 301 Intrinsic (e.g., lens distortion coefficients) and extrinsic (camera pose and location) camera 302 parameters are required to reconstruct a 3D coordinate of a keypoint (e.g., nose, shoulder, 303 elbow) from 2D coordinates of the keypoint projected onto camera images from different 304 views. To calibrate these parameters, first, we initialized the intrinsic parameters of each 305 camera with the cv2.omnidir.calibrate() function in OpenCV 44 using images of checkerboards 306 from multiple angles. Then, the extrinsic parameters of each camera were initialized with the 307 cv2.solvePnP() function from the known 3D coordinates in the cage, e.g., corners of the room, 308 and their 2D coordinates projected onto the camera image. Finally, we waved a wand with a 309 marker (ping-pong ball) on the tip throughout the recording volume, and the intrinsic and 310 extrinsic parameters of all cameras were simultaneously optimized with a bundle adjustment 19 311 by minimizing the reprojection errors of the marker locations. 312 313 Data collection 314 Groups I and II were moved to the recording cage and recorded for 3 weeks in October and 315 4 weeks across May and June, respectively. Before moving to the recording cage, the monkeys 316 had lived with the same group members for >10 months, either in a similar-sized group cage 317 (Group I) or in a large breeding colony (Group II). The monkeys wore a colored necklace for 318 individual identification (Fig S1b). The light in the large room was turned on from 08:30 to 319 17:30. Food pellets were supplied once a day in the food container ( Fig S1a). Supplemental 320 fresh vegetables and fruits were given 2-3 times a week. Water was supplied ad libitum from 321 water dispensers (Fig S1a). Recording was conducted daily during the daytime from 08:00 to 322 18:00. Sometimes, recording was paused due to technical issues. Unless noted, all data used 323 were from each group's last 8 recording days after the monkeys were well acclimated to the 324 recording environment. 325 326 Two-dimensional video processing 327 Different deep neural network models were used for monkey detection (YOLOv3 45 ), pose 328 estimation (HRNet w32 46 ), and monkey identification (ResNet50 47 ) (Fig 1b) Multi-view tracklet generation 353 We generated multi-view tracklets, i.e., a set of 2D monkey detections (estimated in the 354 process described above) corresponding to each individual across views and video frames by 355 the following processing. For cross-view matching of 2D monkey detections, we customized 356 and used the MVPose algorithm suggested for human pose estimation 18 (Fig 1c). To estimate 357 optimal cross-view matches, we calculated the affinity matrix (A), which represents the affinity 358 of all pairs of 2D detections in all views in a time point. We defined the affinity matrix as a 359 weighted sum of geometric affinity (A g ) and appearance affinity (A a ), as follows: 360

1
(1) 361 where i and j are an index of a pair of 2D monkey detections (1 ≤ i, j ≤ m; m, the total number 362 of 2D monkey detections), and  is constant (0.8 in this study). To derive geometric affinity 363 (A g ), we calculated the geometric distance (D) of a pair of 2D monkey detections as follows: 364 where N is the number of keypoints and d ijk is the distance between camera rays to the k-th 366 keypoint of the i-and j-th 2D monkey detections. Only the keypoints commonly detected 367 (prediction confidence > 0.1) in the i-and j-th 2D monkeys were included. Then, we defined 368 geometric proximity (C) as follows: 369 where β is constant (1,500 mm in this study). The geometric affinity (A g ) was obtained by 372 mapping the proximity C to values in (0,1) with a sigmoid function. The original algorithm calculated the appearance affinity (A a ) using image feature values obtained with a person re-374 ID network 49 , which extracts view-invariant discriminative appearance features such as 375 clothing and hairstyle. However, extracting such view-invariant discriminative appearance 376 from similar-looking monkeys is difficult, so we estimated a putative ID of each single-view 377 tracklet based on the outputs of the monkey identification network with a similar ID 378 assignment algorithm used for the multi-view tracklets (see next section) applied separately to 379 each view. Then, we calculated appearance affinity (A a ) as follows: 380 where <A, P> denotes the inner product of the matrices,  represents a constant (50 in this 392 study), and ‖ ‖ * represents the nuclear norm of P. C represents a set of matrices satisfying 393 the following constraints: 394 where n is the number of views and P vw is a permutation matrix that represents the matching of detection pairs of view v and w. P vw satisfies the following constraints: 398 , 400 , 1 , , 401 , 1 This cross-view matching was performed for one keyframe every 0.5 s to reduce computational 403 load. Between the keyframes, cross-view and cross-frame matching was estimated using 404 single-view tracklets and sets of the cross-view matched 2D monkey detections at keyframes. 405 Specifically, we defined the tracking consistency of a pair of sets of the matched 2D monkey 406 detections in neighboring keyframes as the number of single view tracklets shared by both sets 407 (Fig 1d). Then, matching of the pairs of the sets that maximized the total tracking consistency 408 was calculated using the Hungarian method, resulting in multi-view tracklets, i.e., a set of 2D 409 monkey detections corresponding to an individual monkey across views and video frames. where l represents an index of ID (B, G, R, or W), l max represents an index of the most detected 417 ID, and N l is the detection count of ID l. The valid ID of each time point was checked using a 418 5-s sliding window. If only a single valid ID was found in a multi-view tracklet, the ID was 419 assigned to the tracklet. If multiple valid IDs were found, the tracklet was separated at the mid-420 point when different valid IDs were found. If no valid ID was found, Eq. 8 was re-examined 421 with a time window covering the entire time range of the tracklet, and if it passed, the ID was 422 assigned to the tracklet. 423 Since the linking of multi-view tracklets was interrupted when no correspondence was found 424 between the neighboring keyframes, e.g., in the case of severe occlusion, second, we stitched 425 the tracklets together based on their continuity using network flow optimization 15,50 . 426 Specifically, a directed graph consisting of nodes representing tracklets and edges representing 427 possible connections between tracklets was constructed. A pair of multi-view tracklets (nodes) 428 separated by no more than 5 s and having at least one common single-view tracklet were 429 connected with an edge. Each edge had a cost value equal to the distance between the monkey's 430 neck location at the end of one tracklet and the beginning of the other tracklet. If the assigned 431 IDs of a tracklet pair were different, the corresponding edge was removed. If the IDs were the 432 same, the cost value of the corresponding edge was divided by 100. In addition, sink and source 433 nodes were assumed and connected to all tracklet nodes by the edges of a fixed amount of cost 434 (1,000 mm in this study). Then, optimal associations between tracklets were calculated by a 435 minimum-cost flow algorithm using the capacity_scaling() function in the NetworkX library 436 (networkx.org), and the tracklets were stitched accordingly. 437 Third, the IDs were re-assigned to the stitched tracklets using the same criteria (Eq. 6). 438 Sometimes, the ID assignment resulted in multiple tracklets having the same ID at the same 439 time. We resolved such ID overlaps as follows: 1) if an overlapping tracklet was previously 440 stitched and the original (before stitching) tracklet corresponding to the overlapping part had 441 no valid ID, the tracklet was unstitched; 2) Overlapping tracklets without a non-overlapping 442 period (Fig S3a) was excluded; and 3) Overlapping tracklets were trimmed based on the 443 timings when the valid IDs were detected (Fig S3b). Finally, the tracklets that had no ID yet 444 and that there were ≥3 frames in which tracklets with the other three IDs were present during 445 the period of the tracklet were assigned to the fourth (final) ID. 446 447 Pose filtering 448 We reconstructed the 3D motion of each monkey using the multi-view tracklets with the 449 corresponding ID. For robust 3D motion estimation, we used the Anipose algorithm 19 . Briefly, 450 2D keypoint trajectories were Viterbi filtered in each view, and then the reasonable 3D motion 451 of an animal was estimated by minimizing the error of body part length, change in keypoint 452 acceleration, and reprojection error. 453 454 Detection of social behavioral events 455 We calculated basic behavioral parameters using the estimated 3D motions of monkeys to 456 detect behavioral events. Specifically, we determined the position (neck; mid-point between 457 left and right shoulder), speed, and face direction (vector from the mid-point of the left and 458 right ears to the nose, rotated upward by 35°) of each monkey. We also calculated the distance 459 and approaching/leaving speed (speed along the axis connecting the pair) for each pair of 460 monkeys. We lowpass filtered these parameters with a cut-off frequency of 0.5 Hz. In addition, 461 we classified body-centered postures in a feature space obtained with principal component 462 analysis to detect sitting, lying (Fig S4a), and typical forelimb posture associated with 463 grooming (Fig S4b). The social behavioral events were detected (Fig 3a) with these parameters 464 according to the definitions shown in Text S1. 465 466

467
To examine the performance of 3D pose estimation, we annotated the 3D poses of monkeys 468 with one frame every 12 s from eight 5-min video clips that did not overlap with the training 469 dataset. The error of each keypoint between a manually annotated and automatically estimated 470 pose was calculated when the corresponding instance was found (root mean square error of all 471 keypoints < 500 mm). To assess the performance of 3D tracking, we annotated the 3D neck 472 positions of monkeys with IDs in 2.5 frames/s from the same video clips. Manually annotated 473 instances and automatically estimated instances in each frame were matched by the Hungarian 474 method to minimize the total distance between the necks of the matched pairs. If the distance 475 of a matched pair was <400 mm and the IDs of the pair were coincident, it was counted as 476 successful (true positive [TP]). The number of false positives (FP) and false negatives (FN) 477 was calculated by subtracting TP from the total number of automatically estimated or manually 478 annotated instances, respectively. 479 We tested the performance of social behavioral event detection using another dataset. four monkeys were used to analyze their interactions. As a result, we obtained 58 and 111 499 sessions for Groups I and II, respectively (Table S2). 500 We used an SVM to predict monkey IDs from behavioral patterns to assess individual 501 differences in the behavioral patterns. Twenty sessions (100 min) were combined and used as 502 input to the SVM, and Leave-one-out cross-validation was used to evaluate the performance 503 of the SVM. Specifically, we used a set of 20 randomly sampled sessions as test data. From 504 the rest of the data, we randomly sampled 100 combinations of 20 sessions and used them for 505 training the SVM. Then, we compared the trained SVM's ID predictions on the test data with 506 the correct IDs. This training-test process was repeated 100 times, and the accuracy of ID 507 prediction by the SVM was calculated. 508 Statistical tests were performed using SPSS and MATLAB. The significance threshold was 509 set to 0.05. Cognition 37, 167-196 (1990).      Pounce: The distance between the monkeys is <600 mm. Neither monkey moves fast (speed 899 < 1,500 mm/s). At least one of the monkeys is not sitting or lying. The approach speed from 900 Monkey X to Monkey Y is >0 and <1,500 mm/s. The acceleration of the approach speed is 901 >2,500 mm/s. 902

903
Since the monkeys were smaller in Group II than in Group I, the distance, speed, and 904 acceleration thresholds above were normalized by multiplying the values by 0.82 for Group II. 905 After the detection of each event in each frame, except for Look, the event markers were 906 filtered according to the following rules: if the same type of event occurred within an interval 907 of less than I max , the event was considered to have continued during the interval; and if the 908 duration of an event was less than D min , the event was ignored. The filter parameters for each 909 event are shown in Table S4. 910