A framework for studying behavioral evolution by reconstructing ancestral repertoires
=====================================================================================

* Damián G. Hernández
* Catalina Rivera
* Jessica Cande
* Baohua Zhou
* David L. Stern
* Gordon J. Berman

## Abstract

Although extensive behavioral changes often exist between closely related animal species, our understanding of the genetic basis underlying the evolution of behavior has remained limited. Here, we propose a new framework to study behavioral evolution by computational estimation of ancestral behavioral repertoires. We measured the behaviors of individuals from six species of fruit flies using unsupervised techniques and identified suites of stereotyped movements exhibited by each species. We then fit a Generalized Linear Mixed Model to estimate the suites of behaviors exhibited by ancestral species, as well as the intra- and inter-species behavioral covariances. We found that much of intraspecific behavioral variation is explained by differences between individuals in the status of their behavioral hidden states, what might be called their “mood.” Lastly, we propose a method to identify groups of behaviors that appear to have evolved together, illustrating how sets of behaviors, rather than individual behaviors, likely evolved. Our approach provides a new framework for identifying co-evolving behaviors and may provide new opportunities to study the genetic basis of behavioral evolution.

Behavior is one of the most rapidly evolving phenotypes, with notable differences even between closely-related species [1, 2]. Variable behaviors and rapid behavioral evolution likely allows species to adapt rapidly to new or varying environments [3, 4]. Despite the importance of animal behavior, progress in revealing the genetic basis of behavioral evolution has been slow [5–8]. In contrast, recent decades have seen significant progress in understanding the genetic causes of morphological evolution [9–12].

While there are many potential reasons for the discrepancy between studies of behavioral and morphological evolution, including the lack of a fossil record for behavior, a key difficulty is identifying which aspects of an animal’s development and physiology are responsible for the observed changes in animals’ actions. Changes in behavior along a phylogeny could emerge from alterations in the developmental patterning of neural circuitry (e.g., brain networks, descending commands, central pattern generators), hormonal regulation that affects the expression of behaviors, or the gross morphology of an animal’s body or limbs [13]. Each of these possibilities could result in behavioral effects at different, yet overlapping, scales – from muscle twitches to stereotyped behaviors to longer-lived states like foraging or courtship or aging that may control the relative frequency of a given behavior – making it difficult to identify the precise aspects of behavior that are changing.

To address these difficulties, the standard approach in the study of behavioral evolution has been to identify focal behaviors that exhibit robust differences between species, such as courtship behavior in fruit flies [14–16] or burrow formation in deermice [17, 18]. With these robust differences in phenotype, it is possible to perform analyses that isolate regions of the genome that correlate with quantitative changes in the performance of the focal behavior. However, there tend to be multiple such regions identified, each containing many genes. Given the large number of putative genes involved, combined with the possibility of epistatic interactions between loci, identification of the contributions of individual genes to behavioral evolution has moved slowly.

An alternative approach to focusing on single behaviors is to examine the full repertoire of movements that an animal performs. By identifying sets of behaviors that evolve together, it may be possible to identify regulators of these suites of behaviors. This approach has been made possible by recent progress in unsupervised identification of animal behaviors across length and time scales [20, 21]. In this study, we introduce a quantitative framework for studying the evolutionary dynamics of large suites of behavior. We have focused initially on fruit flies, which provide a convenient model for this problem because they exhibit a wide range of complex behaviors and unsupervised approaches can be used to map all of the animal movements captured in video recordings [19, 22, 23].

We recorded movies of isolated male flies from six species in a nearly stimulus-free environment. Because we did not record flies experiencing social and other environmental cues, we did not observe many charismatic natural behaviors, such as courtship and aggression. Nevertheless, we found that the behaviors they performed, including walking and grooming, contain speciesspecific information. We thus hypothesized that our quantitative representations of behaviors could be studied in an evolutionary context. To infer the evolutionary trajectories of behavioral evolution, we estimated ancestral behavioral repertoires with a Generalized Linear Mixed Model (GLMM) approach [24], which builds upon Felsenstein’s approach to reconstructing ancestral states [25, 26]. Using these results, we develop a framework that allows us to model the behavioral traits that co-vary both within a species and along the phylogeny. We find that within-species variance is related primarily to long-lasting internal states of the animal, what might be called a fly’s “mood,” and that inter-species variance can capture how disparate behaviors may evolve together. This latter finding points towards the presence of higher-order behavioral traits that may be amenable to further evolutionary and genetic analysis.

## EXPERIMENTS AND BEHAVIORAL QUANTIFICATION

We captured video recordings of all behaviors performed by single flies isolated in a largely featureless environment for multiple individuals from six species of the *Drosophila melanogaster* species subgroup: *D. mauritiana*, *D. melanogaster*, *D. santomea*, *D. sechellia*, *D. simulans*, and *D. yakuba* [22]. Although the animals could not jump or fly in these chambers and were not expected to exhibit social or feeding behaviors, the flies displayed a variety of complex behaviors, including locomotion and grooming. Each of these behaviors involves multiple body parts that move at varying time scales. The species studied here were chosen because their phylogenetic relationships are well understood [27–30] (summarized in the tree seen in Fig. 3), and genetic tools are available for most of these species [31]. Since a single strain represents a genomic “snapshot” of each species, we assayed individuals from multiple strains from each species to attempt to capture species-specific differences, and not variation specific to particular strains (see Materials and Methods). In total, we collected data from 561 flies, each measured for an hour at a sampling rate of 100 Hz.

While previous studies have identified differences in specific behaviors, such as courtship behavior, between these species [14, 16, 32, 33], here we assayed the full repertoire of behaviors the flies performed in the arena, with the aim of identifying combinations of behaviors that may be evolving together. To measure this repertoire, we used a previously-described behavior mapping method [19, 22] that starts from raw video images and attempts to find each animal’s stereotyped movements in an unsupervised manner. The output of this method is a two-dimensional probability density function (PDF) that contains many peaks and valleys (Fig. 1A), where each peak corresponds to a different stereotyped behavior (e.g., right wing grooming, proboscis extension, running, etc).

![FIG. 1.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F1.medium.gif)

[FIG. 1.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F1)

FIG. 1. 
Behavioral repertoires of *Drosophila*. A: The behavioral space probability density function, obtained using the unsupervised approach described in [19] on the entire data set of 561 individuals across all species. Coarse grained behaviors corresponding to the different types of movements exhibited in the map are shown as well. B: The relative performance of each of the 134 stereotyped behaviors for each of the six species. Each region here represents a behavior, and the color scale indicates the logarithm of the fraction of time that species performs the specified behavior divided by the average across all species.

Briefly, to create the density plots, raw video images were rotationally and translationally aligned to create an egocentric frame for the fly. The transformed images were decomposed using Principal Components Analysis into a low-dimensional set of time series. For each of these postural mode time series, a Morlet wavelet transform was applied, obtaining a local spectrogram between 1 Hz and 50 Hz (the Nyquist frequency). After normalization, each point in time was mapped using t-SNE [34] into a two dimensional plane. Finally, convolving these points with a two-dimensional gaussian and applying the watershed transform [35], produced 134 different regions, each of these containing a single local maximum of probability density that corresponds to a particular stereotypical behavior. Thus, by integrating the density of the region for a particular fly, we can associate to each of them a 134-dimensional real-valued vector that represents the probability of the fly performing a certain stereotyped behavior at a given time. We will refer to this quantity as the animal’s *behavioral vector* ![Graphic][1]</img>.

The behavioral map averaged across all six species is shown in Fig. 1A and displays a pattern of movements similar to those we found in previous work, where locomotion, idle/slow, anterior/posterior movements, etc. are segregated into different regions [19, 22]. Averaging across all individuals of each species, we found the mean behavioral vector for each species (Fig. 1B) and observed that each species performs certain behaviors with different probabilities. For example, *D. mauritiana* individuals spend more time performing fast locomotion than all other species on average, and *D. yakuba* individuals spend much of their time performing an almost species-unique type of slow locomotion, but little time running quickly.

These average probability maps provide some insight into potential species differences, but to identify species-specific behaviors, we also need to account for variation in the probability that individuals of each species perform each behavior. One way to address this problem is to ask whether an individual’s species identity can be predicted solely from its multi-dimensional behavioral vector. To explore this question, we first used t-SNE to project all 561 individuals into a 2 dimensional plane (Fig. 2A), using the Jensen-Shannon divergence as the distance metric between individual behavioral vectors. In this plot, different colors represent different species, and different symbols with the same color represent different strains within the same species. Although there is not a clear segregation of all species in this plane, the distribution of species is far from random, with individuals from the same species tending to group near to each other.

![FIG. 2.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F2.medium.gif)

[FIG. 2.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F2)

FIG. 2. 
Classification of fly species based on behavioral repertoires. A: A t-SNE embedding of the behavioral repertoires shows that behavioral repertoires contain some species-specific information. Each dot represents one individual fly, with different colors representing different species and different symbols with the same color representing different strains within the same species. The distance matrix (561 by 561) used to create the embedding is the Jensen-Shannon divergence between the behavioral densities of individual flies. B: Confusion matrix for the logistic regression with each row normalized. All the values are averaged from 100 different trials. The standard error is less than 0.01 for the diagonal elements and less than 0.005 for each of the off-diagonal elements.

To quantify this observation, we applied a multinomial logistic regression classifier that performed a six-way classification based solely on the high-dimensional behavioral vectors. After training, the classifier correctly classified 89 ± .2% of vectors (using a randomly-selected test set of 30% of the entire data set). Moreover, the confusion matrix (Fig. 2B) revealed no systematic misclassifications bias amongst the species. Note that we have used a relatively simple classifier compared to modern deep learning methods [36], so these results likely represent a lower bound on the distinguishability of the behavioral vectors. Thus, behavioral vectors appear to contain considerable species-specific information. We therefore proceeded to explore how these behavioral vectors may have evolved along the phylogeny.

## RECONSTRUCTING ANCESTRAL BEHAVIORAL REPERTOIRES

Multiple methods have been proposed for reconstructing ancestral states solely from data collected from extant species [25, 37]. These methods generally fall into two camps: parsimony reconstruction, which attempts to reconstruct evolutionary history with the fewest number of evolutionary changes [38], and diffusion-processes, which model evolution as a random walk on a multi-dimensional landscape [39]. Given the high-dimensional behavioral vectors that we are attempting to model, a diffusion process is more likely to capture the inter-trait correlations that we would like to understand. Thus, we focus on a diffusion-based model here.

Given a phylogeny for a collection of species, we modeled how species-specific complexes of behaviors might have emerged. Specifically, we assumed that each behavior is a quantitative trait, that is, each behavioral difference results from the additive effects of many genetic loci, each of small effect. We do not, however, assume that all behaviors evolve independently of each other. Thus, we are interested in predicting (1) how behaviors co-vary and (2) whether intra- and inter-species variation can be separated to identify independently evolving sets or linear combinations of behaviors.

We assumed that the flies’ behaviors evolved via a diffusion process, where initially the process starts at the common ancestor behavioral representation and eventually each individual’s trajectory performs a random walk with Gaussian noise along the known phylogenetic tree. Note that this is a less stringent assumption than neutrality, as multiple traits under selection may evolve in a correlated manner. More precisely, we fit a Multi-response Generalized Linear Mixed Model (GLMM) to the data, using the approach described in [24]: ![Formula][2]</img>  where ![Graphic][3]</img> denotes the logarithm of the *behavioral vector* ![Graphic][4]</img> for each individual, ![Graphic][5]</img> is the mean behavior of the common ancestor (treated as the fixed effects of this model), and ![Graphic][6]</img> and ![Graphic][7]</img> are the random effects corresponding to the phylogenetic and individual variability, respectively. We assume that these random effects are generated from the multi-dimensional normal distributions ![Graphic][8]</img> (phylogenetic) and ![Graphic][9]</img> (individual). Here, the matrix *A* represents the information contained in the phylogenetic tree, with *A**ij* being proportional to the length of the path from the most recent common ancestor of species *i* and *j* to the main ancestor. This matrix is normalized so that the diagonal elements are all equal to 1. *I* is the identity matrix, and *V* (*a*) and *V* (*e*) are the covariance matrices that govern the process. We fit *μ*, *V* (*a*), and *V* (*e*) using Markov Chain Monte Carlo (MCMC) simulation (see Materials and Methods). We checked that the MCMC converged using the Gelman-Rubin diagnostic (see Materials and Methods, Fig. S1). In addition to the inferred behavioral states corresponding to the common ancestor, ![Graphic][10]</img>, we also reconstructed the mean behavioral representations for the intermediate ancestors (Fig. 3). Further validation of our results corresponding to the current species behavior is shown in Fig. S2.

![FIG. S1.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F7.medium.gif)

[FIG. S1.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F7)

FIG. S1. 
Gelman Rubin diagnostic for model parameters inferred using MCMC. A: PSRF for the 134 ancestral behaviors inferred in the GLMM. 20 MCMC chains with different initial conditions were used. B: PSRF for the phylogenetic covariance matrix elements corresponding to the 10% most common behaviors performed by the measured flies. C: PSRF for the individual covariance matrix elements corresponding to the 10% most common behaviors performed by the measured flies. The PSRF values for all of these inferred parameters indicate that the MCMC chains are converging.

![FIG. S2.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F8.medium.gif)

[FIG. S2.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F8)

FIG. S2. 
Comparison between measured and inferred behaviors (in log scale) for each of the extant species. The mean of the measured behavioral repertoires for all the individuals of a particular species is taken in the log scale. Each measured behavioral mean gets compared to the mean obtained from the components of the MCMC samples corresponding to that particular species and behavioral mode (i.e., the inferred behavioral repertories from the GLMM). The biggest differences occur mostly in the low probability behaviors.

![FIG. 3.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F3.medium.gif)

[FIG. 3.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F3)

FIG. 3. 
Reconstructed behavioral repertoires using the GLMM. Inferred probabilities of the behavioral traits for the ancestral states are plotted in logarithmic scale. Except for the ancestral root, other ancestral states are plotted with respect to the closest ancestor. Here, all but the root ancestor are plotted with respect to their closest ancestral state. Therefore, for each behavioral trait, *i*, we show: ![Graphic][11]</img>, where ![Graphic][12]</img> and ![Graphic][13]</img> correspond to the inferred mean behavioral trait for the given ancestor and its closest ancestor, respectively.

## INDIVIDUAL VARIABILITY AND LONG TIMESCALE CORRELATIONS

While it is not possible to directly test the accuracy of our ancestral state reconstructions, the inferred covariance matrices generate predictions about genetic correlations that are, in principle, testable. We therefore focus on our fitted covariance matrix, ![Graphic][14]</img>, which accounts for within-species random effects.

We first note that *V* (*e*) exhibits a modular structure (Fig. 4A). After rearranging the behavior order via an information-based clustering procedure [40], we see that a block diagonal pattern emerges, with positive correlations lying within the blocks and negative correlations lying off the diagonal. This clustering approach minimizes the functional ![Graphic][15]</img>, where ⟨*d*⟩ is the average within-cluster distance between behaviors (defined here as ![Graphic][16]</img>), *I*(*C*; *b*) is the mutual information between cluster assignment and behavior number, and *β* modulates the relative importance of the two terms (see Materials and Methods). This modular structure emerges when applying other clustering methods as well (Fig. S3). Quantifying the matrix’s modularity, we find that ⟨*d*⟩ ≈ 0.30 and 0.22 for the 3 and 6-cluster solutions respectively. These values are significantly smaller than the average distances obtained using random cluster assignments (⟨*d*⟩ = 0.46 ± 0.03 and 0.45 ± 0.04 for 3 and 6 clusters respectively, see Fig. S5). Strikingly, these clusters are spatially contiguous in the behavioral map – implying that similar behaviors explain much of the intra-species variance [23]. Moreover, new clusters emerge in a hierarchical fashion, where coarse-grained behaviors sub-divide into new clusters (Fig. 4B), a feature that is not guaranteed by the information-based clustering algorithm.

![FIG. S3.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F9.medium.gif)

[FIG. S3.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F9)

FIG. S3. 
Behaviors clustered according to information of the individual covariance matrix using three different clustering methods. A: Results using k-medoids clustering method with distance matrix *d**ij* = (1 − *ρ**ij*)/2 for 2,3,..7 clusters. To the right, the WSI between the clusters obtained using k-medoids and those obtained using predictive information bottleneck method. Clearly, the similarity between these two orthogonal measurements is significant, as can be shown when compared to the WSI calculated by randomly shuffling the labels of the k-medoids clustering corresponding to each number of clusters. B: Same as in A but we used Spectral clustering instead of k-medoids. The similarity index between Spectral clustering and predictive information bottleneck is also statistically significant. C: Same as in A but we used Information based clustering instead of k-medoids. The similarity index between Information based clustering and predictive information bottleneck is statistically significant as well.

![FIG. S4.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F10.medium.gif)

[FIG. S4.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F10)

FIG. S4. 
Coarse-grained behavioral representations that are optimally predictive of the future behavior states via DIB. A: Behavioral representation with 2,3,…,7 clusters using *τ* = 50 in Eq. 5. B: Optimal trade-off curve (Pareto Front) between complexity of coarse grained description against predictive power. For each number of clusters, representations in A correspond to points (in red) in this curve with the highest predictive information

![FIG. S5.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F11.medium.gif)

[FIG. S5.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F11)

FIG. S5. 
Modularity measure of the intra-species behavioral covariance matrix using information based clustering. < *d* > corresponds to the average distance among elements of the same clusters, (see Materials and Methods for definition). We show for different number of clusters that matrix modularity is significantly smaller (in blue) than expected by random assignation of behaviors to clusters (in orange).

![FIG. 4.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F4.medium.gif)

[FIG. 4.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F4)

FIG. 4. 
The structure of variability between flies of the same species relates to long timescale transitions in behavior. A: The intra-species behavioral covariance matrix (*V* (*e*)), with columns and rows ordered via an information-based clustering algorithm [40]. The black squares represent behaviors that are grouped together in the three cluster solution. B: Behavioral map representation of the clustering solutions. The two, three, and six cluster solutions are shown on top (colors on the three cluster solution match those above the plot in A). The clusters are all spatially contiguous and break down hierarchically (see Fig. S3 for more examples). C: Clustering structure of the behavioral space obtained finding the optimally predictive groups of behaviors (see text for details). Note how these clusterings are nearly the same as the clusterings in B, despite having been derived from an entirely independent measure.

This hierarchical structure of the behavioral space is reminiscent of the hierarchical temporal structure of behavior that was hypothesized originally by ethologists [41] and was observed to optimally explain the long timescale structure of *Drosophila melanogaster* behavioral transitions [23]. To explore this connection further, we found coarse-grainings of the behavioral space that are optimally predictive of the future behaviors that the flies perform via the Deterministic Information Bottleneck (DIB) [42]. Similar to the previously described information-based clustering method, this approach minimizes a functional, ![Graphic][17]</img>, where *b*(*t*) is a fly’s behavior at time *t*, *Z*(*t* + *τ*) is the coarse-grained behavior visited at time *t* + *τ*, *τ* = 50, *I*(*b*(*t*); *Z*(*t* + *τ*)) is the mutual information between these quantities, *γ* is a positive constant, and ![Graphic][18]</img> is the entropy of the coarse-grained representation (see Material and Methods). As *γ* is increased, progressively coarser representations are found.

Applying this method to the data pooled across all six species (Figs. 4C, S4), we again found the same type of hierarchical division in the behavioral space that was observed for freely moving *D. melanogaster* [23]. Moreover, we found that the structure of the space using this approach closely mirrors the structure found via clustering *V* (*e*) (Fig. 4C). We quantify the similarity between both clustering partitions by calculating the Weighted Similarity Index (WSI), a modification of the Rand Index [43] (Materials and Methods). The WSI between the information-based clustering method and the predictive information bottleneck for three clusters is *WSI* = 0.73 and *WSI* = 0.87 for six clusters. For random clusterings, we would expect to observe 0.51±0.02 and 0.70±0.01 for 3 and 6 clusters, respectively, indicating a non-random overlap between these two partitions. Fig. S3, shows that this result is independent of the clustering method and the number of clusters.

The overlap between these two coarse-grainings indicates that most individual variability in the behaviors we observe results from non-stationarity in behavioral measurements, rather than from individual-specific variation. That is, much of the intraspecific variation appears to reflect flies recorded when they were experiencing different hidden behavioral states (i.e. “moods”), rather than reflecting fixed (environmental or genetic) differences between flies. This variation may have arisen because, although we controlled many variables (e.g., fly age, circadian cycle, temperature, and humidity), it is not possible to control for all internal factors (e.g., hunger, arousal, etc.) that affect an animal’s behavioral patterns [44]. The temporal coarse-graining of the behavioral space that we found via the DIB, gives insight into these non-stationarities, as they are optimally-predictive of the fly’s future behaviors. Given the contiguous nature of these regions, this result means that flies tended to stay within specific regions of the behavioral space much longer than one would assume from a Markov model.

This observation implies that variation in behavior observed among individuals, especially in non-manipulated settings, is likely to often reflect a large component of hidden behavior states (Fig. 5A). Thus, it may be possible to improve upon behavioral measurements in many settings by controlling for the variability associated with these hidden states. For example, just because one fly performs less anterior grooming than another may reflect that the animal is in a different long timescale behavioral state, rather than that the animal has a genetically encoded preference for reduced grooming.

![FIG. 5.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F5.medium.gif)

[FIG. 5.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F5)

FIG. 5. 
Variability within a species, long timescale transitions, and hidden states modulating behavior. A: A cartoon of the hypothesized relation between individual variability within a species and long timescale transitions through hidden states. B: Accounting for the long timescale dynamics - by adjusting for the amount of time spent in each coarse-grained region (here, the six cluster solution at the top right of Fig. 4C) - affects the measured behavioral distributions between *D. santomea* and *D. yakuba*. Shown is the comparison of the Mahalanobis distance ((*z**b*)*ij*) between behavioral distributions before (x-axis) and after (y-axis) adjusting. C: Kernel density estimates of the distributions for the circled behaviors in B) on the left before (left) and after (right) adjustments. Solid lines represent *D. santomea* and dashed lines represent *D. yakuba*.

A potential method for accounting for these artifacts is to normalize each individual’s behavioral density such that the amount of time that the animal spends in each of the coarse-grained regions is equalized. In other words, the amount of time spent anterior grooming, locomoting, etc. are set to be the same for all animals, thus accounting for the variability associated our inferred hidden states. Mathematically, if *P**i* is the probability of observing behavior *i*, and *C**i* is the clustering assignment of this behavior, we can define a normalized probability, ![Graphic][19]</img>, via ![Formula][20]</img>  where ![Graphic][21]</img> is the total density in cluster *C* for an individual fly and ![Graphic][22]</img> is the average across all animals.

We found that applying this normalization to our data often results in substantial changes in the inferred distributions of behavioral densities. For example, Fig. 5B displays how the difference in behavioral density between *D. santomea* and *D. yakuba* (as measured by the Mahalanobis distance between the distributions) alters as a result of normalization. For some behaviors, the signal increases (red points), and in some cases, it reverses (blue points). Thus, it is important to take these non-stationary effects into account when estimating how often single behaviors are performed in studies of behavioral evolution. To measure these non-stationary effects, many behaviors must be measured, not just a focal behavior.

## IDENTIFYING PHYLOGENETICALLY LINKED BEHAVIORS

One of the advantages of our approach is that we separate variations in behavior corresponding to evolutionary patterns, the phylogenetic variability, from variations among individuals of the same species. By studying the properties of the phylogenetic covariance matrix (*V* (*a*)), we can identify behaviors that may be evolving together.

We first characterized the coarse-grained structure within *V* (*a*) through the information-based clustering described in the previous section [40]. As seen in Fig. 6A, these clusters are not spatially contiguous in the behavioral space. This pattern contrasts to the spatial contiguity we observed for the individual covariance matrix (Fig. 4B). For example, the two-cluster solution (Fig. 6A, left) separates the behavioral space into side legs movements (middle) and certain locomotion gaits (far left) from the rest of behaviors. Similarly, non-localized structure is also observed when the matrix is clustered into a larger numbers of clusters as well.

![FIG. 6.](http://biorxiv.org/https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.17.209361/F6.medium.gif)

[FIG. 6.](http://biorxiv.org/content/early/2020/07/17/2020.07.17.209361/F6)

FIG. 6. 
Phylogenetic variability and behavioral meta-traits. A: (top) Clustering the phylogenetic covariance matrix (using the same information-based clustering method from Fig. 4), we observe that the clusters are no longer spatially contiguous. (bottom) The phylogenetic covariance matrix reordered according to four clusters (colors corresponding to the four-cluster map above). B: Fraction of variance explained by the largest eigenvalues of the phylogenetic covariance matrix. C: The eigenvectors corresponding to the largest six eigenvalues. D: Distributions of the projections of individual density vectors from *D. santomea* and *D. yakuba* onto eigenvector 3. E: Same as in D but using projections of individuals from *D. sechellia* and *D. simulans* onto eigenvector 4. F: Same as in D but using projections of individuals from *D. simulans* and *D. mauritiana* onto eigenvector 5.

One possible interpretation of these discontinuous clusters is that at the neural level, each of these groups of movements may reflect a motor response to shared up-stream commands [22]. For example, different types of locomotion might be controlled through the same descending neural circuitry, but due to evolutionary changes, the same commands could lead to different behavioral outputs, as has been observed in fly courtship patterns [16]. Thus, examination of phylogenetically course-grained regions such as these may provide a more biologically realistic view of suites of evolving behaviors than does focus on single behaviors.

To quantify these patterns as traits, we decomposed *V* (*a*) via an eigendecomposition. As seen in Fig. 6B, almost all of the variance within the matrix can be explained with only the first six eigenmodes. These eigen-vectors (Fig. 6C) share similar non-local structure to the clusterings described above. By projecting individual behavioral vectors onto these eigenvectors, the resulting dot products represent a meta-trait that is a linear combination of phylogenetically linked behaviors.

These evolving meta-traits may be suitable targets for further neurobiological or genetic studies. Three examples of these distributions are shown in in Fig. 6D for several pairs of closely related species. These three examples were not chosen at random, but instead because they showed significant differentiation between species. The aim of this analysis is not to show that all meta-traits would differ between all pairs of species, which strikes us as unlikely, but rather that it is possible to identify synthetic meta-traits that could be further interrogated with experimental methods.

## DISCUSSION

We have developed a quantitative framework to study the evolution of behavioral repertoires, using fruit flies (*Drosophila*) as a model system. We started with observations of 561 individuals from six extant species behaving in an unremarkable environment. This assay did not include social behaviors, such as courtship and aggression, nor many foraging behaviors. Thus, at first glance, it might seem like we had excluded most species-specific behaviors from the analysis. Nonetheless, we found that other complex behaviors, like walking, running, and grooming, exhibit species-specific features that can be used to reliably assign individuals to the correct species. Thus, the motor patterns of behaviors that are not normally investigated for their species-specific features are clearly evolving between even closely related species. It is not clear if these differences reflect natural selection or genetic drift on the details of these motor patterns. But, all of these behaviors would seem to be critical to individual survival, so it is possible that these behaviors have evolved, at least in part, in response to natural selection. It is clear, however, that the underlying neural circuitry controlling these behaviors must have evolved.

Inspired by these observations, we estimated patterns of behavioral evolution in the context of a well-understood phylogeny. We fit a Generalized Mixed Linear Model to our behavioral measurements and the given phylogeny to reconstruct ancestral behavioral repertoires and the intra- and inter-species covariance matrices. We found that the patterns of intra-species variability are similar to long timescale behavioral dynamics. This suggests that much of the intraspecific variability that emerged by sampling flies under well-controlled conditions reflects variability in the hidden behavioral states of individual flies. This variability is a clear confound for evolutionary and experimental studies of behavior and we therefore propose a method to control for these internal states and improve the accuracy of behavioral phenotyping. We showed that controlling for these internal states can dramatically alter estimates of the “heritable” elements of behavior.

Given our estimates for how suites of behaviors evolved, we examined whether the inter-species covariance matrix could be used to identify behavioral metatraits that might be subjected to further evolutionary and experimental analysis. We identified multiple suites of behaviors that differed between closely related species, providing a starting point for further analysis of how the mechanismns underlying these suites of behaviors have evolved.

The analysis framework introduced here represents the first attempt to analyze full behavioral repertoires to gain insight into evolution. In principle, this approach could be applied to any data set where a large number of behaviors have been sampled in many species. We envision several areas where future improvements may yield more detailed, comprehensive, and biologically meaningful results. First, we recorded behavior from only six species of flies. Adding additional species would place more constraints on the evolutionary dynamics, likely resulting in less variance in the ancestral state estimations and potentially adding more structure to the relatively low rank co-variance matrices. Additionally, further work is required to determine the balance between sampling within and between strains and species that optimizes estimates of evolutionary dynamics.

Second, our framework assumes that all evolutionary changes in behavior resemble a diffusion process. Although this assumption is a reasonable initial hypothesis [25], it may be possible to test this assumption. For example, deeper sampling of additional species may allow identification of specific behaviors on particular lineages where neutrality can be rejected [45].

In addition, all of our analyses involved measuring the fraction of behaviors performed during the recording time, ignoring the temporal structure and sequences of movements. While we show here that much of this information can be related to the structure of the intraspecies covariance matrix, the order in which behaviors occured may also provide important biological information. It should be possible to incorporate temporal structure directly into the regression. Deciding exactly which quantities to measure and how they should be incorporated, however, are complex questions that are outside the scope of this initial study.

Lastly, capturing the full range of animal behaviors for a large number of animals presents a number of technological challenges, which is why we focused on measuring behavior in a highly simplified environment. However, a more complete understanding of the structure of behavior will require more sophisticated ways to capture behavioral dynamics in more naturalistic settings and during complex social arrangements. While modern deep learning methods have made tracking animals in more realistic settings increasingly plausible [46, 47], there are still considerable hurdles to translating this information into a form that can be subjected to the kind of analysis we propose here.

Despite these limitations, this work represents a new way to quantitatively characterize the evolution of complex behaviors, which may provide new phenotypes that can be subjected to experimental analysis. In the absence of a behavioral fossil record, reconstructing ancestral behaviors requires an inferential approach like the one we present here. In addition, more complex models could be built to test assumptions underlying this initial, diffusion-based, model. Finally, a strength of our approach is that it makes falsifiable predictions about how behaviors are linked mechanistically, providing predictions that can be tested experimentally to provide further insight in the genetic and neurobiological structure of behavior.

## MATERIALS AND METHODS

### Data collection

All imaging of fly behavior followed the procedures described in [22], but without any red light stimulation. In total, we collected data from 561 individual from 18 strains and six species. These included three strains of *D. mauritiana* (*mau29* : 29 flies, *mau317* : 35 flies, *mau318* : 32 flies), four strains of *D. melanogaster* (*Canton-S* : 31 flies, *Oregon-R*: 33 flies, *mel54* : 34 flies, *mel56* : 31 flies), three strains of *D. santomea* (*san00* : 29 flies, *san1482* : 33 flies, *STO OBAT* : 22 flies), three strains of *D. sechellia* (*sech28* : 32 flies, *sech340* : 25 flies, *sech349* : 33 flies), three strains of *D. simulans* (*sim5* : 33 flies, *sim199* : 30 flies, *Oxnard* : 34 flies), and two strains of *D. yakuba* (*yak01* : 34 flies, *CYO2* : 31 flies).

### Generalized Linear Mixed Model

We fit our GLMM (Eq. 1) using the software introduced in [24]. The covariance matrices *V* (*e*) and ![Graphic][23]</img>, *K* = 134 and the mean vector ![Graphic][24]</img>, were inferred from the posterior distribution via MCMC sampling. Prior distributions for the covariance matrices were given by Inverse Wishart Distributions (conjugate priors for the multi-Gaussian model) with *K* degrees of freedom and ![Graphic][25]</img> as scale matrix, with *J* and *I* the unit and identity matrices respectively. Tree branch length were estimated from [30].

### Gelman-Rubin convergence diagnostic

This test evaluates MCMC convergence by analyzing the difference between several Markov chains. Convergence is evaluated by comparing the estimated between-chains and within-chain variances for each parameter of the model. Large differences between these variances indicate non-convergence [48]. Let *θ* be the model parameter of interest and ![Graphic][26]</img> be the *m*th simulated chain, *m* = 1, 2, …, *M*. Denote, ![Graphic][27]</img> and ![Graphic][28]</img> be the sample posterior mean and variance of the *m*th chain. If ![Graphic][29]</img> is the overall posterior mean estimator, the between-chains and within-chain variances are given by: ![Formula][30]</img>  

In reference [48], it is shown that the following weighted average of *W* and *B* is an unbiased estimator of the marginal posterior variance of *θ*: ![Graphic][31]</img>.

The ratio ![Graphic][32]</img> should get close to one as the M chains converge to the target distribution with *N* → ∞. In reference [49] this ratio known as the potential scale reduction factor (PSRF) was corrected to account for the the sampling variability using ![Graphic][33]</img>, where *d* is the degrees of freedom estimate of a t-distribution. Values of PSRF for all model parameters such that *R**c* < 1.1 are used in [49] as a criteria for convergence of the MCMC chains. Here, we used 20 independent chains, each with a different initialization.

### Information-based clustering

Minimizes the distance between elements within clusters while compressing the original representation. The method minimizes the functional ![Graphic][34]</img>, where ![Graphic][35]</img> is the mutual information between the original behavioral variable *i* and the representation *C*. ![Graphic][36]</img>, and *d*(*C*) is the average distance of elements chosen out of a single cluster: ![Formula][37]</img>  with *d*(*i*1, *i*2) being the distance measure between a pair of elements and *P*(*i* | *C*) being the probability to find element *i* in cluster *C*.

Given | *C* | = *N**c*, *T* and a random initial condition for *P*(*C* | *i*), a solution is obtained by iterating the following self-consistent equations until the criteria ![Graphic][38]</img> is satisfied. We chose 40, 000 different random values of *T* ∈ [0.1, 1000], *N**c* between 2 and 20, and performed the optimization in each case until the convergence criterion was met. We defined the Pareto front as the set of solutions *P*(*C* | *i*) such that no other solution presents a smaller ⟨*d*⟩ and a smaller *I*(*C*; *i*). Finally, for each number of clusters we selected the solution with the lowest ⟨*d*⟩.

For each number of clusters, the significance of the optimal value found for ⟨*d*⟩ is shown by comparing it to the average distance corresponding to random cluster assignments. These assignments are made in such a way that the amount of elements per cluster is conserved by randomly shuffling the vector that assigns each behavior to a particular cluster. The values presented in the main text correspond to the mean and standard deviation of ⟨*d*⟩ over 50 different random trials.

### Deterministic Information Bottleneck

Here we use the Deterministic Information Bottleneck (DIB) method to find coarse-grainings of the behavioral space that optimally predict future states [42]. Inspired by the Information Bottleneck [50], DIB replaces the compression measure *I*(*X, Z*) with the entropy *H*(*Z*), thus emphasizing constraints on the representation. DIB minimizes the functional: ![Formula][39]</img>  with respect to *p*(*z* ∈ *Z*|*x* ∈ *X*) and takes the limit as *α* → 0.

To apply DIB to the behavioral dynamics, we count time in units of the transitions between states, providing a discrete time series of behaviors, *b*(*n*) that can take on *N* = 134 different integer values at each discrete time *n*. Here, we relate the joint distributions of *b*(*n*) (*X* in Eqn. 5) and *b*(*n* + *τ*) (*Y*) through a coarse-grained clustering of the behavioral states (*Z*). We chose 10, 000 different pairs of random values for *β* between 0.1 and 104 and *N**c* between 2 and 30 clusters. Given *N**c*, *β* and a random initial condition for *p*(*t* | *x*), we find a solution by iterating the self-consistent equations [42] until the convergence criteria ![Graphic][40]</img> is satisfied. If any cluster has its probability become zero at any iteration, then that cluster is dropped for all future iterations, thus *N**c* is the maximum number of clusters that can be returned. Of these 10, 000 solutions, we keep all solutions that are on the Pareto front (i.e., no other solution has both a higher *I*(*Y*; *T*) and a smaller *H*(*T*)). The displayed clusters are the solutions on the Pareto front with the largest *I*(*Y*; *Z*) for a given number of clusters.

### Weighted Similarity Index

We quantify the similarity between clustering partitions by calculating the Weighted Similarity Index (WSI), a modification of the Rand Index [43] such that behaviors contribute the index according to their overall probability. Specifically, ![Formula][41]</img>  where *S**a*(*S**b*) is the set of pairs of behaviors that belong to the same (different) cluster in the two partitions and *P**k* is the probability of observing behavior *k*.

We thank Ilya Nemenman and Daniel Weissman for their helpful comments on the manuscript. D.G.H. was supported by Programa Raices from the MinCyT. C.R. was supported by the NSF Physics of Living Systems Student Research Network (1806833). G.J.B. was supported by NIMH R01 MH115831-01, the Human Frontier Science Program (RGY0076/2018), and a Cottrell Scholar Award, a program of the Research Corporation for Science Advancement (25999). J.C., D.L.S., and G.J.B. were supported by the Howard Hughes Medical Institute and the Janelia visiting researcher program.

*   Received July 17, 2020.
*   Revision received July 17, 2020.
*   Accepted July 17, 2020.


*   © 2020, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), CC BY-NC 4.0, as described at [http://creativecommons.org/licenses/by-nc/4.0/](http://creativecommons.org/licenses/by-nc/4.0/)

## References

1.  [1]. K. Z. Lorenz, Scientific American 199, 67 (1958).
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=13602816&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1958WD73900004&link_type=ISI) 

2.  [2]. E. P. Martins and  E. L. P. Martins, Phylogenies and the comparative method in animal behavior (Oxford University Press, 1996).
    
    
3.  [3]. F. Baier and  H. E. Hoekstra, Proceedings of the Royal Society B 286, 20191697 (2019).
    
    
4.  [4]. M. J. West-Eberhard, Developmental plasticity and evolution (Oxford University Press, Oxford, U.K., 2003).
    
    
5.  [5]. J. M. Gleason and  M. G. Ritchie, Genetics 166, 1303 (2004).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZ2VuZXRpY3MiO3M6NToicmVzaWQiO3M6MTA6IjE2Ni8zLzEzMDMiO3M6NDoiYXRvbSI7czo0ODoiL2Jpb3J4aXYvZWFybHkvMjAyMC8wNy8xNy8yMDIwLjA3LjE3LjIwOTM2MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

6.  [6]. D. Yamamoto and  Y. Ishikawa, Journal of Neurogenetics 27, 130 (2013).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.3109/01677063.2013.800060&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23806054&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

7.  [7]. C. Ellison,  C. Wiley, and  K. Shaw, Journal of Evolutionary Biology 24, 1110 (2011).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1111/j.1420-9101.2011.02244.x&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=21375646&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

8.  [8]. K. L. Shaw and  S. C. Lesnick, Proceedings of the National Academy of Sciences 106, 9737 (2009).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTA2LzI0Lzk3MzciO3M6NDoiYXRvbSI7czo0ODoiL2Jpb3J4aXYvZWFybHkvMjAyMC8wNy8xNy8yMDIwLjA3LjE3LjIwOTM2MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

9.  [9]. T. M. Williams and  S. B. Carroll, Nature Reviews Genetics 10, 797 (2009).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nrg2687&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=19834484&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000270915400015&link_type=ISI) 

10. [10]. N. Shubin,  C. Tabin, and  S. B. Carroll, Nature 457, 818 (2009).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature07891&link_type=DOI) 
    
    [GeoRef](http://biorxiv.org/lookup/external-ref?access_num=2009053843&link_type=GEOREF) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=19212399&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000263266700031&link_type=ISI) 

11. [11]. M. Levine and  E. H. Davidson, Proceedings of the National Academy of Sciences 102, 4936 (2005).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMToiMTAyLzE0LzQ5MzYiO3M6NDoiYXRvbSI7czo0ODoiL2Jpb3J4aXYvZWFybHkvMjAyMC8wNy8xNy8yMDIwLjA3LjE3LjIwOTM2MS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

12. [12]. D. L. Stern and  N. Frankel, Philosophical Transactions of the Royal Society B: Biological Sciences 368, 20130028 (2013).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1098/rstb.2013.0028&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=24218640&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

13. [13]. B. S. Baker,  B. J. Taylor, and  J. Hall, Cell 105, 13 (2001).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/S0092-8674(01)00293-8&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=11300999&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000168063300004&link_type=ISI) 

14. [14]. J. Cande,  P. Andolfatto,  B. Prud’homme,  D. L. Stern, and  N. Gompel, PLoS One 7, 1 (2012).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1371/journal.pone.0035495&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22506074&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

15. [15]. J. Cande,  D. L. Stern,  T. Morita,  B. Prud’homme, and  N. Gompel, Cell reports 8, 363 (2014).
    
    
16. [16]. Y. Ding,  J. L. Lillvis,  J. Cande,  G. J. Berman,  B. J. Arthur,  M. Xu,  B. J. Dickson, and  D. L. Stern, Current Biology 29, 1089 (2019).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.cub.2019.02.019&link_type=DOI) 

17. [17]. J. N. Webert,  B. K. Peterson, and  H. E. Hoekstra, Nature 493, 402 (2013).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature11816&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23325221&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000313615900051&link_type=ISI) 

18. [18]. C. K. Hu and  H. E. Hoekstra, Seminars in cell & developmental biology 61, 107 (2016).
    
    
19. [19]. G. J. Berman,  D. M. Choi,  W. Bialek, and  J. W. Shaevitz, Journal of The Royal Society Interface 11, 20140672 (2014).
    
    
20. [20]. G. J. Berman, BMC Biology 16, 23 (2018).
    
    
21. [21]. A. E. X. Brown and  B. de Bivort, Nature Physics 20, 410 (2018).
    
    
22. [22]. J. Cande,  S. Namiki,  J. Qiu,  W. Korff,  G. M. Card,  J. W. Shaevitz,  D. L. Stern, and  G. J. Berman, eLife 7, e34275 (2018).
    
    
23. [23]. G. J. Berman,  W. Bialek, and  J. W. Shaevitz, Proceedings of the National Academy of Sciences 113, 11943 (2016).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTEzLzQyLzExOTQzIjtzOjQ6ImF0b20iO3M6NDg6Ii9iaW9yeGl2L2Vhcmx5LzIwMjAvMDcvMTcvMjAyMC4wNy4xNy4yMDkzNjEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

24. [24]. J. Hadfield, Journal of Statistical Software 33, 1 (2010).
    
    
25. [25]. J. Felsenstein, The American Naturalist 125, 1 (1985).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1086/284325&link_type=DOI) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1985AAK4600001&link_type=ISI) 

26. [26]. J. D. Hadfield and  S. Nakagawa, Journal of evolutionary biology 23, 494 (2010).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1111/j.1420-9101.2009.01915.x&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=20070460&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000273949900005&link_type=ISI) 

27. [27].Drosophila 12 Genomes Consortium, Nature 450, 203 (2007).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nature06341&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=17994087&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000250746200042&link_type=ISI) 

28. [28]. D. J. Obbard,  J. Maclennan,  K.-W. Kim,  A. Rambaut,  P. M. O’Grady, and  F. M. Jiggins, Molecular Biology and Evolution 29, 3459 (2012).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1093/molbev/mss150&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=22683811&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000310167700019&link_type=ISI) 

29. [29]. S. Chyb and  N. Gompel, Atlas of Drosophila Morphology: Wild-type and classical mutants (Academic Press, 2013).
    
    
30. [30]. A. S. Seetharam and  G. W. Stuart, PeerJ 1, e226 (2013).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.7717/peerj.226&link_type=DOI) 

31. [31]. D. L. Stern,  J. Crocker,  Y. Ding,  N. Frankel,  G. Kappes,  E. Kim,  R. Kuzmickas,  A. Lemire,  J. D. Mast, and  S. Picard, G3 7, 1339 (2017).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiZ2dnIjtzOjU6InJlc2lkIjtzOjg6IjcvNC8xMzM5IjtzOjQ6ImF0b20iO3M6NDg6Ii9iaW9yeGl2L2Vhcmx5LzIwMjAvMDcvMTcvMjAyMC4wNy4xNy4yMDkzNjEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

32. [32]. D. Yamamoto and  Y. Ishikawa, Journal of Neurogenetics 27, 130 (2013).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.3109/01677063.2013.800060&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=23806054&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

33. [33]. T. O. Auer and  R. Benton, Current opinion in neurobiology 38, 18 (2016).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.conb.2016.01.004&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=26851712&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

34. [34]. L. van der Maaten and  G. Hinton, Journal of Machine Learning Research 9, 2579 (2008).
    
    
35. [35]. F. Meyer, Signal processing 38, 113 (1994).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/0165-1684(94)90060-4&link_type=DOI) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1994NX65900010&link_type=ISI) 

36. [36]. I. Goodfellow,  Y. Bengio, and  A. Courville, Deep learning (MIT press, Cambridge, MA, 2016).
    
    
37. [37]. Z. Yang et al., Computational molecular evolution, Vol. 284 (Oxford University Press Oxford, Oxford, U.K., 2006).
    
    
38. [38]. C. W. Cunningham,  K. E. Omland, and  T. H. Oakley, Trends in Ecology & Evolution 13, 361 (1998).
    
    
39. [39]. J. Hadfield and  S. Nakagawa, Journal of evolutionary biology 23, 494 (2010).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1111/j.1420-9101.2009.01915.x&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=20070460&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=000273949900005&link_type=ISI) 

40. [40]. N. Slonim,  G. S. Atwal,  G. Tkačik, and  W. Bialek, Proceedings of the National Academy of Sciences 102, 18297 (2005), [http://www.pnas.org/content/102/51/18297.full.pdf](http://www.pnas.org/content/102/51/18297.full.pdf).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoicG5hcyI7czo1OiJyZXNpZCI7czoxMjoiMTAyLzUxLzE4Mjk3IjtzOjQ6ImF0b20iO3M6NDg6Ii9iaW9yeGl2L2Vhcmx5LzIwMjAvMDcvMTcvMjAyMC4wNy4xNy4yMDkzNjEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

41. [41]. N. Tinbergen, The Study of Instinct (Oxford University Press, Oxford, U. K., 1951).
    
    
42. [42]. D. Strouse and  D. J. Schwab, Neural Computation 29, 1611 (2017).
    
    
43. [43]. W. M. Rand, Journal of the American Statistical association 66, 846 (1971).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.2307/2284239&link_type=DOI) 
    
    [Web of Science](http://biorxiv.org/lookup/external-ref?access_num=A1971L498800028&link_type=ISI) 

44. [44]. D. J. Anderson, Nature Reviews Neuroscience 17, 692 (2016).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1038/nrn.2016.125&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=27752072&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

45. [45]. F. Tajima, Genetics 135, 599 (1993).
    
    [Abstract/FREE Full Text](http://biorxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiZ2VuZXRpY3MiO3M6NToicmVzaWQiO3M6OToiMTM1LzIvNTk5IjtzOjQ6ImF0b20iO3M6NDg6Ii9iaW9yeGl2L2Vhcmx5LzIwMjAvMDcvMTcvMjAyMC4wNy4xNy4yMDkzNjEuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 

46. [46]. T. D. Pereira,  D. E. Aldarondo,  L. Willmore,  M. Kislin,  S. S. H. Wang,  M. Murthy, and  J. W. Shaevitz, Nature Methods 16, 117 (2018).
    
    
47. [47]. M. W. Mathis and  A. Mathis, Current Opinion in Neurobiology 60, 1 (2020).
    
    [CrossRef](http://biorxiv.org/lookup/external-ref?access_num=10.1016/j.conb.2019.10.008&link_type=DOI) 
    
    [PubMed](http://biorxiv.org/lookup/external-ref?access_num=31791006&link_type=MED&atom=%2Fbiorxiv%2Fearly%2F2020%2F07%2F17%2F2020.07.17.209361.atom) 

48. [48]. A. Gelman and  D. B. Rubin, Statistical Science, 457 (1992).
    
    
49. [49]. S. P. Brooks and  A. Gelman, Journal of computational and graphical statistics 7, 434 (1998).
    
    
50. [50]. N. Tishby,  F. C. Pereira, and  W. Bialek, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing (University of Illinois Press, Urbana-Champaign, IL, 1999) pp. 368–377.

 [1]: /embed/inline-graphic-1.gif
 [2]: /embed/graphic-3.gif
 [3]: /embed/inline-graphic-2.gif
 [4]: /embed/inline-graphic-3.gif
 [5]: /embed/inline-graphic-4.gif
 [6]: /embed/inline-graphic-5.gif
 [7]: /embed/inline-graphic-6.gif
 [8]: /embed/inline-graphic-7.gif
 [9]: /embed/inline-graphic-8.gif
 [10]: /embed/inline-graphic-9.gif
 [11]: F3/embed/inline-graphic-10.gif
 [12]: F3/embed/inline-graphic-11.gif
 [13]: F3/embed/inline-graphic-12.gif
 [14]: /embed/inline-graphic-13.gif
 [15]: /embed/inline-graphic-14.gif
 [16]: /embed/inline-graphic-15.gif
 [17]: /embed/inline-graphic-16.gif
 [18]: /embed/inline-graphic-17.gif
 [19]: /embed/inline-graphic-18.gif
 [20]: /embed/graphic-7.gif
 [21]: /embed/inline-graphic-19.gif
 [22]: /embed/inline-graphic-20.gif
 [23]: /embed/inline-graphic-21.gif
 [24]: /embed/inline-graphic-22.gif
 [25]: /embed/inline-graphic-23.gif
 [26]: /embed/inline-graphic-24.gif
 [27]: /embed/inline-graphic-25.gif
 [28]: /embed/inline-graphic-26.gif
 [29]: /embed/inline-graphic-27.gif
 [30]: /embed/graphic-9.gif
 [31]: /embed/inline-graphic-28.gif
 [32]: /embed/inline-graphic-29.gif
 [33]: /embed/inline-graphic-30.gif
 [34]: /embed/inline-graphic-31.gif
 [35]: /embed/inline-graphic-32.gif
 [36]: /embed/inline-graphic-33.gif
 [37]: /embed/graphic-10.gif
 [38]: /embed/inline-graphic-34.gif
 [39]: /embed/graphic-11.gif
 [40]: /embed/inline-graphic-35.gif
 [41]: /embed/graphic-12.gif