Learning to embed lifetime social behavior from interaction dynamics

Interactions of individuals in complex social systems give rise to emergent behaviors at the group level. Identifying the functional role that individuals take in the group at a specific time facilitates understanding the dynamics of these emergent processes. An individual’s behavior at a given time can be partially inferred by common factors, such as age, but internal and external factors also substantially influence behavior, making it difficult to disentangle common development from individuality. Here we show that such dependencies on common factors can be used as an implicit bias to learn a temporally consistent representation of a functional role from social interaction networks. Using a unique dataset containing lifetime trajectories of multiple generations of individually-marked honey bees in two colonies, we propose a new temporal matrix factorization model that jointly learns the average developmental path and structured variations of individuals in the social network over their entire lives. Our method yields inherently interpretable embeddings that are biologically relevant and consistent over time, allowing one to compare individuals’ functional roles regardless of when or in which colony they lived. Our method provides a quantitative framework for understanding behavioral heterogeneity in complex social systems, and is applicable to fields such as behavioral biology, social sciences, neuroscience, and information science. Author summary Group-level emergent behaviors are the result of interactions between individual group members. To understand these social dynamics, one must objectively measure the function of an individual in their group at any given time. Ideally, one would also like to compare individuals from different groups, for example, to measure how specific environmental conditions or other external factors influence group behavior. Unfortunately, such an objective measure is hard to obtain because the group and its dynamics constantly change, making it challenging to define an individual’s role in the group as a function of its actions and interactions. We propose a principled approach to model individuals in complex social systems by considering that function often depends, at least partially, on common factors such as age. The model learns a meaningful and interpretable descriptor for all individuals, and can be used to understand how complex social systems function and the emergence of group behavior.

1 Introduction 1 Animals living in groups often coordinate their behavior, resulting in emergent properties 2 at the group level. The dynamics of the inter-individual interactions produce, for 3 example, the coherent motion patterns of flocking birds and shoaling fish, or the results 4 of democratic elections in human societies. In many social systems, individuals differ 5 consistently in how, when, and with whom they interact. The way an individual 6 participates in social interactions and therefore contributes to the emergence of group- 7 level properties can be understood as its functional role within the collective [1][2][3][4]. 8 Technological advances have made it possible to track all individuals and their 9 interactions, ranging from social insects to primate groups [5][6][7][8][9][10]. These methods 10 produce datasets that have unprecedented scale and complexity, but identifying and 11 understanding the functional roles of the individuals within their groups has emerged as 12 a new and challenging problem in itself. Social network analysis of interaction networks 13 has proven to be a promising approach because interaction networks are comparatively 14 straightforward to obtain from tracking data, and the networks represent each individual 15 in the global context of the group [2,3,11,12]. 16 In most social systems, the way individuals interact changes over time, due to new 17 experiences, environmental changes, or physiological conditions. Furthermore, groups 18 themselves also tend to change, both in size and composition [13][14][15][16][17][18]. Despite these 19 changes over time, an objective measure of the functional role should identify individuals 20 that serve a similar function (e.g. a guard versus a forager). Unfortunately, we are 21 now facing a recursive definition of function: We are trying to derive the function of an 22 individual from the network, but the network itself is also a function of the individuals' 23 behavior (and other factors). Still, consider a group-living species in which only a subset 24 of individuals engage in nursing duties. If we analyze the networks of different groups of 25 the same species in different environmental conditions and group sizes, we still expect 26 an objective measure of function to be shared among individuals engaged in nursing, 27 regardless of these confounding factors. How can we extract such an objective measure 28 from a constantly changing network of interactions without a fixed frame of reference? 29 In many social systems, individuals share common factors that partially determine the 30 roles they take. For example, an individual's age can have a strong influence on behavior. 31 In humans, factors such as socioeconomic status are comparatively easy to measure yet 32 determine behavior and, therefore, interactions to a large extent. If individuals take on 33 roles partially determined by a common factor, can we use this dependency to learn an 34 objective measure of function? Here, we show that such common factors are a powerful 35 inductive bias to learn semantically consistent functional descriptors of individuals over 36 time, even in highly dynamic social systems. 37 In recent years, methods that automatically learn semantic embeddings from high-38 dimensional data have become popular. These methods map entities into a learned 39 vector space. For example, in natural language models, a word can be represented as 40 a vector, such that specific regions in the manifold of learned embeddings correspond 41 to words with similar meaning. Similarly, recommender systems can learn meaningful 42 embeddings of users and items, for example, movies, such that similar entities cluster in 43 the manifold of learned embeddings [19][20][21][22]. 44 Such embeddings are usually learned from the data without additional supervision. 45 In recommender systems, a movie's genre is usually not given in a dataset of user ratings, 46 yet the genre can be identified given the learned embeddings [23]. This capability of 47 learning embeddings from raw data and using them in downstream tasks is desirable in 48 datasets of social interactions, where raw data is often abundant but labels are hard to 49 acquire. Furthermore, embeddings are interpretable. For example, vector arithmetic of 50 word embeddings can be used to understand how semantic concepts the natural language 51 model has learned from the data relate to each other [24]. For entities that change 52 over time, trajectories of embeddings can be analyzed, i.e., how one entity changes 53 within the learned manifold of embeddings. Such analyses can, for example, reveal how 54 environmental conditions such as resource availability affect behavioral changes within 55 the group [25,26]. 56 Most real-world networks have a hierarchical organization with overlapping communi-57 ties, and thus soft community detection algorithms are often used to group and describe 58 entities [26][27][28]. Non-negative matrix factorization (NMF) is a principled and scalable 59 method to learn embeddings from data that can be represented in matrix form, such as 60 interaction networks. NMF has an inherent soft clustering property and is therefore well 61 suited to derive embeddings from social interaction networks [29]. If the embeddings 62 allow us to predict relevant behavioral properties, they serve our understanding as 63 semantic representations.

64
In symmetric non-negative matrix factorization (SymNMF), the dot products of any 65 two individuals' embeddings (factor vectors) reconstruct their interaction affinity [30,31], 66 see Figure 1 a and b). However, this algorithm has no straightforward extension in tem-67 poral settings where the interaction matrices change over time. The interaction matrices 68 at different time points can be factorized individually, but there is no guarantee that 69 the embeddings stay semantically consistent over time. The dot product is permutation 70 invariant, therefore factorization can result in different embeddings depending on the 71 optimization method being used, or noise in the data. Consider the hypothetical case of 72 two groups of animals of the same species with two tasks, guards and nurses. Factorizing 73 the interaction matrices of both groups will likely reveal two clusters, but there is no 74 guarantee that the same cluster will be assigned to the same task for both groups. The 75 same problem can occur in the case of only one group with new animals emerging and 76 some dying over time without any changes in the distribution of tasks on the group level. 77 In this case, the embeddings are not semantically consistent over time. The prediction of 78 relevant behavioral properties will deteriorate, and individuals cannot be meaningfully 79 compared against each other.

80
Several approaches to extend NMF to temporal settings have been proposed in a 81 variety of problem settings. Previous work proposed factorization methods for time 82 series analysis [32,33], while others focus on the analysis of communities that are 83 determined by their temporal activity patterns [34]. Jiao and coworkers consider the 84 case of communities from graphs over time and enforce temporal consistency with an 85 additional loss term [35]. Several previous works represent network embeddings as a 86 function of time [36] and [37], but the meaning of these embeddings can still shift over 87 time. Temporal matrix factorization is similar to the tensor decomposition problem, 88 which has many proposed solutions, see review by [38]. In particular, time-shifted tensor 89 decomposition methods have been used in multi-neuronal spike train analysis, when 90 recordings of multiple trials from a population of neurons are available [39,40]. 91 We approach this problem in the honey bee, a popular model system for studying 92 individual and collective behavior [41]. Honey bees allocate tasks across thousands of 93 individuals without central control, using an age-based system: young bees care for 94 brood, middle-aged bees perform within-nest labor, and old bees forage outside [42,43]. 95 While age is a good predictor for the task of an average bee, individuals often deviate 96 drastically from this common developmental trajectory due to internal and external 97 factors. Honey bee colonies are also organized spatially: brood is reared in the center, 98 honey and pollen are stored at the periphery, and foragers offload nectar near the exit. 99 Therefore, an individual's role is partially reflected in its location, which provides the 100 unique opportunity to evaluate whether learned embeddings based on the interaction 101 data alone are meaningful. . For a daily snapshot of a temporal social network, symmetric NMF is able to extract meaningful factor representations of the individuals. Colors represent the interaction frequencies of all individuals (a). The age-based division of labor in a honey bee colony is clearly reflected in the two factors -same-aged individuals are likely to interact with each other (b). For long observation windows spanning several weeks, the social network changes drastically as individuals are born, die, and switch tasks (c). Here, we investigate how a representation of temporal networks can be extracted, such that the factors representing individuals can be meaningfully compared over time, and even across datasets.
these embeddings can be used to predict task allocation, survival, activity patterns, 105 and future behavior [12]. The method proposed here is conceptually similar but solves 106 several remaining challenges. Here, we introduce Temporal NMF (TNMF), which yields 107 consistent semantic embeddings even for individuals from disjoint datasets, for example, 108 data from different colonies, or for long-duration recordings that contain multiple lifetime 109 generations.

110
TNMF jointly learns a) a functional form of the average trajectory of embeddings along 111 the common factor, b) a set of possible functional deviations from the average trajectory, 112 and c) for each individual, a soft-clustering assignment (individuality embedding) to 113 these deviations. We show that these representations can be learned in an unsupervised 114 fashion, using only interaction matrices of the individuals over time. We analyze how 115 well the model is able to disentangle common development from individuality using 116 a synthetic dataset. Furthermore, we introduce a unique dataset containing lifetime 117 trajectories of multiple generations of individually-marked honey bees in two colonies. 118 We evaluate how well the embeddings learned by TNMF capture the semantic differences 119 of individual honey bee development by evaluating their predictiveness for different 120 tasks and behaviorally relevant metrics compared to several baseline models proposed in 121 previous works.
Interactions A t age c(t, j) age c(t, i) Individuality embeddings When applied to social networks, f (i) can represent the role of an entity within the 125 social network A [30, 31] -however, in temporal settings, factorizing the matrices for 126 different times separately will result in semantically inconsistent factors.

127
Here we present a novel temporal NMF algorithm (TNMF ) which extends SymNMF to temporal settings in which A ∈ R T ×N ×N + changes over time t. We assume that the entities i ∈ {0, 1, . . . , N } follow to some extent a common trajectory depending on an observable property (for example the age of an individual). We represent an entity at a specific point in time t using a factor vector f + (t, i) such that In contrast to SymNMF, we do not directly factorize A t to find the optimal factors that reconstruct the matrices. Instead, we decompose the problem into learning an average trajectory of factors m(c(t, i)) and structured variations from this trajectory o(t, i) that depend on the observable property c(t, i): This decomposition is an inductive bias that allows the model to learn semantically 129 consistent factors for entities, even if they do not share any data points (e.g., there is 130 no overlap in their interaction partners), as long as the relationship between functional 131 role and c(t, i) is stable. Note that in the simplest case c(t, i) = t, TNMF can be seen 132 as a tensor decomposition model, i.e. the trajectory of all entities is aligned with the 133 temporal dimension t of A. In our case, c(t, i) maps to the age of individual i at time t. 134 While many parameterizations for the function o(t, i) are possible, we only consider one particular case in this work: We learn a set of individuality basis functions b(c(t, i)) (shared among all entities) that define a coordinate system of possible individual variations and the individuality embeddings φ, which capture to what extent each basis function applies to an entity: where K is the number of learned basis functions. This parameterization allows us to 135 disentangle the forms of individual variability (individuality basis functions) and the 136 distribution of this variability (individuality embeddings) in the data. 137 We implement the functions m(c(t, i)) and b(c(t, i)) with small fully connected neural networks with non-linearities and several hidden layers. The parameters θ of these functions and the entities' embeddings φ are learned jointly using minibatch stochastic gradient descent:θ Note that non-negativity is not strictly necessary, but we only consider the non-138 negative case in this work for consistency with prior work [30,31]. Furthermore, instead 139 of one common property with discrete time steps, the factors could depend on multiple 140 continuous properties, i.e. c : R T ×N → R P , e.g. the day and time in a intraday analysis 141 of social networks. 142 We find that the model's interpretability can be improved using additional regular-143 ization terms without significantly affecting its performance. We encourage sparsity in 144 both the number of used factors and individuality basis functions by adding L 1 penalties 145 of the mean absolute magnitude of the factors f (t, i) and basis functions b(c(t, i)) to the 146 objective. We encourage individuals' lifetimes to be represented with a sparse embedding 147 using an L 1 penalty of the learned individuality embeddings φ. 148 We also introduce an optional adversarial loss term to encourage the model to learn 149 embeddings that are semantically consistent over time, i.e. to only represent two entities 150 that were present in the dataset at different times with different embeddings if this is 151 strictly necessary to factorize the matrices A. We jointly train a discriminative network 152 d(φ i ) that tries to classify the time of the first occurrence of all entities based on their 153 individuality embeddings φ. The negative cross-entropy loss of this model is added as a 154 regularization term to equation 5 in a training regime similar to generative adversarial 155 networks [44]. Note that a high cross-entropy loss of the discriminative network d(φ i ) 156 implies that the distribution of individuality embeddings φ is consistent over time. See 157 appendix S1.1 for more details and S2 for an ablation study of the effect the individual 158 regularization terms have on the results of the model. 159 We implemented the model using PyTorch [45] and trained it in minibatches of 160 256 individuals for 200 000 iterations with the Adam optimizer [46]. We calculate the 161 reconstruction loss A t −Â t 2 only for valid entries, i.e., we mask out all matrix elements 162 where one of the individuals is not alive at the given time t. See appendix S1.3 for the 163 architecture of the learned functions, a precise description of the regularization losses, 164 and further hyperparameters. The code of our reference implementation is publicly 165 available: https://github.com/nebw/temporal_nmf. We created synthetic datasets using a generative model of interactions based on a 169 common latent trajectory of factors and groups with structured variations from this 170 trajectory. We compute the number of interactions between two individuals as the dot 171 product of their latent factors and additive Gaussian noise. Using these datasets we 172 can evaluate whether the model successfully converges and is able to correctly identify 173 which individual belongs to which latent group, even in the presence on high amounts of 174 observational noise. While we believe that such a latent structure exists in most complex 175 social systems, it is not directly observable, and thus, for data from a real system, we 176 can only evaluate the model on proxy measures (see section 2.3) that are observable. 177 We model a common lifetime trajectory of factors using a smoothed Gaussian random 178 walk in R + with σ walk = 1 for the steps of the random walk and σ smoothing = 10 179 for the Gaussian smoothing kernel. See Figure 3 a) for one example of a generated 180 lifetime trajectory with three factors. We then randomly create latent groups by creating 181 smoothed Gaussian random walks that define how these groups differ from the common 182 lifetime trajectory. See Figure 3 b) for the lifetime trajectory of one latent group. For 183 each group, we also define different expected mean lifetimes. We set the average lifetime 184 of an entity to 30 days with a standard deviation of 10 days. We then randomly assign 185 1024 individuals to those latent groups and also assign random dates of emergence and 186 disappearance of these individuals in the dataset. We then compute the individual factor 187 trajectories for each individual, as can be seen in Figure 3 c). Finally, for 100 days of 188 simulated data, we generate interaction matrices by computing the dot products of the 189 factors of all individuals (Figure 3 d). 190 We then measure how well the individuality embeddings φ of a fitted model match 191 the true latent groups from the generative model using the adjusted mutual information 192 score [47]. Furthermore, we measure the mean squared error between the ground 193 truth factors and the best permutation of the factors f + . We evaluate the model on 194 128 different random synthetic datasets with increasing Gaussian noise levels in the 195 interaction tensor. Honey bees are an ideal model system with a complex and highly dynamic social structure. 198 The entire colony is observable most of the time. In recent years, technological advances 199 have made it possible to automatically track individuals in entire colonies of honey bees 200 over long periods of time [6,10,48]. We analyze a dataset obtained by tracking thousands 201 of individually marked honey bees at high temporal and spatial resolution, covering 202 entire lifespans and multiple generations.

203
Two colonies of honey bees were continuously recorded over a total of 155 days. Each 204 individual was manually tagged at emergence, so the date of birth is known for each 205 bee. Timestamps, positions, and unique identifiers of all (N=9286) individuals from 206 these colonies were obtained using the BeesBook tracking system [10,12,48]. See Table 1 207 for dates and number of individuals. Temporal affinity matrices were derived from this 208 data as follows: For each day, counts of proximity contact events were extracted. Two 209 individuals were defined to be in proximity if their markers' positions had an euclidean 210 distance of less than 2 cm for at least 0.9 seconds. The daily affinity between two 211 individuals i and j based on their counts of proximity events p t,i,j at day t was then 212 computed as: A t,i,j = log(1 + p t,i,j ), A ∈ R Nt×Ni×Ni , where N t is the number of days 213 and N i the number of individuals in the dataset.

214
The datasets also contains labels that can be used in proxy tasks (see section 2.3) 215 to quantify if the learned embeddings and factors are semantically meaningful and 216 temporally consistent.  source cohort (date of birth) and calculate the area under the ROC curve (AUC cohort ) 229 using a stratified 100-fold cross-validation with scikit-learn [50]. The baseline models 230 do not learn an individuality embedding; therefore we compute how well the model can 231 predict the cohort using the mean factor representation of the individuals over their 232 lives. We define consistency as 1 − AUC cohort of this linear model. Note that a very low 233 temporal consistency would indicate that the development of individual bees changes 234 strongly between cohorts and colonies, which we know not to be true.

235
Mortality and Rhythmicity: We evaluate how well a linear regression model can 236 predict the mortality (number of days until death) and circadian rhythmicity of the 237 movement [12] (R 2 score of a sine with a period of 24 h fitted to the velocity over a 238 three-day window). These metrics are strongly correlated with an individual's behavior 239 (e.g. foragers exhibit strong circadian rhythms because they can only forage during the 240 daytime; foragers also have a high mortality). We follow the procedure given in [12] and 241 report the 100-fold cross-validated R 2 scores for these regression tasks.

242
Time spent on different nest substrates: For a subset of the data, from 2016-08-01 243 to 2016-08-25, nest substrate usage information is also available. This data contains the 244 proportion of time each individual spends in the brood area, honey storage, and on the 245 dance floor. This data was previously published and analyzed [12,51]. The task of an 246 honey bee worker is strongly associated with her spatial distribution in the hive. We 247 therefore expect a good representation of the individuals' functional role to correlate 248 with this distribution.

249
For this data, we expect the factors f + and individuality embeddings φ to be seman-250 tically meaningful and temporally consistent if they reflect an individual's behavioral 251 metrics (mortality and rhythmicity) and if they do not change strongly over time 252 (measured in the consistency metric). 253

Baseline models 254
Biological Age: Task allocation in honey bee is partially determined by temporal 255 polyethism. Certain tasks are usually carried out by individuals of about the same age, 256 e.g. young bees are usually occupied with nursing tasks. We therefore use the age of an 257 individual as a baseline descriptor.

258
Symmetric NMF: We compute the factors that optimally reconstruct the original 259 interaction matrices using the standard symmetric NMF algorithm [31,52], for each day 260 separately, using the same number of factors as in the TNMF model.

261
Optimal permutation SymNMF: We consider a simple extension of the standard 262 SymNMF algorithm that aligns the factors to be more consistent over time. For each pair 263 of subsequent days, we consider all combinatorial reorderings of the factors computed for 264 the second day. For each reordering, we compute the mean L 2 distance of all individuals 265 that were alive on both days. We then select the reordering that minimizes those pairwise 266 L 2 distances and greedily continue with the next pair of days until all factors are aligned. 267 Furthermore, we align the factors across colonies (where individuals cannot overlap) as 268 follows: we run this algorithm for both datasets separately and align the resulting factors 269 by first computing the mean embedding for all individuals grouped by their ages. As 270 before, we now select from all combinatorial possibilities the reordering that minimizes 271 the L 2 distance between the embeddings obtained from both datasets. See section S3.1 272 for pseudo code. constrained to the diagonals, i.e. D ∈ R T ×M ×M Temporal NMF models: We evaluate variants of the temporal symmetric matrix 274 factorization algorithms proposed by [35] and [36].

275
For the tensor decomposition and temporal NMF baselines, we follow the procedure 276 given above for the Optimal permutation SymNMF to find the optimal reordering to 277 align the factors obtained by applying models to the two datasets separately. We factorize the interaction matrices of the 128 synthetic datasets with varying levels of 281 Gaussian noise. We confirmed that our model converges in all datasets and evaluate 282 whether we can distinguish the individuals' ground truth group assignments. To that 283 end, we extract the individuality embeddings φ from the models and measure how well 284 they correspond to ground truth data using the adjusted mutual information (AMI) 285 score. Furthermore, we measure the mean squared error between the best permutation 286 of learned factors f + and the ground truth factors. 287 We find that for low levels of noise, our model can identify the truth group assignments 288 with high accuracy, and are still significantly better than random assignments even at 289 very high levels of noise (see figure 4). Note that for this experiment, we evaluated a 290 model with the same hyperparameters as used in all plots in the results section (see 291  Table 2) and a variant without explicit regularization except the L 1 penalty of the 292 learned individuality embeddings φ (λ embeddings , because this regularization is required 293 to meaningfully extract clusters), which was set to 0.1. See appendix 2.2.1 for more 294 details on the synthetic datasets. As encouraged by the model, most individuals can predominantly be described by a 311 single basis function. That means that while each honey bee can collect a unique set of 312 experiences, most can be described with a few common individuality embeddings which 313 are consistent across cohorts and colonies. In the context of honey bee division of labor, 314 the basis functions are interpretable because the factors correspond to different task 315 groups. For example, b 12 (c(t, i)) (accounting for ≈ 10.7% of the individuals) describes 316 workers that occupy nursing tasks much longer than most bees. As the individuality 317 embeddings φ only scale the magnitude of the basis functions, they can be interpreted 318 in the same way. Individual lifetime trajectories in the factor space can be computed 319 based on the mean lifetime trajectories (m), individuality basis functions (b(c(t, i))) and 320 individuality embeddings (φ). See figure 6 for examples of individual lifetime trajecto-321 ries from workers that most strongly corresponded to the common individuality basis 322 functions.

323
Evaluation: We verify that the learned representations of the individuals are mean-324 ingful (i.e., they relate to other properties of the individuals, not just their interaction 325 matrices) and semantically consistent over time and across datasets using the metrics 326 described in the section Evaluation. We compare variants of our model with different 327 adversarial loss scaling factors and factor L 1 regularizations, the baseline models, and 328 the individuals' ages. We expect a good model to be temporally consistent and seman-329 tically meaningful. All variants of our model outperform the baselines in terms of the 330 semantic metrics Mortality and Rhythmicity, except for the [36] model, which performs 331 comparably well in the Mortality metric. The adversarial loss term further increases 332 the Consistency metric without negatively affecting the other metrics. A very strong 333 adversarial regularization (see row with λ adv = 1 in Table 2) prevents the model from 334 learning a good representation of the data. See Table 2 for an overview of the results. 335 We also evaluate the tradeoff between the different metrics using a grid search over the 336 hyperparameters (see appendix 3.2).

337
Scalability: The functions m(c(t, i)) and b(c(t, i)) are learned neural networks with 338 non-linearities. The objective is non-convex and we learn the model parameters using 339 stochastic gradient descent. Optimization is therefore slower than the standard NMF 340 algorithms that can be fitted using algorithms such as Alternating Least Squares [53]. 341 We found that the model converges faster if the reconstruction loss of the age based 342 model m(c(t, i)) is additionally minimized with the main objective in equation 5. Due 343 to the minibatch training regime, our method should scale well in larger datasets. Small 344 neural networks were sufficient to learn the functions m(c(t, i)) and b(c(t, i)) in our 345 experiments. Most of the runtime during training is spent on the matrix multiplication 346 f + (t, i) · f + (t, j) T and the corresponding backwards pass.

347
Tradeoff between temporal consistency and semantic meaningfulness: We 348 performed a grid search over the hyperparameters λ f , λ adv , λ basis , and λ embeddings (see 349  Table 1) to evaluate whether models can only be either semantically meaningful or 350 temporally consistent. For this analysis, we define Semantic meaningfulness as the sum 351 of the Rhythmicity and Mortality metrics introduced in section 2.3. We find that models 352 that are very temporally consistent fail to learn semantically meaningful information. 353 Interestingly, the models with the best tradeoff between the two metrics are almost as 354 semantically meaningful as those models with low temporal consistency and the highest 355 semantic meaningfulness. This analysis suggests that regularization encourages the 356 model to only represent different individuals differently if this is strictly necessary to 357 factorize the data. See Figure 5.

359
Temporal NMF factorizes temporal matrices with overlapping and even disjoint com-360 munities by learning an embedding of individuals as a function of a common factor, 361 such as age, and a learned representation of the individuals' individuality. This explicit 362 dependency on a common factor that partially determines the function of an individual 363 constitutes an inductive bias. We show that the model learns semantically consistent 364 representations of individuals, even in challenging cases, such as the datasets analyzed 365 in this work.

366
The individual components of the model are straightforward to visualize and interpret. 367 The learned individuality embeddings φ can be understood as soft-cluster assignments 368 relating to the whole lifetime of an individual, while the factor vectors f + (t, i) can be 369 interpreted as cluster assignments of the individuals at a specific point in time, i.e. two 370 individuals with similar factor vectors are likely to interact if they exist in the same 371 group at the same time. Furthermore, the model encourages sparsity, making the results 372 easier to interpret because the model only uses as many factors and clusters as necessary. 373 We identified a crucial trade-off that comes with temporal consistency: For a specific 374 point in time, the ability to predict behaviorally relevant attributes will likely be worse for 375 a model that learns temporally consistent representations compared to a non-consistent 376 model with the same capacity. Conversely, in more challenging cases, e.g. when taking 377 long periods of time or data from disjoint communities into consideration, temporally 378 consistency is indispensable for a good representation. Furthermore, we found that 379 models can be temporally consistent, semantically meaningful, or both; selecting the 380 correct model requires an inductive bias, but regularization of the model also influences 381 the results.

382
Previous works have demonstrated that biologically relevant findings can be obtained 383 using network analysis of social interaction networks [4,5,26,[54][55][56][57][58][59][60][61][62]. A recent method, 384 Network Age [12], proposes using spectral decomposition of honey bee interaction 385 networks into succinct descriptors of the individual's social network that can be used 386 Individuality embedding dimension Individual Individuality embeddings φ Figure 7. Left: Hierarchical clustering of individuality embeddings: Most individuals strongly correspond to a single individuality basis function, making it easy to cluster their lifetime social behavior (i.e. each individual has a high value in a single dimension for their individuality embedding). Because each cluster is strongly associated with a specific individuality basis function, and because each basis function is interpretable ( Figure 5), these blueprints of lifetime development can also be intuitively understood and compared. Right: TSNE plots of the individuality embeddings colored by cluster (left) and the maximum circadian rhythmicity of an individual during her lifetime (right), indicating that the embeddings are semantically meaningful.
to predict task allocation, survival, activity patterns, and future behavior. Symmetric 387 nonnegative matrix factorization and Laplacian-based spectral clustering have been 388 shown to be equivalent [29]. Thus, TNMF can be understood as a further development 389 of Network Age. TNMF learns representations of individuals based on their social 390 interaction network that can facilitate the analysis of developmental trajectories, division 391 of labor, and individual variance in behavior. Furthermore, TNMF provides temporally 392 consistent embeddings and with that rectifies a remaining limitation of Network Age. 393 We confirmed that, on the honey bee dataset, TNMF obtains biologically meaningful 394 lifetime trajectories with promising prospects for experimental application. TNMF 395 may help advance our understanding of the colony function and the interplay between 396 environmental factors and individual and collective responses. The method presented 397 here offers a way to investigate the impact of stress factors, such as pesticides, parasitic 398 mites, and agricultural monoculture, on the social structure of colonies and therefore 399 may present an avenue to improve honeybee health. 400 We applied our method to the honey bee model, which has numerous individuals, 401 with an entangled and highly dynamic social structure. Our method, however, can be 402 applied to any setting in which matrix factorization is commonly used, such as recom-403 mender systems, network analysis, audio processing, bioinformatics, etc. Interaction 404 matrices in networked systems have a broad class of use-cases. In any system with 405 dynamically interacting units, our model reduces high-dimensional interaction patterns 406 to low-dimensional embeddings. The only requirement is that the interactions follow 407 our generic model of an average path from which individual units can deviate. The 408 method could serve as a means to generate hypotheses: Clustering individuals in the 409 embedding space may reveal functional groups, and the basis functions can indicate 410 relevant time points in individual developments that can be investigated in follow-up 411 studies. Note that time in our model may be replaced by any variable along which one 412 wants to study the matrix dynamics. While we evaluate TNMF on honey bees, the 413 method may be used to study human social networks and their underlying dynamics. 414 A deeper understanding of human interaction dynamics may benefit aspects of human 415 life, such as health, technology, and work. The method could, for example, be used to 416 identify individuals with a higher risk of contracting or transmitting a disease, or help 417 assess the effect of pandemics and potential interventions.

418
By publishing the honey bee dataset and our reference implementation of the TNMF 419 algorithm, we hope to encourage the scientific community to build upon our efforts. 420