Machine Learning to Summarize and Provide Context for Sleep and Eating Schedules

The relative timing of sleep and of eating within the circadian day is important for human health. Despite much data on sleep, and a growing data set for eating, there remains a need for an interpretative framework for the understanding of this data for health decisions. This study provides a new statistical and machine learning analysis of more than 500 participants in the Daily24 project. From their data, and the analysis, we propose a framework for determining the classification of participants into different chronotypes and with that the ability to realize the potential impact of daily circadian habits on health. We propose that our resulting distribution curves could be used, similar to NHANES (National Health and Nutrition Examination Survey) data for pediatric growth, as a measure for circadian misalignment and used to help guide re-entrainment schedules. Author summary Daily habits can be positive, negative or neutral for human health. Generally sleep and eating schedules are assumed without thought for their potential to help or interfere with health. In this study we propose a framework, based on data from more than 500 participants, for evaluating the relative timing of meals and sleep schedules. This evaluation, similar to pediatric growth charts, can guide clinical suggestions for those at the extremes, while helping others to realize that they are unusual relative to the population average

Introduction 1 A few general rules for optimizing sleep and eating schedules have arisen from anecdotal, 2 cultural and research based findings. For example it is now generally well accepted that 3 eating a large meal before sleep is, on average, a poor idea for optimal health [1]. 4 Similarly, the stress that many years of shift work places on an individual has been well 5 December 31, 2020 1/18 documented [2]. What is not well understood is how much natural variability there is in 6 a population of individuals with respect to their sleep and eating schedules. In a similar 7 way, and related to the natural variability is the important, but challenging question of 8 whether individuals can be characterized for their schedule relative to the population 9 distribution and can be placed into different risk categories based on their sleep and 10 eating behaviour. 11 For example, important milestones in an individual's pediatric development are 12 compared against population averages. This lets pediatricians and parents understand 13 and even take corrective actions if the development is not proceeding normally. A 14 similar measurement for circadian events would be ideal but is not nearly so easy to 15 attain. For children's measures of development, a single set of office measures and 16 comparison against the NHANES population densities is all that is needed [3]. In 17 contrast, for a determination of daily habits, especially ones that may have health 18 benefits or may be dangerous to health, a set of measures needs to be performed over 19 multiple days into the weeks or months range. Currently, complicating the comparison 20 to NHANES, there is no similar population measure to compare the distribution of 21 circadian measures against. 22 With this paper we aim to present the first steps towards the ability to measure 23 circadian patterns within an individual and to compare those patterns against a 24 population. 25 We propose to do this by building from our Daily24 data collection of more than 500 26 individuals who collectively contributed to an ongoing project about the timing of 27 eating and sleeping. While this project is ongoing and still of modest size, it presents an 28 outstanding opportunity to define what a population measure means for these types of 29 events and how an individual can be fairly compared against that larger population. 30 Our framework leaves many questions open for more study. For example while we 31 can estimate how many days an individual needs to contribute for a fair comparison, we 32 can only do so under assumptions about the stability of a particular participants set of 33 habits. In a related way we can posit extreme schedules for comparison to our 34 population dataset, but we have not collected from a sufficiently large range of 35 individuals and their behavior patterns to clearly delineate the full complexity of the 36 measurement space. Further complicating our analysis is that there is very little data 37 connecting long-term behavior patterns with health risk. 38 Despite these limitations we believe that it is important to phrase these questions 39 and to begin the process of defining what a measure of circadian patterns should look 40 like and how it may be used to help particular patients and their clinical care teams. 41 We present this work with a full realization that the current framework is only a first 42 step into this fascinating problem and we don't believe that this is immediately ready 43 for clinical work. In that spirit we provide an outline for how our initial dataset and 44 analysis could be extended, validated, and eventually used in a clinical setting. We 45 believe that efforts to establish the importance of daily awareness of eating and sleeping 46 times can be a substantial benefit in human health and that this dimension of human 47 health has not been fully addressed in all of its ramifications.

49
Participants in the Daily24 project submitted their daily eating and sleeping times 50 through a smartphone App. We view their entries as representing stable habits that are 51 sampled via the App on a daily basis. Clearly this is a strong assumption, since 52 individuals may have weekly variability, may change jobs or habits, or may simply have 53 a widely variable schedule. By collecting this information, we have the ability to define 54 those with very regular habits, and also an ability to sense those with much wider 55 latitude in their daily schedules.

56
December 31, 2020 2/18 We immediately note that we are not connecting any of the Daily24 sleeping and 57 eating events with long-term health. This is, in part, due to the difficulty in defining 58 the causal connections, and in part due to the limited (6-months data window) time of 59 the study. We instead view the shared data as helping us to think about the definition 60 of a community averaged distribution of sleeping and eating events within a (we hope) 61 generally healthy population. We collected this data with an effort to determine daily 62 schedules and did not instruct individuals to change their schedules as part of the study. 63 In that regard we assume that we have a reasonable, statistically valid, distribution of 64 collected eating and sleeping events.

65
The total number of lines of data was more than 1/4-million. Each of these events 66 we view as a tweet-like note about a meal eaten, a sleep duration, or a snack. Some 67 individuals were outstanding in contributing over the entire six-months. Other 68 participants were less prolific in their contributions, but still did contribute a 69 statistically meaningful set of events. We elected to trim our initial set of events to 70 reflect individuals that contributed at least 21-days over the 6-months of the study.

71
Each day was considered complete if it had at least a complete sleep event associated 72 with it.

73
In sum the participants from this study were recruited as part of our Daily24 project 74 and were not selected based on their particular chronotype, or on the basis of any factor 75 known to be correlated with circadian patterns, such as shiftwork or any medical 76 condition. We view the participating pool as reflecting generally healthy individuals, 77 generally older, and technically literate (so that they could reliably use a 78 smart-phone-based app and enter their eating and sleeping times).

79
In Fig 1 we describe the data collection process and our assumptions for analyzing 80 the information. We make the assumption that our participants were truthful in their 81 daily behavior and consistent in entering their schedules. We accept that we did not 82 have the resources to verify every event and that while individual users could edit 83 (within 48-hours) their entered events, that we did not have an ability to check each 84 event as it is uploaded. We view this as more of a strength than a limitation, since we 85 tried to encourage users to view this as an observational study with no intent to modify 86 their behavior and no judgements as to the relative timing of their sleep and eating 87 events. With sufficient inputs, we believe that patterns in sleeping and eating can be found. 89 As an extreme, and to illustrate this idea, it is commonly accepted that the probability 90 of eating is not constant (a uniform distribution) over 24-hours. There are periods in 91 the day that are more likely (greater probability) than others for an eating event to 92 occur. Similarly, sleep is not expected to occur with a uniform probability with the 93 most likely start of sleep occurring near the end of the circadian day. Primary data consisted of tweet-like entries from the App installed on each participant's phone to our AWS backend that stored the timing of eating and sleeping events. From this primary data the derived data summarized each day in terms of the eating trajectory and its relative length and position relative to the sleep event.

106
Circular statistics were selected to account for the 24-hour cycle of our primary data.

107
The most well-known of the distributions defined by circular statistics is called the von 108 Mises distribution [4]. The von Mises distribution was the first circular distribution ever 109 proposed and is arguably the most widely known and studied circular distribution.

110
There are a few common characteristics for the von Mises distribution. It is defined by 111 two parameters: loc, a measure of location and concentration, a measure of spread. The 112 distribution becomes a Uniform distribution when its concentration is zero. Unlike other 113 probability distributions, the von Mises distribution is a special case for a related

118
Each von Mises distribution has only one peak in its density function, while in a 119 person's circadian cycle, there can be several peaks. For instance, a user's meal records 120 may show three peaks, say at 8:00 a.m., 12:00 p.m., and 8:00 p.m., which means this 121 user most likely had had meals at these three-time points.

122
Each peak needs to be represented by its own distribution, so a participant's day of 123 recording was assumed as a mixture of von Mises distributions, with every patient's 124 data represented as a sum of k component von Mises distributions, each weighted by its 125 own coefficient. Using Bayesian framework, the conjugate prior distribution for the loc 126 parameters was each an individual von Mises distribution; for concentration, a Gamma 127 distribution. For the weight coefficients, we used Dirichlet distributions. We assume we 128 know K and in practice, we tried several Ks ranging from 2 to 6 and found the one that 129 has the best fit.

130
To gain an approximation of the posterior, which is the same functional form as 131 prior distribution, we used the variational inference method [5]. Compared to sampling 132 -based methods it has two advantages: it is deterministic and converges fast.

133
With this approach we gain the posterior parameters and then use the expectation 134 of each posterior distribution as Bayes estimates for parameters in the model.

135
Using the mixed von Mises model, we are able to compare the posterior density 136 functions among different users, in order to begin the process of seeing and capturing 137 the diversity among the population. We view this distribution and approach as the 138 simplest of those that we tried, and so its also the main reference and comparison point 139 for the data. It is also important to reemphasize that we chose the users who were 140 active more than 21 days as active users and took all of their records as the population. 141 For further analysis we then compared the posterior density estimation of the 142 population to individual densities computed for the top ten users. We calculated the KL 143 divergence between two density functions as a numeric measure in keeping with the 144 goodness-of-fit metric of the variational-inferential algorithm.  Gaussian process models have proven to be reliable estimators of many complex 155 distributions [8]. They are a first-choice distribution for many applications in machine 156 learning and many robust computer algorithms have been built around their application 157 to different types of data. For our application we looked mainly at variants of Gaussian 158 Process models that built the distributions as chains of conditional events [9]. This 159 approach builds on the independent mixture model of the von Mises distribution by 160 computationally arguing that the density fit is better seen as a series of conditional 161 probability estimates. That can be simply seen as phrasing the question: given that a  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23   Regression Model [9]. GPAR models are multi-output regression models used to exploit 166 dependencies between outputs to maximize predictive performance where it can also 167 capture nonlinear relationships between outputs. One specific feature that we thought 168 GPAR useful for is its ability to define functions in terms of each other and to then 169 stack them together.

170
In order for GPAR to work for our data, we first ordered our data into three contributors, the total meal numbers ranged from six to eleven meals. After we ordered 174 the data in the following way, we defined distributions for each event. When we fit the 175 Von Mises distribution, there were many meals that overlapped with each other. As a 176 result, we have combined a few meals together for the individuals that were close to 177 each other so that there will be four meal events.

178
After working through the data component of GPAR, we defined each function to be 179 von Mises distribution. Because every event should start with the wake-up time, we 180 have used the same von Mises distribution fit defined previously. On the other hand, for 181 other events, we created sample points from previous events and refitted so that we 182 have functions that depend on both the data and the previous events. Afterward, we 183 stacked them and added some noise. Now using GPAR's regression function 184 GPARRegressor, we fit our functions to the Gaussian Process model. Then, we plotted 185 observed points, the von Mises line, and the Gaussian Process Regression.

186
Multi-Output Gaussian Process Toolkit (MOGPTK), our second Gaussian Process 187 approach builds from a Python package for multi-channel data modeling using Gaussian 188 processes (GP) [10,11]. This toolkit aims to address the need for a Multi-output 189 Gaussian Process kernel and provides a natural way to train our model and it is based 190 on the trained model to predict the following pattern. To apply this toolkit, we also 191 need to implement GPFlow [12], which is an extensive GP framework with a wide 192 variety of implemented kernels, likelihoods, and training strategies. MOGPTK is a 193 based MOGP kernel from which specific kernels are generated. The base kernel provides 194 the functionality to split the input data into multiple channels and process them by Under this toolkit, we mainly focused on the MOSM kernel, which is the 197 Multi-Output Spectral Mixture Kernel [13]. The MOSM kernel is designed to provide a 198 closed-form covariance function after applying the inverse Fourier transform. Based on 199 the Parra and Tobar paper, the cross-spectral density between channels i and j is 200 modeled as a complex-valued SE function. In our approach, we applied the MOSM 201 Kernel to find the covariance, mean, magnitude, delay, and phase between every two 202 variables, and then built our model under the MOSM Kernel.  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23    In order to make state-space model work for our data, we converted our data to 215 "counts" type for 8 behaviors as realization of observable variables (day-count for 'wake', 216 'drinkOnly', 'smallSnack', 'largeSnack', 'smallMeal', 'mediumMeal', 'largeMeal', and 217 'sleep'). This reflects the type of data that the SSM package is expecting to use for fits. 218 Our aim is to estimate the underlying latent variables, which in our case can be  Specifically, we used hidden Markov model(HMM) [16] and switching linear 221 dynamics system(SLDS) model [17] to fit the data. HMM as a classic state space model 222 December 31, 2020 7/18 is simple and easy to fit while SLDS can interpret data more subtle with continuous 223 hidden states [18] .

224
To compare the two state space models with the GP models and the mixed von We conclude this presentation of results for the population fits by reference to between the models, these may not be sufficiently different to enable a clear winner or 230 subset of losers in the models to be discriminated. For that reason we evaluated how 231 individuals are scored in the following section.

233
To evaluate our five different models we explored how individuals were scored in each of 234 the five models. This did not lead to a clear winner (i.e. one best model), but did let us 235 evaluate how different types of individuals would be represented within the different 236 formulations.

237
What we feel is thus important, for model selection in this instance, is to closely 238 evaluate what properties of which individuals are most critical for a clinical application. 239 Since we do not have that additional data there is not easy way to discriminate between 240 the different models.   Table 1. Comparison summary table for 5 methods on primary data. We use 2 error metrics: KL divergence(KLD) and mean absolute error(MAE). We calculate KLD between each method fitting and Gaussian kernel density estimation(Gkde) of the data.(Since KL(p, q) = KL(q, p) we use mean of two values) For mixed von Mises model, the calculation is straightforward, for GP models and SSM models, we can only sample from model results and get data counts for time windows. By this way we recover a sample data(size 1000) from each model and apply Gkde on the sample data, compare to Gkde of original primary data, then get KLD. Replicating this procedure for 100 times gives estimation of KLD and corresponding confidence interval(CI). For MAE, we compare sample data from each model and original primary data with histograms(48 bins), then calculate mean absolute difference of counts in each bin as MAE. GP models and SSM models' MAE calculation is straightforward, we use inverse function method to sample from estimated mixed von Mises distribution. Replicating this procedure for 100 times gives estimation of MAE and corresponding confidence interval state space models provide better fits. Note that only one of the two state space models 252 (HMM) is able to give some indicators of internal variables that underlie shifts differing 253 between early and late initiations. One way to further compare the different models is to plot the individual KL 255 divergence scores for each model relative to a common reference probability. This is 256 shown in two ways in Fig 9 and Fig 10. Note that a set of individuals (8 in total, three 257 of them are shown in Fig 6,7,8, while the rest are in the supporting information) for 258 comparison is shown along the x-axis. For Fig 9 the ordering along the x-axis is from 259 most consistent score to most divergent score. The distribution of scores in Fig 10   260 illustrates that there is a non-symmetric distribution with a very long tail on the right 261 hand side.

263
Circadian biology is an ancient part of human physiology and reflects our human 264 evolution within the context of a twenty-four hour day [19]. The adaptations to the 265 light/dark cycle of each 24-hour cycle has been an important regulator of many 266 biological systems. While the molecular, cellular, and tissue ramifications of these 267 adaptations remain an active area of research, every individual makes decisions about 268 eating and sleeping without much thought or context on each day.

269
Recent work has shown that the timing of eating has a major impact on circadian 270 function [20]. The beginnings of circadian physiology emphasized light and sleep as the 271 main drivers of circadian rhythms. With the realization of the importance of the timing 272 of eating, the full awareness of the coupling between peripheral and central components 273 of circadian biology came into focus. With this awareness has come an improved ability 274 to define circadian mis-alignment as due to behavior (for example shiftwork) that does 275 not support a consistent 24-hour rhythm that aligns light/dark, sleep and meals [2].

276
With the growing acceleration of technology the ability for many individuals to 277 ignore light/dark and to work at many hours has become common. While the new 278 habits that this brings may seem to have only trivial impact on a daily level, they can 279 lead to significant physiological stress over years. To evaluate the relative impact of the 280 daily habits over many years is a challenge that is not yet fully addressed. To help 281 interpret what a circadian daily habit means for human health there is the need to a 282 summary of many days of behavior and a way to relate the individuals behavior back to 283 both optimal behavior and to the statistical behavior of many others. The NHANES 284 project has provided many families and pediatricians with a dataset that lets a 285 comparison of an individual child relative to the population to be easily defined [3]. Our 286 work is in the same tradition, with the expectation that the methods defined by this 287 paper an provide the entry point to a larger, community defined, dataset for 288 summarizing, interpreting, and aiding, in the evaluation of circadian health. Wednesday is simply different from other Wednesdays. This is a challenging problem 302 with many algorithms that have been defined for changepoint detection, but without a 303 clear indication for when a particular data point represents a genuine change or simply 304 a fluctuation within the same stable distribution.

305
An additional type of changepoint is that seen with shiftwork. While we did not 306 have participants in Daily24 that shared a schedule similar to the traditional shiftwork 307 schedule, we do feel that the Daily24 App should be applicable to shiftwork schedules as 308 readily as the 'normal' schedules that we sampled. In addition, the changepoint 309 algorithm should be capable of detecting shifts in daily schedule that reflect an 'on-shift' 310 day relative to days that are 'off-shift'. Defining the optimal penalty function for the Changepoint Detector Participants will change their schedule, sometimes due to a work change, sometimes due to a weekday versus a weekend. To simplify the initial analysis, we made the very strong assumption of a single stable pattern. To define when a shift from that pattern is due to a schedule change versus an unusual event from the same stable distribution is the challenge of optimizing a changepoint detector. We present, in this figure, the first steps into defining one for Daily24 data.
The changepoint approach that we used is built within the Rupture Python code 312 and is based on multiple papers [21]. We evaluated a range of different possible 313 algorithms and different penalty functions. While we were not able to tune the 314 changepoint detection to reliably get all changes correctly labeled, we did get results 315 that suggest the implementation of a changepoint detection would be important for the 316 generalization of our results. In particular, we suggest that changepoint measures, even 317 if imperfect, can be a large help in identifying those individuals with a large weekend 318 effect or that have shiftwork schedules.

332
While the analysis we present is not ready for clinical work, the approach that we 333 outline may provide a framework for how daily habits and their summary can be 334 presented within a clinical setting.

335
As an example we imagine that the synthetic data is a reasonable representation of 336 the range of observed human behavior. This let us describe the approach, but we 337 emphasize that without real participant data that the ideas presented are still at the 338 idea stage.

339
With the assumption of five converged population distributions, representing the 340 early and late chronotypes, the shiftworkers, the weekend/weekday shifting schedules 341 and those with a long-term habit of late meals, we can define each person's circadian 342 schedule relative to their main reference population. We can imagine a type of  This may be especially important for clinical issues surrounding circadian 348 mis-alignment, for example in high-risk populations like shift workers [2]. We can 349 imagine the ideas of this paper being combined with ideas for optimal re-entrainment 350 from travel with light schedules to define new schedules for optimal recovery from 351 mis-aligned light/dark and eating/sleeping [22,23].

352
An additional focus area for this type of analysis is for providing a continued 353 assessment of the impact of intermittent fasting on human health. This was our own 354 entry point to Daily24 and the analysis of this paper could be used to more fully 355 characterize the impact of timing of eating by a comparison back from individuals to 356 the community. There has already been much valuable research on the impact of timing 357 of eating, and this framework could help to move these types of questions from 358 individual anecdotal studies or from purely research based studies, into a clinical setting 359 by providing a consistent framework for comparison [19,20] 360 Fig 13. Interpretation of data with chronotypes and for the clinical setting To enable clinical care teams and patients to readily see what their data means and to help with communication, something like this data presentation may be used. What is plotted is the strongest set of events distributed over the participants logged timings for eating and for sleep. By plotting this on a relative axis, with additional use of color and symbols, the clinical meaning can be made clear.

361
A significant unsolved problem for circadian physiology is how to extrapolate from a 362 daily habit into the impact of that habit over years. While our current study doesn't 363 connect the health outcomes with the daily habit, it is a framework for providing an 364 ability to quickly summarize the daily habits of an individual within the context of a 365 larger group. While the direct clinical application of this approach will need still larger 366 datasets and still more analysis work to connect the distributions to health outcomes, 367 we believe that this framework approach can provide an important anchoring point 368 within a population for the interpretation of many days of circadian data. This is a real 369 improvement over a scatter chart of an individual's data and potentially can provide a 370 way for nuanced discussion of an individual's daily habits within the context of a 371 clinical office visit. 372 We emphasize that the approach outlined will need to be extended to a larger 373 dataset with more varied individuals (by age, gender and race). The importance of 374 potential cultural bias (since all participants are in the US) is also a factor that should 375 be considered in enlarging the dataset. 376 Furthermore, the optimal approach for how long an individual's circadian schedule 377 should be tracked and with what confidence bars will need to be worked out. While 378 some confidence bars can be supplied based on assuming that (for example) a two week 379 schedule is fully representative of a two-month or two-year schedule, this may clearly 380 lead to a large systematic error if the assumption of an unbiased and consistent sample 381 is wrong.
Supporting information 383 S1 Individual example-1. Active more in the early part of the day.