Scaling laws in spoken language associated with cognitive functions

A decline in cognitive functions due to aging has led to critical problems in modern society, and it is imperative to develop a method to predict the decline or related diseases, including dementia. Although it has been expected that language could be a sign of the cognitive decline, it remains less understood, especially in natural conditions. In this study, we recorded the large-scale data of one million words from group conversations among healthy elderly people and analysed the relationship between spoken language and cognitive functions based on scaling laws, namely, Zipf’s law and Heaps’ law. We found that word patterns followed these scaling laws irrespective of cognitive function, but the variations in scaling laws were associated with cognitive functions. Moreover, using generative models, we uncovered the relationship between the variations and cognitive functions. These results indicate that scaling laws in language can be a biomarker for the cognitive decline.


Main 25
Understanding the aging of brain functions and predicting cognitive decline are crucial for modern 26 aging societies because mental health problems in elderly people have a huge impact on their daily 27 life and are causing significant medical and economic costs in many countries around the world 1,2 . 28 The most common example of cognitive decline with age is dementia, especially Alzheimer's disease, 29 which leads to the impairment of the executive functions or behavioural and psychological 30 symptoms 3,4 . While the neuropathological mechanisms for dementia are considered as remarkably 31 complex 2 , it is necessary to urgently develop practical methods to predict the decline or maintain 32 healthy cognitive functions. It is expected that language may be a key sign representing states of 33 cognitive functions [5][6][7][8][9] . 34 Language is the most sophisticated means of communication for humans; it allows abstract 35 thoughts and underlies our various social activities ranging from daily conversations to cultural 36 accumulation 10,11 . The complex language processing clearly comes from our brain functions; thus, the 37 impairments in cognitive functions can result in language disorder 12,13 . Therefore, it is imperative to 38 understand the relationship between language and aging in predicting cognitive functions using data 39 related to language. Some methods for predicting cognitive diseases using language data have been 40 proposed 5-9,12-14 . For instance, scores of verbal fluency tests (in which participants produce as many 41 words as possible from a category in 60 seconds) are one of the indices used to distinguish people with 42 dementia or mild cognitive impairment (MCI) from healthy people 12 . Recently, it has become easier 43 to record and analyse large-scale data on language such as ordinary conversations due to the 44 development of devices and algorithms 7-9 . Therefore, we expect that extracting information about 45 cognitive functions from large-scale data makes it possible to develop a more prominent method. 46 However, the understanding of the relationship between cognitive functions and language in natural 47 conditions such as ordinary conversations is still lacking. In this study, we focused on statistical laws, 48 namely, the scaling laws of spoken language in conversations. 49 Scaling laws, which are generally defined as f(x) ~ x µ , where µ is a scaling exponent (here, 50 "~" means that the left side is proportional to the right side), are ubiquitous in natural phenomena 15 . 51 Interestingly, they are observed widely in phenomena related to brains or behaviour, such as neural 52 dynamics, decision-making, semantic memory, memory retrieval, cognition, movements, language, 53 and social dynamics 16-24 . The most famous example can be seen in the word patterns of human 54 language 16 . So far, the previous studies that have focused on language patterns in corpus data from 55 written texts or spoken language 16,19,25,26 have found two main scaling laws, namely, Zipf's law and 56 Heaps' law. Zipf's law states that the frequency of appearance of words with rank r for appearance 57 follows a kind of power-law distributions P(r) ~ r −α , suggesting that a huge number of words are rarely 58 used, while a small number of words are extremely used often. Since it was reported that the exponent 59 α is close to 1 in most cases 27 , the most frequent word will appear twice as often as the second most 60 although one may expect that talkative people have high cognitive scores, we found no evidence of a 119 relationship between the cognitive score and the total number of words spoken (Spearman's 120 correlation coefficient ρ = −0.11, p = 0.38). Second, we straightforwardly calculated a type-token ratio, 121 which is defined as a value of the number of different words divided by the number of words. The 122 Pearson's correlation coefficient between the ratio and the cognitive score was 0.18 (p = 0.15), thus 123 there was no significant relationship between the type-token ratio and the cognitive scores. Then, we 124 analysed the relationship between the cognitive score and Zipf's law and found no significant 125 relationship (correlation coefficient r = −0.23, p = 0.07; Fig. 2a; Supplementary Table 3). 126 Then, we examined the relationship between the exponent β of Heaps' law and cognitive 127 scores and we found a significant positive correlation between the exponents β and cognitive scores 128  Table 3). The participants with higher cognitive scores were likely 131 to have word patterns with higher Heaps' exponent β, and vice versa. On the other hand, the result for 132 the type of conversation and the age of participants showed that they were not associated with the 133 exponent (p = 0.75 and p = 0.72, respectively; Supplementary Table 3). Then, we confirmed that the 134 relationship was robust to original cognitive scores. The correlation coefficients between the 135 exponents and each of the four original cognitive scores (MoCA-J, logical memory I + II, digit symbol 136 coding, digit span) were also significantly large (Supplementary Table 4). Thus, these results indicate 137 that the variation of the Heaps' law could come from the difference of the cognitive functions. 138 When one uses conversational data as a biomarker of cognitive decline, it is useful to 139 investigate the relationship between the number of words (i.e. data length) and the degree of 140 association of scaling laws with cognitive scores. We calculated the correlation coefficient between 141 the exponents and cognitive scores with different data lengths. Fig. 3a shows that in Zipf's law, no 142 correlation was found, which is consistent with the result in the case that all data were used. In contrast, 143 Fig. 3b shows that the longer the data length, the higher the correlation coefficient between Heaps' 144 exponent and cognitive score. Importantly, this indicates that it is not necessary to analyse data sets 145 with tens of thousands of words of each participant. Therefore, the result suggests that we could 146 quantify the association with cognitive scores with as little as utterances in an hour or two per person, 147 suggesting that we can extract useful information about cognitive functions in realistic conditions such 148 as ordinary conversations. 149 150

Generative models bridging scaling laws and cognitive functions 151
In this section, we focus on the Heaps' exponent because we obtained the evidence of a significant 152 association with cognitive functions. Next, we asked a question why the cognitive functions are 153 associated with the Heaps' exponent. To mechanically bridge them, we used a generative model for 7 scaling laws in language developed by Gerlach and Altmann 25 , although various models have been 155 proposed 29 . The model we used is a stochastic model originally based on the Yule process 35 , and can 156 produce Zipf's law and Heaps' law from a simple assumption (see Methods). Here, we focused on a 157 parameter related to the decay rate of probability for new word production and interpret it as a 158 cognitive function. We analysed the relationship between the scaling exponent β and the parameters 159 (see Method). We set the maximum number of words Mmax = 20,000, which is close to the empirical 160 data length and a relatively small value compared with the books corpus data (e.g. 10 9 words in 25 ). 161 Moreover, we fitted the double power-law model to simulated data and estimated the exponent. Fig.  162 4a shows the relationship between the number of words and different words derived from the model. 163 When the parameter γ of cognitive functions changes, the patterns of Heaps' law can change (Fig. 4a). 164 Further, the parameters significantly correspond to the exponent β, and the one with the lower 165 cognitive function can produce Heaps' law with lower β (Fig. 4b), suggesting that the exponent β 166 means that the growth rate of new words and the higher values correspond with high growth rate of 167 new words, and vice versa. Note that the model and the relationship between the parameters and the 168 exponent we observed are not novel, but we confirmed that the statistical fitting for simulated data 169 from the model can recover the relationship even for the small number of words. In this study, we quantitatively investigated the natural spoken language of healthy elderly people with 189 various cognitive function scores from the viewpoint of scaling laws in word patterns and explored 190 the relationship between the scaling laws including Zipf's law and Heaps' law and cognitive function 191 scores (Fig. 1). We found that the scaling laws in spoken language were robust, irrespective of various 192 cognitive function scores from the result of fitting Zipf's law and Heaps' law. We did not find a 193 significant relationship between the Zipf's exponent and the cognitive score (Figs. 2a, 3a). The 194 exponents of Heaps' law, that is, the slope of relationship between the number of words and different 195 words, were significantly associated with cognitive function scores (Figs. 2b, 3b). The relationship 196 with Heaps' law was supported by the stochastic model for word production patterns (Fig. 4). 197 Moreover, the large Heaps' exponents were related to obtaining new words from others and using them 198 in their conversations (Fig. 5). Note that the participants were healthy elderly people after screening 199 using criteria MMSE score ≥ 24, although there were variations in their cognitive scores. Therefore, 200 we can exploit the relationship with scaling laws as a biomarker to detect the tendency of cognitive 201 decline, even for people regarded as being healthy. 202 A feature of our approach for conversational data is that large-scale data make it possible to 203 analyse statistical patterns such as scaling laws, which have been considered as robust properties 204 across language and cultures 36 . Therefore, although the participants in our study were Japanese, our 205 results and implications could be extended to other languages. Furthermore, by focusing only on the 206 statistical patterns and scaling laws, we can exclude the meaning of words or contents of a conversation 207 and separate the frame (i.e. scaling function) and the variations (i.e. exponent). This is considered as 208 a type of a coarse-graining method for large and complex data while keeping important information, 209 which could extract understandable characteristics from large data of language. In this sense, our 210 approach is different from machine learning approaches that specialise in prediction rather than 211 understanding the mechanisms behind the word patterns. Thus, it is considered that our approach does 212 not need much data and can detect the relationship between cognitive functions and the word patterns 213 even for healthy participants. 214 Our findings suggest that healthy elderly people with variations of cognitive scores are still 215 on the scaling law (Fig. 1). In contrast, MCI or dementia patients might not be on the scaling laws 216 because repeating a certain word due to critical cognitive impairment or memory disorder may result 217 in the collapse of the scaling laws. Previous studies have reported that the Zipf's exponents of 218 schizophrenia patients are different from those of healthy people 31 . However, the rigorous statistical 219 techniques that we used in the present study for fitting power-law distribution were developed 220 recently; thus, they can reveal the patterns of scaling laws more accurately 34  The production of new words is important for communication or creating new ideas. There 226 would be two points regarding the mechanisms of new word production. First, the amount of new 227 words reflects how much the participants memorise things, particularly long-term memory. 228 Theoretically, it was reported that Heaps' law is related to the size of potential words 37 . Second, the 229 number of new words suggests an ability of cognitive functions to take new information into their 230 memory and to use it, which could be crucial especially for elderly people. People must cope with 231 complex and unpredictable environments, but this ability could decline with aging. 232 Some of the scaling laws observed in brain and behaviour has been reported to be associated 233 with functions or adaptability. For example, the movement patterns of various animals, including 234 humans, often follow a scaling law, which can result in an efficient searching strategy for unpredictable 235 environments 21 . For another example, it has been observed that the brain dynamics poised at a critical 236 point between order and disorder follows a scaling law 38 , which makes it possible to achieve decision-

Data collection 255
To obtain the data of spoken language from healthy elderly people, we recorded conversations among 256 the participants. The participants were recruited from the Tokyo Silver Human Resources Center and 257 were community-living healthy Japanese retired adults who speak Japanese as their mother-language. 258 To limit participants to healthy people, the exclusion criteria were set as follows: dementia; 259 neurological impairment; any disease or medication known to affect the central nervous system; and participants in the other eight groups made a short presentation on a pre-determined theme (e.g. 288 favourite places in the neighbourhood), which was given in advance each week and included Q&A 289 sessions for participants within the same group. The latter is a method that we had developed 290 previously for preventing dementia 40 , but here we focused not on the detail and effect of the method, 291 but on the conversational patterns extracted from recorded conversational data. 292 293 Conversational data pre-processing 294 To investigate word production patterns, we quantitatively analysed conversation transcriptions 295 derived from the recorded audio data. For the analysis, we first applied Google Cloud Speech-to-Text 296 (Google, Mountain View, CA, 2018) to automatic transcription from audio to text data, and then 297 manually checked all the text by comparing it to the audio data and fixing any mistakes. Second, we 298 automatically decomposed all text into words using MeCab (ver. 0.996), which is a useful tool for 299 Japanese morphological analysis based on conditional random fields 45 . Finally, we obtained the data 300 of the words that each participant spoke and used all the data in our analysis by putting together all 301 the sessions, from the first to the fourteenth sessions. These analyses were conducted using R (ver. 302

Cognitive function scores 305
We conducted a principal component analysis to summarise four cognitive function scores of MoCA-306 J, WAIS III logical memory I + II (delayed), digit symbol coding, and digit span (forward + backward). 307 The first principal component (PC1) contains 40.6% of all variances, and the coefficients of each 308 cognitive score on PC1 have the same sign because the four cognitive function scores positively 309 correlated with each other. Therefore, we can conclude that the larger the value on the PC1, the better 310 the cognitive function. Hereafter, we use the value on the PC1 as 'cognitive function score'. 311 Additionally, for the simplicity of the interpretation, the cognitive score was normalised with mean = 312 0 and SD = 1. 313 314

Zipf's law and Heaps' law 315
To quantitatively analyse word production patterns, we pay attention to scaling laws in language, 316 which have been investigated in the context of statistical linguistics 16 . Previous studies have reported 317 that most language data including corpus data robustly follow Zipf's law and Heaps' law 25,27,28,46 . 318 Zipf's law states that the relationship between the rank r of number of words and the frequency P(r) 319 of words is described as P(r) ~ r −α where α is the scaling exponent and has been reported to be 320 approximately 1, and "~" means that the left side is proportional to the right side. Heaps' law is about 321 how the number of new words grows and that the relationship between the number of words M and 322 the number of different words N follows a function N ~ M β . In this study, we focused on whether words 323 in the spoken language of healthy elderly people follow scaling laws and on the variation of exponents 324 in the scaling relationships if scaling relationships exist. 325 For fitting the distribution to the data of rank-frequency relationship, we compared seven 326 candidate distributions including a power-law, shifted power-law, cutoff power-law, and so on (see the 327 details under Supplementary Note). First, we fitted each candidate model to the data using maximum 328 likelihood estimation (MLE) 34 and estimated the parameters (i.e. exponents) using the Nelder-Mead 329 method for maximising the log-likelihood. Then, we explored the best model using AICs and Akaike 330 weights. As for Heaps' law, we used the least-squared method, estimated the scaling exponents and 331 calculated the R-squared value to evaluate good-of-fit. To investigate how the variations of the scaling exponents in Heaps' law emerge from the difference 335 of cognitive functions or behavioural rules, we used a mathematical generative model based on a 336 previously proposed stochastic model for book corpus data 25 . In the model, new words can be 337 produced by the following rules. When a word is produced, a novel word is produced at the probability 338 pnew and the already used word is used at the probability 1 − pnew. The probability pnew is updated 339 depending on a function of nc, which is the number of the different words every time a novel word is 340 produced. The update rule is as follows: 341 where a is decay parameter and s is a small constant, and can produce the scaling law of Heaps' law. 343 Here, we interpret a as 1/γ, where γ is a parameter of cognitive functions because a is directly leading 344 to the exponent β 25 . Moreover, we observed the double scaling law in the relationship between the 345 number of words and different words. In the case of M < m, N = M 1 and for M ≥ m, N ~ M β . To model 346 this property, we adopted a different rule for small M following Gerlach and Altmann 25 . If N is small, 347 here N < 32, the empirical mean value of break points m, we assumed that new words were produced 348 at probability 0.8, but the higher value produced robust results. For N ≥ 32, we used the equation (1). 349 The two rules made it possible to switch from a linear relationship to a sublinear relationship. 350