Fractal analysis of muscle activity patterns during locomotion: pitfalls and how to avoid them

Time-dependent physiological data, such as electromyogram (EMG) recordings from multiple muscles, is often difficult to interpret objectively. Here, we used EMG data gathered during mouse locomotion to investigate the effects of calculation parameters and data quality on two metrics for fractal analysis: the Higuchi’s fractal dimension (HFD) and the Hurst exponent (H). A curve is fractal if it repeats itself at every scale or, in other words, if its shape remains unchanged when zooming in the curve at every zoom level. Many linear and nonlinear analysis methods are available, each of them aiming at the explanation of different data features. In recent years, fractal analysis has become a powerful nonlinear tool to extract information from physiological data not visible to the naked eye. It can present, however, some dangerous pitfalls that can lead to misleading interpretations. To calculate the HFD and the H, we have extracted muscle synergies from normal and mechanically perturbed treadmill locomotion from the hindlimb of adult mice. Then, we used one set per condition (normal and perturbed walking) of the obtained time-dependent coefficients to create surrogate data with different fluctuations over the original mean signal. Our analysis shows that HFD and H are exceptionally sensitive to the presence or absence of perturbations to locomotion. However, both metrics suffer from variations in their value depending on the parameters used for calculations and the presence of quasi-periodic elements in the time series. We discuss those issues giving some simple suggestions to reduce the chance of misinterpreting the outcomes. New & Noteworthy Despite the lack of consensus on how to perform fractal analysis of physiological time series, many studies rely on this technique. Here, we shed light on the potential pitfalls of using the Higuchi’s fractal dimension and the Hurst exponent. We expose and suggest how to solve the drawbacks of such methods when applied to data from normal and perturbed locomotion by combining in vivo recordings and computational approaches.


Introduction
The collection of physiological data over different time spans is crucial for neuroscience research 45 (Sherrington 1906). However, after collecting the data there always remains the issue of finding convenient and informative ways of analyzing it. For instance, electromyogram (EMG) recordings from multiple muscles are widely used for their important role in the study of locomotion (Bizzi et al. 2008). Despite their intuitive meaning (a muscle is either active or inactive and thus either does or does not produce a recordable signal), EMG recordings contain 50 complex features that might not always be detectable and/or quantifiable by traditional analysis methods. To uncover these local or global features of the signal, many linear and nonlinear approaches exist (Kantz and Schreiber 2004;Ting et al. 2015). Here, we used linear decomposition to extract time-dependent motor primitives from EMG data (Santuz et al. 2019).
Then, we selected two nonlinear metrics derived from fractal analysis and endeavored to answer 55 two main questions: 1) are those metrics able to detect differences in EMG-derived signal (i.e. motor primitives) when external perturbations are added to locomotion? And 2) what are the potential pitfalls -and ways to avoid them -of using those metrics? Nature exhibits similar patterns at different scales. A snowflake, when looked through a magnifying glass, shows shapes comparable to those visible to the naked eye. And so do the 60 branches of a tree, the coastline of an island, the course of a river, and many other natural or artificial entities (Mandelbrot 1983). Using mathematics, it is possible to build some geometric constructs that have the property of repeating themselves at every level of zoom, again and again. In other words, they are self-similar (Mandelbrot 1983). Such constructs have been called "fractals" by the mathematician Benoit B. Mandelbrot (Mandelbrot 1983). A celebrated 65 example of fractal, the famous Koch curve (Von Koch 1904) was discovered by another Submitted Manuscript 5 mathematician, Helge von Koch. As shown in the detail of Fig. 1A, the Koch curve repeats itself at every level of zoom: a demonstration of self-similarity. Mandelbrot, following the empirical work of yet another mathematician, Lewis Fry Richardson, asked himself whether it would be possible to quantify the "degree of complication" of such curves (Mandelbrot 1967;Richardson 70 1961). If in Euclidean geometry a line has dimension 1 and a plane has dimension 2, why not assigning a "fractional" (or fractal) dimension to a curve that is neither a straight line nor, say, a filled rectangle (Mandelbrot 1967)? Since the seminal work of Mandelbrot, the term "fractal analysis" has been used to denote a wide range of approaches, some of them pre-existing, that deal with self-similar entities (Mandelbrot 1983). 75 Fractal analysis has been used, amongst other disciplines, in hydrology (Hurst 1951), geography (Mandelbrot 1967), genetics (Peng et al. 1994), neuroscience (Lutzenberger et al. 1992), finance (Mandelbrot 1997), physiology and medicine (Losa et al. 2005), biomechanics (Scafetta et al. 2009), and seismography (Flores-Marquez et al. 2012). For instance, it can be used to predict stock market shifts and an example is given in Fig. 1B. Here it is shown the daily stock closing 80 price (NASDAQ Composite 2020) of Twitter, a popular social media amongst scientists (Cheplygina et al. 2020). Due to the SARS-CoV-2 and related COVID-19 outbreak (WHO 2020), after around 60 working days of positive trend, the stock price started to oscillate only to fall drastically by 44% in just 19 working days. The Hurst exponent (H, a metric for fractal analysis, see details in the methods and caption of Fig. 1) values reported in Fig. 1B, show the 85 increasing and decreasing trends before and after the COVID-19 crisis with 0.5 < H < 1.
However, when the time series is analyzed as a whole and has many oscillations as those we might find in EMG recordings from locomotion (see e.g. Fig. 2A-B), H values do not display any trend and 0 < H < 0.5. This is an indication that different fractal dimensions can be attributed to Submitted Manuscript 6 the same curve, depending on the scale and subset considered. The discrepancy can be explained 90 by the fact that data coming from real-world scenarios cannot always be classified as self-similar (Mandelbrot 1985). As shown in the enlarged circle and rectangle of Fig. 1B, the shape of this time series is comparable at different zoom levels, but is not exactly the same as for the Koch snowflake of Fig. 1A. The same is true for EMG data, as shown in the enlarged circles of Fig.   2A-B. Both the stock prices and EMG do not satisfy the condition of strict self-similarity and 95 should hence better be defined as statistically self-similar or, as Mandelbrot suggested, selfaffine (Mandelbrot 1985). For this and other reasons, care should be taken when conducting fractal analysis of physiological data, not to misunderstand the sources of variation (Gálvez Coyt et al. 2013;Jelinek et al. 2005;Kesić and Spasić 2016;Müller et al. 2017). For instance, the Higuchi's fractal dimension (HFD, see methods) is a popular metric for the estimation of fractal 100 dimension in neurophysiology (Kesić and Spasić 2016). Its values, however, can be influenced by numerical perturbations (Liehr and Massopust 2020), noise (Müller et al. 2017), and periodicity of the signal (Gálvez Coyt et al. 2013).
Here, we propose to use HFD and H as descriptors of changes in EMG-related time series when external perturbations are added to mouse locomotion. We show the sensitivity of the HFD and 105 H to well defined changes caused by targeted manipulation of surrogate data (Dingwell and Cusumano 2000;Theiler et al. 1992). Moreover, we investigated the sensitivity of the HFD and H to physiological changes in motor primitives caused by external perturbations. Finally, we offer some evidence-based suggestions to make the most out of physiological data's fractal analysis. Daily market closing price for Twitter stocks (NASDAQ Composite 2020). This time series is not strictly selfsimilar, but it is self-affine, as shown in the circular detail. The Hurst exponent (H) is calculated for several intervals 115 indicated as days in the subscripts, between brackets. For the calculation of H, the smallest window was set to half the length of the considered interval. 0.5 < H < 1 indicates persistence (negative or positive trend, e.g. that of the inreasing prices between day 615 and 672), while 0 < H < 0.5 indicates anti-persistence (no trend or oscillation around a mean value, e.g. when considering the whole time series from day 1 to 712 or in the case of EMG recordings from locomotion).

Baseline data
All procedures were performed according to the guidelines of the Canadian Council on Animal 125 Care and approved by the local councils on animal care of Dalhousie University with protocol #19-014. Five adult wild-type mice (age 67 ± 8 days) with C57BL/6 background walked at 0.3 m/s on a custom-built treadmill (workshop of the Zoological Institute, University of Cologne) capable to administer perturbations through quick mediolateral displacements and accelerations of the belt. EMG data was recorded from the hindlimb as previously reported (Akay et al. 2014;130 Santuz et al. 2019). Basic muscle activation patterns, or motor primitives (Dominici et al. 2011;Santuz et al. 2017), were extracted from EMG using the muscle synergies approach (Bernstein 1967;Bizzi et al. 1991) via non-negative matrix factorization (Lee and Seung 1999; 2019) using custom R scripts (R v3.6.3, R Core Team, 2020, R Foundation for Statistical Computing, Vienna, Austria). Of the extracted primitives, we arbitrarily chose one for normal 135 and one for perturbed walking, both representative of all five recorded animals. The two time series had the same length (i.e. 6000 data points, see Fig. 2A-B) and described one of the basic muscle activation patterns extracted from 30 gait cycles. Each cycle was time-normalized to 200 points (Santuz et al. 2019): 100 points assigned to the stance (when the limb was in contact with the ground) and 100 to the swing phase (when the limb was swinging in the air). 140

Surrogate data
From the two baseline time series, we generated three datasets using a modified version of the surrogate data method (Dingwell and Cusumano 2000;Theiler et al. 1992) using custom R scripts. For each of the two baseline series we: 1) calculated the mean gait cycle; 2) subtracted

Higuchi's fractal dimension
The Higuchi's fractal dimension (HFD) was calculated as previously reported (Santuz et al. 2020). Briefly, using moving windows of different size k, the length L defined by Higuchi (Higuchi 1988) was determined as shown in Fig. 3. L is not an Euclidean length, but is calculated the considered sub-series, to make L comparable at different k. This definition implies that L = 0 when taken between two points with the same ordinate (i.e. when L is "horizontal") and L → ∞ if the two points are very far apart on the y-axis (i.e. when L tends to be "vertical"). Following Higuchi's definition, if the time series is fractal then the logarithmic average of all obtained 180 lengths is proportional to the logarithm of the window size, with a slope that is the HFD (see Fig.   4A-C). HFD values are independent on the amplitude of the signal and range from 1 (e.g. for a straight line) to 2 (e.g. for a filled rectangle), with HFD = 1.5 for random Gaussian noise; higher HFD indicate more complex data (Anmuth et al. 1994;Higuchi 1988;Kesić and Spasić 2016).  H [615, 672] ); in other words, the series is persistent or has long memory (Gneiting and Schlather 2004;Mandelbrot 1983). For 0 < H < 0.5, in the long term high values in the series will be 205 probably followed by low values as in EMG-related data from locomotion, with a frequent switch between high and low values (see example in Fig. 1B, H [1,712] ); in other words, the series is anti-persistent (Gneiting and Schlather 2004;Mandelbrot 1983). Examples are given in Fig.   1B. A Hurst exponent of 0.5 indicates a completely random series without any persistence (Mandelbrot 1983;Qian and Rasheed 2004). 210

Statistics
To evaluate differences in HFD and H, we estimated the 95% confidence intervals (CI) of the compared sets of surrogate data using the 2.5% sample quantile as the lower bound and the 97.5% sample quantile as the upper bound. Ten thousand resamples with replacement for each parameter were used to estimate the CI (Santuz et al. 2019). Moreover, we calculated the effect 215 Submitted Manuscript 13 size Hedges' g. Differences were considered significant when the zero was lying outside each CI.
The level of significance was set at α = 0.05. All statistical analyses were conducted using custom R scripts.

Data availability
In the supplementary data set accessible at Zenodo (DOI: 10.5281/zenodo.3764734) we made 220 available: a) the metadata file "metadata.dat"; b) the baseline signals file "baseline_data.RData"; c) the R script "fractal_analysis.R" to calculate the H and HFD of the baseline data and produce the log-log plots. Explanatory comments are profusely present throughout the script and in the metadata file.

Results and Discussion
We found that both HFD and H values of almost periodic physiological time series depend on several variables. Care should be taken when conducting this kind of fractal analysis and in the next paragraphs we will present and discuss our findings, concluding with some suggestions.

Quasi-periodic time series
Locomotion dynamics are quasi-periodic (Scafetta et al. 2009). This means that when we record some physiological parameters such as muscle activity, their behavior in time will not be as regular as that of a sinusoidal function, but it will likely be close enough to allow the identification of similar cycles (e.g. the steps or the strides). The time series considered in this 255 study have been time-normalized to 200 points assigned to each gait cycle. Thus, their period is 200. As it is visible in Fig. 2, the quasi-periodicity becomes more evident when shifting the attention from BSLN (Fig. 2C-D) to BSLN50 (Fig. 2E-F) and, ultimately, BSLN10 (Fig. 2G-H).

Higuchi's fractal dimension
The HFD values (Fig. 4A-C) were significantly different between normal (left) and perturbed 260 (right) walking at all window lengths with absolute values of the Hedges' g effect size between 3.2 and 96.3. This indicates the ability of HFD to be sensitive to the presence of perturbations Submitted Manuscript 16 during locomotion. However, HFD values increased as the length of the window used for regression increased. This behavior shows that the curve in the log-log plot is not linear as in the case of strictly self-similar data series (Higuchi 1988). Due to the lack of consensus in the 265 literature on how to choose the maximum window length (what is often called k max ) and select the most linear part of the log-log curve (Kesić and Spasić 2016), extreme care should be taken when calculating the HFD of quasi-periodic time series. It is not surprising that some authors suggest to only consider the first two points in the log-log plot, so as to make sure that the selected portion is trivially linear (Gneiting et al. 2012). Moreover, a discontinuity appeared in 270 the neighborhood of the 200 th time point, bigger in normal as compared to perturbed walking and increasing from BSLN to BSLN10. This discontinuity is responsible for a sharp increase in the slope (which is the HFD) of the regression lines that include it, with an effect that increases with the "regularity" of the time series. This is due to the definition of the Higuchi's length (Higuchi 1988), which tends to zero when taken between two points that have close ordinate component 275 or, in other words, when it is almost horizontal in the example of

Hurst exponent
The H values (Fig. 4D-F) were significantly different between normal and perturbed walking at all window lengths with absolute values of the effect size between 5.1 and 45.6. This indicates 280 the ability of H to be sensitive to the presence of perturbations during locomotion. However, H values (or the slope of the considered curve in the log-log plot) were higher for minimum window lengths smaller than 200 points and lower for minimum window lengths bigger than 200 points, divided by what is called a crossover point (Mandelbrot 1985). This not only shows the susceptibility of H to the chosen window length, but also that H is influenced by the presence or absence of quasi-periodic elements in the time series. This can be explained by comparing the properties of H and HFD. For self-similar series, the fractal dimension is 2-H (Mandelbrot 1985), thus lower H (or higher HFD, as stated above and in the methods) indicate more irregular or complex data (Higuchi 1988;Kesić and Spasić 2016). For self-affine series such as those treated in this study, fractal dimension and H are not necessarily linearly correlated and can exhibit 290 different local and global values (Gneiting and Schlather 2004;Mandelbrot 1997). In the words of Gneiting and Schlather (Gneiting and Schlather 2004): "[…] fractal dimension and Hurst coefficient are independent of each other: fractal dimension is a local property, and longmemory dependence is a global characteristic.". In fact, if the data shows any short-range dependence as in this case, it is commonly suggested to exclude the low end of the plot for the 295 estimation of H (Taqqu et al. 1995).

Concluding remarks
When dealing with physiological data, it is tempting to use fractal analysis for estimating the regularity, complexity or persistence of time series (Kesić and Spasić 2016;Losa et al. 2005).
Here we showed that the values of two common metrics for fractal analysis, HFD and H, are 300 exceptionally sensitive to changing in EMG-derived data when external perturbations are applied to murine locomotion. However, HFD and H also strongly depend on the parameters selected for the calculations and the presence or absence of quasi-periodic elements in the time series. In order to work around those issues, we suggest to: a) calculate HFD by using only the most linear part of the log-log plot, which in our case correspond to maximum the first 10 data points, and b) 305 calculate H by using a smallest window bigger than the period given by the length of the individual gait cycles.