Detection of task-relevant and task-irrelevant motion sequences: application to motor adaptation in goal-directed and whole-body movements

Motor variability is inevitable in our body movements and is discussed from several various perspectives in motor neuroscience and biomechanics; it can originate from the variability of neural activities, it can reflect a large degree of freedom inherent in our body movements, it can decrease muscle fatigue, or it can facilitate motor learning. How to evaluate motor variability is thus a fundamental question in motor neuroscience and biomechanics. Previous methods have quantified (at least) two striking features of motor variability; the smaller variability in the task-relevant dimension than in the task-irrelevant dimension and the low-dimensional structure that is often referred to as synergy or principal component. However, those previous methods were not only unsuitable for quantifying those features simultaneously but also applicable in some limited conditions (e.g., a method cannot consider motion sequence, and another method cannot consider how each motion is relevant to performance). Here, we propose a flexible and straightforward machine learning technique that can quantify task-relevant variability, task-irrelevant variability, and the relevance of each principal component to task performance while considering the motion sequence and the relevance of each motion sequence to task performance in a data-driven manner. We validate our method by constructing a novel experimental setting to investigate goal-directed and whole-body movements. Furthermore, our setting enables the induction of motor adaptation by using perturbation and evaluating the modulation of task-relevant and task-irrelevant variabilities through motor adaptation. Our method enables the identification of a novel property of motor variability; the modulation of those variabilities differs depending on the perturbation schedule. Although a gradually imposed perturbation does not increase both task-relevant and task-irrelevant variabilities, a constant perturbation increases task-relevant variability.


22
In our daily life, we can repeatedly achieve desired movements, such as grasping a cup, throwing a ball, 23 and playing the piano. To achieve the desired movements, our motor system needs to resolve at least two 1 2 (Xw − X rel w) T under the constraint X ̸ = X rel to avoid the self-evident answer, X rel can be written as where (ww T ) † is a pseudo-inverse of ww T and |w| = √ w T w. The equality w T (ww T ) † = w T |w| 2 holds 144 when w ∈ R D×1 . Under the decomposition X = X rel + X irr , X irr can be written as where I ∈ R D×D is an identity matrix. Under the appropriate normalization (i.e., mean and standard 146 deviation of each component of X and y were set to be 0 and 1, respectively, see Materials and Methods 147 for details), X rel w = Xw = h and X irr w = 0, indicating that X rel and X irr denoted task-relevant 148 and task-irrelevant components under the framework of linear regression. An important point of this 149 framework is that it does not require any explicit function (e.g., forward kinematics such as in UCM or 150 task function such as in GEM and TNC) but require only data X and y. 151 Figs. 1A and 1B demonstrate typical examples of the decomposition when X includes only 2 elements 152 and constrains the task by setting h = X 1 − X 2 (i.e., w 1 = 1 and w 2 = −1) to some certain values (e.g., 153 y = 2, 0, -2 in the simulated task 1, 2, 3, respectively). Because the constrained task was one dimensional 154 and input data were two dimensional, an infinite patterns of X values resulted in an identical h value. In 155 this case, X rel = X 1 √ 2 ( 1 −1 −1 1 ) = 1 √ 2 (X 1 − X 2 , −(X 1 − X 2 )) and X irr = X − X rel . The simulated 156 data points on the dotted line in Fig. 1B indicated X rel . On the dotted line, the data points can be by controlling a large number of DoFs. We focus on a simplified version of whole-body movements: a vertical jump while crossing arms in front of the trunk ( Fig. 2A). This goal-directed whole-body movement enabled us to focus on lower limb and trunk motions and assess task-relevant variability, task-164 irrelevant variability, and the low-dimensional spaces in which a high portion of the motor variability 165 was embedded. We proposed a machine learning technique to simultaneously evaluate these features of 166 variability while considering motion sequence and the relevance of each motion to jumping height. 167 Subjects stood in a fixed position and were instructed to look at a computer monitor located in front   Second, we determined whether the subjects showed motor adaptation in the experimental setting.

186
In 96 learning trials in experiment 1 (Fig. 2C where {a i a j } = (a 2 1 , a 1 a 2 , a 1 a 3 , a 1 a 4 , a 2 2 , a 2 a 3 , a 2 a 4 , a 2 3 , a 3 a 4 , a 2 4 ). By comparing these three candidates, 213 we found that the first candidate (i.e., X = ({q i }, {q i })) showed the lowest prediction error (Fig. 4A). In 214 particular, the first candidate, with four-time frames before release, yielded the lowest prediction error.

215
If the prediction error equals 1, the method cannot predict the output data. In contrast, if the prediction 216 error equals 0, the method can predict the output data with 100% accuracy. As shown in Fig    Our method does not require any explicit task function, such as the parabolic approximation of jumping 244 height, but it determines the relevance of the motion sequence to the task in a data-driven manner.

245
Further, our method is robust against observation noise due to the properties of ridge regression.

246
Relevance of each principal component to task performance. Movement variability shows not 247 only less task-relevant variability than task-irrelevant variability but also a low-dimensional structure. The 248 current study compares our method to PCA, a conventional method to extract low-dimensional structure.

249
Because the low-dimensional structure is considered to represent some features of motor control, it can 250 be expected to be correlated to task performance. We decomposed the motion sequence X into principal participants. In a typical subject, however, the 2nd rather than the 1st PC showed the highest correlation 258 to jumping height (red line in Fig. 4E). This typical subject was not an exception; Fig. 4F shows the PC 259 number with the highest correlation to jumping height. In 6 out of 13 subjects, the 1st PC showed the highest correlation to performance. In 3 out of 13 subjects, the 2nd PC showed the highest correlation, 261 and the 3rd PC showed the highest in 4 out of 13 subjects. These results indicate that the explained 262 movement variability did not correspond to the relevance to task performance.

263
Ridge regression enabled the prediction of jumping height with higher accuracy than PCA (red line 264 in Fig. 4C) because the ridge regression weights each PC based on both the explained movement vari-265 ability and the task relevance. In PCA (or equivalently singular value decomposition (SVD)), the motion 266 sequence at the tth trial is decomposed as the ith eigenvalue corresponding to the ith PC v i , and u i,t indicated how the ith PC appeared at the 268 trial. The correlation of the ith PC to task performance was thus calculated based on u i,t , and it did 269 not reflect the relevance of the ith PC to task performance. In contrast, the ridge regression enables the

277
mension. An advantage of our method is its linearity, which enables the simultaneous comparison of 278 the task-relevant and task-irrelevant variabilities among the conditions where mean kinematics or task 279 parameters change (e.g., before, during, and after motor learning). It was previously unclear how task- previously unclear whether such modulation of variability could be observed in whole-body movements.

286
Our method without linear approximation enabled the discussion of how task-relevant and task-irrelevant 287 variabilities are modulated before and after motor adaptation in whole-body movements. We thus applied 288 our method to motor adaptation in response to constant and gradually imposed perturbations.

289
In experiment 2 (two days for each subject), subjects experienced gradually increased or decreased 290 perturbations. Each subject underwent ten learning trials without any perturbation. The perturbation 291 was gradually imposed for ten trials and was set to 0.05 or -0.05 for ten trials (Figs. 5A, B). The gradually 292 imposed perturbation required not abrupt but gradual compensation (i.e., subjects were required to 293 modify their motions slightly in each trial). In a total of 30 trials, the target height was set to 50% of 294 the subject's maximum jumping height. Subjects who experienced a p t > 0 on the first day experienced 295 a p t < 0 on the 2nd day and vice versa. The order of perturbation was counterbalanced across subjects.

296
The subjects could adapt to the gradually increased or decreased perturbations (Fig. 5C).

297
In experiment 3, the subjects experienced constant perturbations. Each subject underwent five learn-298 ing trials without any perturbation. The perturbation was set to 0.05 or -0.05 for 15 trials, 0 for ten trials 299 for washout and -0.05 or 0.05 for 15 trials (Figs. 5D, E). In contrast to experiment 2 where the pertur-300 bation was gradually imposed, subjects were required to modify their motions abruptly in experiment 3.

301
Subjects who experienced a p t = 0.05 on the 6th-20th trials experienced a p t = −0.05 on the 31st-45th 302 trials and vice versa. The order of perturbation was counterbalanced across subjects. In a total of 45 303 trials, the target height was set to 50% of the subject's maximum jumping height. In both experiments 304 2 and 3, the subjects adapted to the perturbations (Fig. 5F). 305 We calculated the task-relevant and task-irrelevant variabilities before and after adaptation in exper- a simulated and two-dimensional case similar to that shown in Fig. 1 (Figs. 6C, D). In adapting to 315 perturbations, the subjects needed to modify their output (i.e., jumping height) by determining an ap-

316
propriate input (i.e., motion sequence). In adapting to gradually increasing or decreasing perturbations, 317 there was no modulation in both task-relevant and task-irrelevant variabilities (Fig. 6C). In adapting to 318 a constant perturbation, the task-relevant variability increased, while the task-irrelevant variability was 319 not be modulated (Fig. 6D). Notably, Figs. 6C and 6D were not real data but simulated examples to 320 interpret our results. In summary, the modulation of task-relevant variability depends on the schedule of 321 perturbation.

323
We proposed a flexible and straightforward machine learning technique that quantified task-relevant 324 variability, task-irrelevant variability, and the relevance of each principal component to task performance 325 in a noise-robust manner while considering motion sequence and how each motion sequence was relevant to 326 task performance (Fig. 4). Our method can find the relevance of each motion sequence to performance 327 (i.e., task function) in a data-driven manner; our method does not require any explicit task function, ,q, our framework corresponds to UCM. When we define w 1 = 1, w 2 =v g , X 1 = p −p, and

345
Another advantage of our method is the ability to select appropriate input based on predictive power 346 (Fig. 4A) select the length of time frames is another crucial problem (Fig. 4A). In our data, the motion data with four-time frames (approximately 33 ms) was chosen for the best predictive power. Although the motion 357 data with four-time frames sound motion fragments rather than motion sequences, our method can be 358 applied independently of the length of time frames. In our case, four-time frames were chosen to increase 359 the predictive power.

360
Although we compared our method to UCM and GEM (Fig. 4B), we needed to compare it to the 361 TNC method [13,37], the other method used to quantify motor variability from a different perspective.

362
TNC enables the extraction of three types of information from motion data: T-cost, which quantifies 363 how the mean motion data deviate from the optimal motion; N-cost, which quantifies how the motor 364 variability deviates from the optimal variability; C-cost, which quantifies how the covariance among 365 motion data deviates from the optimal covariance. Although the TNC does not consider task-relevant where X i,t is X i at the tth trial, X rel i,t is the ith component of X rel at the tth trial, X irr i,t is the ith component 481 of X irr at the tth trial, and Cov(X rel i , X irr i ) is the covariance between X rel i and X irr i . Notably, in the 482 current experimental setting, Cov(X rel i , X irr i ) in the analyzed trials was close to 0. We thus considered 483 only Var(X rel i ) and Var(X irr i ).

Ridge regression
The ridge regression enabled us to determine the best one-dimensional linear space 485 w ∈ R D×1 in the input data X ∈ R T ×D to predict the output data y ∈ R T ×1 by minimizing the cost 486 function: The first term on the right-hand side indicates the fitting error, the second term indicates the regulariza-488 tion of w, and λ is a regularization parameter. The current study determined λ to minimize the prediction 489 error based on a 10-fold cross validation, which enabled us to avoid overfitting [28]. Overfitting, which 490 can appear without any regularization, leads to the selection of a model that is more complicated than 491 the true one. Minimization of the cost function concerning w leads to the optimal value for w: where I was an identity matrix. When XX T has multicollinearity, it is difficult to calculate the inverse 493 of XX T because of the rank deficit. The identity matrix with a regularization parameter λ enables to 494 calculate the inverse of XX T + λI and predict output data with a certain accuracy.

495
The ridge regression showed high prediction power under the existence of measurement noise in X. 496 Under the existence of measurement Gaussian noise ξ with a mean of 0, the standard deviation is σ o , 497 covariance is 0, and the cost function averaged across all the possible noise can be written as The equivalence between equations (5) and (7) indicates that the ridge regression enabled the selection of 499 the best w to predict y under the existence of measurement noise while avoiding overfitting. The equiv-500 alence also suggests that the regularization parameter λ corresponds to the variance of the observation The ridge regression enabled the estimation of an appropriate w based on the normalized y and X, 503 i.e., the mean and standard deviation of y and X should be normalized to be 0 and 1, respectively; , where w corresponds to unnormalized data, should be divided by σ i (w i = wi σi ) and be subtracted. In total, the normalization is indispensable for estimating an appropriate w; however, it 513 did not affect the results at all.
where g ≃ 9.8(m/s 2 ). In the joint angle representation, p and v were written as follows: and where l i indicated the length of the ith limb (i.e., l 1 indicated the length between right toe and heel, l 2 521 indicated the length between right heel and knee, l 3 indicated the length between right knee and hip, and 522 l 4 indicated the length between hip and back). In the UCM (blue crosses in Fig. 4B), we calculated the 523 task-relevant and task-irrelevant variabilities based on equations (9) and (10).

524
Using the equations (9) and (10), the predicted jumping height h can be written as line in Fig. 4A). The second candidate data were the functions in the forward kinematics of the position 527 and velocity of the hip joint (equations (9) and (10), red line in Fig. 4A). The third candidate data were 528 the functions that appeared in equation (11) (orange line in Fig. 4A). In GEM (blue crosses in Fig. 4B), 529 we calculated the task-relevant and task-irrelevant variabilities based on equation (11). and D i,j = 0 when i ̸ = j, and V ∈ R D×D is an orthogonal matrix. Using the SVD and equation (6), the 535 predicted output h t can be written as where min(T, D) determines the rank of X, λ 2 i is an eigenvalue of X T X, Corr(·, ·) indicates the correlation 537 between two vectors, v i is the eigenvector of X T X corresponding to λ 2 i , and u i,t is the (i, t) component 538 of U . On the other hand, PCA enables the decomposition of X t as This equation indicates that the motion data can be decomposed into eigenvectors (principal components) In PCA, we found the relation between the explained variance and prediction power to be as fol-548 lows: at a z% explained variance, we determine the number of principal components based on n z = 549 min n , the minimum number of principal components that exceed z% explained 550 variance). After determining n z , motion data can be reconstructed asX  Figure 1. The concept of our method. A: An example of decomposing input data X into task-relevant 682 X rel and task-irrelevant components X irr . In this case, we assumed that the task 1 required X 1 − X 2 to 683 be 2 (green line), task 2 required X 1 − X 2 to be 0 (red line), and task 3 required X 1 − X 2 to be -2 (blue 684 line). Green, red, and blue dots indicate the typical input data for tasks 1, 2, and 3, respectively. In the 685 ridge regression, these tasks can be achieved with w 1 = 1 and w 2 = −1, i.e., h = w 1 X 1 + w 2 X 2 = X 1 − X 2 686 should be determined differently in each task. B: The input data were decomposed into a task-relevant 687 (black dotted line) component X rel = Xww T /|w| 2 and a task-irrelevant component X irr = X − X rel 688 (solid black line). X rel was separated depending on the task, and X irr was not separated, which indicates 689 that the decomposition enables the discussion of the task-relevant and task-irrelevant components.

758
We assume that the task before adaptation required X 1 − X 2 to be 2 and that the task after adaptation 759 required X 1 − X 2 to be -2. Panel C indicates an interpretation of our results in experiment 2. In 760 experiment 2, there was no modulation in both task-relevant and task-irrelevant variabilities. Panel D

761
suggests an explanation of our findings from experiment 3. In the experiment, task-relevant variabilities 762 increased after adaptation, and task-irrelevant variabilities remained unchanged.