ABSTRACT
Dexterous tool use is typically characterized by fast and precise motions performed by multiple fingers. One representative task is piano playing, which involves fast performance of a sequence of complex motions with high spatiotemporal precision. However, for several decades, a lack of contactless sensing technologies that are capable of precision measurement of piano key motions has been a bottleneck for unveiling how such an outstanding skill is cultivated. Here, we developed a novel sensing system that can record the vertical position of all piano keys with a time resolution of 1 ms and a spatial resolution of 0.01 mm in a noncontact manner. Using this system, we recoded the piano key motions while 49 pianists played a complex sequence of tones that required both individuated and coordinated finger movements to be performed as fast and accurately as possible. Penalized regression using various feature variables of the key motions identified distinct characteristics of the key-depressing and key-releasing motions in relation to the speed and accuracy of the performance. For the maximum rate of the keystrokes, individual differences across the pianists were associated with the peak key descending velocity, the key depression duration, and key-lift timing. For the timing error of the keystrokes, the interindividual differences were associated with the peak ascending velocity of the key and the interstrike variability of both the peak key descending velocity and the key depression duration. These results highlight the importance of dexterous control of the vertical motions of the keys for fast and accurate piano performance.
INTRODUCTION
One of the most representative features of skillful motor actions, such as surgery and musical performance, is dexterous tool use at high speed with high precision. This activity is challenging, particularly for individuals without any history of extensive manual training, due to the trade-off between speed and precision of movements (Fitts, 1954) and thus requires people to undergo years of training to overcome it. A precise description of such skillful behaviors is essential for elucidating biomechanical principles governing the production of movements and neuroplastic mechanisms subserving the acquisition and loss of skills through training and the development of disorders (Furuya & Hanakawa, 2016). A methodological challenge for such a precise description is difficulty in obtaining acurate measurements of fast and subtle movements in dexterous tool use, in contrast to gross and slow movements used in daily activities, such as grasping. Modern technologies for sensing human motions, such as motion capture with multiple high-speed cameras (Engel, Flanders, & Soechting, 1997; Goebl & Palmer, 2013) and data gloves with multiple bending sensors (Gentner et al., 2010; Jerde, Soechting, & Flanders, 2003), have enabled quantitative assessment of complex manual movements. However, the time resolution of these sensors is generally not enough to capture complex patterns of fast motions of a tool that can be manipulated in skillful motor actions, such as throwing a baseball and playing musical instruments. In addition, an occlusion of markers attached to the body and the tool has been the bottleneck of obtaining precise measurements of complex movements involving dynamic postural changes with motions at multiple joints. The development of novel sensing technologies has attempted to solve such problems. For example, a miniature magnetic sensor successfully recorded motions of a ball on the order of milliseconds, and a series of experiments with it uncovered various features of skillful ball-throwing motions (Hore & Watts, 2011; Hore, Watts, Leschuk, & MacDougall, 2001). However, sensors attached to a tool can alter physical properties, such as weight and inertia, and affect tactile and proprioceptive feedback in motion, the latter of which matters particularly in the assessment of the symptoms of focal hand dystonia due to sensory trick (Paulig, Jabusch, Grossbach, Boullet, & Altenmuller, 2014). Therefore, the development of contactless or noncontact sensors that enable to record motions of a tool to be manipulated at high spatiotemporal resolution is needed to fully unveil experts’ motor expertise in dexterous tool use.
Piano playing can be one of the most representative dexterous skills to perform (Furuya, Flanders, & Soechting, 2011; Furuya & Soechting, 2012; Winges, Furuya, Faber, & Flanders, 2013). Previous studies developed some sensors, such as pressure sensors that were placed on the bottom of the keys (Parlitz, Peschel, & Altenmuller, 1998), force sensors that were implemented on the key surface (Furuya & Kinoshita, 2008; Kinoshita, Furuya, Aoki, & Altenmuller, 2007; Oku & Furuya, 2017), and custom-made data gloves (Furuya, Tominaga, Miyazaki, & Altenmuller, 2015; Gentner et al., 2010; Tominaga, Lee, Altenmuller, Miyazaki, & Furuya, 2016), which successfully recorded the motions and/or force of the piano keys and fingers at high spatiotemporal resolution. However, none of these original sensors were no-contact sensors that enabled the recording of key motions without altering the touch sensation. In contrast, the noncontact sensing technology that has been used most frequently in previous studies was the Musical Instrument Digital Interface (i.e., MIDI) (Furuya & Altenmüller, 2013a; Furuya, Nitsche, Paulus, & Altenmuller, 2013; Hosoda & Furuya, 2016; Jabusch, Vauth, & Altenmüller, 2004; Nakahara, Furuya, Francis, & Kinoshita, 2010; Pfordresher, 2003; Repp, 1995; van der Steen, Molendijk, Altenmüller, & Furuya, 2014; van Vugt, Furuya, Vauth, Jabusch, & Altenmüller, 2014). This technology captures only two discrete events of the key motion, the moments when the key was depressed and released, which provides no high spatiotemporal resolution and therefore fails to capture various features of fine motor control.
Here, we propose a novel sensing system capable of capturing the time course of the vertical position of 88 piano keys, without any physical contact with the keys, with a time resolution of 1 ms and a spatial resolution of 0.01 mm. The sensors are embedded under the piano keys and did not have any mechanical contact with any keys. To test whether this sensing system allows for the identification of the motor proficiency of expert pianists, a behavioral experiment with expert pianists was performed, and a set of motor engrams of pianists’ touches were extracted from the collected data and analyzed by a penalized regression model. The results identified a novel motor skill that explains individual differences in both the maximum speed and timing precision across pianists’ fast piano performances.
METHODS
Participants
Forty-nine expert right-handed classical pianists (41 females, 20 - 45 years old) without a history of serious physical problems related to piano playing served as the participants in the present study. Each pianist underwent at least 15 years of piano training at music conservatories and/or privately under the supervision of professional pianists. In accordance with the Declaration of Helsinki, the experimental procedures were explained to all participants. Informed consent was obtained from each participant prior to the experiment. The Ethics Committee at Sophia University approved this study.
Sensing system
The vertical position of each key was measured using a custom-made contactless optical sensor system (Fig. 1). The sensor system was mounted beneath the piano keys of an acoustic piano (i.e., key-bed). The sensor consists of 88 photo reflectors (LBR-127HLD, Letex Technology Corp.), seven 12-bit analog-to-digital (A/D) converter integrated circuit chips (ADS7953, Texas Instruments), and a microprocessor (STM32F446, STMicroelectronics). Each of the photo reflective sensor beneath the key projects infrared light on the bottom surface of the key, and the derived voltage signal changes in relation to the intensity of the reflected infrared light. The intensity of the reflected infrared light increases linearly with a decrease in the distance between a photo reflector and the bottom surface of a key (see details in the Results). Therefore, the change in the voltage signal represents the change in the distance between a photo reflector and the bottom surface of a key. Each A/D converter controlled by the microprocessor collects the voltage signal from the 12 or 14 photo reflectors. The voltage signal is stored on a personal computer via the microprocessor at a sampling frequency of 1 kHz and converted to the vertical distance. The sensor system can therefore record the time-varying vertical position of all 88 piano keys at a 0.01 mm spatial resolution without any physical contact that can affect the mechanics characteristics of the piano keystroke.
Schematic illustration of the system for sensing the key positions. (A) Each of the 88 photo reflectors were mounted beneath each of the piano keys. (B) The photo reflector projected infrared light on the bottom surface of the key, and the derived voltage signal changed linearly with the distance between the photo reflector and the bottom surface of the key in relation to the intensity of the reflected infrared light.
Experimental setup and task
The experimental apparatus consisted of a Yamaha acoustic piano (U1) and the sensing system. Figure 2 shows the experimental task requiring the repetition of two sets of simultaneous keystrokes of the two keys with leaving one white key in between (i.e., major third interval), using the right index and ring fingers for one set and the right middle and little fingers for another set. Participants were asked to perform the task for 6 seconds as fast and accurately as possible and at paced tempo (100 bpm) at a predetermined loudness (i.e., mezzo forte), which was provided as a sound stimulus from the speaker located in front of the participant. We used this task because such a chord-trill task, which has been included in various musical pieces (e.g., Etude Op.25 no.6 by Frederik Chopin, Ondine by Maurice Ravel, Piano Sonata No.3 1st mov. by Ludwig van Beethoven). has been known to be technically challenging to play quickly and accurately.
The experimental task. The experimental task required the repetition of two sets of simultaneous keystrokes of the two keys, leaving one white key in between (i.e., major third interval), using the right index and ring fingers for one set and the right middle and little fingers for another set.
Data analysis and statistics
The position data of the keys were low-pass filtered using a second-order Butterworth filter with a cutoff frequency of 20 Hz. Figure 3 illustrates 6 features characterizing the time-varying waveform of each vertical motion of the keys and their derivatives (i.e., velocity). For each feature variable, the mean and standard deviation across strikes within a trial were computed.
Definition of the six characteristics of the key-depressing and key-releasing motions. (i) peak_des_vel: peak descending velocity during key depression, (ii) peak_asc_vel: peak ascending velocity during key release, (iii) max_depth: the maximum key-moving distance during key depression, (iv) peak_to_bottom: the time difference between the moment when the key reaches its peak descending velocity and the moment when the key reaches its bottom during key depression, (v) release_to_peak: the time difference between the moment when the key leaves from its bottom and the moment when the key reaches its peak ascending velocity during key lift, (vi) depression_ratio: the ratio of the duration when the key touched the bottom relative to the duration between the onset and offset of key motion.
We used the interkeystroke interval at the fastest tempo (a difference in timing between two successive keypresses) and loudness balance at the paced tempo (a difference in the peak descending velocity of the two keys to be depressed simultaneously) as variables representing the maximum tempo and precision of the performance, respectively. Using each of the two variables as a dependent variable and the interstrike mean and standard deviation values of the aforementioned 6 features characterizing the waveforms of the vertical positions of the keys as independent variables, a penalized regression (i.e., elastic net regression) was performed to identify motor skills associated with speed and accuracy of the finger movements of expert pianists. Here, each variable used for the regression was standardized (subtracting the mean value and dividing by the standard deviation). All of the analyses were performed using the library “Scikit-learn” (Pedregosa et al. 2011) in Python.
RESULTS
Performance of the sensing system
To evaluate the ground noise of the system, we computed the ratio of the maximum amplitude of the ground noise (i.e., the difference between the maximum and minimum sensor values when the key was stable at the highest position for 20 seconds) relative to the range of sensor values (i.e., the difference between the sensor value when the key was located at its bottom position and the sensor value when the key was at its highest position). The ratio was 0.10±0.032% across all sensors. This ratio almost corresponded to the spatial resolution of the sensor system and was negligible.
To evaluate the linearity of the system, we pushed the keys down from 0 mm to 10 mm at 1 mm intervals by means of a micrometer and calculated a linear error by comparing the measured sensor values with sensor values estimated from a least squared linear regression of the measured sensor values. The across-key average of the ratio of the average linear error of each point relative to the measurement range of the sensor value was 2.4±0.67%.
To evaluate the temporal precision of the system, we executed a continuous 3-minute measurement of the key position 10 times and evaluated the variation and accuracy of the sampling interval time. We evaluated the variation in the sampling interval time by computing the standard deviation of the sampling time recorded from the system’s internal clock. We evaluated the accuracy of the sampling interval by comparing the difference in receipt time between the start command and the end command recorded by the system’s internal clock with the difference in the sent time between the start command and the end command recorded by the internal timer of the Windows operation system called “QueryPerformanceCounter”, which has a 1-microsecond time resolution. The average standard deviation of the sampling time was 8.08 ± 0.02 × 10−3 milliseconds. This deviation is less than 1% of the sampling time interval, which is negligibly small. The error between the elapsed time recorded by the system’s internal clock and that recorded by the OS’s internal clock was 4.16 ± 0.50 × 10−5%. This error corresponds to less than half of a sampling interval time for a 10 s recording, and it was negligible enough for the short recording, such as that used in the experiment in this study.
Results of the regression model based on the human experiments
Table 1 summarizes the results of the elastic net regression explaining the interindividual variance for both the maximum tempo and loudness balance across the participants by displaying the mean and standard deviation of the six features of the key movements across strikes. For the maximum tempo and the loudness balance, the R2 value derived from the model prediction based on the observed values of the feature variables was 0.852 and 0.759, respectively, which are visually displayed in Figures 4 A and 4 B. The alpha value was 0.030 and 0.032 for the model explaining the fastest tempo and loudness balance, respectively, which indicates that the model was almost the same as the ridge regression. Between the fastest tempo and loudness balance, there was no correlation (r = -0.183, p = 0.214), which indicates that these performance variables are independent.
Schematics of the results of the penalized regression explaining the interindividual variance of each of the IOIs at the fastest tempo and loudness balance across the participants. (A) R2 of the regression model for the IOI at the fastest tempo was 0.852. The individual differences across the pianists of the IOI at the fastest tempo were mostly accounted for by the interstrike mean of the peak descending velocity of the key, the interstrike mean of the key-depression ratio, and the interstrike mean of the time to key release. (B) R2 of the regression model for the loudness balance at the paced tempo was 0.759. The individual difference across the pianists of loudness balance was mostly accounted for by the interstrike variability of the peak key descending velocity, the interstrike variability of the key depression ratio, and the interstrike mean of the peak ascending velocity of the key.
Figure 4 illustrates a schematic drawing of the elastic net model explaining both the fastest speed and loudness balance according to the feature variables of the key movements of the pianists. For the model of the fastest speed (i.e., interonset interval: IOI), the coefficient value was large, particularly for the interstrike mean of the peak descending velocity of the key, the interstrike mean of the key depression ratio, and the interstrike mean of the time to key release. For the model of loudness balance, the coefficient value was large, particularly for the interstrike variability of the peak key descending velocity, the interstrike variability of the key depression ratio, and the interstrike mean of the peak ascending velocity of the key.
DISCUSSION
In the present study, we developed a novel sensing system capable of measuring the time-varying vertical position of all piano keys with high spatiotemporal resolution without having any physical contact with the keys. By using the system, we assessed a variety of features of movements of multiple keys while expert pianists were performing fast and accurate keystrokes. By analyzing these features for a large number of pianists with a machine learning technique using a penalized regression model, we identified novel spatiotemporal features of key motion that were associated with the speed and accuracy of dexterous finger movements in piano performance. There are three major results in this study. First, we confirmed that our sensing system that has no mechanical contact with the piano keys could record the vertical position of the piano keys with 1 ms of temporal resolution and 0.01 mm of spatial resolution with a linearity between the position and voltage of the signal. Second, the system identified novel features of the key motions describing the pianists’ expertise (i.e., speed and accuracy), which are not only the timing and velocity of the key depression and release that had been assessed in many previous studies using the MIDI technology but also the spatiotemporal features of the peak velocities of key movements, the maximum displacement of key descent, and the duration during which keys were maximally depressed. Third, using a large dataset of key movements collected from 49 pianists, a penalized regression model identified a novel set of task-relevant features of the finger touches that explain the individual differences in the speed and accuracy in dexterous keystroke tasks. For the mean interkeystroke interval representing the agility of the repetitive keystrokes, the individual difference across the pianists was associated with the key depression duration, the time to the moment when the key ascending velocity reached its peak, and the peak descending velocity of the key. For the loudness balance of the two simultaneously depressed keys representing the precision of the key-depression velocity, the individual differences were associated with the interstrike variability of the peak descending velocity and the key-depression duration and the interstrike average of the peak ascending velocity of the keys. These results indicate that motor skills necessary for dexterously depressing and releasing the keys play a crucial role in both fast and accurate performance of sequential finger movements in piano performance.
A close inspection of the results of the regression model deepens the understanding of mechanisms underlying fast and accurate performance of the finger motions. First, the negative covariation between the loudness balance error of the two simultaneous keystrokes and the interstrike variability of the key depression duration suggests that pianists who changed motions in a strike-by-strike manner were better at keystroke feedback control based on afferent sensory information. In the accurate performance of repetitive piano keystrokes, previous studies have demonstrated crucial roles of feedback control based on somatosensory information (Hirano, Kimoto, & Furuya, 2020; Hirano, Sakurada, & Furuya, 2020; Hosoda & Furuya, 2016). In feedback control, afferent sensory information is integrated into motor commands to correct movement error. The key-depression duration indicates the duration during which the fingertip receives the reaction force originating from the mechanical interaction between the key and key-bed, whereas this duration is not at all associated with the tone loudness. It is therefore possible that the interstrike variability of key depression represents a process of online correction of movements based on the somatosensory feedback derived from fingertips during repetitive keystrokes. Second, it is reasonable that the loudness balance error covaried positively with the interstrike variability of the peak key descending velocity because the key descending velocity determines the tone loudness (Kinoshita et al., 2007). One possible account for this outcome is that pianists who can produce the target key-striking velocity consistently are able to discriminate subtle deviation of the tone loudness or force applied to the key and therefore can perceive a subtle loudness error between the two tones in the key presses (Hirano et al. 2020). It is also plausible that pianists with a smaller amount of signal-dependent noise in the motor commands (Faisal, Selen, & Wolpert, 2008) displayed both reduced interstrike variability of the key-striking velocity and lower loudness balance error. Third, a positive covariation between the peak ascending velocity of the key and the loudness balance error indicates that faster finger-lift motions allowed for earlier preparation of subsequent keystrokes, which may enable precise control of tone loudness. For example, preparatory auditory imagery in sequential motor actions modulates action planning to produce upcoming movements efficiently (Keller, Dalla Bella, & Koch, 2010). A possible implication for music pedagogy is therefore to encourage pianists to lift their fingers quickly following key depression for accurate loudness control of multiple tones with their fingers; this approach needs further evaluation of causality through interventional studies.
For the maximum tempo when playing at the fastest tempo, the individual differences were negatively associated with both the key depression duration and duration until the key ascending motion reached its peak velocity. These results corroborate our previous observation that the duration of the hand muscular activities during the individual keypress was negatively associated with the maximum tempo when pianists were playing as fast as possible (Winges et al., 2013). The shortened key-depression duration and hand muscular activation, as well as quicker initiation of the key ascending movement, can allow for quicker transition of the direction of the finger movement from flexion to extension; these features are prolonged abnormally in pianists with focal hand dystonia (Furuya et al., 2015; Furuya, Uehara, Sakamoto, & Hanakawa, 2018) and thereby skillful finger movements are impaired.
One pedagogical implication for practicing and teaching the piano can be to encourage pianists to lift their fingers quickly immediately after a key reaches its bottom to accomplish both speed and accuracy in virtuosic piano performance. When pianists are challenged to play faster or more accurately, there are many candidate skills to be taken into consideration. These include spatiotemporal features of movements of the keys and their attributes, such as movements, posture, and muscular activities of the fingers, arm, and trunk (Furuya & Altenmüller, 2013b). Identifying a small set of skills relevant to skillful piano performance is therefore useful to optimize piano practice and teaching efficiently. In the present study, a combination of the novel sensing system and machine learning analyses successfully identified only a few features of key movements, most of which were intuitively irrelevant to task performance. In future studies, it will be essential to address the causal relationship of these features through an interventional experiment to further identify crucial motor skills necessary for optimizing the musical performance of expert pianists.
From a technological point of view, the advantage of the present sensing system over various existing technologies is that it can retrofit the existing acoustic pianos. We have retrofitted the system into several grand pianos made by different companies (e.g., Kawai, Steinway, Yamaha), which took half an hour to complete, including the calibration process. To the best of our knowledge, the existing high-resolution sensing system should be built-in prior to shipment, which contrasts with our system. In addition, the spatiotemporal resolution of our sensing system is as high as that of other built-in sensing systems, such as Disklavier and SPIRIO.
ACKNOWLEDGMENT
We thank Mr. Tomohiro Saito for supporting the development of a prototype of the sensing system and Mr. Hayato Nishioka for enhancing the system.
Footnotes
Conflict of Interest: We have no conflicts of interest to declare.