Reliability metrics of outcome measures during the single and double leg drop jump tests

Although a number of previous studies have studied reliability metrics of outcome measures during single and double leg drop jump tests, none of them presented detailed reliability information for both tests with a group of collegiate athletes of both genders. The aim of this study was therefore to assess the reliability of outcome measures during the single and double leg drop jump tests with proper reliability metrics. Seventeen handball players (11 male and 6 female) participated in the experiments. Each player performed three double leg and three single drop jumps from a 30 cm height box onto a portable force plate in two sessions a week apart. Sixteen outcome measures, including jump height (JH), ground contact time (GCT), reactive strength index (RSI), were calculated. Four groups of reliability metrics like absolute agreement and consistency intraclass correlation coefficients (inter-day and intra-day), standard error of measurement, coefficient of variation, and minimal metrically detectable change were estimated for reliability assessment. The outcome measures as RSI and its individual components JH and GCT together with normalized vertical stiffness (Kvert) yielded high inter-day and intra-day intraclass correlation coefficients and low standard error of measurement and coefficient of variation levels for the double leg drop jump test. The more challenging single leg drop jump test could also be considered reproducible with some other highly reliable outcome measure set by keeping JH, GCT, and normalized Kvert and replacing vertical jump impulse with RSI. The results of the current study therefore suggested the drop jump tests could be deemed reliable to be used for short-term and long-term monitoring needs for a group of collegiate athletes of both genders.


Introduction
Muscle exercises are classified primarily into static and dynamic types (Knuttgen ve Komi, 2003).
Plyometrics is a dynamic type of exercise included in training programs to improve speed, endurance, reactive strength, and explosive power of athletes (e.g., Chu and Myer, 2013;Slimani et al., 2016).
Plyometric exercise comprises an eccentric stretch of the muscle tendon unit (MTU) immediately followed by a concentric shortening of the MTU. This combination of MTU undergoing active stretching followed by rapid shortening during the stretch-shortening cycle (SSC) is the foundation of plyometric exercise which seeks to exploit the force potentiating capabilities of the SSC (Norman and Komi, 1979). Different types of jumps such as tuck jump, single leg hop, and depth or drop jump are commonly used in plyometric exercise programs to improve vertical jumping ability (Chu and Meyer, 2013).
One of the most popular vertical jump drills in plyometric training programs is drop jump (DJ) (Bobbert, 1990). DJ consists of standing on a box, stepping off and dropping onto the ground, upon landing (eccentric stretching or braking phase), immediately performing a vertical jump (concentric shortening or propulsion phase) (Komi, 2003). One of the proposed mechanisms responsible for the force potentiation observed in SSC is the recovery of elastic potential energy (e.g. Alexander, 1987;Blazevich, 2011). The elastic potential energy is stored in the MTU during the eccentric stretch phase and that stored elastic energy is released during the concentric shortening phase, and MTU loading is the dominant factor that influences energy storage (Blazevich, 2011). Baxter and his colleagues (2021) evaluated Achilles tendon loading profiles of various exercises and proposed that single and double leg drop jumps are tier 3 and 4 (the highest tier) respectively. This research indicated that single and double leg drop jumps could be assessed separately as loading profiles are different, yet little research has been done on single leg DJ compared to double leg DJ.
Several previous studies examined double leg DJ and compared it with other vertical jumps using biomechanical instrumentation such as electromyography systems, motion capture cameras, and force plates (e.g. Bobbert et al., 1987;Ishikawa et al., 2006;Healy et al., 2014). In one of the first studies on the subject, Komi and Bosco (1978) studied the vertical jump performance of men and women for three types of jumps as squat jump, countermovement jump, and drop jump. In the many following studies, vertical jump performance has been studied extensively from the point view of biomechanical performance, neuromuscular control, and SSC function variables using total body biomechanics (e.g. Harry et al., 2021;McMahon et al., 2021). In order to do so, several outcome measures, such as jump height, ground contact time, reactive strength index have been calculated from the force-time curve variables recorded by a force plate during vertical jumps (McMahon et al., 2021). Those outcome measures have the potential to be useful for profiling and benchmarking, injury prevention, and rehabilitation and return to sport perspectives, and to be used to evaluate effectiveness and impact of vertical jump drills in plyometric training programs, if obtained in a reproducible way.
To be capable of identifying meaningful change for both short-term and long-term monitoring needs, reliable and repeatable test protocols are critical to creating a performance evaluation system (French et al., 2022). Sport scientists who evaluate performance have to make sure that the procedures they follow do not involve errors that are larger than the differences they are supposed to assess and also the outcome measures are reliable to count them on (Kibele, 1998). Thus, the jump data obtained from tests must be reproducible and consistent, in other words, the following measurements under the very same conditions should yield virtually the same results as the first measurement. This reproducibility is usually quantified using reliability metrics by implementing a test-retest method in which the first data set obtained in the first measurements is compared with the data obtained from the same subjects in the following measurements at some other time under the very same conditions (Vincent and Weir, 2020).
Such reliability metrics as intraclass correlation coefficient (ICC) and standard error of measurement are reported frequently to quantify test-retest reliability in the exercise and sport sciences (Weir, 2005).
The reliability evidence, however, often lacks clarity (Morrow and Jackson, 1993). The methodology used to obtain outcome measures and reliability metrics might not be defined in detail in the published studies (Liljequist et al., 2019). Also, studies with the purpose of establishing the reliability of tests should be representative of the defined population in terms of age and gender (Morrow and Jackson, 1993). However, as noted in a very recent study (Harry et al., 2021), most of the generated data on vertical jumps were stemming from male-only studies. Actually, this situation is not limited to studies on jumping, since it has been demonstrated quantitatively in the article "Invisible Sportswomen": The Sex Data Gap in Sport and Exercise Science Research (Cowley et al., 2021). The authors examined the gender distribution of approximately 12 million participants in 5261 articles published in the field of sports and exercise sciences, 63% of the publications presented data on both male and female participants, 31% were only men, and 6% only women (approximately 8.2 million male and 4.2 million female participants in total). As the results of the study (Cowley et al., 2021) clearly demonstrated, women are significantly underrepresented in sport and exercise science research.
Despite the scarcity of data on woman, scientific papers, technical notes, and new technological improvements are piling up about measurement, training, and analysis methods for vertical jumps (e.g. Comfort et al., 2019 and reactive strength index were to be highly reliable from trial to trial evidenced by very high ICC values. Conversely, TTS was deemed not reliable from trial to trial, as evidenced by moderate to low ICC values. However, arm position was not controlled throughout the jumping and landing movements which could be a reason for that result. Also, the gender of the subjects was not specified explicitly despite the fact that research on vertical jumps have suggested there are several differences between male and female subjects (e.g. Komi and Bosco, 1978;Laffaye et al., 2010). In a more recent study, Byrne and his colleagues (2017) investigated the inter-day reliability of the reactive strength index and optimal drop height using an optical system for measurements. The subjects were 19 male hurling players. The reactive strength index and optimal drop height measures yielded high reliability with ICC values of 0.87 and 0.80 as well as coefficient of variation values of 4.20% and 2.98% respectively.
However, this study was also performed with only male subjects. In addition to that an optical not a force plate system was used for the measurements. Moreover, the details of ICC computations and models were not reported in detail like many other studies (Liljequist et al., 2019).
As vertical jumping is regarded as a central and fundamental element of many sports, vertical jumping ability is one of the factors that determine performance in a variety of sports (Bobbert, 1990;Aragon-Vargas, 2000). Handball, a contact team sport which includes high-paced and short-term activities such as jumping, is not an exception to that. Due to the nature of handball, the players' biomechanical performance, neuromuscular control, and SSC function were emphasized. Therefore, monitoring variables related with those aspects by means of outcome measures of DJ tests might be beneficial.
Given also the resemblance of single leg jumps to some fundamental movements in handball suggests that studying single leg DJ as well as double leg DJ may be worthwhile.
This experimental study will test the hypothesis that outcome measures of single and double leg DJ tests such as jump height, ground contact time, reactive strength index calculated from a group of male and female handball players using a portable force plate would yield reliable measurements from trial to trial and day to day. A repeated measures experimental design was implemented with subjects performing three single leg and three double leg DJs from a fixed height in a non-fatigued state. The very same experimental measurements were repeated after one week. The inter-day and intra-day reliability of each of the outcome measures will be statistically assessed with reliability metrics as ICC, standard error of measurement, coefficient of variation, and minimal metrically detectable change. Ee shared all data and code repository on Open Science Framework. Hence, anyone interested could be able to repeat the analysis and change any procedure in the methodology to assess possible effects of them on the reliability metrics and outcome measures.

Subjects
Seventeen voluntary participants (11 male and 6 female) took part in the experiments (age: 20.9±2.5 years, height: 176.8±8.3 cm, weight: 71.6±14.3 kg) with no neuromusculoskeletal problem that would affect their jumping performance at the time of measurements. The subjects were the members of the college handball team. The study took place at METU, Ankara, Turkey. All subjects have experience in plyometric exercise but did not perform such exercises at the time of the study. Subjects had performed no strength training in the 48 hours before data collection. All subjects gave informed consent to participate in the study. This specific study was reviewed and approved by Noninterventional Clinical Researches Ethics Board of Hacettepe University (No: GO 17/902).
A statistical power analysis was performed a priori for sample size estimation. With a correlation coefficient, r = 0.80 and power = 0.90, the projected sample size needed was 11 subjects (Portney and Watkins, 2015). We recruited 17

Instrumentation
During DJs, the ground reaction force in the vertical direction (VGRF) were registered at 2000 Hz with a portable single force plate (Kistler 9260AA, Switzerland)) for the sample period. The computer software interface was Kistler's own software MARS. Each raw data of each trial was stored on the same Pentium-powered laptop and then exported as comma-separated values (csv) file to MATLAB environment for calculations of outcome measures and reliability metrics.

Procedures
Before testing, a week earlier, the athletes completed a familiarization session in which they had opportunity to practice single and double DJs. Then, the recorded DJ tests were administered by the same raters in two sessions a week apart at the same time of day, at the same indoor venue where the collegiate athletes train, on a level sports court with a synthetic hard floor. Before each measurement session, the athletes were re-informed about the procedures. The participants wore the same shoes and similar clothing for the tests, and were to avoid drinking (other than water), eating, and exercising in the hour before testing. DJ tests were assessed in two different conditions as double and single leg; first three trials of double leg, then three trials of single leg to provide the participants with a gradual increase in neuromuscular stress (Lloyd et al., 2009).
Before the measurements, the subjects were then given verbal description and visual demo of single and double leg DJs. The instruction set was: (i) step forward off the box of 30 cm without stepping down and drop onto the force plate, (ii) upon landing perform a rebound jump as high as possible and as quickly as possible, think the force plate as a hot plate (iii) on touch-down stick your landing and to stabilize as quickly as possible while facing straight ahead and remain motionless by simply looking at a cross sign on the wall approximately 4 m in front of you, (iv) keep your hands on the hips throughout the jump, (v) do not tuck your legs in the air (Dalleau et al., 2004;Flanagan et al., 2008;Byrne et al., 2017). In both DJ tests, the participants stepped forward off the box with their preferred foot which was established in the familiarization session and remained the same in the following sessions (Maloney et al., 2016).
The warm-up procedure consisted of five minutes of jogging, followed by dynamic stretching of each major muscle group of the lower limbs (Flanagan et al., 2008). After warm-up, the participants were provided with the opportunity to practice both jumps. Then the participants were allowed for five minutes of rest before the DJ tests. The participants performed six maximal DJs (three double and three single leg). The rest period was approximately one minute between the jumps. If the participant could not stabilize in single leg DJ, that trial was repeated. Data was successfully registered for all participants for both jumps.

Data Analysis
The data analysis had two stages: (i) detection of key time points such as landing and touch-down instants, (ii) calculation of outcome measures of DJs. Both single and double leg DJ performance of the participants were analyzed with the exact same code based on the methods described by Baca (1999) . Fig 1 showed an example of VGRF record for a double leg DJ. Fig 1 also indicated key time points of the jump. We used the same numbering order as in Baca (1999). Specifically, t2 was identified as the instant of landing onto the force plate by locating the data point where VGRF value first exceed 10 N threshold (Lloyd et al., 2009;Castilla et al., 2021) after starting recording.
Similarly, t5 was identified as the instant of touch-down on the force plate by locating the data point where VGRF value first exceed 10 N threshold after the maximal vertical jump. The take-off instant, t4, was located as the data point where the VGRF value first drops below 10 N threshold after the t2 instant. The t6 is the last point of force record at which the participant stands still on the force plate.
The vertical velocity of total body center of mass (BCOM) at the instant of landing (t2) was estimated by numerically integrating the vertical force minus body weight (BW = m*g where m is the body mass and g is the gravitational acceleration, 9.80665 m/s 2 ) from t6 to t2 using the trapezoid method (Baca, 1999).
The t3 is the instant where the vertical velocity of BCOM is zero, and located with solving the following equation below numerically (Baca, 1999). below, from t2 to t3 is the eccentric or braking phase, from t3 to t4 concentric or propulsion phase.
The t3 instant divides ground contact phase (from t2 to t4) into two as eccentric or braking phase (from t2 to t3) and concentric or propulsion phase (from t3 to t4) (Baca 1999;Komi 2003). The t1 instant was when participant take-off from the box, yet we did not locate that instant as Baca (1999) used a second force plate for that purpose.
After locating the key time points, we calculated several outcome measures of DJs. Kinetic analysis of vertical jumping by using force plates enables extensive assessment of neuromuscular function in eccentric (braking) and concentric (propulsion) phases of SSC separately and jointly, allowing researcher and practitioners to go beyond testing only simple measure of jump height (Bishop et al., 2021). Recently, Bishop and his colleagues (2021)  From the force-time curves obtained from the force plate, we calculated the following outcome measures.
 Jump height (JH in cm): JH was the peak displacement attained by the BCOM during the flight phase (from t4 to t5 in Fig 1) and calculated using the flight time in the air (JH-TIA) and the vertical velocity of BCOM the take-off instant (JH-TOV) (Linthorne 2001;Moir, 2008).
 Ground contact time (GCT in s): GCT was the time spent on the ground after landing until take-off (from t2 to t4 in Fig 1) and used to classify SSC as fast (100-250 ms) and slow (>250 ms) (Schmidtbleicher, 1992).
 Reactive strength index (RSI in cm/s): RSI has been used to quantify SSC performance and calculated as the ratio of JH to GCT (Flanagan and Comyns, 2008). (from t2 to t4 in Fig 1) with integration constants for velocity as V(t=t2) = V L and for position as Y(t=t2) = 0 (arbitrarily) (Baca, 1999).
 Rate force development (RFD in BW/s): RFD has been used to assess explosive strength of athletes and proposed to be determined by the capacity to exert maximal voluntary effort in the early phase of an explosive movement (Maffiuletti et al., 2016). RFD was calculated as the peak value of the time derivate of the filtered vertical force-time curve in the braking phase (from t2 to t3 in Fig 1). The vertical force signal was only filtered for RFD analysis using a second order low-pass zero-lag Butterworth filter with a 5 Hz cut-off frequency (Sarabon et al., 2020).

 Vertical jump impulse (VJI in BW*s): The vertical impulse applied during the jump is equal
to the momentum change (momentum of an object is equal to the mass of the object times the velocity of it). The impulse-momentum relation requires that impulse or total VGRF applied to the BCOM is directly proportional to the change in the vertical velocity (Cleather, 2021).
VJI has been used to assess performance (Cleather, 2021) and deficits before return to sport after injury (Costley et al., 2021). VJI was calculated by integrating the vertical force-time curve during the contact phase of jump (from t2 to t4 in Fig 1) and dividing the result by the BW.

 Normalized vertical stiffness (Normalized Kvert in N/m/kg): Normalized Kvert has been
linked with injury risk and athletic performance (Butler et al., 2003) and was calculated by dividing the absolute Kvert value by the body mass (Maloney et al., 2016). The logic behind this normalization is that vertical stiffness has been suggested to be affected by body size (Farley et al., 1993). . In this particular study, a method called as sequential estimation has been used (Colby et al., 1999). Specifically, sequential average is calculated as a cumulative average of the VGRF data points after touch-down (t5 in Fig 1) by successively adding one data point at a time, hence after the first point, the average of the first 2 data points was calculated, then the average of the first 3 data points was calculated, and so on. VGRF was considered to be stable and VTSS was located when the sequential average remained within one quarter standard deviation of the VGRF in the interval three seconds after the touch-down (Colby et al., 1999).

Statistical analysis
For the day-to-day or test-retest or inter-day reliability analysis, the mean values of three trials of the DJ outcome measures in single and double leg tests were used. We performed a paired t-test on the difference of mean values of outcome measures obtained at test and retest sessions to verify absence of systematic bias (Atkinson and Nevill, 1998). Alpha level was set at 0.05 level for all statistical analyses. For the trial-to-trial or intra-day reliability analysis, three trials of the DJ outcome measures from the retest session were used only.
To estimate relative reliability, we used two-way random effects model of ICC, which essentially compares the within-subject variability of the measurement data with the between-subject variability of the measurement data. (Shrout and Fleiss, 1979;McGraw and Wong 1996). The test-retest and trial-to-trial correlations were quantified with two forms of ICC, namely, absolute agreement ICC(A,k) and consistency ICC(C,k) (McGraw and Wong, 1996;Liljequist et al., 2019). For each ICC value, the lower and upper bounds of 95% confidence interval (CI) were also calculated by using the equations presented at McGraw and Wong (1996). The ICC is a real number usually between 0 and 1 (Liljequist et al., 2019). According to Munro's classification, the strength of correlation coefficients was interpreted as follows to describe the degree of relative reliability: 0.  (Corriveau et al., 2000, Salavati et al., 2009. Also, the coefficient of variation (CV), basically the ratio of the standard deviation (SD) to the mean of the data, was calculated to assess absolute reliability of the outcome measures. To do so, the mean CV value from the individual CV values was computed (Atkinson and Nevill, 1998). For the trial-to-trial CV calculations, the second and third trials have been taken into calculations. The following equations were used to compute the CV metric (n=number of subjects, x 1 and x 2 are measurements for the ith subject (Shechtman, 2013)).  -TIA, JH-TOV, GCT, BPT, PPT, MPF, PCOMD, VJI, absolute Kvert, and normalized Kvert, ranged from 0.70 to 0.89, which could be considered high level of reliability. Also, the inter-day ICC values of five outcome measures, like RSI, MBF, RFD, PPP, and VTTS, ranged from 0.50 to 0.69, which could be considered moderate level of reliability.

Results
Lastly, PGRF was the only outcome measure that have low level of inter-day reliability. In terms of absolute reliability, the highest CV level was observed for VJI as 2.82%.
For double leg DJ tests (Table 3) In terms of intra-day ICC and double leg condition (     should not be assumed to be always superior for the double leg DJ than the single leg DJ which is more challenging in terms of Achilles tendon loading. The inter-day and intra-day ICC values of JH calculated by two methods using the flight time in the air (JH-TIA) and the vertical velocity of BCOM at the take-off instant (JH-TOV) were almost identical. When a jump mat is used for measuring jumping performance, the flight time method is used to calculate JH. However, when a force plate is used, either method could be employed in the calculations. Moir (2008) suggested that both methods are valid and consistent, however, when a force platform is available, practitioners or experimenters should use the TOV method for countermovement jump. We wanted to see whether there is a large difference in terms of reliability metrics hence it would be possible to favor one method to the other for the DJ tests, yet, the results of the reliability analysis yielded very similar values for two methods.
Nevertheless, the CV values were slightly lower for the JH-TIA, therefore, time in the air method might be favored for a little better absolute reliability.
Among the outcome measures of single and double leg DJ tests, VTTS had the lowest or the second lowest inter-day and intraday ICC values. VTTS, or time to stabilization measures for any direction have been offered to analyze dynamic postural stability (Ross and Guskiewicz, 2008). Assessing dynamic stability using dynamic tests such as single and double leg DJ would especially be critical to detect postural instabilities in athletes (Ross and Guskiewicz, 2008). For that purpose, many methods had been developed to estimate time to stabilization measures (Fransz et al., 2015). Previously, Flanagan et al. (2008) assessed reliability of VTSS using a method in which VTSS was located as the first instance when VGRF reached and stayed within a threshold for one second, and the analysis yielded an intra-day ICC value that could be considered as moderate level of reliability. To estimate VTTS, we used sequential estimation method which was shown to be able to identify postural instabilities (Colby et al., 1999). For sequential estimation method, the results of current study with collegiate athletes demonstrated ICC values which could be considered moderate to high level of reliability and rather low CV values (CV < 10%).
Certain limitations affected our study. In this study, we studied 16 outcome measures, yet there is possibility to extract more information from DJ test, an instrumental jump test which has the potential to generate reliable and rich data sets to study biomechanical performance, neuromuscular control, and SSC function. Future studies may consider adding more outcome measures, possibly combining data from other sources as electromyography, three-dimensional motion capture, and modeling and simulation. In addition, it is possible that some choices on detecting key-time points might affect the reliability and magnitude of outcome measures. A recent study showed that changing threshold values to detect take-off instance influences reliability and magnitude of countermovement jump outcome measures (Perez-Castilla et al., 2021). Additional reliability studies may study the same issue for the DJ test.
In conclusion, the suggested DJ outcome measures (Bishop et al., 2021) as RSI and its individual components JH and GCT as well as normalized Kvert yielded high inter-day and intra-day ICC and low SEM and CV levels for the double leg DJ test. Those results thus suggest that the double leg DJ test could be used to assess jumping performance by using those above-mentioned outcome measures in collegiate handball players. The more challenging single leg DJ test could also be used in assessments with some other four reliable outcome measures like JH, GCT, normalized Kvert, and VJI. In conclusion, DJ tests could be deemed reliable to be used for shortterm and long-term monitoring from the point view of biomechanical performance, neuromuscular control, and SSC function.