From treadmill to trails: predicting performance of runners

Previous laboratory studies have measured the energetic costs to humans of running at uphill and downhill slopes on a treadmill. This work investigates the extension of those results to the prediction of relative performance of athletes running on flat, hilly, or very mountainous outdoor courses. Publicly available race results in the Los Angeles area provided a set of 109,000 times, with 2200 runners participating in more than one race, so that their times could be compared under different conditions. I compare with the results of a traditional model in which the only parameters considered are total distance and elevation gain. Both the treadmill-based model and the gain-based model have some shortcomings, leading to the creation of a hybrid model that combines the best features of each. Author summary Running a race on a road allows absolute measures of performance. Trail running, however, has traditionally been thought of as a sport in which the only valid comparison is between different runners competing on the same course on the same day. Even the exact measurement of distance is considered to be unimportant, since courses and conditions vary so much. An extreme example is the relatively new genre of “vertical” races, in which runners race up a mountain. In a typical example, the competitors cover a horizontal distance of 5 km, while climbing about 1000 m. The winner in one such race had a time almost triple that expected for a state-champion high school runner in a 5k road race. Clearly no comparison can be made here without taking into account the amount of climbing. In noncompetitive contexts, many runners venture onto mountain trails, lightly dressed and with little equipment, so that it becomes important to be able to anticipate whether they will have the endurance needed to be able to safely complete a planned route. Again, this is impossible without some model of the effect of hill climbing.


27
This paper presents a method for predicting relative performance on trail runs -28 "relative" meaning that we can predict the time for course A divided by the time for  Traditionally, runners and hikers have described a trail using two numbers, the 31 horizontal distance and the total elevation gain. For example, if the route is an 32 out-and-back voyage consisting of steady climbing to a peak and a return, then the total 33 elevation gain is simply the elevation of the peak minus the elevation of the trailhead. If 34 the elevation profile of the trip consists of multiple clearly defined ascents and descents, 35 then one adds up the ascents. Although this two-parameter description of the route is 36 easy to derive from a paper topographic map, knowledge of the two numbers is not 37 sufficient to make a very useful estimate of the total energy expenditure. 38 It has been known for a long time among the officials who measure road races that 39 the effect of elevation change has a nonlinear dependence on the grade. The following 40 argument was advocated by R. Baumel. [1] Consider a closed course whose elevation 41 profile is described by some function y(x). The derivative y ′ is the trail's slope i. The 42 total energy expenditure is an integrated effect of the slope, of the form L 0 C(i)dx, 43 where C is a function that describes the energetic cost of running up or down a hill. We 44 will see that C has been measured in laboratory experiments, but for the moment we 45 assume only that C is a smooth function, so that for small slopes it can be well 46 approximated by the first few terms of its Taylor series, C(i) ≈ c 0 + c 1 i + c 2 i 2 . Then for 47 any closed loop over a distance L, the contribution from the c 1 term vanishes, and the 48 energy cost is c 0 L + c 2 L 0 i 2 dx. The dependence on the slope is therefore quadratic 49 rather than linear. For example, if we were to exaggerate the elevation profile by a 50 factor of 2, y → 2y, then the size of the c 2 term would go up by a factor of four, not two 51 (in the low-slope limit, on a closed course). From conversations with runners and hikers, I have found that the result of Baumel's 53 argument almost always elicits total disbelief, especially when presented as a numerical 54 example showing the extreme smallness of the slope effect when the slope is small. One 55 of the goals of this paper is to test this empirically. As an alternative hypothesis, it is 56 commonly believed that one can get a good measure of the relative energy cost by 57 taking the horizontal distance and adding in a term proportional to the total elevation 58 gain. If the total gain is determined down to a fine enough scale (which with modern 59 technology has become more practical), then this hypothesis is equivalent to the 60 assumption that the cost of running is given by a function of the form whose graph is shaped like a hockey stick (dashed line in Fig 1). distance, which usually differs negligibly from the increment of horizontal distance dℓ.

72
C has units of J/kg · m. The correctness of the factor of 1/m has empirical support.
[10] 73 Efficiency varies by ∼ 25% even among elite athletes, [8] [7] and differences are also 74 to be expected between elite and recreational athletes. This is one of the reasons why 75 this study presents a comparative technique, rather than an absolute method for 76 determining a particular runner's actual energy expenditure in units of kilocalories.

77
The function C(i), shown as the solid line in Fig 1, resembles  To test these models, I use publicly available race results from the Los Angeles area.

87
This area has a large population and tall mountains. The large population makes it 88 possible to pick out a significant number of runners who have competed in several 89 different races. If the ratio of the runner's time on courses 1 and 2 is t 2 /t 1 , then we take 90 this as a measure of the ratio E 2 /E 1 of the energy expenditure, which can be compared 91 with the model. It was possible to find courses with a variety of elevation profiles, 92 allowing a test of the dependence of the predictions on the amount of hill climbing. The 93 proportionality of time to energy is found to be good for level and uphill running, but 94 less valid for downhill running, [11] and indeed we will see that these observations seem 95 to hold for the data investigated here. Table 1 lists the races used as sources of data. One-letter mnemonics are defined so 97 that courses can be referred to succinctly in the text. Because a runner's performance 98 can change over time due to training and aging, the time period of the study was 99 restricted as much as possible to January 2017 through March 2020 (before the COVID 100 epidemic ended races other than virtual ones in California). Distance and elevation data 101 were analyzed as described in Appendix 3.

102
Runners' names and times were obtained by web-scraping public race results, and 103 runners were assumed to be the same person if their first and last names matched.

104
When a runner ran the same race more than once, their best time was used. To avoid 105 biases in comparisons of times in different races, it is necessary here to define an upper 106 limit on the times that will be used from a given race, and to do so in some consistent 107 and unbiased way. Some such limit is in any case defined by race organizers, but is 108 different for different races and usually quite long, often about 4-5 hours for a 109 half-marathon. Competitors who clock the longer times are generally either walking the 110 entire race or alternating between walking and jogging, and especially in more casual races may be pushing a stroller, running alongside their tween-age child, or staying in a 112 costumed group for fun and emotional support. Because the physiological data and 113 models used in this work are not applicable to walking, I impose a somewhat arbitrary 114 time limit of 2.5 hours on half-marathon times. These limits, as well as others, where 115 imposed, are described in the notes in Table 1. For course S, the time limit was derived 116 by scaling down the half-marathon time limit in proportion to the distance. The other 117 courses in this study are of a qualitatively different character, so for them I simply used 118 the race organizers' cut-off. The resulting bias is an inherent limitation of this work.

119
Exertion depends most strongly on distance, and the goal of this work is to tease out 120 effects from other factors, which are often weaker. For this reason, distance is a 121 confounding variable in this study and has been controlled for as much as possible by 122 using races at a consistent distance, the half marathon (21.1 km), or distances that,  In table 1, two measures of hilliness are given. The total elevation gain is the only 128 parameter needed in order to calculate an energy expenditure using the function C g .

129
The next column gives a statistic I will refer to as the climb factor, CF, which is defined 130 as the fraction of the runner's total energy expenditure that is devoted to climbing.

131
That is, if E is the actual energy required for the course, and E 0 the energy that would 132 have been required if the race had been perfectly flat, then Inverting this equation known, a measure of effort can be found by dividing the distance by 1 − CF .

135
To define quantitative tests of the models, consider a comparison of courses 1 and 2. 136 The observed data are the runner's times t 1 and t 2 , and the model predicts the ratio of 137 the energy consumption For small errors, E is approximately the relative error in the prediction, expressed as a 139 percentage. The use of the logarithm transforms multiplicative sources of error into 140 additive quantities.

141
August 21, 2023 4/13 We pick a feature of the model that is to be tested. For example, we would like to 142 see whether the model does a good job of predicting the relative times for flat races 143 compared to steep uphill-only races (Fig 2, c). For this example, we make a list of 144 courses that are relatively flat (P, C, H, and I), and a list of some that are steep 145 uphill-only courses (B and V). We then find every case where the same runner did a run 146 j from the first list and a run k from the second, and compute the error E jk , which will 147 be positive if the runner's time in the uphill race k is overpredicted by the model 148 relative to their time in the flat race j. 149 3 Results

150
Fig 2 shows a comparison of the quality of the predictions of the functions C g and C t as 151 descriptions of the effects of going up and down hills. The third model C r is a hybrid of 152 these, introduced in section 4.2. All predictions were corrected for distance as described 153 in section 5. Each of the four sub-figures a through d has been constructed so as to test 154 a particular feature of these models. We discuss each in turn. about twice as much. According to the treadmill-based model C t , the effects of climbing 158 and descending nearly cancel out, giving a negligible CF < 1% for each run, as 159 expected from Baumel's argument. In the gain-based model C g , however, the effect of 160 the hills on course P is 6 times its elevation gain, which is equivalent to adding 1.0 km 161 to its length. The effect for I would be half as much, causing the model to predict a 162 considerable difference in the times on the two courses. In the figure we see that 163 Baumel's approximation is a good one here. The median error for C t (open circles) is 164 only 1.7%, while that for C g (filled circles) is +6.4%, the positive sign showing that the 165 effect of the small hills is over-predicted.

166
Of the four tests a-d, this is the only one where the effect being probed is small 167 enough to require statistical analysis rather than simple visual inspection. Such an 168 analysis (Appendix 4) show that systematic error in C g is significant (p = 3 × 10 −6 ), 169 while any such evidence against C t is statistically marginal. runners' times are only slightly lower. We see that both models greatly underpredict the 174 runners' times in the mountain race. Most of the running in this race is on slopes with 175 |i| ≈ 0.10 to 0.15. A likely interpretation is that on the uphills, C g is an underestimate 176 (see c, below), while on the downhills C t is an underestimate. The race is run on a trail 177 that is mostly a narrow single track, with steep hillsides on the climber's right. Safety is 178 likely to inhibit many runners from going downhill at anything like the pace that would 179 be possible for the elite mountain runners in ref.
[8] on a treadmill, and trail etiquette 180 dictates that they yield the right of way when encountering people who are still on their 181 way up.    Table 2 so as to shift the minimum of the function up and to the right. This by varying its parameters, to dramatically modify the function's behavior for 228 −0.06 ≲ i ≲ −0.03 while retaining its apparently correct behavior at −0.03 ≲ i ≲ 0. A 229 more successful ad hoc recipe was simply to introduce a cut-off in C, i.e., to define a 230 "recreational" version of the function, where i 0 = −0.03. In other words, we simply chop the bottom off of the curve of C t , at 232 the dotted line in Fig 3. 233 The results of the hybrid model C r are shown as gray circles in Fig 2 and   It is convenient to describe the function C(i) using a fit to the form where the subscript t stands for treadmill. Parameters fitted to the results of ref.
[8] are 249 given in Table 2. The purpose of using this form, rather than the polynomial fit given 250 by [8], is to make the computations degrade gracefully in cases where the limitations of 251 GPS tracks or data from digital elevation models produce unrealistically steep slopes.

252
In such cases, this expression approaches the physiologically expected asymptotic 256 measured the energy consumption at the speed that was found to be most efficient for 257 that particular subject.

258
Appendix 2. Analytic approximation to world-record speeds 259 Cameron [2] has given a convenient closed-form approximation to world-record speeds of 260 runners at various distances, This is shown as the red curve in figure 4. The parameters are given in Table 3 262 Table 2. Parameters for Eq (5). These parameters were found by constraining Eq 5 to agree with the polynomial fits in ref.
[8] on the following degrees of freedom: the function is minimized at the same i, and has the same value of C there; the functions agree at i = 0. Furthermore, the slopes at ±∞ were constrained to have the asymptotic values found in that work.  The use of these data is inherently subject to certain errors, which need to be 272 minimized. Trails and roads are intentionally constructed so as not to go up and down 273 steep hills, but the DEM may not accurately reflect this. The most common situation 274 seems to be one in which a trail or road takes a detour into a narrow gully in order to 275 maintain a steady grade. If the gully is narrower than the horizontal resolution of the 276 DEM, then the DEM doesn't know about the the gully, and the detour appears to be a 277 steep excursion up and then back down the prevailing slope.

278
Empirically, I have found that sensitivity to these effects can be minimized if the 279 elevation profile of the run y(x) is filtered by convolving it with a rectangular 280 windowing function having width w = 200 meters. This tends to eliminate unrealistic 281 glitches in the elevation data, and also seems to give a fairly close reproduction of race 282 organizers' estimates of total elevation gain. This choice of w gives sane results for 283 routes in mountainous terrain, and is used throughout this work, even for flat courses 284 on city streets. For a course that is relatively flat and has many small, short hills, 285 w ≈ 60 m gives more accurate results, but I have used the larger value of w throughout 286 this work in an effort to maintain consistency. sample courses, or |∆CF/CF interp | ≲ 0.05. However, with ∆x/w = 0.5, the use of 300 bilinear interpolation becomes crucial to obtaining good results, with the additive errors 301 rising to 0.01 to 0.03 and relative errors to as big as 0.2 on a flattish course.

302
The mileage derived from a GPS track can vary quite a bit depending on the 303 resolution of the GPS data. Higher resolution increases the mileage, because small 304 wiggles get counted in. This has a big effect on the energy calculation, because the 305 energy is mostly sensitive to mileage, not gain. For races that were advertised as 5k or 306 half-marathon races, I have therefore used the advertised distance, as shown in Table 1, 307 in order to calculate the first-order estimate of the energy, but have used the elevation 308 gain and CF value derived from the actual GNSS data.

309
Appendix 4: Statistical analysis 310 In section 3, test (a) probes an effect small enough that visual inspection of the scatter 311 plots is not a satisfactory way of testing hypotheses. Specifically, we want to know 312 whether the apparent systematic error in the model C g is statistically consistent with 313 zero.

314
We do not know a priori the underlying probaility distribution of the ratio of times or 315 of its logarithm E. One might have expected based on previous work [5] that the times 316 would be log-normal, in which case E would be normally distributed. However, a Q-Q 317 plot shows that this is not the case for the present data-set, and in fact the distribution 318 of E is asymmetric. The ratio of times, however, has a symmetric and leptokurtic 319 distribution. Its symmetry allows the use of the one-sample Wilcoxon test. For C g the 320 null hypothesis is rejected with p = 4 × 10 −6 , while for C t , p = 0.07. Thus the defect in 321 C g is significant, while any such evidence against C t is statistically marginal. distances, with the total energy expenditure being close to that of a flat half-marathon 330 road race. But the endurance required for these races does vary, and this makes it 331 desirable to have some rough method of compensating for the variation in pace with 332 distance. Here I describe a very simple model that has the following characteristics that 333 make it suitable for this study: (1) its two parameters are universal rather than fits to 334 the characteristics of an individual; (2) its dependence on the parameters is purely 335 multiplicative, i.e., varying the parameters only rescales the axes on a graph of speed 336 versus distance. The model is essentially a simplification of the one constructed by 337 Rapoport, [9] with modifications to suit these purposes.

338
First we compute an equivalent distance d, which is the distance of flat running that 339 would require the same energy expenditure as the actual run. If the runner's time is t, 340 then v = d/t has dimensions of speed, but is in fact a measure of energy per unit time, 341 or power. We then have where P is the power and ϵ is a measure of the runner's efficiency. For example, a 343 recreational runner with a slight roll of belly fat will have a lower value of ϵ because of 344 where the proportionality constant A is another per-individual parameter that it will be 353 possible to normalize away later. This expression's linearity in f is an approximation to 354 results from real-world data that provide evidence for slightly nonlinear behavior. [9] 355 The runner's supply of carbohydrates c is limited by the amount of glycogen that 356 can be stored in the liver and the leg muscles. If f is chosen optimally, then there will 357 be some distance d c = cϵ that can be run with pure carbohydrate fuel, while longer 358 distances will require f < 1. Thus, Under these assumptions, the runner's speed will be the same in races at all distances 360 less than d c , which is unrealistic. We will first work out the consequences of Eq 7-9 and 361 the introduce a simple elaboration that more realistically reproduces the effects of 362 fatigue.

363
Solving Eq 7-9 and expressing κ as a correction factor relative to the short-distance 364 maximum speed v m = Aϵ, we find This depends on the universal parameter β ≈ 0.4 and also on the critical distance d c .

366
The latter is a measure of endurance and does depend on individual factors such as 367 body composition and training, as well as on strategies such as carbohydrate loading.

368
However, for the sample of recreational athletes studied here, I hypothesize that one can 369 fix a universal value of d c lying somewhere around the half-marathon distance, and find 370 a reasonable description of real-world data.

371
It is not true in reality that runners can maintain the same pace at any of the 372 distances below d c , for which glycogen suffices. As the distance increases from 5 km to 373 the half-marathon distace of 21 km, one observes a decrease in speed which, as 374 originally observed by Hill, [6] appears linear on a graph of speed versus the logarithm 375 of distance. In the men's and women's world-record times, this decrease is about 5%.

376
The graph then shows a knee, like the one described by Eq 10. The more gradual 377 decrease for distances before the knee is generically described as being due to fatigue, 378 which is a complicated and poorly understood phenomenon involving a variety of 379 factors, many of which are mediated by the central nervous system rather than by any 380 change at the chemical or tissue level. As an ad hoc correction, we multiply the result of 381 Eq 10 by a factor controlled by a small parameter Q: The factor of 3 is introduced so that Q is approximately equal to the reduction in speed 383 between a 5k and a half-marathon, and we set Q = 0.05.

Fig 4.
Relative speed versus equivalent distance d. All speeds are normalized relative to the speed at half-marathon distance. The black curve is the function defined by Eq 11, with d c set to a half-marathon distance. The red curve is a fit to world-record times. [2] The green and red violin plots show the distribution of speeds in races S and G relative to the same runners in half-marathon race P (sample sizes 1303 and 11, respectively). The gray dots are the author's personal-record times from a variety of courses. The equivalent distances were determined from the horizontal distances using the curvilinear function C t (i) in Eq 5, which is based on treadmill data.
Empirically, for the mostly recreational runners studied here, a reasonable 385 description of the data is achieved when d c is set to the half-marathon distance, which 386 is the value adopted in this work. Fig 4 shows that setting d c to half-marathon distance 387 gives a good fit to some real-world data.