## Abstract

Previous laboratory studies have measured the energetic costs to humans of running at uphill and downhill slopes on a treadmill. This work investigates the extension of those results to the prediction of relative performance of athletes running on flat, hilly, or very mountainous outdoor courses. Publicly available race results in the Los Angeles area provided a set of 109,000 times, with 2200 runners participating in more than one race, so that their times could be compared under different conditions. I compare with the results of a traditional model in which the only parameters considered are total distance and elevation gain. Both the treadmill-based model and the gain-based model have some shortcomings, leading to the creation of a hybrid model that combines the best features of each.

**Author summary** Running a race on a road allows absolute measures of performance. Trail running, however, has traditionally been thought of as a sport in which the only valid comparison is between different runners competing on the same course on the same day. Even the exact measurement of distance is considered to be unimportant, since courses and conditions vary so much.

An extreme example is the relatively new genre of “vertical” races, in which runners race up a mountain. In a typical example, the competitors cover a horizontal distance of 5 km, while climbing about 1000 m. The winner in one such race had a time almost triple that expected for a state-champion high school runner in a 5k road race. Clearly no comparison can be made here without taking into account the amount of climbing.

In noncompetitive contexts, many runners venture onto mountain trails, lightly dressed and with little equipment, so that it becomes important to be able to anticipate whether they will have the endurance needed to be able to safely complete a planned route. Again, this is impossible without some model of the effect of hill climbing.

## 1 Introduction

This paper presents a method for predicting relative performance on trail runs — “relative” meaning that we can predict the time for course A divided by the time for course B.

Traditionally, runners and hikers have described a trail using two numbers, the horizontal distance and the total elevation gain. For example, if the route is an out-and-back voyage consisting of steady climbing to a peak and a return, then the total elevation gain is simply the elevation of the peak minus the elevation of the trailhead. If the elevation profile of the trip consists of multiple clearly defined ascents and descents, then one adds up the ascents. Although this two-parameter description of the route is easy to derive from a paper topographic map, knowledge of the two numbers is not sufficient to make a very useful estimate of the total energy expenditure.

It has been known for a long time among the officials who measure road races that the effect of elevation change has a nonlinear dependence on the grade. The following argument was advocated by R. Baumel. [1] Consider a closed course whose elevation profile is described by some function *y*(*x*). The derivative *y*′ is the trail’s slope *i*. The total energy expenditure is an integrated effect of the slope, of the form , where *C* is a function that describes the energetic cost of running up or down a hill. We will see that *C* has been measured in laboratory experiments, but for the moment we assume only that *C* is a smooth function, so that for small slopes it can be well approximated by the first few terms of its Taylor series, *C*(*i*) *≈ c*_{0} + *c*_{1}*i* + *c*_{2}*i*^{2}. Then for any closed loop over a distance *L*, the contribution from the *c*_{1} term vanishes, and the energy cost is . The dependence on the slope is therefore quadratic rather than linear. For example, if we were to exaggerate the elevation profile by a factor of 2, *y→* 2*y*, then the size of the *c*_{2} term would go up by a factor of *four*, not two (in the low-slope limit, on a closed course).

From conversations with runners and hikers, I have found that the result of Baumel’s argument almost always elicits total disbelief, especially when presented as a numerical example showing the extreme smallness of the slope effect when the slope is small. One of the goals of this paper is to test this empirically. As an alternative hypothesis, it is commonly believed that one can get a good measure of the relative energy cost by taking the horizontal distance and adding in a term proportional to the total elevation gain. If the total gain is determined down to a fine enough scale (which with modern technology has become more practical), then this hypothesis is equivalent to the assumption that the cost of running is given by a function of the form
whose graph is shaped like a hockey stick (dashed line in Fig 1). Popularly proposed rules are that 100 m of elevation gain is equivalent to either 400 m or 800 m of horizontal distance, so that *c*_{g} is said to be approximately in the range from 4 to 8. There is nothing mathematically impossible about this hypothesis. A function *C*(*i*) of this form evades Baumel’s argument because its hockey-stick shape is not smooth at *i* = 0, and therefore cannot be approximated by its Taylor series.

In a more sophisticated approach, Minetti *et al*. [2] have used oxygen consumption to measure the energy expenditure of runners on a treadmill at slope *i*, for both running and walking. The results are expressed as *C* = (1*/m*)*dE/ds*, where *m* is the person’s body mass, *E* is the energy expended, and *ds* is the increment of three-dimensional distance, which usually differs negligibly from the increment of horizontal distance *dℓ. C* has units of J*/*kg· m. The correctness of the factor of 1*/m* has empirical support. [3]

Efficiency varies by *∼*25% even among elite athletes, [2] [4] and differences are also to be expected between elite and recreational athletes. This is one of the reasons why this study presents a comparative technique, rather than an absolute method for determining a particular runner’s actual energy expenditure in units of kilocalories.

The function *C*(*i*), shown as the solid line in Fig 1, resembles a hyperbola, with a minimum occurring at *i≈* −0.1 to − 0.2. The asymptotes at large positive and negative values of *i* are interpreted in [2] as being determined by the efficiency of eccentric and concentric muscle contraction. For the purposes of this work, a new analytic approximation to the curve found by ref. [2] is used (Appendix 1), and is referred to as *C*_{t}, where “t” stands for “treadmill.”

Nearly all real-world walking and running is done at− 0.2 ≲ *i* ≲ 0.2, where the graph of *C*(*i*) is nearly parabolic.

## 2 Methods

### 2.1 Analysis of publicly available race data

To test these models, I use publicly available race results from the Los Angeles area. This area has a large population and tall mountains. The large population makes it possible to pick out a significant number of runners who have competed in several different races. If the ratio of the runner’s time on courses 1 and 2 is *t*_{2}*/t*_{1}, then we take this as a measure of the ratio *E*_{2}*/E*_{1} of the energy expenditure, which can be compared with the model. It was possible to find courses with a variety of elevation profiles, allowing a test of the dependence of the predictions on the amount of hill climbing.

Table 1 lists the races used as sources of data. One-letter mnemonics are defined so that courses can be referred to succinctly in the text. Because a runner’s performance can change over time due to training and aging, the time period of the study was restricted as much as possible to January 2017 through March 2020 (before the COVID epidemic ended races other than virtual ones in California). Distance and elevation data were analyzed as described in Appendix 3.

Runners’ names and times were obtained by web-scraping public race results, and runners were assumed to be the same person if their first and last names matched. When a runner ran the same race more than once, their best time was used. To avoid biases in comparisons of times in different races, it is necessary here to define an upper limit on the times that will be used from a given race, and to do so in some consistent and unbiased way. Some such limit is in any case defined by race organizers, but is different for different races and usually quite long, often about 4-5 hours for a half-marathon. Competitors who clock the longer times are generally either walking the entire race or alternating between walking and jogging, and especially in more casual races may be pushing a stroller, running alongside their tween-age child, or staying in a costumed group for fun and emotional support. Because the physiological data and models used in this work are not applicable to walking, I impose a somewhat arbitrary time limit of 2.5 hours on half-marathon times. These limits, as well as others, where imposed, are described in the notes in Table 1. For course S, the time limit was derived by scaling down the half-marathon time limit in proportion to the distance. The other courses in this study are of a qualitatively different character, so for them I simply used the race organizers’ cut-off. The resulting bias is an inherent limitation of this work.

Exertion depends most strongly on distance, and the goal of this work is to tease out effects from other factors, which are often weaker. For this reason, distance is a confounding variable in this study and has been controlled for as much as possible by using races at a consistent distance, the half marathon (21.1 km), or distances that, taking extreme climbing into account, result in similar times. These are the distances at which the largest sample sizes are available for mountain trail races. Section 2.2 describes how the remaining inevitable variations in distance have been taken into account, as much as possible.

In table 1, two measures of hilliness are given. The total elevation gain is the only parameter needed in order to calculate an energy expenditure using the function *C*_{g}. The next column gives a statistic I will refer to as the climb factor, CF, which is defined as the fraction of the runner’s total energy expenditure that is devoted to climbing. That is, if *E* is the actual energy required for the course, and *E*_{0} the energy that would have been required if the race had been perfectly flat, then

Inverting this equation gives *E* = *E*_{0}*/*(1− *CF*), so that if the horizontal distance is known, a measure of effort can be found by dividing the distance by 1− *CF*.

To define quantitative tests of the models, consider a comparison of courses 1 and 2. The observed data are the runner’s times *t*_{1} and *t*_{2}, and the model predicts the ratio of the energy consumption *E*_{1}*/E*_{2}. Define

For small errors, *ℰ* is approximately the relative error in the prediction, expressed as a percentage. The use of the logarithm transforms multiplicative sources of error into additive quantities.

We pick a feature of the model that is to be tested. For example, we would like to see whether the model does a good job of predicting the relative times for flat races compared to steep uphill-only races (Fig 3, c). For this example, we make a list of courses that are relatively flat (P, C, H, and I), and a list of some that are steep uphill-only courses (B and V). We then find every case where the same runner did a run *j* from the first list and a run *k* from the second, and compute the error *ℰ*_{jk}, which will be positive if the runner’s time in the uphill race *k* is overpredicted by the model relative to their time in the flat race *j*.

### 2.2 Model of endurance

Animals run more slowly at long distances, and mathematical modeling of this fact dates back about a century. [6] Recent workers have described methods for fitting parameters to the data for individual runners, [7] [8] which for example allows a first-time marathon runner to estimate an appropriate pace. In the present work, there are not enough data available to allow this kind of individualized description of the runners. For this reason, I have concentrated on data from a narrow range of middle distances, with the total energy expenditure being close to that of a flat half-marathon road race. But the endurance required for these races does vary, and this makes it desirable to have some rough method of compensating for the variation in pace with distance. Here I describe a very simple model that has the following characteristics that make it suitable for this study: (1) its two parameters are universal rather than fits to the characteristics of an individual; (2) its dependence on the parameters is purely multiplicative, i.e., varying the parameters only rescales the axes on a graph of speed versus distance. The model is essentially a simplification of the one constructed by Rapoport, [7] with modifications to suit these purposes.

First we compute an equivalent distance *d*, which is the distance of flat running that would require the same energy expenditure as the actual run. If the runner’s time is *t*, then *v* = *d/t* has dimensions of speed, but is in fact a measure of energy per unit time, or power. We then have
where *P* is the power and *ϵ* is a measure of the runner’s efficiency. For example, a recreational runner with a slight roll of belly fat will have a lower value of *ϵ* because of the increased energetic cost of transporting the additional body weight. Although it would seem that we are now introducing an individualized parameter *ϵ*, the model is designed so that at the end of the calculation, cancellations occur that allow *κ* to be predicted on a universal basis.

The power *P* depends on aerobic fitness and on the proportions of fat and carbohydrates being burned in aerobic metabolism. Fat burning is slower than carbohydrate burning by a factor *β≈* 0.4. [7] If we let *f* be the fraction of energy production from carbohydrates, then
where the proportionality constant *A* is another per-individual parameter that it will be possible to normalize away later. This expression’s linearity in *f* is an approximation to results from real-world data that provide evidence for slightly nonlinear behavior. [7]

The runner’s supply of carbohydrates *c* is limited by the amount of glycogen that can be stored in the liver and the leg muscles. If *f* is chosen optimally, then there will be some distance *d*_{c} = *cϵ* that can be run with pure carbohydrate fuel, while longer distances will require *f <* 1. Thus,

Under these assumptions, the runner’s speed will be the same in races at all distances less than *d*_{c}, which is unrealistic. We will first work out the consequences of Eq 4-6 and the introduce a simple elaboration that more realistically reproduces the effects of fatigue.

Solving Eq 4-6 and expressing *κ* as a correction factor relative to the short-distance maximum speed *v*_{m} = *Aϵ*, we find

This depends on the universal parameter *β≈* 0.4 and also on the critical distance *d*_{c}. The latter is a measure of endurance and does depend on individual factors such as body composition and training, as well as on strategies such as carbohydrate loading. However, for the sample of recreational athletes studied here, I hypothesize that one can fix a universal value of *d*_{c} lying somewhere around the half-marathon distance, and find a reasonable description of real-world data.

It is not true in reality that runners can maintain the same pace at any of the distances below *d*_{c}, for which glycogen suffices. As the distance increases from 5 km to the half-marathon distace of 21 km, one observes a decrease in speed which, as originally observed by Hill, [6] appears linear on a graph of speed versus the logarithm of distance. In the men’s and women’s world-record times, this decrease is about 5%. The graph then shows a knee, like the one described by Eq 7. The more gradual decrease for distances before the knee is generically described as being due to fatigue, which is a complicated and poorly understood phenomenon involving a variety of factors, many of which are mediated by the central nervous system rather than by any change at the chemical or tissue level. As an *ad hoc* correction, we multiply the result of Eq 7 by a factor controlled by a small parameter *Q*:

The factor of 3 is introuced so that *Q* is approximately equal to the reduction in speed between a 5k and a half-marathon, and we set *Q* = 0.05.

Empirically, for the mostly recreational runners studied here, a reasonable description of the data is achieved when *d*_{c} is set to the half-marathon distance, which is the value adopted in this work. Fig 2 shows that setting *d*_{c} to half-marathon distance gives a good fit to some real-world data.

Although the fit in Fig 2 is rather good, much higher values of *d*_{c} might be more appropriate for higher-level endurance runners. For example, Eliud Kipchoge’s personal-record speed is only 8% lower in the marathon than in the 1500 m. This can only be reproduced in this model if *d*_{c} is roughly marathon distance for him. For high-level running competitions, Gardner and Purdy give a method of comparing with a standard performance curve, based on a compilation of world records. [9] The red curve in Fig 2 is an approximation to the locus of world-record times (Appendix 2).

Although fitting parameters to individual runners’ characteristics is not the main purpose of this work, doing so is very easy with the model 8, due to its purely multiplicative structure. When data are viewed in the format used in Fig 2, as a log-log plot of speed versus distance, the standard curve *κ*(*d*) is simply slid around horizontally and vertically to match the data, which has the effect of determining the runner’s *v*_{m} and *d*_{c}.

## 3 Results

Fig 3 shows a comparison of the quality of the predictions of the functions *C*_{g} and *C*_{t} as descriptions of the effects of going up and down hills. The third model *C*_{r} is a hybrid of these, introduced in section 4.2. All predictions were corrected for distance as described in section 2.2. Each of the four sub-figures a through d has been constructed so as to test a particular feature of these models. We discuss each in turn.

Here we compare the extremely flat half-marathon I, having only 90 m of elevation gain, to half-marathon P, which is slightly more hilly with 170 m of gain, or about twice as much. According to the treadmill-based model

*C*_{t}, the effects of climbing and descending nearly cancel out, giving a negligible*CF <*1% for each run, as expected from Baumel’s argument. In the gain-based model*C*_{g}, however, the effect of the hills on course P is 6 times its elevation gain, which is equivalent to adding 1.0 km to its length. The effect for I would be half as much, causing the model to predict a considerable difference in the times on the two courses. In the figure we see that Baumel’s approximation is a good one here. The median error for*C*_{t}(open circles) is only 1.7%, while that for*C*_{g}(filled circles) is +6.4%, the positive sign showing that the effect of the small hills is over-predicted.Of the four tests a-d, this is the only one where the effect being probed is small enough to require statistical analysis rather than simple visual inspection. Such an analysis (Appendix 4) show that systematic error in

*C*_{g}is significant (*p*= 3*×*10^{−6}), while any such evidence against*C*_{t}is statistically marginal.In this portion of Fig 3, we compare times in a set of fairly flat half-marathons (with CF values ranging from 0.2% to 8%) with course W, a trail race up and down half of Mount Wilson,

*CF*= 20%. Although the distance of W is much shorter, most runners’ times are only slightly lower. We see that both models greatly underpredict the runners’ times in the mountain race. Most of the running in this race is on slopes with |*i*|*≈*0.10 to 0.15. A likely interpretation is that on the uphills,*C*_{g}is an underestimate (see c, below), while on the downhills*C*_{t}is an underestimate. The race is run on a trail that is mostly a narrow single track, with steep hillsides on the climber’s right. Safety is likely to inhibit many runners from going downhill at anything like the pace that would be possible for the elite mountain runners in ref. [2] on a treadmill, and trail etiquette dictates that they yield the right of way when encountering people who are still on their way up.This test compares runners’ times on the same flattish half-marathons with their performances in two races, B and V, in which runners go up a mountain and finish at the top. Of the sample size of

*n*= 32, only one person was on course V, a “vertical kilometer”-style race which was run in Northern California. Although both models systematically underestimated the difficulty of the uphill races (*ℰ <*0), the underestimate is far more severe for*C*_{g}than for*C*_{t}. Course B consists almost entirely of climbing on grades 0.05*< i <*0.25, at which*C*_{g}is less than*C*_{t}and is apparently a considerable underestimate. Additional factors leading to*ℰ<*0 in both cases are certain to include the aerobic challenge of finishing the race at an elevation of over 3000 m, as well as the difficult footing on the final section.Here we compare the same set of fairly flat half-marathons with a road half-marathon, course X, which consists entirely of running

*down*a mountain. The downhill race is run along with a marathon, which the race’s organizers advertise as being extremely fast and a good way to achieve a “BQ” or qualifying time for the Boston Marathon. Surprisingly, most runners’ times in the downhill half-marathon were only about 10 minutes shorter than in the flat ones, and quite a few runners in the sample actually took longer for the downhill race. We have already seen evidence in part b above that the high physiological efficiencies observed in the lab for downhill running may not translate proportionately into speed. In W, the issues may have been safety and courtesy, which would not have been relevant in this race on a wide asphalt road. A more likely explanation here is that for recreational runners who have not trained extensively on steep hills, a long downhill of this length can be physically difficult due to the strong eccentric strain on the quadriceps, as well as testing the tensor fascia latae.

## 4 Discussion

### 4.1 Interpretation of results

Fig 4 presents a graphical summary of the interpretation of these results.

The observations mainly support the model *C*_{t}, except that as downhill grades get steeper and steeper, it appears that most recreational runners in real-world conditions reach a point of diminishing returns far earlier than is the case in treadmill studies. The results for course X, which has an average *i ≈* −0.05, suggest that this point of diminishing returns is reached at relatively small negative slopes, perhaps *i ≈* −0.03. The factors causing this effect probably have as much to do with safety and etiquette as with physiology, so that they cannot be quantified in any universal way. However, it would be irresponsible to provide runners, especially recreational athletes, with scientific advice that would give an unrealistically rosy picture of the difficulty of a run.

### 4.2 Hybrid model

I have therefore investigated some possible modifications to the function *C*_{t}. A modification that did not work well was to adjust the values of the parameters *p* and *d* given in Table 2 so as to shift the minimum of the function up and to the right. This was unsuccessful, because the smooth analytic character of Eq (10) makes it impossible, by varying its parameters, to dramatically modify the function’s behavior for −0.06 ≲ *i* ≲ −0.03 while retaining its apparently correct behavior at −0.03 ≲ *i* ≲ 0. A more successful ad hoc recipe was simply to introduce a cut-off in *C*, i.e., to define a “recreational” version of the function,
where *i*_{0} = − 0.03. In other words, we simply chop the bottom off of the curve of *C*_{t}, at the dotted line in Fig 4.

The results of the hybrid model *C*_{r} are shown as gray circles in Fig 3 and are in general fairly good. The main remaining inaccuracy is the underprediction of the effect of steep hills in test c. It would be tempting to modify the function *C* to give it an even more severe upward curve for *i*≳ 0.3. This does not seem warranted by the present data, since other factors may be at work, including altitude and rough footing.

## 5 Conclusions

This paper presents a model that can be used to predict the time of a runner on a course, given their time on some other course. This model is a hybrid of two others that have been previously proposed. Testing against a sample of times clocked by mostly recreational athletes shows that the model usually gives predict corrections to within about 10-20%, even when the route includes extreme climbing or descents. It would be desirable for future work to refine this work by controlling better for factors such as altitude, training, and the quality of footing on mountain trails.

## Data availability statement

The compilation of race times was derived from public sources and is itself publicly available at https://github.com/bcrowell/trail. The code used to analyze the data is contained in that repository and in https://github.com/bcrowell/kcals. All code is under a GPL license. I have also used Zenodo to assign a DOI to the data: 10.5281/zenodo.4661554.

## Acknowledgements

The author thanks Wendy Yen for helpful conversations.

## Appendix 1. Analytic approximation to the treadmill function *C*_{t}

It is convenient to describe the function *C*(*i*) using a fit to the form
where the subscript *t* stands for treadmill. Parameters fitted to the results of ref. [2] are given in Table 2. The purpose of using this form, rather than the polynomial fit given by [2], is to make the computations degrade gracefully in cases where the limitations of GPS tracks or data from digital elevation models produce unrealistically steep slopes. In such cases, this expression approaches the physiologically expected asymptotic behavior. Although the present work focuses only on running, parameters for walking are presented as well. The results for running are empirically found to be nearly independent of speed, whereas the ones for walking are not. For walking, ref. [2] measured the energy consumption at the speed that was found to be most efficient for that particular subject.

## Appendix 2. Analytic approximation to world-record speeds

Cameron [10] has given a convenient closed-form approximation to world-record speeds of runners at various distances,

This is shown as the red curve in figure 2. The parameters are given in Table 3

## Appendix 3: Analysis of elevation data

Digital maps projected into a horizontal plane were obtained from the race organizers’ web site or in some cases by tracing roads and trails in a Google Maps application.

Elevation data were obtained from publicly available digital elevation models (SRTM1) having a horizontal resolution of 30 meters. (Elevation data from handheld GPS/GNSS units are more difficult to obtain from public sources, and are in any case of questionable reliability for this purpose, since the uncertainty can be very large when all satellites are near the horizon or when the terrain is rough, causing radio echoes from the walls of canyons.)

The use of these data is inherently subject to certain errors, which need to be minimized. Trails and roads are intentionally constructed so as not to go up and down steep hills, but the DEM may not accurately reflect this. The most common situation seems to be one in which a trail or road takes a detour into a narrow gully in order to maintain a steady grade. If the gully is narrower than the horizontal resolution of the DEM, then the DEM doesn’t know about the the gully, and the detour appears to be a steep excursion up and then back down the prevailing slope.

Empirically, I have found that sensitivity to these effects can be minimized if the elevation profile of the run *y*(*x*) is filtered by convolving it with a rectangular windowing function having width *w* = 200 meters. This tends to eliminate unrealistic glitches in the elevation data, and also seems to give a fairly close reproduction of race organizers’ estimates of total elevation gain. This choice of *w* gives sane results for routes in mountainous terrain, and is used throughout this work, even for flat courses on city streets. For a course that is relatively flat and has many small, short hills, *w ≈* 60 m gives more accurate results, but I have used the larger value of *w* throughout this work in an effort to maintain consistency.

The mileage derived from a GPS track can vary quite a bit depending on the resolution of the GPS data. Higher resolution increases the mileage, because small wiggles get counted in. This has a big effect on the energy calculation, because the energy is mostly sensitive to mileage, not gain. For races that were advertised as 5k or half-marathon races, I have therefore used the advertised distance, as shown in Table 1, in order to calculate the first-order estimate of the energy, but have used the elevation gain and CF value derived from the actual GNSS data.

## Appendix 4: Statistical analysis

In section 3, test (a) probes an effect small enough that visual inspection of the scatter plots is not a satisfactory way of testing hypotheses. Specifically, we want to know whether the apparent systematic error in the model *C*_{g} is statistically consistent with zero.

We do not know a priori the underlying probaility distribution of the ratio of times or of its logarithm *ℰ*. One might have expected based on previous work [5] that the times would be log-normal, in which case *ℰ* would be normally distributed. However, a Q-Q plot shows that this is not the case for the present data-set, and in fact the distribution of *ℰ* is asymmetric. The ratio of times, however, has a symmetric and leptokurtic distribution. Its symmetry allows the use of the one-sample Wilcoxon test. For *C*_{g} the null hypothesis is rejected with *p* = 4 *×* 10^{−6}, while for *C*_{t}, *p* = 0.07. Thus the defect in *C*_{g} is significant, while any such evidence against *C*_{t} is statistically marginal.