How much data do we need? Reliability and data deficiency in global vertebrate biodiversity trends

Global biodiversity is facing a crisis, which must be solved through effective policies and on-the-ground conservation. But governments, NGOs, and scientists need reliable indicators to guide research, conservation actions, and policy decisions. Developing reliable indicators is challenging because the data underlying those tools is incomplete and biased. For example, the Living Planet Index tracks the changing status of global vertebrate biodiversity, but taxonomic, geographic and temporal gaps and biases are present in the aggregated data used to calculate trends. But without a basis for real-world comparison, there is no way to directly assess an indicator’s accuracy or reliability. Instead, a modelling approach can be used. We developed a model of trend reliability, using simulated datasets as stand-ins for the "real world", degraded samples as stand-ins for indicator datasets (e.g. the Living Planet Database), and a distance measure to quantify reliability by comparing sampled to unsampled trends. The model revealed that the proportion of species represented in the database is not always indicative of trend reliability. Important factors are the number and length of time series, as well as their mean growth rates and variance in their growth rates, both within and between time series. We found that many trends in the Living Planet Index need more data to be considered reliable, particularly trends across the global south. In general, bird trends are the most reliable, while reptile and amphibian trends are most in need of additional data. We simulated three different solutions for reducing data deficiency, and found that collating existing data (where available) is the most efficient way to improve trend reliability, and that revisiting previously-studied populations is a quick and efficient way to improve trend reliability until new long-term studies can be completed and made available.

the-ground conservation. But governments, NGOs, and scientists need reliable indicators to 23 guide research, conservation actions, and policy decisions. Developing reliable indicators is 24 challenging because the data underlying those tools is incomplete and biased. For example, 25 the Living Planet Index tracks the changing status of global vertebrate biodiversity, but 26 taxonomic, geographic and temporal gaps and biases are present in the aggregated data 27 used to calculate trends. But without a basis for real-world comparison, there is no way to 28 directly assess an indicator's accuracy or reliability. Instead, a modelling approach can be 29 used. 30 We developed a model of trend reliability, using simulated datasets as stand-ins for the 31 "real world", degraded samples as stand-ins for indicator datasets (e.g. the Living Planet 32 Database), and a distance measure to quantify reliability by comparing sampled to 33 unsampled trends. The model revealed that the proportion of species represented in the 34 database is not always indicative of trend reliability. Important factors are the number and 35 length of time series, as well as their mean growth rates and variance in their growth rates, 36 both within and between time series. We found that many trends in the Living Planet Index 37 need more data to be considered reliable, particularly trends across the global south. In 38 general, bird trends are the most reliable, while reptile and amphibian trends are most in 39 need of additional data. We simulated three different solutions for reducing data deficiency, 40 and found that collating existing data (where available) is the most efficient way to improve 41 trend reliability, and that revisiting previously-studied populations is a quick and efficient 42 way to improve trend reliability until new long-term studies can be completed and made 43 available. 44 (McRae et al., 2017). This means a lack of standardization in study design (individual 76 population time series are standardized, but there is no standardization between 77 populations), monitoring strategy, frequency of assessment, monitoring intensity and effort, 78 even data type (densities, counts of individuals or breeding pairs or even nests, and 79 population size estimates are mixed together). The LPI has taxonomic and geographical 80 imbalances (Collen et   Two challenges presented by the LPI require a different approach than that taken for the 99 sRLI. First, LPI trends are based on population time series that are often short and/or 100 infrequently measured, and there are no regional or taxonomic groups within the LPI where 101 the data is comprehensive enough to be certain of the real-world trend. Therefore, 102 comparing sampled trends to LPI trends would tell us little about how the sampled trends 103 might compare to reality. Second, the LPI uses non-linear trends that change slope and 104 direction over time, so trends should be compared in a way that reflects this. Here, we use a 105 modelling approach to overcome these challenges, based on thousands of datasets of 106 synthetic population time series with variations in the underlying properties of the data to 107 represent regional taxonomic groups in the real world and sampling from those datasets. 108 We degraded the samples by randomly removing observations and adding observation error 109 to resemble regional taxonomic groups in the Living Planet Database (LPD, the database 110 underlying the LPI). We then compared the trends calculated from the samples with those 111 from the complete datasets using a distance metric (Dove et al., 2022) We first created simulated datasets to represent "real-world" regional vertebrate groups for 133 which the LPI calculates biodiversity trends. The LPI is often represented as a single global 134 index trend, but can also be disaggregated into hierarchical groups: first into systems 135 (terrestrial, marine, freshwater), then geographical realms within each system, and finally 136 taxonomic groups within each realm. It is this lowest level of the hierarchy, the regional 137 taxonomic groups, which we simulated. From here on simulated regional taxonomic groups 138 will be referred to as datasets. The base units of the LPI, and of our synthetic datasets, are 139 population time series, which we will refer to simply as populations. These populations are 140 grouped into species, and species are grouped into datasets. 141 Our procedure to simulate a dataset requires six parameters: 1) the total number of 142 populations to simulate (set to 10,000), 2) the mean number of populations assigned to 143 each species (set to 10), 3) the number of years (length of trend) to simulate (set to 50), 4) 144 the mean of the population mean growth rates (μds), 5) the standard deviation of the 145 population mean growth rates (variation among populations, σds), and 6) the mean of the 146 population standard deviations of the growth rate (process error, μɳ). The first three 147 parameters were fixed. The first, total populations, affects trend accuracy only when greater 148 than half of all populations in a dataset are sampled (see Fig. S1), a situation that is unlikely 149 for regional taxonomic groups in the LPD, as it is rare even at the species level (see 150 taxonomic representativeness in McRae et al., 2017). The second parameter, the mean 151 number of populations per species, has no effect on trend accuracy within the wide range of 152 values we tested (see Fig. S2). The third, trend length, is constant across regional taxonomic 153 groups in the LPD. However, it does affect trend accuracy (see Fig. S3), and would therefore 154 need to be set appropriately if adapting the model for a different indicator. Parameters four 155 through six are variable in the LPD and affect trend accuracy, and were therefore set to vary 156 in the simulations. 157 We model population time series using the stochastic exponential model with process error: 158 where Nt is population size at year t, 1 + rt is annual growth rate at year t, and rt ~ N(μpop, 160 could be caused by, for example, uncorrelated environmental variation) by sampling each 162 annual growth rate from a normal distribution. Process error, ɳ, is represented by the mean 163 of the population standard deviations of r, with σpop ~ Exp( 1 ɳ ). 164 The mean of the normal distribution of population growth rates is itself drawn from a 165 normal distribution, μpop ~ N(μspec, 2 ). Thus, populations from a species will have similar 166 but not identical underlying mean population growth rates representing perhaps differences 167 in environmental conditions between geographically isolated populations of a given species. 168 In turn, similar species are grouped together into datasets, and we assume that species 169 within taxonomic groups have underlying population growth rates that are drawn from an 170 identical distribution, μspec ~ N(μds, 2 ). 171 Here, larger values for lead to a broader range of underlying species growth rates, 172 perhaps signifying broader species-specific variation in responses to drivers such as habitat 173 change within a taxonomic group. Using this hierarchical approach therefore captures the 174 similarity of time series within a species, and the similarity of time series between species 175 within a taxonomic group. 176 Growth for each population was modelled for 50 years, starting at a population size of 100. 177 Populations were assigned to species by randomly sampling from a pool of 500 species IDs, 178 with replacement, resulting in a normal distribution of populations per species, pps ~ N(μpps, 179 2 ), with μpps = 20 and σpps = 4.5. While populations are unlikely to be normally distributed 180 across species in the real world (one would expect more rare species than common species), 181 simulations confirmed that our modelling approach is robust against distributional 182 assumptions for this parameter (see Fig. S2). 183

Observation error:
184 The variation in lambdas modelled above assumes all variation is due to process error. 185 However, time series in the LPD are based on population estimates, which can be assumed 186 to include some level of observation error due to e.g. species misidentification, non-187 detection, and counting errors. This observation error is not accounted for in the LPI, but 188 may affect trend reliability. Observation error, ɛ, can be calculated using the coefficient of imputes missing values using log-linear interpolation by 237 where N is the population estimate, i is the year for which the value is to be interpolated, p 239 is the preceding year with an observed value, and s is the subsequent year with an observed 240 value. For all populations, whether interpolated or modelled by a GAM, species indices were 241 formed by a three-step process. First, population sizes were converted to annual rates of 242 change by 243 where N is the population estimate and t is the year. Second, average growth rates were 245 calculated for each species by 246 where is the number of populations in a given species, is the growth rate for 248 population i at year t, and ̅̅̅ is the average growth rate at year t. Growth rates were capped 249 at [-1:1]. Finally, index values were calculated by 250 where I is the index value and t is the year. We selected an appropriate distance measure to compare sampled trends with 'true trends' 259 using the process described in Dove et al. (2022). Of the distance measures deemed 260 appropriate, we chose the Jaccard distance because it uses a 0-1 scale, making it easier to 261 interpret. The Jaccard distance is calculated as 262 and n is the number of time points. From here on, any value calculated by applying the 265 Jaccard distance to compare sampled versus 'true' trends will be referred to as a trend 266 deviation value, or TDV. 267 We use TDV here as a measure of trend accuracy, but it is in fact the complement of 268 accuracy (a perfectly accurate trend would yield a TDV of zero); lower TDV means higher 269 accuracy. Furthermore, when referring to TDVs of simulated trends, we use the term 'trend 270 accuracy,' but when referring to TDVs of LPI trends, we use the term 'trend reliability.' This 271 is because TDVs for simulated trends are measured, while TDVs for LPI trends are estimated 272 based on a model. Trend reliability is thus a measure of expected accuracy based on 273 underlying data sufficiency or deficiency, but should not be considered a proxy for accuracy. 274 In other words, a data deficient trend may be accurate but we cannot rely on it to be so. 275 2.8. Generation of datasets: 276 We generated 3,000 datasets (each consisting of 1,000 species and 10,000 populations), 277 with each dataset sampled 20 times, resulting in 60,000 samples. Values for mean time 278 series length, μds, σds, and μɳ were randomly selected from uniform distributions, while 279 sample size was randomly selected from a log-uniform distribution, ln(SS) ~ U(ln(a), ln(b)), 280 where SS is sample size and a and b are the minimum and maximum values, respectively 281 (log-uniform was chosen to ensure the model would be robust at small sample sizes, as 282 most datasets in the LPD are small). Ranges for the distributions were chosen to ensure that 283 parameter ranges in the samples would be broader than the ranges present in the LPD 284 (Table 1). Regional taxonomic groups from the LPD with fewer than 20 populations were 285 excluded from parameter range calculations to avoid extreme outliers. We set the minimum 286 sample size to 50 because smaller samples rarely generate a complete trend, and the 287 maximum to 10,000 to improve predictions of the effects of sample size increases. 288 where RMSE is the root mean squared error and SD is the standard deviation of the actual 306 TDVs, and 307 where yi is the ith actual TDV, ŷ is the predicted TDV, and n is the number of samples. 309 2.11. Maximum trend deviation value: 310 We set a maximum predicted TDV as a threshold that regional taxonomic group trends 311 within the LPI should not exceed to be considered reliable. First, we built a linear regression 312 model of the square root of TDV from our training datasets, with the natural log of sample 313 size as the predictor variable, since sample size is the only user-controlled variable within 314 the LPD. Every regional taxonomic group within the LPD represents a single sample from the 315 real world; therefore, we were not interested in the mean TDV achieved by each dataset, 316 but in the range of possible TDV values, especially the upper part of the range (the least 317 accurate sample trends from each dataset). 318 We used 10,000 bootstrap estimations of the mean of the TDV from each dataset to 319 calculate the 90% confidence intervals using the bias corrected and accelerated bootstrap 320 interval (BCa) method, also known as the adjusted bootstrap percentile method. The BCa 321 method is a non-parametric method that does not assume the data is normally distributed 322 (the TDV values have a beta distribution) and corrects for bias and skewness in the 323 distribution of the mean estimates. We plotted the curve of the sqrt-log model of the upper 324 90% confidence interval of TDV in relation to sample size on a (non-log) graph of TDV versus 325 sample size (Fig. 2). 326 Increasing sample size should naturally lead to more desirable TDV, but it is costly in terms 327 of time and money to increase sample size, and it may also be prudent to put the resources 328 elsewhere. It is therefore important to choose a maximum TDV that reflect these trade-offs. 329 To choose a maximum TDV, we used a method called the concordance probability method 330 (CZ) (Liu, 2012). We adapted CZ from the field of biomedical research, where it is often 331 necessary to specify a cut-off value to discriminate between positive and negative results 332 from screening or diagnostic tests (Liu, 2012). First, a receiver operating characteristic (ROC) 333 curve is built, plotting the rate of true positives (sensitivity) against the rate of false positives 334 (1 -specificity). The idea is to find the point on the curve that maximises both sensitivity 335 and specificity. The CZ method simply finds the point where their product is maximised. 336 By considering the sqrt-log model of the upper 90% confidence interval of TDV versus 337 sample size (Fig. 2) as equivalent to an ROC curve, we applied the CZ method to find the 338 point on the curve where TDV and sample size are minimised. This is the point where we 339 should achieve maximum value from the data. Further right along the curve, increasing the 340 sample size would give a smaller improvement in trend reliability and is therefore not cost-341 or resource-effective. Since an ROC curve is intended for binary classification, the CZ 342 method assumes that both sensitivity and specificity are on a 0-1 scale. TDV already ranges 343 from 0-1, so we set sensitivity as 1 -TDV. We normalised sample size to a 0-1 scale by 344 converting it to a proportion of the complete dataset (dividing by the total number of time 345 series in the dataset). Since all datasets were the same size, the relationship between TDV and sample size was not altered by the conversion to a proportion. Specificity was then 1 -347 sample size. The optimal cut-point on the curve is defined as 348 max(CZ), CZ(c) = Se(c) * Sp(c) (12) 349 where Se is sensitivity, Sp is specificity, and c is any cut-point.

354
The vertical red line intersects the sqrt-log curve at the optimal cut-point. 355 2.12. Minimum sample size for regional taxonomic groups: 2.13. Assigning reliability ratings to regional taxonomic groups:

368
The actual number of populations in each regional taxonomic group was divided by the 369 minimum sample size and multiplied by 100 to determine the percentage of the minimum 370 sample size actually met by each group. Groups achieving 100% or greater were designated 371 as reliable, those achieving between 50% and 100% were designated as data deficient, and 372 those achieving less than 50% were designated as severely data deficient. 373 2.14. Correlations between reliability rating and LPI relative weighting:  (Tables 1 and 2). Together they 403 describe 62% of the variation (adjusted r-squared: 0.62) in the TDV associated with sampled 404 trends, and with F(5, 29385) = 9,686, p < .001. All independent variables are statistically 405 significant predictors, with p < 0.001. Interaction terms were also statistically significant but 406 did not increase the adjusted r-squared of the model, so we left them out. RRMSEP is 0.231. 407 Sample size is the most important variable affecting trend accuracy, with differences in 408 importance between the other three variables comparatively small. Much of the 409 unexplained variance from the model is due to random sampling. We confirmed this by 410 remaking the model using the sample means, which resulted in an adjusted r-squared of 411 0.87. Using the square root of TDV instead of the log further increased the adjusted r-412 squared to 0.93. This was not the case for the model using the individual samples, where 413 the log resulted in a higher adjusted r-squared than the square root. 414

422
The number of populations needed to achieve the TDV threshold for a reliable trend varies 423 across taxonomic groups and realms (Table 3)  Reliability varies strongly across realms, taxonomic groups, and systems (Figs 3 & 4). 440 Terrestrial trends are the most reliable and freshwater trends the least. Terrestrial and 441 freshwater trends are more reliable in the global north than in the global south, except for 442 terrestrial reptiles and amphibians. Marine bird trends are more reliable in temperate areas 443 than the tropics, while marine fish trends are more reliable in tropical waters than polar. 444 Globally, bird trends are the most reliable, but are nonetheless poor in the tropics, 445 especially Africa. Reptile and amphibian trends are data deficient everywhere except the 446 terrestrial Neotropical realm, and marine and freshwater mammal trends are data deficient 447 everywhere (although marine IndoPacific mammals are very close to the threshold at 97%). 448 The regional taxonomic groups with the greatest potential to affect the reliability of 449 aggregated LPI trends are exclusively tropical (Fig. 5), due to a combination of high relative 450 weighting and low reliability scores. The eight groups of greatest concern include five 451 freshwater and three terrestrial groups, but no marine groups. All are from the tropics.  Fig. 3. Proportion of the total amount of time series data needed to achieve the trend reliability threshold that 460 each regional taxonomic group in the LPD currently contains. A score of 100% or greater means that group 461 already has enough data to produce a reliable trend. A white box refers to a group that meets the reliability 462 threshold, while a coloured box means the threshold has not been met. The further the group is from meeting 463 the threshold, the more intense the colour. A grey box refers either to a group that could not be evaluated 464 because there was too little data (South temperate marine reptiles) or due to an invalid realm-taxon 465 combination (there are no marine reptiles in the Arctic). sample size needed to achieve the TDV threshold. A check mark means that group has at least 100% of the 472 minimum sample size and is considered reliable, a dash means it is data deficient (50-99%), and an X mark 473 means it is severely data deficient (< 50%).  Acquiring accurate and comprehensive data is crucial, but the first step is to answer the 501 question: what do we actually know? The present study quantifies the reliability of trends 502 for each regional taxonomic group in the Living Planet Index and estimates the number of 503 population time series needed to meet a standard of expected accuracy. 504 We used synthetic population time series datasets to construct a multiple regression model 505 of trend accuracy by comparing trends of degraded samples with the trends of the full, 506 undegraded datasets using a distance measure (Fig. 1). We applied the model to regional 507 taxonomic groups in the Living Planet Database to reveal that the majority need additional 508 data for their trends to be considered reliable. Data deficiency is a problem globally, but is 509 more pronounced in the tropics. This is consistent with the analysis of geographical Our concern was that if trends 535 from these areas are the least reliable due to data deficiency, then the LPI could have simply 536 replaced one problem, representation bias, with another: overreliance on data deficient 537 trends. Indeed, our analysis shows that all regional taxonomic groups with a high relative 538 weight and low reliability rating (bottom right of Fig. 5) are tropical. Surprisingly, though, we 539 did not find a statistically significant negative correlation between reliability of trends and 540 their relative weights in the LPI. This also holds true for the terrestrial and freshwater 541 systems when considered separately (the marine system actually shows a positive 542 correlation), and is consistent with Nori et al. (2020), who found that species richness and 543 knowledge gaps are not always correlated. 544 According to our model, the size of a dataset, i.e., the number of species or populations 545 existing in the real world for any regional taxonomic group, is unimportant to the calculation 546 of trend reliability for a given sample, as long as the sample represents less than half of the 547 time series in the dataset (Fig. S1). In other words, it is the absolute number of populations 548 represented in the sample that matters, regardless of whether that sample represents 1% or 549 50% of the total populations in a regional taxonomic group. There are two principles 550 working to cause this seemingly counterintuitive effect. First, the relationship between 551 population size and the sample size needed to reach a desired level of precision is 552 logarithmic and becomes more extreme at lower levels of precision (Israel, 1992). This 553 means that a small sample size should be able to estimate a large population almost as well This dramatic fall-off of observations suggests that more data is needed for the LPI to 574 reliably reflect changes in the status of global vertebrate biodiversity over the past decade. 575 While a reduction in the delay involved in getting new studies into the LPD might help, 576 increasing the number of populations in the LPD is only possible to the extent that the 577 necessary data exists. Therefore, we simulated two potential ways of generating new data 578 to improve trend reliability: A) a global data blitz, with researchers coordinating to track as 579 many unstudied populations as possible for ten years to generate new time series, and B) 580 resampling already-studied populations to uncover recent changes and lengthen existing 581 time series (Fig. 7). Both solutions had a slight but non-significant positive effect on trend 582 accuracy, but were far less effective than adding existing data (as is currently done for the 583 LPD). It is likely that both solutions have a greater effect on the accuracy of the final portion 584 of the trend than on the overall trend, but further study would be required to be certain. 585 Either way, resampling would be more efficient than a data blitz, as the same improvement 586 could be achieved in one year instead of ten. In the long term, tracking additional 587 populations is essential to completing our picture of biodiversity change. However, natural 588 stochasticity means that short time series are of limited value in generating reliable trends 589 (Wauchope et al., 2019), so tracking additional populations takes time to pay dividends. 590 There is another limitation underlying the LPI, which cannot be solved by generating new 591 data. All trends in the LPI begin in the year 1970, which is set as the base year for calculating 592 the index values. Past trends can only be determined by existing data; therefore, while there 593 may be some currently inaccessible data that either could be shared or made available for 594 confidential storage in the LPD (Saha et al., 2018), there are likely to be severe limitations to 595 relieving data deficiency for the early years of the LPI. However, other potential solutions 596 could be examined in future studies. One would be to begin the index at a later year in 597 which there is more data available (e.g., 1990). Another would be to change the base year 598 for calculating the index to a more data-rich year, thus increasing the uncertainty around 599 the early years of LPI trends (Gregory et  Our modelling approach to quantifying trend reliability is subject to several limitations. 608 Certain aspects of the underlying data, such as the distribution of observations and biases in 609 which populations or species are tracked, are too complex to be included as factors in the 610 model, but nonetheless may play a significant role in determining trend reliability. For 611 example, monitoring efforts tend to focus on species at higher risk of extinction (Scheele et  612 al., 2019). Many amphibian populations in the LPD were tracked because they were 613 declining due to the devastating disease chytridiomycosis. This could negatively bias trends 614 and falsely reduce variance in growth rates, leading the model to overestimate reliability 615 because it assumes that tracked populations are randomly selected. On the other hand, 616 Murali et al. (2022) found that population coverage in the LPD is biased towards protected 617 areas, where species are less likely to be threatened, therefore potentially causing a positive 618 bias in LPI trends. Another common phenomenon in the LPD is that time series are non-619 randomly distributed across time and/or space. For example, while some biodiversity 620 hotspots (e.g., tropical Africa) are poorly known, others, especially islands (e.g., 621 Madagascar), are well-studied (Nori et al., 2020), and this may bias entire realms. In the 622 Afrotropical realm, six (12%) of the 51 terrestrial reptile and amphibian time series in the 623 LPD are from Round Island (a tiny uninhabited island near Mauritius) and more than half 624 (57%; 29/51) are from a single study that took place at a reserve in Madagascar over a nine-625 year period; only seven (14%) are from mainland Africa, and of the seven, four are from a 626 single study at a reserve in Nigeria. In this case, the model likely severely underestimates 627 the amount of data needed to get a reliable trend. While this is an extreme example, it 628 shows that there are important underlying aspects of the data that cannot be assessed by a 629 model. Fortunately, these issues tend to diminish when more data is present, and thus 630 should not have a large effect on trends assessed as reliable. 631 The model also assumes that adding additional time series to the LPD will maintain the 632 parameters of the regional taxonomic group to which they are added (e.g., the mean time series length and the level of variance in population mean growth rates will not change). 634 This can result in the model severely overestimating the numbers of populations required to 635 achieve a reliable trend. For example, it suggested that 9,087 populations of freshwater 636 Afrotropical birds are needed. This likely occurred due to problems with the existing data. Another limitation of our modelling approach is that we could not correct for the sizes of 645 the 'real-world' datasets (the number of populations that exist) that the LPD 'samples' are 646 drawing from, and therefore may overestimate the sample size needed to achieve a reliable 647 trend for very small datasets. Although there are estimates of the number of species for 648 each regional taxonomic group, our model uses populations as the base unit to measure 649 sample size. We chose to base sample size on populations rather than species for two 650 reasons. First, we found that mean growth rates within the LPD vary almost as much 651 between populations within a species as they do between species. Therefore, we cannot 652 assume that the trend of a population represents the trend of the species it belongs to any 653 better than it represents the trend of its entire regional taxonomic group. Second, localised 654 threats such as land-use change and habitat destruction are likely to affect some 655 populations within a species disproportionately. Population extinctions also occur much 656 more frequently than species extinctions, and may serve as a prelude (Ceballos et al., 2017). 657 However, a population is not a well-defined unit, and we do not have estimates of how 658 many populations each species or regional taxonomic group is composed of. While our 659 testing suggested we can assume the number of existing populations to be unimportant in 660 determining trend reliability, this assumption breaks down when the sample comprises a 661 large percentage of the dataset. It is unlikely that any regional taxonomic groups currently 662 approach this level of representation within the LPD, but it is nonetheless an important 663 caveat to be aware of. 664 Despite these caveats, the results of our study reveal the strengths and weaknesses in our 665 understanding of global vertebrate biodiversity, highlighting the regional taxonomic groups 666 for which we have enough data to make responsible decisions, as well as those on which 667 future data gathering and collation efforts should focus. Some underlying aspects of the 668 data create biases that are not taken into account by our modelling approach, and more 669 fine-scale studies on gaps in population trends should be performed to better understand 670 these biases and where to divert scientific resources. We show that revisiting previously-671 studied populations is a quick and efficient way to improve trend reliability for data 672 deficient groups until more long-term studies can be completed and made available. The 673 modelling approach we use to quantify trend reliability can also be generalized to assess 674 other global and/or regional biodiversity indices that utilize population time series data. We 675 are facing an urgent global biodiversity crisis made worse by biased and deficient data, but 676 through careful study and cooperative global efforts we can solve the data problem and 677 begin to 'bend the curve' of biodiversity toward a positive trend. 678 Acknowledgements: