The influence of data type and functional traits on native bee phenology metrics: Opportunistic versus inventory records

Efforts to understand activity patterns of bees, our most important pollinators, often rely on opportunistically collected museum records to model temporal shifts or declines. This type of data, however, may not be suitable for this purpose given high spatiotemporal variability of native bee activity. By comparing phenological metrics calculated from intensive systematic inventory data with those from opportunistic museum records for bee species spanning a range of functional traits, we explored biases and limitations of data types to determine best practices for bee monitoring and assessment. We compiled half a million records of wild bee occurrence from opportunistic museum collections and six systematic inventory efforts, focusing analyses on 45 well-represented species that spanned five functional traits: sociality, nesting habits, floral specialization, voltinism, and body size. We then used permutation tests to evaluate differences between data types in estimating three phenology metrics: flight duration, number of annual abundance peaks, and date of the highest peak. We used GLMs to test for patterns of data type significance across traits. All 45 species differed significantly in the value of at least one phenology metric depending on the data type used. The date of the highest abundance peak differed for 40 species, flight duration for 34 species, and the number of peaks for 15 species. The number of peaks was more likely to differ between data types for larger bees, and flight duration was more likely to differ for larger bees and specialist bees. Our results reveal a strong influence of data type on phenology metrics that necessitates consideration of data source when evaluating changes in phenological activity, possibly applicable to many taxa. Accurately assessing phenological change may require expanding wild bee monitoring and data sharing.

functional traits, we explored biases and limitations of data types to determine best practices for 23 bee monitoring and assessment. We compiled half a million records of wild bee occurrence from 24 opportunistic museum collections and six systematic inventory efforts, focusing analyses on 45 25 well-represented species that spanned five functional traits: sociality, nesting habits, floral 26 specialization, voltinism, and body size. We then used permutation tests to evaluate differences 27 between data types in estimating three phenology metrics: flight duration, number of annual 28 abundance peaks, and date of the highest peak. We used GLMs to test for patterns of data type 29 significance across traits. All 45 species differed significantly in the value of at least one 30 phenology metric depending on the data type used. The date of the highest abundance peak 31 differed for 40 species, flight duration for 34 species, and the number of peaks for 15 species. 32 The number of peaks was more likely to differ between data types for larger bees, and flight 33 duration was more likely to differ for larger bees and specialist bees. Our results reveal a strong 34 influence of data type on phenology metrics that necessitates consideration of data source when 35 evaluating changes in phenological activity, possibly applicable to many taxa. Accurately 36 assessing phenological change may require expanding wild bee monitoring and data sharing. however, has been paid to how the type, quantity, or quality of data used to measure phenology 73 in bees -or other species -may produce conflicting conclusions. 74 To assess the vulnerability of plant-pollinator relationships to climate change, we first 75 need to examine whether data used to assess bee phenology accurately reflects real changes in 76 bee activity, as opposed to noise from unevenly-sampled biological variability, biased collecting 77 protocols, or sample-size limitations of the data. Sampling diverse organisms at large spatial and 78 temporal scales can be an incredibly laborious and expensive process. As a result, our knowledge 79 of bee trends necessarily draws from patchy and inconsistently-collected data ( Meiners et  sources of error and uncertainty is an important step in advancing scientific understanding and 87 maintaining public trust and investment in the ability of science to measure and mitigate changes 88 in our natural world. 89 Data available to researchers interested in large-scale animal activity trends can generally 90 be divided into two types: "opportunistic" and "inventory". "Opportunistic" data usually consist 91 of records compiled from museum specimens belonging to specific groups of interest that are the effort required to follow a systematic inventory protocol is higher, but the assumption is that 106 inventory records have fewer biases, resulting in superior estimates of bee activity, floral 107 reliability (Wright et al., 2015), and baseline community patterns against which evaluations of 108 future change can be measured (Meiners et al., 2019). Despite these core data type differences, 109 however, both opportunistic and inventory data have been used interchangeably in studies of 110 native bee phenology without an assessment of their relative suitability for the task. Drawing 111 mistaken conclusions from data that is flawed, incomplete, or was collected for another purpose 112 may result in mismanagement of natural resources, misdirected sampling efforts, and missed 113 opportunities to harness the full power of both opportunistic and inventory datasets. 114 We use data from six systematic bee inventories and approximately a quarter million 115 opportunistic museum records collected over twenty years to compare estimates per data type of 116 three phenology metrics for forty-five abundant native bee species. To assess the possibility of 117 extrapolating conclusions to additional species, we also examine trends related to five functional 118 life history traits, which recent research has shown to be predictive of native bee rates of decline 119 Mandelik, 2015). With this approach, we seek to answer two central questions: 1) can 122 opportunistic data produce parameter estimates of native bee species phenology that are 123 statistically equivalent to more expensive inventory data?, and 2) if phenology metrics differ 124 between data types, are there patterns associated with functional traits that could be useful 125 indicators of which species are more susceptible to erroneous phenology estimates? In answering 126 these questions, we seek to improve the utility of natural history collection data, the 127 determination of native bee trends and conservation practices, and the broad reliability of 128 phenology results.  We restricted the temporal and spatial range of our study within reason, while keeping 147 our dataset large to limit the phenological variability introduced solely by spatiotemporal factors. 148 For both inventory and opportunistic data, we only used specimen records that met all of the 149 following criteria: 1) identified to a valid species, 2) collected between 1990 and 2015 in the 150 USA or Canada, and 3) contained complete and reliable georeferencing. Data cleaning to meet 151 these criteria was conducted in R (R Core Team, 2015). To ensure sufficient sample sizes for 152 species-level comparisons between data types, we excluded any species with fewer than 180 153 occurrences for each data type, retaining only the fifty most abundant species shared between the 154 two data sets. 155 Once we finalized our list of fifty species, we conducted literature searches and expert 156 surveys to assign them into categories of five pre-selected life history traits that literature a Keyence digital microscope to measure body size as the average inter-tegular distance (between 161 wing bases) for five female specimens of each species, following the method specified by Cane 162 (1987). Based on species-specific literature searches, we categorized the sociality of a bee 163 species as either 1) solitary, or 2) social, which included bee species that can be described as 164 eusocial, communal, and primitively social, or 3) unknown (our list of fifty did not include any 165 cleptoparasitic species). We noted whether a species was considered a floral specialist in the 166 literature by a simple 1) yes, 2) no, or 3) unknown. We noted nest location as a binary trait, with 167 species categorized as nesting primarily 1) above ground or 2) below ground. Due to a lack of 168 published information, we classified voltinism based on a survey of expert opinion into the 169 following classes: 1) univoltine (one generation), 2) multivoltine (>1 generation), 3) social (since 170 these species replace members throughout the season but not in the same way as multiple The final dataset contained 104,101 bee occurrence records, of which 71,152 were from 175 inventory collections and 32,949 were opportunistically collected. From the original fifty species, we removed five from the dataset because they are either: 1) commonly managed (Apis 177 mellifera); 2) have an unusual, socially-parasitic life history (Bombus insularis); or 3) could not 178 reliably be distinguished in females (Agapostemon angelicus, Agapostemon texanus, 179 Agapostemon angelicus/texanus), resulting in a final set of 45 species (Table 1). 180 The final 45 species selected for phenology metric analyses showed a relatively even 181 spread of traits. Some trait category assignments for certain species were impossible to assign 182 based on current knowledge and remain labeled as "unknown" in our trait dataset (Table 1). All 183 assigned trait categories were represented by at least 12 out of 45 total species. 184 185

Calculation of phenology metrics 186
We identified three measurable metrics of bee phenology that would be useful and 187 reliable for quantitatively estimating changes in patterns of bee species activity over time: 1) 188 flight duration, or the number of days in a year the bee species was active; 2) clusters, or the 189 number of distinct peaks in abundance in a year; and 3) the date of a bee species' highest annual 190 peak in abundance (Fig. 1). We defined flight duration as the middle 90% of occurrences, 191 removing the upper and lower 5% of values to eliminate outliers that may represent unusual 192 activity in any given year. We determined the number of clusters in a set of occurrences, with a 193 maximum possible of three clusters, using a gap statistic. We then used kmeans clustering to find 194 the location along the day-of-year axis of all clusters. The cluster with the highest value on the 195 density plot was chosen as the date for the greatest abundance of occurrences. We calculated 196 these three metrics twice for each species, once each for all occurrences from the inventory data 197 type and once for all the opportunistic data type occurrences. In order to have a single number 198 for each metric that showed how different they were for the two data types, we calculated a test 199 statistic for each of the three metrics. We chose the test statistic as the absolute difference 200 between the opportunistic and inventory metric, so that each species had three test statistics, one 201 each for flight duration, number of clusters, and location of greatest abundance. 202 In order to determine if these test statistics indicated that there was a substantial 203 difference between phenology patterns for data types, we compared these observed test statistics 204 to a set of simulated test statistics that came from shuffling the data. We randomly shuffled the 205 data type labels for all occurrences of each species, retaining the same relative number of 206 opportunistic and inventory labels for each species. We then recalculated the three phenology 207 metrics for the two data types and the test statistic, so that each species had three simulated test 208 statistics. Finally, we repeated this process 1000 times, so that each species had a distribution of 209 simulated test statistics. 210 To determine if the observed test statistics were statistically significantly different than 211 the distribution of simulated test statistics, which would indicate that data type mattered for that 212 phenological pattern, we calculated a p-value based on the number of simulated test statistics that 213 were greater than the observed one. We used an alpha cut-off of 0.05, and each species had one 214 p-value for each of the three phenology metrics. Given the multitude of pairwise comparisons, 215 we also include a more stringent alpha cut-off of 0.001, which is the lowest value that can be 216 achieved given the number of permutations. 217 218

Modeling influence of functional traits 219
After assigning a category value to each bee species for each of the five selected 220 functional traits, as described above, we used generalized linear models to assess the influence of 221 functional traits on significant differences between data types from permutation tests for each of 222 the three identified bee phenology metrics (Fig. 1). This evaluation of the influence of functional 223 traits on data type significance was only conducted for species with complete trait category 224 information (Table 1). Species with "unknowns" were removed, and voltinism levels "social" 225 and "multi" were ultimately combined so that all categorical traits were binary variables. All data 226 manipulation, plotting, and statistical tests were conducted in the R statistical package (R Core 227 Team, 2015). 228

Phenology Metrics by Data Type 231
With three phenology metrics for each of 45 species, we compared a total of 135 pairs of 232 phenology variable calculations based on data type. We found significantly different values 233 depending on which data type was used (inventory or opportunistic) in 87 out of 135 cases, 234 which represents 64% of the possible total, much higher than the 5% expected under a null 235 hypothesis and assuming a 5% alpha. The date of highest peak in abundance was the metric with 236 the greatest number of value discrepancies due to data type: the date of the seasonal peak was 237 significantly different depending on which dataset was used for 40 out of 45 species (89%, Fig.  238 3). Flight duration, or the number of days a species was active, differed based on data type for 34 239 out of 45 species (76%, Fig. 3). And the number of clusters, or distinct peaks in abundance, was 240 different between data types for 15 out of 45 species (33%, Fig. 3). It should be noted, however, 241 that the number of clusters was the least reliable of the three phenology metrics, due to 242 limitations of the gap statistic used to calculate it, sensitivity to variable collection efforts over 243 time in the opportunistic data, and the narrow range of options between just one and three for 244 number of clusters detected. We considered other options for calculating number of clusters but 245 found the gap statistic to be the most defensible, if still flawed. 246 All 45 species had at least one metric that was significantly different depending on data 247 type (Table 2). Ten out of 45 species had significantly different results for all three phenology 248 metrics depending on the type of data used to evaluate them. Occurrence curves for each species 249 and data type illustrate the comparison between inventory and opportunistic data types of the 250 three phenology metrics (Fig. 4). The species Ceratina nanula, for example, differed in flight 251 duration between data types, seen as the width of the x-axis between dotted lines, but had 252 statistically similar results for the date of the highest peak and the number of clusters in 253 abundance (Fig. 4, top left). Lasioglossum sisymbrii had the same number of clusters in both 254 inventory and opportunistic datasets, but different values for both flight duration and date of the 255 highest peak, shown by the solid vertical line on the plot (Fig. 4,

top right). Two species of 256
Osmia had different results for the number of clusters reported by the gap statistic (Fig. 4, middle  257 row), as well as either a different flight duration or different date of highest peak depending on 258 data type. As mentioned above, ten species differed in all three metrics between data types, as 259 illustrated by Lasioglossum hudsoniellum and Anthophora urbana (Fig. 4,

Relationship of Functional Traits to Data Type Significance 264
For the group of forty species without any unknown trait values, body size was a 265 significant (p = 0.047) predictor of whether flight duration would differ between data types, with 266 larger bees being more likely to have different results for seasonal activity length depending on 267 which dataset was used (Table 3). Generalized linear model results also found body size to be a 268 marginally significant (p = 0.055) predictor of whether the number of clusters would differ 269 between data types, with larger bees more likely to return different number of clusters depending 270 on whether opportunistic or inventory records were used to calculate them. Floral specialization 271 was also a marginally significant (p = 0.051) predictor of difference between data type in flight 272 duration, with species designated as floral specialists more likely to return different values for 273 number of days they were active over a season depending on data type ( Table 3). The likelihood 274 for date of the highest peak to differ between opportunistic and inventory data was not 275 significantly related to any of the five functional traits. and changes in phenology, not only for pollinators but potentially for other taxa studied using 284 compiled museum records. With the high natural variability of many small organisms already 285 obscuring measurable signals of behavior, adding noise to phenology models by using messy or 286 inappropriate data may confound phenological estimations to the point that they become 287 uninformative. If biases are consistent and directional, phenology studies that do not take into 288 account the influence of data may even report patterns opposite the truth. 289 Our study reveals an urgent need to ensure that the data used for evaluating changes in 290 phenology, not only of native bees but likely of many other organisms as well, are of sufficient 291 quality to produce reliable results. Comparing metrics over time that are not compatible -for 292 example, checking for changes in the date of peak species activity by comparing recent inventory 293 records to older opportunistic records from historical study sites -may add noise instead of 294 clarity to collective efforts to detect real changes in phenology, or may create false impressions 295 of a pattern. Critical interaction mismatches between any organisms reliant on each other could 296 be obscured. Such misleading results may also hinder scientific progress and conservation 297 efforts, erode public trust in science, and dilute the gravity of warnings about pollinator declines 298 and other environmental changes. 299 The implications of our study may be relevant in many systems but are certainly of 300 consequence as applied to native bees. Since plant reproductive success depends on the timing of Despite the overall increased availability of data, records are still very limited for many 315 taxa, and that is where common parameters like functional traits can be useful. Until further 316 technological advances in bee species identification and specimen processing make it feasible to 317 obtain sufficient data to evaluate phenological trends for a majority of cryptic or rare native bee 318 species, efforts to identify unifying variables that correspond to data type reliability or 319 phenological variability will be relevant. Our result that the importance of having high-quality 320 data increases with increasing body size and increasing floral specialization for native bees, for 321 instance, can help guide studies of smaller groups of species when deciding how to allocate data 322 collection resources. Larger bees can emerge earlier in the season than smaller-bodied bees, due 323 to their greater ability to generate and maintain elevated body temperatures under cold conditions 324 specialist bees in our study. Knowing, as a result, that inventory data is more appropriate for 331 these species allows for cleaner interpretation of their behavior. Likewise, it is useful to know 332 from this result that opportunistic data may be more appropriate for estimating phenology 333 metrics for smaller species and floral generalists, at least where many records are available. In 334 these ways, our functional trait model illustrates how exploring limitations of data types can have 335 both biological and statistical value. 336 Our study does not seek to undermine the great importance or value of natural history or 337 museum collections, but rather to explore and illuminate best practices for data use in the study 338 of phenology. The appropriate source of data may depend entirely on the nature or scale of the 339 question being asked, or the level of specificity desired. Systematic inventories of native bee 340 fauna provide ideal data for understanding bee ecology, but are hugely expensive and time-341 consuming, and should not take the place of opportunistic data for every research endeavor. In 342 some cases, such as when gauging phenological changes across decades, it may not be possible 343 to rely on inventory data, but the limitations of the data available must still be understood, 344 because the best available data may fail to provide the correct answers, regardless of the methods 345 employed. There is much to be gained from appropriate use of opportunistic data to estimate 346 metrics of species phenology, and much to be lost from ignoring it. The influence of data type on 347 phenology estimation is likely important for many other taxa with spotty records and high 348 inherent variability. Incorporating measures of data bias and associated relevance of functional 349 traits to guide interpretation of results may benefit the study of phenology and ecology in a 350 myriad of ways. 351 While we improve our use of data, we must also continue expanding our knowledge base. 352 Natural history collections across the world are struggling to attain the financial, institutional, 353 and cultural support required to develop, curate, document, and digitize museum collections. 354 Improving the flow of high-quality data records from diverse areas and time periods is an 355 important step in alleviating data bias and improving our understanding of phenology. 356 Expanding and further standardizing inventory efforts will also be important. The majority of We are grateful to all the professional and hobbyist bee collectors who contributed 370 specimen records to the USDA NPIC museum dataset, and to the systematists and curators who 371 identify, maintain, and database those collections. In particular, we owe thanks to Harold Ikerd 372 and Skyler Burrows for their work on records used in this study.