Opportunistically collected photographs can be used to estimate large-scale phenological trends

Premise Research on large-scale patterns of phenology have utilized multiple sources of data to analyze the timing of events such as flowering, fruiting, and leaf out. In-situ observations from standardized surveys are ideal, but remain spatially sparse. Herbarium records and phenology-focused citizen science programs provide a source of historic data and spatial replication, but the sample sizes for any one season are still relatively low. A novel and rapidly growing source of broad-scale phenology data are photographs from the iNaturalist platform, but methods utilizing these data must generalize to a range of different species with varying season lengths and occurring across heterogenous areas. They must also be robust to different sample sizes and potential biases toward well travelled areas such as roads and towns. Methods/Results We developed a spatially explicit model, the Weibull Grid, to estimate flowering onset across large-scales, and utilized a simulation framework to test the approach using different phenology and sampling scenarios. We found that the model is ideal when the underlying phenology is non-linear across space. We then use the Weibull Grid model to estimate flowering onset of two species using iNaturalist photographs, and compare those estimates with independent observations of greenup from the Phenocam network. The Weibull Grid model estimate consistently aligned with Phenocam greenup across four years and broad latitudes. Conclusion iNaturalist observations can considerably increase the amount of phenology observations and also provide needed spatial coverage. We showed here they can accurately describe large-scale trends as long as phenological and sampling processes are considered.


INTRODUCTION
Earlier flowering and leaf onset in the spring is one of the strongest indicators of climate change (Parmesan and Yohe, 2003;Scheffers et al., 2016). Tracking and forecasting these trends for the myriad of plant species remains challenging due to the need for a large amount phenological observations across a species range (Chuine and Régnière, 2017). Large-scale observing networks and preserved herbarium specimens have provided a wealth of information, yet still leave large areas under-represented. Here we explore a new data stream of phenological observations from iNaturalist, an online social platform where users submit photos of plants and animals annotated with the time and geographic coordinates. Other users submit identifications of the taxon, resulting in a consensus vote on the final species determination. Observations of plants on the platform commonly include flowers in the photo, thus can indicate the time and location a species was in flower and is analogous to imaged herbarium records now commonly made available via aggregators such as iDigBio. With an adequate sample size over time and space, the flowering phenology can potentially be inferred for the entire growing season across a large portion of a species range.
There is no standardized method yet developed to model large-scale phenological trends across space from dispersed phenological observations. Potential methods utilizing these data need to account for several confounding factors, as the observations are derived from the interaction of both spatially heterogeneous phenological patterns and sampling bias. At a single location the full seasonal phenology of a species can resemble various statistical distributions such as the normal, beta, or uniform distribution. As the spatial extent expands the full phenological distribution can change depending on sample size and location ( Fig. 1) (de Keyzer et al., 2017). Observations pull from this underlying phenology, but may have their own spatial and temporal observer bias. As with other citizen science projects, we expect iNaturalist observations to have spatial bias toward roads and populated areas (Dickinson et al., 2012). Similar to herbarium records we also anticipate a temporal bias toward in-season observations of flower presence (Daru et al., 2018). Thus, a method to estimate onset at large spatial scales should be generalizable to a variety of phenologies and also accommodate highly clustered sampling with little to no flowering absence represented. In this study we developed a method to estimate the onset of flowering across large-spatial scales using a well-tested phenological estimator based on the Weibull distribution. Using the Weibull distribution is a well tested method and is beneficial as it utilizes only flowering presence data (Pearse et al., 2017;Taylor, 2019). We then integrated the Weibull estimator into a spatial ensemble framework (Fink et al., 2010). In the first part of our analysis, we test the method using simulated data where the underlying phenology and sampling scenarios are known. In the second part, we apply the methods to actual iNaturalist data and compare the estimates of flowering onset to landscape level greenup from the Phenocam network. Finally we discuss the potential for broadest use of iNaturalist data in ecological and phenological research.

The Model
Our model, the Weibull Grid model, combines a Weibull distribution based estimator with a spatially explicit ensemble to estimate flowering onset across a heterogeneous landscape. The Weibull distribution can model a large range of shapes, and is commonly used to used to estimate the start or end of a process (Roberts and Solow, 2003;Pearse et al., 2017). It utilizes only observations of flowering presence, thus is advantageous here since iNaturalist data has a potential in-season biases. The estimated date of first flowering is the sum of the dates of all flowering weighted by the joint Weibull distribution and is equivalent to estimating an origination or extinction date. Taylor (2019) found it to be the superior estimator for flowering onset in most cases, and at least as good as methods that also include sparse absence reporting.
To use the Weibull estimator across heterogeneous landscapes we incorporated it into the random spatial grid used by Fink et al. (2010). A landscape is first divided into equally spaced grids. Within each grid cell boxes are uniformly randomly located. Observations within each box are then used in the Weibull estimator to estimate onset, thus all boxes have a spatially independent estimate. An estimate of onset at any point on the landscape is the median estimate from all boxes in which the point resides, with quantiles of the estimates used for confidence intervals. Model parameters are 1) the sizing of the initial grid cells, 2) the sizing of the boxes, and 3) the number of boxes per grid cell.

Simulated Data
To evaluate the Weibull Grid model we first used simulated flowering data to create scenarios with a variety of underlying phenology and sampling schemes. We first simulate flowering on an x,y coordinate system designed to represent longitude and latitude, respectively. We used four parameters to describe the underlying phenology: 1) the initial start day of year (DOY) of flowering, 2) the length of flowering, 3) whether the spatial gradient is linear (ie. the change in the start DOY is a function of only the y-axis) or nonlinear (ie. a random surface is generated over which onset varies, Fig. S1), and 4) the strength of the spatial gradient. The start DOY is adjusted across the landscape by the equation `DOY ~ initial DOY + y*strength` when the gradient is linear, and `DOY = initial DOY + β*strength` when the gradient is non-linear, where β is a random surface scaled to 0-1. The random surface is generated with the R package gstat using a variogram with a Gaussian distribution, a nugget of 0, sill of 0.025, and range of 0.6. The true flowering curve is a normal distribution with the mean DOY at the midpoint of flowering, and the start DOY and the end doy (start DOY + flowering length) each at three standard deviations. From this, observations of flowering are randomly drawn using the distribution probabilities as weights. Simulated observations of flowering are randomly drawn from all DOYs 1:365 where peak flowering (the center of the distribution) has the highest probability, and any day before the Start DOY or after the End DOY has 0 probability.
The location of each observation in space is simulated to represent completely random sampling (non-clustered), or bias sampling toward 1 to several locations (clustered). Non-clustered sampling was generated by randomly selecting x,y coordinates from uniform distributions. Clustered sampling was generated using a Thomas process, where a random number of cluster centers was chosen from a Poisson distribution (λ=5) and n samples are generated with the highest probabilities centered on the cluster centers. Finally the total sample size of flowering observations was simulated to represent both abundant and relatively rare species.
We used a range of parameters to represent the potential phenology of as many species as possible. We tested flowering lengths of 15, 30, 45, and 60 days (Table  1). Using satellite data from a prior study we found that spring greenup in the Eastern U.S.A. had a trend of 3.36 days latitude -1 (Melaas et al., 2018). From this we derived three potential gradient strengths labeled Weak (1.68), Moderate (3.36), and Strong (6.72). In the simulation study the extent is 0-1, thus the gradient strengths were scaled to 0.1 so 10 degrees of latitude is simulated. Finally the spatial gradient type could be either linear or nonlinear. For the sampling scenarios we used sample sizes of 150, 300, 600, and 1200 flowering observations, where the observations could be uniformly randomly distributed or clustered. For each combination of parameters from both the underlying phenology and sampling scenarios we ran 100 simulations, with 19,200 in total.
Each simulation was fit with the Weibull Grid model several times using different combinations of parameters (Table 1). We tested initial grid cell sizes of 0.1, 0.2, and 0.3, box sizes of 0.2 and 0.4, and boxes per grid cell of 5, 10, and 20, 40. For each simulation we also fit a Naive model to test the performance of the Weibull Grid model. The Naive model used a linear regression of DOY~y, where y, representing latitude, is assumed to be the primary spatial gradient along which phenology varies. The Naive model onset estimate is then 0.01 percentile prediction interval.
For each simulation we generated 441 evenly spaced points on the simulated grid representing the true flowering onset date and calculated the RMSE of the models estimated onset date at the same locations. For each combination of Weibull Grid parameters, underlying phenology, and sampling scenarios we calculated, using the 100 replicate simulations, the average: 1) RMSE, 2) bias, and 3) percentage of the 441 testing observations which had no estimates. A location can lack an estimate in low sample size scenarios if none of its associated boxes had the minimum sample size (n=3) needed to produce an estimate. For each phenology and sampling scenario, we chose the set of Weibull Grid model parameters with the lowest RMSE to present here. The lowest errors for all simulated scenarios and the corresponding Weibull Grid parameters are available in the appendix (Table S1, Fig S2-

iNaturalist and Phenocam comparison
We compared estimates of onset using iNaturalist flowering data for two species and compared them with greenup estimates from Phenocam sites. We downloaded all research grade observations of Rudbeckia hirta and Maienthemum canadense from the dates 2016-01-01 to 2019-08-01 (GBIF.org, 2019). From the primary photo for each observation we scored whether a fully open flower was present or not. We scored R. hirta as "flowering" when at least 1 flowering head with at least 50% of visible ray petals were fully unfolded and non-senesced. We scored M. canadense as "flowering" when at least 1 fully open flower with non-senesced petals was visible.
We kept all observations scored as "flowering" and with coordinates accurate to at least 50km. We built Weibull grid models to estimate flowering onset for each species and year (8 onset models total for the 2 species across 4 years). Weibull Grid parameters were chosen from the optimal parameters from the simulation analysis described above, and were based on the sample size, range size, and flowering length for each species (Figs. S2-S3).
We matched Phenocam locations which best represented the primary habitat and range for the two species, and which had data available in the years 2016-2019. For R. hirta we chose 6 grassland sites in the continental U.S.A. east of longitude -100. For M. canadense we chose 6 deciduous broadleaf sites in the N.E. U.S.A., Great Lakes Region, and southern Appalachian Mountains (Table S2). Using the phenocamr package we downloaded and extracted greenup estimated by the 10% rising transition of the 3-day 90th percentile G cc and the associated confidence intervals (Richardson et al., 2018). For each Phenocam site location the flowering onset from the respective Weibull Grid model (R. hirta models for phenocam grassland sites, and M. canadense models for phenocam deciduous broadleaf sites, matched to the respective year) was estimated for comparison with the phenocam derived greenup date.

Simulation Analysis
Using simulated flowering observations with known onset dates the Weibull Grid model performed best when the underlying phenology had a non-linear spatial gradient (Fig. 2). When the underlying phenology of the simulated flowering data was based on a linear spatial gradient, the Naive model outperformed the Weibull Grid model in nearly all cases (Fig. 2, Table S1). When the underlying phenology was a non-linear spatial gradient, the Weibull Grid model outperformed the Naive model, except with long flower days of weak and moderate gradients (Fig. 2 B,D). Performance of both models generally decreased (ie. had higher error) with increasing flowering length, though with a strong spatial gradient the flowering length had either no effect on the Weibull Grid model (Fig. 3 F), or had the highest errors at the shortest and longest flowering lengths (Fig. 3 E). Simulating clustered sampling, as opposed to completely random sampling, increased errors for the Weibull Grid model regardless of the underlying phenology (Fig. 3, Table S1). The change in error was highest when the sample size was the lowest. Using a sample size of 150 simulated observations the Weibull Grid model RMSE increased by more than 1 day when clustered sampling was introduced. This increase happened regardless of flowering length or the spatial gradient strength (Table S1). As the sample size increased the magnitude of the RMSE change attributed to clustered sampling decreased. Higher sample sizes improved the Weibull Grid model performance overall, especially moving from 150 to 300 samples. Beyond 300 simulated samples performance improvements were relatively small.

Comparison of iNaturalist based estimates with Phenocam estimates
We used the Weibull Grid model to estimate flowering onset for two species based on flower phenology observations derived from iNaturalist photos. We compared these estimates with greenup dates from the Phenocam network. In deciduous broadleaf forests the Weibull Grid model estimated flowering onset of M. canadense an average of 2.7 days after canopy greenup (range: -10.4 to 17.3). At grassland sites the Weibull Grid model estimated flowering onset of R. hirta after canopy greenup at all sites and years, 33.9 days on average (range: 1.3 to 66.1). However, some estimates, especially at lower latitude sites, had confidence intervals extending well before the Phenocam greenup.

DISCUSSION
Here we have shown that opportunistically collected photos of in-flower plants from iNaturalist can provide consistent estimates of large-scale phenological patterns. We showed that a method to estimate flowering onset integrated into a spatial ensemble framework can consistently match independent estimates of greenup.
The same method was also validated using simulated data and was especially beneficial when non-linear spatial gradients were present. The iNaturalist platform is still relatively new, and its data has largely been used as verified observations in species distribution models. Extracting ecologically relevant information from user submitted photographs is challenging (Barve et al., 2019), yet the scale of the data can provide an extremely high return on investment for phenological research.

Diverse Data Harmonization
The sheer amount of observations available from iNaturalist as well as the spatial coverage provides potential for analysis at scales not before possible. The data density provides high replication across gradients, which provides the most information about large-scale patterns (Kreyling et al., 2018). Yet, as a relatively new platform which does not emphasize standardized sampling, iNaturalist data are missing historical context as well as site fidelity. Herbarium specimens have been a primary source of large-scale analysis and provide historic observations dating back over 100 years (Willis et al., 2017). Yet for the two species used here historic herbarium observations rarely exceed 100 total observations per year, thus are likely lacking adequate spatial replication to track seasonal large-scale trends (Fig.  5). Since 2016 the annual number of research grade R. hirta observations on iNaturalist has increased from 500 to nearly 2000. USA-NPN has similar increases in the number of observations, but does not provide as high a density since many observations are repeated at the same location (Fig. S4). Even with this lower density the repeated sampling observations from the USA-NPN, and other monitoring networks, helps constrain phenological timings (Gerst et al., 2016;Elmendorf et al., 2019). Thus with its dense sampling iNaturalist derived phenological data can compliment as opposed to replace other data sources.
Combining these diverse data streams will be a major challenge, as they have a variety of protocols and observation biases. The Plant Phenology Ontology (PPO) provides a framework for harmonizing different phenology observations into common indicators (Stucky et al., 2018;Barve et al., 2019;Brenskelle et al., 2019). Hierarchical models can be used to model the different observations processes under a common underlying phenology (Ogle et al., 2015;Elmendorf et al., 2019) and also share strength between closely related species or ones with similar traits (Theobald et al., 2017). Incorporating potential drivers into models may also be difficult. For example if gridded temperature data were combined with onset estimates from the Weibull Grid model then the spatial grain chosen will affect the results (Jelinski and Wu, 1996). Alternatively, the presence/absence observations could be integrated into a model directly without first estimating onset dates (Clark et al., 2014;Elmendorf et al., 2019). Future analysis using the strengths of each dataset will likely provide the most information for inferring phenological patterns and their drivers.

Weibull Grid Model Performance
The Weibull Grid model was most beneficial when the underlying phenology had a non-linear spatial gradient. This is due to the spatial ensemble method having a location dependent estimate which is informed only by nearby observations (Fink et al., 2010). This limits the model to regions with dense sampling, which we expect iNaturalist to provide for many abundant well-known species (Fig. 5). Estimates improved up to 300 total observations, but beyond that improvement of overall performance was minimal. Our analysis simulated 10 degrees of latitude/longitude, thus as a first approximation we can suggest, for every year of transitions to estimate, sampling densities of 3 observations per degree -2 at mid-latitudes. When the underlying phenology had a linear spatial gradient the Naive model always performed better. This scenario will likely be encountered when an analysis has a small spatial extent, and the onset of flowering, or other phenophases, is nearly simultaneous across the study region. The Weibull Grid model was fairly robust to clustered sampling, with errors increasing by 1 day or less when clustered sampling was introduced and sample size was 300 or greater (Fig. 3). The model performed best with shorter flowering seasons and moderate to strong nonlinear spatial gradients. As the flowering season was lengthened observations on average were farther from the true onset date, thus the estimate had higher uncertainty. With stronger gradients, nearby observations were also more likely to be farther apart in time, again contributing to higher uncertainty and overall error. Larger sample sizes did improve performance to a degree, especially at the strongest gradient (Table S3).
We found a larger box size (sized 0.4 on a 0-1 simulated grid) had the lowest errors except in two cases: With a strong non-linear gradient combined with a short flowering season, and with a high sample size (n=1200). (Figs. S2-S3). The optimal choice for the other two parameters (the initial grid cell size and number of boxes within each cell) varied as the interaction between these two parameters determined the total number of boxes across the entire simulated landscape. Since the final point estimates are based off quartiles, more estimates from a higher amount of boxes are not detrimental and the only downside is increased computational time. Thus a small initial grid cell size and high number of boxes is recommended.

iNaturalist vs. Phenocam Derived Onset Estimates
The flowering onset estimates for M. canadense and R. hirta were consistent with ecosystem level estimates. We had no a priori assumptions about when in the growing season the two species flowered, yet the Weibull Grid model fit with iNaturalist derived phenology data consistently provided estimates near Phenocam derived greenup.  Helenurm and Barrett (1987), as another study showed M. canadense flowering 7 days earlier since 1980 due to warmer temperatures.

The Potential Information Content of Photographs
We are discretizing the iNaturalist photos into binary presence or absence of open flowers, yet there is additional information in the photos which can be utilized. Phenophases such as fruiting, leaf onset and coloring can also be scored. The total number of flowers or fruit on an individual or in a photo can be tallied to quantify flower intensity or reproductive success. The natural progression of different phenophases can be used to constrain the timing of earlier and later transitions (Wolkovich and Ettinger, 2014;Ettinger et al., 2018). For example, fruiting follows flowering, thus the presence of fruit implies flowering onset happened at an earlier date, and vice versa. M. canadense, analysed here, had very discrete phenophase timing. During scoring we noted that open flowers of M. canadense never coincided with ripe fruit, but occasionally coincided with enlarging ovaries. We hypothesize the overlap between phenophases is highly species dependent and thus difficult to generalize in models. For example, some species have ripe fruit which persist for months, and the presence of maturing or unripe fruit may be a better phenophase to utilize. Likewise, long flowering plants with indeterminate inflorescences could simultaneously have unopened flower buds, open flowers, and immature and ripe fruit.
Other information beyond the presence or quantity of phenophases can be extracted from photos as well. Plants besides the target species may be present, allowing for analysis of competing species or community scale phenology. However, researchers should ensure the same photo was not submitted repeatedly to record multiple species. If the photo frame is large enough then landscape level greenness indices can be extracted using either machine learning and/or crowdsourcing techniques (Kosmala et al., 2016), which can be integrated with Phenocam data. iNaturalist observations identified as plant pollinators have a high chance of containing flowers, providing an avenue to analyse these phenological interactions (Gazdic and Groom, 2019). The long-term storage of iNaturalist photos and their metadata, similar to storing herbarium specimens, will provide future researchers with these analysis avenues, as well as other options which may not be obvious today (Heberling and Isaac, 2018).

Limits and Biases of Phenology From Photographs
While we utilized only flowering presence data here, the absence of flowers or other phenophases can be highly informative in constraining estimates of phenological timings (Taylor, 2019). The scoring of iNaturalist photos provides a method for characterizing phenophase absence, though depending on size, abundance, and other traits, such absence reporting may be better suited to some species more than others.
We expected a bias toward observations of plants in flower on iNaturalist, and that was the case with one of the two species analyzed. For R. hirta 94% of observations had open flowers, while M. canadense had 36%. We hypothesize the presence of flowers drives a higher proportion of research grade observations in two ways: 1) the ability to positively identify a plant to the species level, and 2) the prominence of a plant driving the motivation of an observer to photograph it. M. canadense is a common, sometimes dominant, understory plant in N.E. North America with distinctive parallel veined leaves. This allows for both more observations and easier identification when flowers are absent, thus more non-flowering observations. R. hirta is common throughout Eastern North America, but the early season basal rosette can resemble other species in the Asteraceae family, thus lacking flowers identification is difficult. Without a positive identification the observations will not be marked as "research grade", excluding them from most downstream analysis. Alternatively, if R. hirta is in the understory and lacking flowers it may not be prominent enough for an observer to take a photo to begin with. If the former is the primary driver for the dearth of non-flowering R. hirta research grade observations, then there is potentially a large history of phenological absence data available on iNaturalist if more advanced identification methods could be applied.
Extracting phenological information from photos has several strengths and limitations. False positives for flowers are likely low since the presence of flowers is easy to detect. There is potential dispute over what constitutes "flowering", as the inflorescence of some species may remain highly visible even after successful pollination and cessation of ovule viability. Flowering heads comprising numerous flowers can also remain viable over several days to weeks. For these issues the PPO can provide guidance (Stucky et al., 2018), and multiple observers resulting in a consensus vote on final phenology decreases scoring errors (Barve et al., 2019). The potential for false negative (flowers absent) is high, especially if the entire plant is not included in the photo frame. R. hirta is a perennial taprooted herb, so it's likely the full above ground portion of the plant, or at least the full inflorescence, is in most photos. M. canadense has spreading rhizomes, so it would be impossible to judge from a single photograph whether all stems of an individual plant are present. Larger plants will likely have higher error rates from scoring, either from only a portion of an individual being photographed or the plant being far away in the image (Barve et al., 2019). Photos of plants with charismatic flowers, even small ones, may exclude the rest of the plant which would prohibit deriving any information besides flower presence.
Taken together these biases will limit which plant species can have their phenology scored and analyzed from online photographs on iNaturalist and other platforms. Most suitable will be common, abundant and easily identifiable species which are large enough to be conspicuous, but small enough such that the entire plant is included in most photos at a near enough distance to adequately score phenophases. Still unstudied is the effect of observer effort across space and time.
To obtain an adequate sample size a plant must flower (or have other phenophases present) at the locations and times coinciding with observers. Plants more abundant in locations which are more likely to be visited, and which flower during times mostly likely to be visited will be better suited. Tall species such as trees, rare species, or those with inconspicuous flowers will likely not be suitable.

How Predominant is Nonlinear Phenology?
In our simulation analysis we found the Weibull Grid method to be superior when the underlying phenology is non-linear across space. In the real world we hypothesis that large-scale (>100km in extent) phenology gradients will have non-linear trends in the majority of plant species. Complex topography, which affects phenology thru abiotic drivers such as temperature and sunlight, predominates much of the world's land surface. Even in areas of uniform topography differences in weather patterns, community structure, and successional status can change the phenology such that a single explanatory variable such as latitude cannot adequately account for the variance (Melaas et al., 2016). Regions experiencing shifts in climate at faster rates than surrounding regions will also contribute to non-linearity (Ault et al., 2015). In the cases where a linear gradient exists the Weibull Grid method may still be beneficial when the species encompass a large range, and the shape of the phenological distribution varies over latitude. Future research should explore further the approximate threshold in non-linearity in which a simple naive model fails to capture the variation. We admit that beyond studies using gridded climate or remotely sensed data, this aspect of plant phenology is largely unstudied, and we believe iNaturalist data will provide valuable information in analysing large-scale patterns.

CONCLUSIONS
The iNaturalist platform has been operational since 2008, yet it's grown to be one of the most popular citizen science platforms, given its inclusive participatory nature, tools to help with identification, and how it balances ownership issues related to photographs, while also strongly supporting open data approaches. Here we have shown the potential to extract phenological info from iNaturalist photographs, and outlined how these data, with their high spatial replication, can complement other sources. Quantifying large-scale trends across space and time can validate past findings of experimental studies and provide new hypotheses for future ones. Location specific transition estimates can also be used to parameterize models for large-scale forecasts, which benefit from data across a large spatial extent (Taylor, 2019). We made a spatially explicit model which showed, given an adequate sample size, iNaturalist derived data can provide precise estimates of phenological onset for plant species which occur across heterogeneous environments. Flexible modelling approaches which can utilize the unique structure of these data, eg. the Weibull estimator, will be key in large scale analysis. In the future, and as the iNaturalist database grows, it will likely become a primary source of phenological data. non-linear Moderate 45 8 (6.6) 6 (7) 5.1 (7.5) 4.5 (6.6)