Monitoring photogenic ecological phenomena: Social network site images reveal spatiotemporal phases of Japanese cherry blooms

Some ecological phenomena are visually engaging and widely celebrated. Consequently, these have the potential to generate large footprints in the online and social media image records which may be valuable for ecological research. Cherry tree blooms are one such event, especially in Japan where they are a cultural symbol (Sakura, 桜). For centuries, the Japanese have celebrated Hanami (flower viewing) and the historical data record of the festival allows for phenological studies over this period, one application of which is climate reconstruction. Here we analyse Flickr social network site data in an analogous way to reveal the cherry blossoms’ seasonal sweep from southern to northern Japan over a twelve-week period. Our method analyses data filtered using geographical constraints, multi-stage text-tag classification, and machine vision, to assess image content for relevance to our research question and use it to estimate historic cherry bloom times. We validated our estimated bloom times against official data, demonstrating the accuracy of the approach. We also investigated an out of season Autumn blooming that has gained worldwide media attention. Despite the complexity of human photographic and social media activity and the relatively small scale of this event, our method can reveal that this bloom has in fact been occurring over a decade. The approach we propose in our case study enables quick and effective monitoring of the photogenic spatiotemporal aspects of our rapidly changing world. It has the potential to be applied broadly to many ecological phenomena of widespread interest.


Introduction
Climate change is disrupting many ecological phenomena, threatening insect pollinators vital to our 44 food supply [1], generating conditions that increase the likelihood of wildfires 57 Geo-tagged visual media in particular, as a form of volunteered geographic information, has seen strong 58 interest in scientific research. This data may include photographic evidence of events in remote areas 59 that would otherwise be impractical to survey [10][11][12][13][14], but it may also contain observations of popular 60 events, in which case the sheer abundance of the data is potentially of benefit. Daume [12] however, 61 has noted that although Twitter, the source of data in his study of invasive species, is potentially a rich 62 source of observations, there are technical challenges involved in using SNS data; careful validation of 63 results is required. There has also been concern raised about the implications for personal privacy of 64 the availability of mass data sets being used in research, especially when the subject of the research 65 relates to the personal attributes of people uploading images of themselves and their friends [15].

66
Of specific relevance to the work presented in this paper, if carefully managed, volunteered information 67 gleaned from SNS may provide a valuable source of data to monitor and understand the dynamics of 68 ecosystems [16,17]. In effect, every person posting to a social network site might be what we propose 69 to call an "incidental citizen scientist" of value to ecological projects. Previous applications in the 70 domain include a project where manually analysed tweets from a sample spanning three years were 71 used to detect invasive species [12]. Purkart and Depa detected invasive species in new sites in Slovakia,

72
Czechia and Austria using crowd-sourced information on Facebook [18]. In addition, Becken and 73 Stantic monitored sentiments on the Great Barrier Reef in Australia using data contained in tweets [19].

74
The primarily socially-governed (rather than scientifically-governed) data-collection behaviour of 75 incidental citizen scientists requires researchers to carefully assess the quality of their data [12]. In 76 addition, researchers must be mindful that the data may be inadequate or unsuitable for answering some 77 types of pertinent question. For example, the effect human activity is having on the climate, and the 78 impact of this on ecosystems, is of major concern to ecologists [1], but it is not immediately apparent 79 how interest in understanding these changes is reflected in the activities of SNS users (e.g., see [20]  for each month against a tag is coloured according to the relative frequency of its prevalence in that month. Note that "autumn" 147 is a frequent tag in October, November and December. These photos were found by manual checking to contain autumn leaves 148 rather than blossoms.

149
As anticipated, the computer vision API returned the text tag "cherry blossom" for most photos. Human 150 analysis of the other returned tags revealed that most were conceptually related to cherry blossoms, 151 except perhaps for the tags "autumn" and "maple tree", that appeared in the last months of the year 152 associated with some photographs (Fig 2). We might sensibly expect these tags to be associated by the 153 computer vision algorithm with autumn leaves and Japanese Maples. A visual inspection of images 154 with these tags confirmed this, from which we learnt that: (i) the computer vision API was correct in its 155 assignment of the tag; and (ii) the data originally downloaded from Flickr using the search term "cherry 156 blossom" contained images irrelevant to our goal. To refine the data set, we therefore chose to 157 automatically maintain only photos with the tag "cherry blossom" and without tags "autumn" or "maple 158 tree".

160
After the initial data collection (section 2.1) and filtration (section 2.2) of photos sourced from across 161 Japan (Fig 1), we used the geographic data associated with each image in the set to focus specifically    1, column B). The text tags, and their 229 relative frequencies, returned by the machine vision API for this data set are reported (Fig 5). Out of 230 the images returned from the text tag filter, 21,908 were subsequently tagged "cherry blossom" by the 231 computer vision API (section 2.2), but some were eliminated due to them being tagged also "autumn" 232 or "maple tree", resulting in a set of 21,633 images (    (Fig 6).

Tokyo autumn bloom identification 263
The temporal distribution of cherry blossom search results in Tokyo was amalgamated across all years 264 in the study period by month and the monthly total calculated (Fig 7A). This data's annual pattern has 265 its main peak early in the year corresponding roughly to the Northern spring, as did the Japan-wide 266 image set (cf. Fig 1). In both the Japanese national and subset Tokyo-restricted datasets, a secondary 267 peak of SNS images of cherry blossoms is apparent from October to December (Figs 7A & 7B).  October to December in the years 2008 -2018 inclusive. A subset of these photos (blue bars) was also tagged "cherry 273 blossom" and not "autumn" and not "maple tree", by the computer vision API (See Table 1, column B).

274
The evidence for the November-centred flowering period was unexpected. Hence, to ensure that the 275 photographs were not simply misclassified in this region due to an error in our method, we manually   In the current study we formally assess SNS images within Japan to understand if blooming patterns of 292 cherry blossom trees can be reliably estimated from this data. Fig 4 evidences  citizen scientists. However, our data collection method also showed it is possible to capture the temporal 313 signature of the cherry blossom event even in regions away from major cities ( Fig. 4 and S1 Movie).

314
Applying our methodology, the SNS data can be successfully used to efficiently regenerate the dates of 315 the flowering events to the JNTO-reported dates over a decade, in both Tokyo, and Kyoto. We note that 316 our estimate each year for both regions consistently lagged or matched, but never foreshadowed, the 317 date reported by JNTO (Tables 2 & 3). There may be a number of reasons for this. For instance, the 318 SNS data peak may be generated as part of a self-actualising prophecy in which it follows the JNTO-319 published date. For this to be the case, for some reason a majority of visitors to the cherry blossoms p.12 320 would choose to visit only after published full bloom dates, biasing the SNS data in the way we 321 observed. An ethnographic survey might elucidate the relevance of this effect on our project, but that is 322 well beyond our present scope. However, we comment below on the extent to which our method stands 323 independently of the JNTO-reported dates.

324
An alternative reason for the lag in SNS data with respect to the published dates might derive from 325 some characteristic of the timestamping of images uploaded. Perhaps this might be biasing the SNS 326 data somehow. For instance, perhaps international visitors to the site were more likely to have their 327 phones set to a time zone preceding Tokyo's rather than after it, ensuring that the images uploaded were 328 frequently offset as we observed. We find this argument difficult to justify given the auto-update of guests who must plan their travel well in advance and would probably target the main season. In short, 347 the evidence of the late autumn bloom is likely to be doubly incidental in the sense that the "citizen 348 scientists" were incidentally contributing ecological data, but also, they were unlikely to have 349 deliberately set out to witness this phenomenon.

350
The autumn bloom has, it turns out, gained mainstream media attention: "for the first time in memory"