Abstract
The steady decline of avian populations worldwide urgently calls for a cyber-physical system to monitor bird migration at the continental scale. Compared to other sources of information (radar and crowdsourced observations), bioacoustic sensor networks combine low latency with a high taxonomic specificity. However, the scarcity of flight calls in bioacoustic monitoring scenes (below 0.1% of total recording time) requires the automation of audio content analysis. In this article, we address the problem of scaling up the detection and classification of flight calls to a full-season dataset: 6672 hours across nine sensors, yielding around 480 million neural network predictions. Our proposed pipeline, BirdVox, combines multiple machine learning modules to produce per-species flight call counts. We evaluate BirdVox on an annotated subset of the full season (296 hours) and discuss the main sources of estimation error which are inherent to a real-world deployment: mechanical sensor failures, sensitivity to background noise, misdetection, and taxonomic confusion. After developing dedicated solutions to mitigate these sources of error, we demonstrate the usability of BirdVox by reporting a species-specific temporal estimate of flight call activity for the Swainson’s Thrush (Catharus ustulatus).
Competing Interest Statement
The authors have declared no competing interest.