Abstract
Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness, but automatically extracting vocalization data from existing large databases has only recently gained traction and has yet to be evaluated with respect to accuracy of different approaches. Here, we use a recently-published machine learning framework to extract syllables from six bird species ranging in their phylogenetic relatedness from 1–85 million years, comparing how phylogenetic relatedness impacts accuracy as well as the utility of applying trained models to novel species. Model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa; however, using models trained on multiple distantly related species can recover the lost accuracy. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Previous submission did not include figures.