Abstract
An automatic, quick, accurate, and scalable method for animal behavior inference using only videos of animals offers unprecedented opportunities to understand complex biological phenomena and answer challenging ecological questions. The advent of sophisticated machine learning techniques now allows the development and implementation of such a method. However, apart from developing a network model that infers animal behavior from video inputs, the key challenge is to obtain sufficient labeled (annotated) data to successfully train that network - a laborious task that needs to be repeated for every species and/or animal system. Here, we propose solutions for both problems, i) a novel methodology for rapidly generating large amounts of annotated data of animals from videos and ii) using it to reliably train deep neural network models to infer the different behavioral states of every animal in each frame of the video. Our method’s workflow is bootstrapped with a relatively small amount of manually-labeled video frames. We develop and implement this novel method by building upon the open-source tool Smarter-LabelMe, leveraging deep convolutional visual detection and tracking in combination with our behavior inference model to quickly produce large amounts of reliable training data. We demonstrate the effectiveness of our method on aerial videos of plains and Grévy’s Zebras (Equus quagga and Equus grevyi). We fully open-source the code1 of our method as well as provide large amounts of accurately-annotated video datasets2 of zebra behavior using our method. A video abstract of this paper is available here3.
Competing Interest Statement
The authors have declared no competing interest.