PT  - JOURNAL ARTICLE
AU  - Tianxun Zhou
AU  - Calvin Chee Hoe Cheah
AU  - Eunice Wei Mun Chin
AU  - Jie Chen
AU  - Hui Jia Farm
AU  - Eyleen Lay Keow Goh
AU  - Keng Hwee Chiam
TI  - ConstrastivePose: A contrastive learning approach for self-supervised feature engineering for pose estimation and behavorial classification of interacting animals
AID  - 10.1101/2022.11.09.515746
DP  - 2022 Jan 01
TA  - bioRxiv
PG  - 2022.11.09.515746
4099  - http://biorxiv.org/content/early/2022/11/10/2022.11.09.515746.short
4100  - http://biorxiv.org/content/early/2022/11/10/2022.11.09.515746.full
AB  - In recent years, supervised machine learning models trained on videos of animals with pose estimation data and behavior labels have been used for automated behavior classification. Applications include, for example, automated detection of neurological diseases in animal models. However, there are two problems with these supervised learning models. First, such models require a large amount of labeled data but the labeling of behaviors frame by frame is a laborious manual process that is not easily scalable. Second, such methods rely on handcrafted features obtained from pose estimation data that are usually designed empirically. In this paper, we propose to overcome these two problems using contrastive learning for self-supervised feature engineering on pose estimation data. Our approach allows the use of unlabeled videos to learn feature representations and reduce the need for handcrafting of higher-level features from pose positions. We show that this approach to feature representation can achieve better classification performance compared to handcrafted features alone, and that the performance improvement is due to contrastive learning on unlabeled data rather than the neural network architecture.Author Summary Animal models are widely used in medicine to study diseases. For example, the study of social interactions between animals such as mice are used to investigate changes in social behaviors in neurological diseases. The process of manually annotating animal behaviors from videos is slow and tedious. To solve this problem, machine learning approaches to automate the video annotation process have become more popular. Many of the recent machine learning approaches are built on the advances in pose-estimation technology which enables accurate localization of key points of the animals. However, manual labeling of behaviors frame by frame for the training set is still a bottleneck that is not scalable. Also, existing methods rely on handcrafted feature engineering from pose estimation data. In this study, we propose ConstrastivePose, an approach using contrastive learning to learn feature representation from unlabeled data. We demonstrate the improved performance using the features learnt by our method versus handcrafted features for supervised learning. This approach can be helpful for work seeking to build supervised behavior classification models where behavior labelled videos are scarce.Competing Interest StatementThe authors have declared no competing interest.