%0 Journal Article %A Jeremy G. Todd %A Jamey S. Kain %A Benjamin L. de Bivort %T Systematic exploration of unsupervised methods for mapping behavior %D 2016 %R 10.1101/051300 %J bioRxiv %P 051300 %X To fully understand the mechanisms giving rise to behavior, we need to be able to precisely measure it. When coupled with large behavioral data sets, unsupervised clustering methods offer the potential of unbiased mapping of behavioral spaces. However, unsupervised techniques to map behavioral spaces are in their infancy, and there have been few systematic considerations of all the methodological options. We compared the performance of seven distinct mapping methods in clustering a data set consisting of the x-and y-positions of the six legs of individual flies. Legs were automatically tracked by small pieces of fluorescent dye, while the fly was tethered and walking on an air-suspended ball. We find that there is considerable variation in the performance of these mapping methods, and that better performance is attained when clustering is done in higher dimensional spaces (which are otherwise less preferable because they are hard to visualize). High dimensionality means that some algorithms, including the non-parametric watershed cluster assignment algorithm, cannot be used. We developed an alternative watershed algorithm which can be used in high-dimensional spaces when the probability density estimate can be computed directly. With these tools in hand, we examined the behavioral space of fly leg postural dynamics and locomotion. We find a striking division of behavior into modes involving the fore legs and modes involving the hind legs, with few direct transitions between them. By computing behavioral clusters using the data from all flies simultaneously, we show that this division appears to be common to all flies. We also identify individual-to-individual differences in behavior and behavioral transitions. Lastly, we suggest a computational pipeline that can achieve satisfactory levels of performance without the taxing computational demands of a systematic combinatorial approach.Abbreviations GMM: Gaussian mixture model; PCA: principal components analysis; SW: sparse watershed; t-SNE: t-distributed stochastic neighbor embedding %U https://www.biorxiv.org/content/biorxiv/early/2016/05/02/051300.full.pdf