TY - JOUR T1 - NBLAST: Rapid, sensitive comparison of neuronal structure and construction of neuron family databases JF - bioRxiv DO - 10.1101/006346 SP - 006346 AU - Marta Costa AU - Aaron D. Ostrovsky AU - James D. Manton AU - Steffen Prohaska AU - Gregory S. X. E. Jefferis Y1 - 2014/01/01 UR - http://biorxiv.org/content/early/2014/08/09/006346.abstract N2 - Efforts to map neural circuits from model organisms including flies and mice are now generating multi-terabyte datasets of 10,000s of labelled neurons. Technologies such as dense EM based reconstruction, and sparse/multicolor labeling with image registration allow neurons to be embedded within the spatial context of a circuit or a whole brain. These ever-expanding data demand new computational tools to search, organize and navigate neurons. We present a simple, but fast and sensitive, algorithm, NBLAST, for measuring pairwise neuronal similarity by position and local geometry. Inspired by the BLAST algorithm for biological sequence data, NBLAST decomposes a query and target neuron into short segments. Each matched segment pair is scored using a log-likelihood ratio scoring matrix empirically defined by the statistics of real matches and non-matches in the data.We demonstrate the application of a reference implementation of NBLAST to a dataset of 16,129 single Drosophila neurons. NBLAST scores are sensitive enough to distinguish 1) two images of the same neuron, 2) two neurons of the same identified neuronal type 3) two neurons of very closely related types. We demonstrate that NBLAST scores can be used to identify neuronal types, such as olfactory projection neurons, with reliability that matches or exceeds expert annotation in a fraction of the time. We also show that clustering using appropriately normalized NBLAST scores can reveal classic morphological types as well as identify unpublished classes. We carry out detailed analysis of a number of neuronal classes including Kenyon cells, olfactory and visual projection neurons, auditory, and male-specific P1 neurons. This identifies many new neuronal types and reveals unreported features of topographic organization. Finally we provide a complete clustering of the 16,129 neurons in the test dataset into 1,052 clusters of highly related neurons. These clusters are then organized into superclusters, enabling both exploration of the dataset and the matching of individual clusters to morphological types in the literature. Finally NBLAST queries can be used to identify candidate neurons matching neurite tracts with transgene expression pattern.We provide a general purpose open source toolbox that implements construction of score matrices, the core pairwise scoring algorithm, de novo and precomputed database search and clustering along with complete source code and data for the analyses in this paper. The neuronal families can also be queried online through virtualflybrain.org and visualized in interactive 3D at jefferislab.org. ER -