RT Journal Article SR Electronic T1 OrchID: a Generalized Framework for Taxonomic Classification of Images Using Evolved Artificial Neural Networks JF bioRxiv FD Cold Spring Harbor Laboratory SP 070904 DO 10.1101/070904 A1 Serrano Pereira A1 Barbara Gravendeel A1 Patrick Wijntjes A1 Rutger A. Vos YR 2016 UL http://biorxiv.org/content/early/2016/08/22/070904.abstract AB Taxonomic expertise for the identification of species is rare and costly. On-going advances in computer vision and machine learning have led to the development of numerous semi- and fully automated species identification systems. However, these systems are rarely agnostic to specific morphology, rarely can perform taxonomic “approximation” (by which we mean partial identification at least to higher taxonomic level if not to species), and frequently rely on costly scientific imaging technologies. We present a generic, hierarchical identification system for automated taxonomic approximation of organisms from images. We assessed the effectiveness of this system using photographs of slipper orchids (Cypripedioideae), for which we implemented image pre-processing, segmentation, and colour and shape feature extraction algorithms to obtain digital phenotypes for 116 species. The identification system trained on these digital phenotypes uses a nested hierarchy of artificial neural networks for pattern recognition and automated classification that mirrors the Linnean taxonomy, such that user-submitted photos can be assigned a genus, section, and species classification by traversing this hierarchy. Performance of the identification system varied depending on photo quality, number of species included for training, and desired taxonomic level for identification. High quality photos were scarce for some taxa and were under-represented in the training set, resulting in imbalanced network training. The image features used for training were sufficient to reliably identify photos to the correct genus but less so to the correct section and species. The outcomes of this project include a library of feature extraction algorithms called ImgPheno, a collection of scripts for neural network training called NBClassify, a library for evolutionary optimization of artificial neural network construction called AI::FANN::Evolving and a planned web application called OrchID for identification of user-submitted images. All project outcomes are open source and freely available.