ABSTRACT
Taxonomic expertise for the identification of species is rare and costly. On-going advances in computer vision and machine learning have led to the development of numerous semi- and fully automated species identification systems. However, these systems are rarely agnostic to specific morphology, rarely can perform taxonomic “approximation” (by which we mean partial identification at least to higher taxonomic level if not to species), and frequently rely on costly scientific imaging technologies. We present a generic, hierarchical identification system for automated taxonomic approximation of organisms from images. We assessed the effectiveness of this system using photographs of slipper orchids (Cypripedioideae), for which we implemented image pre-processing, segmentation, and colour and shape feature extraction algorithms to obtain digital phenotypes for 116 species. The identification system trained on these digital phenotypes uses a nested hierarchy of artificial neural networks for pattern recognition and automated classification that mirrors the Linnean taxonomy, such that user-submitted photos can be assigned a genus, section, and species classification by traversing this hierarchy. Performance of the identification system varied depending on photo quality, number of species included for training, and desired taxonomic level for identification. High quality photos were scarce for some taxa and were under-represented in the training set, resulting in imbalanced network training. The image features used for training were sufficient to reliably identify photos to the correct genus but less so to the correct section and species. The outcomes of this project include a library of feature extraction algorithms called ImgPheno, a collection of scripts for neural network training called NBClassify, a library for evolutionary optimization of artificial neural network construction called AI::FANN::Evolving and a planned web application called OrchID for identification of user-submitted images. All project outcomes are open source and freely available.