PT - JOURNAL ARTICLE AU - Boyan Angelov TI - species2vec: A novel method for species representation AID - 10.1101/461996 DP - 2018 Jan 01 TA - bioRxiv PG - 461996 4099 - http://biorxiv.org/content/early/2018/11/05/461996.short 4100 - http://biorxiv.org/content/early/2018/11/05/461996.full AB - Word embeddings are omnipresent in Natural Language Processing (NLP) tasks. The same technology which defines words by their context can also define biological species. This study showcases this new method - species embedding (species2vec). By proximity sorting of 6761594 mammal observations from the whole world (2862 different species), we are able to create a training corpus for the skip-gram model. The resulting species embeddings are tested in an environmental classification task. The classifier performance confirms the utility of those embeddings in preserving the relationships between species, and also being representative of species consortia in an environment.