TY - JOUR T1 - Machine learning based prediction of functional capabilities in metagenomically assembled microbial genomes JF - bioRxiv DO - 10.1101/307157 SP - 307157 AU - Fred Farrell AU - Orkun S. Soyer AU - Christopher Quince Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/04/25/307157.abstract N2 - The increasing popularity of genome resolved meta genomics - the binning of genomes of potentially uncultured organisms direct from the environmental DNA - has resulted in a deluge of draft genomes. There is a pressing need to develop methods to interpret this data. Here, we used machine learning to predict functional and metabolic traits of microbes from their genomes. We collated an extensive database of 84 phenotypic traits associated with 9407 prokaryotic genomes and trained different machine learning models on this data. We found that a lasso logistic regression based on the frequency of gene orthologs had the best combination of functional prediction performance and interpretability. This model was able to classify 65 phenotypic traits with greater than 90 ER -