Abstract
The increasing popularity of genome resolved meta genomics - the binning of genomes of potentially uncultured organisms direct from the environmental DNA - has resulted in a deluge of draft genomes. There is a pressing need to develop methods to interpret this data. Here, we used machine learning to predict functional and metabolic traits of microbes from their genomes. We collated an extensive database of 84 phenotypic traits associated with 9407 prokaryotic genomes and trained different machine learning models on this data. We found that a lasso logistic regression based on the frequency of gene orthologs had the best combination of functional prediction performance and interpretability. This model was able to classify 65 phenotypic traits with greater than 90