RT Journal Article SR Electronic T1 Combining multiple functional annotation tools increases completeness of metabolic annotation JF bioRxiv FD Cold Spring Harbor Laboratory SP 160887 DO 10.1101/160887 A1 Marc Griesemer A1 Jeffrey Kimbrel A1 Carol Zhou A1 Ali Navid A1 Patrik D'haeseleer YR 2017 UL http://biorxiv.org/content/early/2017/07/07/160887.abstract AB The dirty little secret behind genome-wide systems biology modeling efforts is that they are invariably based on very incomplete functional annotations. Annotated genomes typically contain 30-50% of genes with little or no functional annotation [1], severely limiting our knowledge of the "parts lists" that the organisms have at their disposal. In metabolic modeling, these incomplete annotations are often sufficient to derive a reasonably complete model of the core metabolism at least, typically consisting of well-studied (and thus well-annotated) metabolic pathways that are sufficient for growth in pure culture. However secondary metabolic pathways or pathways that are important for growth on unusual metabolites exchanged in complex microbial communities are often much less well understood, resulting in missing or lower confidence functional annotations in newly sequenced genomes. For example, one third of the EC database consists of "orphan enzymes" that have been described in the literature but for which no sequence data is available [1].Individual metabolic annotation tools often return annotations for different subsets of genes, offering the potential to greatly increase the completeness of metabolic annotations by combining the outputs of multiple tools. Indeed, recent genome-scale modeling of Clostridium beijerinckii NCIMB 8052 demonstrated that the total number of genes and reactions included in the final curated model could be almost doubled by incorporating multiple annotation tools [2].Here, we present preliminary results on a comprehensive reannotation of 27 bacterial Tier 1 and Tier 2 reference genomes from BioCyc[3], focusing on enzymes with EC numbers annotated by KEGG[4], RAST[5], EFICAz[6], and the Brenda enzyme database [7].