RT Journal Article SR Electronic T1 Unifying the global coding sequence space enables the study of genes with unknown function across biomes JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.06.30.180448 DO 10.1101/2020.06.30.180448 A1 Chiara Vanni A1 Matthew S. Schechter A1 Silvia G. Acinas A1 Albert Barberán A1 Pier Luigi Buttigieg A1 Emilio O. Casamayor A1 Tom O. Delmont A1 Carlos M. Duarte A1 A. Murat Eren A1 Robert D. Finn A1 Renzo Kottmann A1 Alex Mitchell A1 Pablo Sanchez A1 Kimmo Siren A1 Martin Steinegger A1 Frank Oliver Glöckner A1 Antonio Fernandez-Guerra YR 2020 UL http://biorxiv.org/content/early/2020/11/18/2020.06.30.180448.abstract AB One of the biggest challenges in molecular biology is bridging the gap between the known and the unknown coding sequence space. This challenge is especially extreme in microbial systems, where between 40% and 60% of the predicted genes are of unknown function. Discarding this uncharacterized fraction should not be an option anymore. Here, we present a conceptual framework and a computational workflow that bridges this gap and provides a powerful strategy to contextualize the investigations of genes of unknown function. Our approach partitions the coding sequence space removing the known-unknown dichotomy, unifies genomic and metagenomic data and provides a framework to expand those investigations across environments and organisms. By analyzing 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes we showcase our approach and its application in ecological, evolutionary and biotechnological investigations. As a result, we put into perspective the extent of the unknown fraction, its diversity, and its relevance in genomic and environmental contexts. By identifying a target gene of unknown function for antibiotic resistance, we demonstrate how a contextualized unknown coding sequence space enables the generation of hypotheses that can be used to augment experimental data.Competing Interest StatementThe authors have declared no competing interest.