RT Journal Article SR Electronic T1 Functional and evolutionary significance of unknown genes from uncultivated taxa JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.01.26.477801 DO 10.1101/2022.01.26.477801 A1 Álvaro Rodríguez del Río A1 Joaquín Giner-Lamia A1 Carlos P. Cantalapiedra A1 Jorge Botas A1 Ziqi Deng A1 Ana Hernández-Plaza A1 Lucas Paoli A1 Thomas S.B. Schmidt A1 Shinichi Sunagawa A1 Peer Bork A1 Luis Pedro Coelho A1 Jaime Huerta-Cepas YR 2022 UL http://biorxiv.org/content/early/2022/01/27/2022.01.26.477801.abstract AB Most microbes on our planet remain uncultured and poorly studied. Recent efforts to catalog their genetic diversity have revealed that a significant fraction of the observed microbial genes are functional and evolutionary untraceable, lacking homologs in reference databases. Despite their potential biological value, these apparently unrelated orphan genes from uncultivated taxa have been routinely discarded in metagenomics surveys. Here, we analyzed a global multi-habitat dataset covering 151,697 medium and high-quality metagenome assembled genomes (MAGs), 5,969 single-amplified genomes (SAGs), and 19,642 reference genomes, and identified 413,335 highly curated novel protein families under strong purifying selection out of previously considered orphan genes. These new protein families, representing a three-fold increase over the total number of prokaryotic orthologous groups described to date, spread out across the prokaryote phylogeny, can span multiple habitats, and are notably overrepresented in recently discovered taxa. By genomic context analysis, we pinpointed thousands of unknown protein families to phylogenetically conserved operons linked to energy production, xenobiotic metabolism and microbial resistance. Most remarkably, we found 980 previously neglected protein families that can accurately distinguish entire uncultivated phyla, classes, and orders, likely representing synapomorphic traits that fostered their divergence. The systematic curation and evolutionary analysis of the unique genetic repertoire of uncultivated taxa opens new avenues for understanding the biology and ecological roles of poorly explored lineages at a global scale.Competing Interest StatementThe authors have declared no competing interest.