PT - JOURNAL ARTICLE AU - Jacob L. Steenwyk AU - Dayna C. Goltz AU - Thomas J. Buida III AU - Yuanning Li AU - Xing-Xing Shen AU - Antonis Rokas TI - orthoSNAP: a tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees AID - 10.1101/2021.10.30.466607 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.10.30.466607 4099 - http://biorxiv.org/content/early/2021/11/02/2021.10.30.466607.short 4100 - http://biorxiv.org/content/early/2021/11/02/2021.10.30.466607.full AB - Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of positive selection, often rely on gene families of single-copy orthologs (SC-OGs). In contrast, large gene families with multiple homologs in one or more species—a phenomenon observed among several important families of genes such as transporters and transcription factors—are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed orthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by orthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure. From 46,645 orthologous groups of genes inferred using graph-based clustering of sequence similarity scores across four separate eukaryotic datasets, we identified 6,634 SC-OGs; using orthoSNAP on the remaining 40,011 orthologous groups of genes, we identified an additional 6,630 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar. orthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.Competing Interest StatementAntonis Rokas is a scientific consultant for LifeMine Therapeutics, Inc.