TY - JOUR T1 - The Co-regulation Data Harvester for <em>Tetrahymena thermophila</em>: automated high-throughput gene annotation and functional inference in a microbial eukaryote JF - bioRxiv DO - 10.1101/115816 SP - 115816 AU - Lev M. Tsypin AU - Aaron P. Turkewitz Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/03/10/115816.abstract N2 - Identifying co-regulated genes can provide a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, which is not available for most organisms with sequenced genomes. Studies in Tetrahymena thermophila, the most experimentally accessible ciliate, have generated a rich transcriptomic database covering many well-defined physiological states. Genes that are involved in the same pathway show significant co-regulation, and screens based on gene co-regulation have identified novel factors in specific pathways, for example in membrane trafficking. However, a limitation has been the relatively sparse annotation of the Tetrahymena genome, making it impractical to approach genome-wide analyses. We have therefore developed an efficient approach to analyze both co-regulation and gene annotation, called the Co-regulation Data Harvester (CDH). The CDH automates identification of co-regulated genes by accessing the Tetrahymena transcriptome database, determines their orthologs in other organisms via reciprocal BLAST searches, and collates the annotations of those orthologs' functions. Inferences drawn from the CDH reproduce and expand upon experimental findings in Tetrahymena. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems. ER -