PT - JOURNAL ARTICLE AU - Arli A. Parikesit AU - Peter F. Stadler AU - Sonja J. Prohaska TI - Large-Scale Evolutionary Patterns of Protein Domain Distributions in Eukaryotes AID - 10.1101/142182 DP - 2017 Jan 01 TA - bioRxiv PG - 142182 4099 - http://biorxiv.org/content/early/2017/05/27/142182.short 4100 - http://biorxiv.org/content/early/2017/05/27/142182.full AB - The genomic inventory of protein domains is an important indicator of an organism’s regulatory and metabolic capabilities. Existing gene annotations, however, can be plagued by substantial ascertainment biases that make it difficult to obtain and compare quantitative domain data. We find that quantitative trends across the Eukarya can be investigated based on a combination of gene prediction and standard domain annotation pipelines. Species-specific training is required, however, to account for the genomic peculiarities in many lineages. In contrast to earlier studies we find wide-spread statistically significant avoidance of protein domains associated with distinct functional high-level gene-ontology terms.1998 ACM Subject Classification J.3 Life and Medical Sciences