TY - JOUR T1 - A high-throughput screen for transcription activation domains reveals their sequence characteristics and permits reliable prediction by deep learning JF - bioRxiv DO - 10.1101/2019.12.11.872986 SP - 2019.12.11.872986 AU - Ariel Erijman AU - Lukasz Kozlowski AU - Salma Sohrabi-Jahromi AU - James Fishburn AU - Linda Warfield AU - Jacob Schreiber AU - William S. Noble AU - Johannes Söding AU - Steven Hahn Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/12/12/2019.12.11.872986.abstract N2 - Transcription activation domains (ADs) are encoded by a wide range of seemingly unrelated amino acid sequences, making it difficult to recognize features that permit their dynamic behavior, fuzzy interactions and target specificity. We screened a large set of random 30-mer peptides for AD function and trained a deep neural network (ADpred) on the AD-positive and negative sequences. ADpred correctly identifies known ADs within protein sequences and accurately predicts the consequences of mutations. We show that functional ADs are (1) located within intrinsically disordered regions with biased amino acid composition, (2) contain clusters of hydrophobic residues near acidic side chains, (3) are enriched or depleted for particular dipeptide sequences, and (4) have higher helical propensity than surrounding regions. Taken together, our findings fit the model of “fuzzy” binding through hydrophobic protein-protein interfaces, where activator-coactivator binding takes place in a dynamic hydrophobic environment rather than through combinations of sequence-specific interactions. ER -