Cooption of polyalanine tract into a repressor domain in the mammalian transcription factor HoxA11

An enduring problem in biology is explaining how the functions of genes originated and how those functions diverge between species. Despite detailed studies on the functional evolution of a few proteins, the molecular mechanisms by which protein functions have evolved are almost entirely unknown. Here we show that a polyalanine tract in the homeodomain transcription factor HoxA11 arose in the stem-lineage of mammals and functions as an autonomous repressor module by physically interacting with the PAH domains of SIN3 proteins. These results suggest that long polyalanine tracts, which are common in transcription factors and often associated with disease, may generally function as repressor domains and can contribute to the diversification of transcription factor functions despite the deleterious consequences of polyalanine tract expansion. Research Highlights We show that a polyalanine track in HoxA11 evolved into a repressor domain in mammals through an increase in alanine repeat number, indicating that transcription factors can evolve novel functions despite the potential deleterious consequences associated with amino acid repeats.


Introduction
The mechanisms of gene regulatory evolution are debated, however, it is clear that changes in the regulatory activities of transcription factors plays an important role in the evolution of gene regulation (Wilson, 1975). Detailed functional dissection of a few transcription factors indicates that the evolution of new co-factor interactions (Löhr et al., 2001;Heffer et al., 2010;Brayer et al., 2011b), ligand-binding activities (Thornton et al., 2003;Ortlund et al., 2007;Bridgham et al., 2009), subcellular localization signals (Salichs et al., 2009), post-translational modifications (Lynch et al., 2011), and neo-allosteric effects (Nnamani et al., 2016) can generate functional diversity. Despite these studies, major questions in transcription factor evolution remain unanswered. For example, how have transcription factors evolved and modularized their regulatory functions, how are these functions maintained despite sequence divergence, and how do functional changes occur without strong negative pleiotropic effects?
Alanine rich regions have long been recognized to be common in transcriptional repressor domains suggesting they have a role in mediating transcriptional repression (Janody et al., 2001;Maurer et al., 2003), but a mechanistic explanation for these observations has been lacking. Here we show that a polyalanine tract in the homeodomain transcription factor HoxA11 evolved in the stem-lineage of mammals, physically interacts with the PAH domains of SIN3 proteins, and functions as a SIN3A/HDAC1-dependent repressor domain. Remarkably, while our results suggest that polyalanine tracts may generally have the ability to function as SIN3dependant repressor domains polyalanine tract expansions are also associated with several disease suggesting they can contribute to the functional diversification of transcription factors despite their potentially deleterious consequences.

Ancestral sequence reconstruction
To determine when the polyalanine tract in HoxA11 evolved we identified HoxA11 genes from 156 placental mammals, 3 marsupials, 2 monotremes, 8 sauropsids (including birds, turtles, and squamate reptiles), 2 species of amphibian, 2 species of coelacanth, 30 Euteleost fish, and 4 species of Chondrichthyes ( Fig. 2A). Ancestral sequences were inferred with PAML (CODEML) using the species tree and the GTR model of sequence evolution, indels inferred by maximum parsimony. The Bayesian posterior probability at each site of the reconstructed ancestral sequence was >0.96 for all sites and genes. The ancestral Eutherian, Therian, Mammalian, and Amniote genes were synthesized by GeneScript Corp. with human optimized codon usage and ligated into pcDNA3.1(+)-V5/His as described previously (Brayer et al., 2011a;Nnamani et al., 2016). Proper expression and nuclear localization of all extant and reconstructed expression constructs were verified by western blotting. Interested readers are referred to (Brayer et al., 2011a;Nnamani et al., 2016) for further details.

Cell culture and luciferase reporter assays
HoxA11 was amplified by PCR from embryonic cDNA from mouse (Mus musculus). Fulllength and deletion coding regions were cloned into the GAL4 DNA-binding domain vector pM2 (mouse amino terminal and internal deletions), pBIND (alanine repeat), or pFLAG. HeLa cells were grown in DMEM supplemented with 10% FBS. Cells were transiently transfected with Lipofectamine 2000 (Invitrogen) according to the manufacturer's protocol with 2ng of the Renilla control vector (pGL4.71) and 50ng of the appropriate luciferase reporter. Luciferase expression was assayed 48 hours after transfection using the Dual Luciferase Reporter System (Promega).
Each experiment was repeated four times, with 8 replicates per experiment.

Co-immunoprecipitation assays
HeLa cells were incubated overnight at a density of ~4.4 x 10 6 cells/mL on 10 cm plates prior to transfection. A total of 12 μg of appropriate expression vectors was transfected and incubated for 4 hours at 37°C, before addition of 7 mL DMEM. After 16 hours the transfection media was removed and replaced with fresh DMEM, and the cells were incubated an additional 24 hours before harvesting. After removing DMEM and washing cells twice with PBS, 1 mL icecold lysis buffer (20 mM Tris, pH 8.0, 40 mM KCl, 10 mM MgCl2, 10% glycerol, 1% Triton X-100, 1x Complete EDTA-free protease inhibitor cocktail (Roche), 1x PhosSTOP (Roche)) was added to each plate and cells were harvested by scraping with a rubber spatula. Cells were then incubated on ice for 30 minutes in 420 mM NaCl. Whole cell lysate was cleared by centrifugation at 10,000 rpm for 30 minutes at 4°C, and supernatant was transferred to a clean microfuge tube. After equilibrating protein concentrations, 1 mL of sample was mixed with 40 mL of antibody conjugated agarose beads (Sigma) pre-washed with TNT buffer (50 mM Tris-HCl, pH 7.5, 150 mM NaCl, 0.05% Triton X-100), and rotated overnight at 4°C. The following day, samples were treated with 50 U DNase (Roche) and 2.5 µg RNase (Roche) for 60 minutes at room temperature, as indicated. Samples were washed 3x with 1 mL wash buffer (150 mM NaCl, 0.5% Triton X-100). After the final wash, agarose beads were resuspended in elution buffer (500 mM Tris pH 7.5, 1 M NaCl), and boiled to elute immunoprecipitated complexes.
Eluted protein was run on Bis-tris gels, probed with antibodies specific to epitope tags, and visualized by Chemi-luminescence.

HoxA11 is a transcriptional repressor of IGFBP1
Previous studies have found that the insulin response sequence (IRS) from the insulinlike growth factor binding protein 1 (IGFBP1) enhancer is directly bound by HoxA11 (Kim et al., 2003;Gao et al., 2004). As a first step to determine if HoxA11 may regulate IGFBP1 expression, we explored their expression in previously generated RNA-Seq data human endometrial stromal fibroblasts (ESFs) and ESFs differentiated and decidual stromal cells (DSCs) (Lynch et al., 2015) and found that both were expressed more highly in DSCs than ESFs. Next, we silenced endogenous HoxA11 expression using siRNA and assayed IGFBP1 expression 48hrs after transfection and decidualization by quantitative real-time PCR (qRT-PCR). We found that treatment with a HoxA11 specific siRNA lead to a strong knockdown of HeLa cells do not express HoxA11 allowing us measure the effects of HoxA11 on luciferase expression without interference from endogenous HoxA11. We found that transfection of the HoxA11 expression vector repressed luciferase expression ~64% compared to controls, consistent with HoxA11 being direct a repressor of IGFBP1 expression (Fig. 1C).
Next we mapped the location of the experimentally determined Hox biding-site in the human IRS (Kim et al., 2003;Gao et al., 2004) in relation to the location of H3K27me3 peaks (a marked 'poised' cis-regulatory elements) identified from human ESFs using ChIP-chip (Grimaldi et al., 2011), as well as binding-sites for co-repressors and DNaseI clusters identified by ChIP-Seq from ESFs (Lynch et al., 2015), to determine if the IRS has the signature of a negative regulatory element for IGFBP1. Indeed, the Hox binding site is located within a DNaseI hypersensitivity cluster and an H3K27me3 peak that is lost upon differentiation of ESFs into decidual stromal cells (Fig. 1D) and overlaps with ENCODE ChIP-Seq peaks for several corepressors including YY, SIN3A, and HDAC2 (Fig. 1D). Thus we conclude that HoxA11 may directly repress IGFBP1 expression in association with co-repressors such as SIN3A, YY1, and HDACs.

The HoxA11 polyalanine tract is a repressor domain
To determine if the HoxA11 polyalanine tract functions in transcriptional regulation, we assayed the ability of mouse HoxA11 amino (N)-terminal deletion mutants to repress luciferase expression from the pGL2-3xIRS[luc/3xIRS] reporter vector in transiently transfected HeLa cells.
We found that full-length HoxA11 (CDS) as well as N-terminal deletions up to amino acid 170 (Δ170) strongly repressed luciferase expression. In contrast N-terminal deletions that removed the polyalanine tract (Δ222 and Δ242) and an internal deletion of the alanine tract (ΔAla) lost nearly all of their ability to repress luciferase expression ( Fig. 2A) suggesting the polyalanine tract is necessary for HoxA11 to repress luciferase expression from the pGL2-3xIRS [luc/3xIRS] reporter vector.
To test if the HoxA11 polyalanine tract is sufficient to mediate transcriptional repression, we fused it to the GAL4 DNA-binding domain (GAL4-DBD) and tested the ability of the fusion protein to repress luciferase expression from the reporter vector pGL4.35[luc2P/9XGAL4UAS], which contains the SV40 enhancer/early promoter and nine repeats of the GAL4 Upstream Activator Sequence (UAS); this sequence drives transcription of the luciferase reporter gene luc2P in response to binding of fusion proteins containing the GAL4 DNA binding domain.
Consistent with the polyalanine tract being a repressor domain, we found that the polyalanine tract GAL4-DBD fusion protein strongly repressed luciferase expression ( Fig. 2A).
We next inferred whether HoxA11 repressed luciferase expression through steric interference of co-activator binding, or by recruiting a co-repressor complex by treating cells with trichostatin A (TSA), which selectively inhibits mammalian class I and II histone deacetylases.
Consistent with HoxA11 recruiting corepressors, we found that TSA treatment effectively abolished luciferase repression by HoxA11 constructs containing the polyalanine stretch and the polyalanine tract GAL4-DBD fusion protein compared to TSA-free controls ( Fig. 2A). Thus class I or II histone deacetylases (HDACs) are likely required for the HoxA11 polyalanine tract to mediate repression.

The HoxA11 polyalanine tract interacts with SIN3A PAH domains
A previous study found that the homeodomain of HoxA11 binds a complex including YY1 and HDAC2 to mediate repression (Luke et al., 2006), suggesting that co-factors may mediate repression from the HoxA11 polyalanine tract. To determine if the polyalanine tract mediates physical interactions between HoxA11 and SIN3A or HDAC1, we tested the ability of FLAGtagged full length HoxA11 and the ΔAla internal deletion mutant to co-immunoprecipitate with HA-tagged HDAC1 and SIN3A. We found that full length HoxA11 interacted with both HDAC1 and SIN3A, however, the ΔAla internal deletion mutant was unable to interact with either SIN3A or HDAC1 (Fig. 2B). These results indicate that the polyalanine tract is critical for the interaction of HoxA11 with SIN3A and HDAC1.
SIN3A and its paralog SIN3B contain two highly conserved PAH domains that interact with transcription factors through a 'wedged helical bundle' composed of four a-helices (Spronk et al., 2000;Swanson et al., 2004). These helices form a hydrophobic cleft in PAH domains that bind hydrophobic a-helices and the hydrophobic face amphipathic a-helices in their binding partners (Eilers et al., 1999;van Ingen et al., 2003;Anderson et al., 2009), suggesting that the polyalanine tract in HoxA11 may bind to the PAH domains of SIN3A. Consistent with this hypothesis, we found that V5/His-tagged PAH1 and PAH2 domains of SIN3A coimmunoprecipitated with the polyalanine tract GAL4-DBD fusion protein (Fig. 2C), but that neither PAH domain co-immunoprecipitated with a GAL4-DBD tagged HoxA11 construct with an internal deletion of the core AAATSAAAVAAAA residues (DCoreAla) of the polyalanine tract ( Fig. 2C). Taken together these results indicate that the HoxA11 polyalanine tract interacts with the PAH domains of SIN3A, likely recruiting a larger co-repressor complex that includes HDAC1.

The HoxA11 polyalanine repressor domain evolved in mammals
We previously reported that the HoxA11 polyalanine tract evolved in mammals (Chiu et al., 2000;Roth et al., 2005) suggesting that non-mammalian HoxA11 proteins, which have polyalanine tracts less than 6 residues, may not be able to interact with SIN3A to mediate transcriptional repression from the 3xIRS regulatory element. To test this hypothesis, we assayed the ability of chicken, platypus, short-tailed opossum, mouse, and human, as well as reconstructed ancestral Eutherian, Therian, Mammalian, and Amniote HoxA11 proteins to repress luciferase expression from the pGL2-3xIRS[luc/3xIRS] reporter vector in transiently transfected HeLa cells. Similar to our observation for mouse HoxA11, extant and reconstructed HoxA11 proteins from other mammals repressed reporter gene expression by 35-81% depending on the species (Fig. 3A). The chicken and ancestral Amniote HoxA11 proteins, however, only repressed reporter gene expression by 11-14% (Fig. 3A). Thus the ability of HoxA11 to repress reporter gene expression greatly increased in mammals.
Next we tested whether extant and reconstructed ancestral proteins differed in their ability to interact with HA-tagged HDAC1 and SIN3A in co-immunoprecipitation assays.
Consistent with our luciferase assay results, V5/His-tagged extant and ancestral HoxA11 proteins from mammals were all able to interact with HDAC1 and SIN3A (Fig. 3B). The chicken and ancestral Amniote HoxA11 proteins, however, did not interact with either SIN3A or HDAC1 (Fig. 3B). These results demonstrate that the expansion of the polyalanine tract in HoxA11 created a new protein-protein interaction between HoxA11 and SIN3A leading to a new repressor domain in mammals.

The HoxA11 polyalanine tract evolved in the mammalian stem-lineage
To more precisely determine when the polyalanine track originated, we identified HoxA11 genes from 207 vertebrates including 156 placental mammals, three marsupials, two monotremes, eight sauropsids (including birds, turtles, and squamate reptiles), two species of amphibian, two species of coelacanth, 30 Euteleost fish, and four species of Chondrichthyes.
We found that polyalanine tracts longer than six residues were only found in mammals, within mammals polyalanine tracts range from 10-25 residues long (Fig. 4A). Parsimony-based ancestral state reconstructions across the HoxA11 genes in our dataset indicates that the common ancestor of amniotes likely had a polyalanine tract that was only 5 residues long, while the common ancestor of mammals had one or two polyalanine tracts that were 12 and 20 residues long, respectively (Fig. 4B). While many mammals had contiguous alanine runs 20-22 residues long, we found that only red fox and a 'long' allele from domestic dog had repeats longer than 23 residues (Fig. 4C). These results indicate that the polyalanine tract expanded in the Mammalian stem-lineage and that purifying selection constrains tract length to be generally less than 23 residues long.

The polyalanine tract may dock into the PAH domains of SIN3
Polyalanine repeats transition from random coil to stable hydrophobic a-helices at ~10 residues long (Marqusee et al., 1989;Fiori et al., 1993), thus they are biophysically and structurally predisposed to bind the hydrophobic cleft of PAH domains and function as SIN3/HDAC-dependent repressor domains. To better understand the structural basis for the derived interaction between the mammalian HoxA11 polyalanine tract and the PAH domains of SIN3A, we modeled their interaction based on the solution structure of the SIN3A PAH2-HBP1 interaction (Swanson et al., 2004) using Rosetta and RosettaDock (Kim et al., 2004;Lyskov and Gray, 2008). We found that the chicken and ancestral amniote polyalanine tracts were predicted to be mostly unstructured with a short helix (5-8 residues long; Fig. 5A), in contrast mammalian polyalanine tracts that were predicted to form long hydrophobic a-helices (Fig. 5B). These structural models suggest that the short polyalanine tracts of non-mammalian HoxA11 proteins do not form an a-helix of sufficient length or stability to mediate a functional interaction with PAH domains, whereas the longer tracts found in mammals can form longer amphipathic a-helices that mediate a stable interaction with PAH domains (Fig. 5C).

A new repressor domain, but at what cost?
While longer polyalanine tracts in mammalian HoxA11 proteins may form more stable interaction interfaces with PAH domains than shorter repeats and thus facilitate their cooption into SIN3-dependant repressor domains, polyalanine tract expansions in transcription factors are associated with numerous human diseases (Goodman et al., 1997;Kjaer et al., 2002;Albrecht et al., 2004;Brown and Brown, 2004). Previous studies, for example, found that polyalanine repeats greater than 23 residues long self-interact to form cytoplasmic aggregates that are excluded from the nucleus (Albrecht et al., 2004;Oma et al., 2007). These results suggest there is a strict upper limit to polyalanine tract length, which is consistent with our observation that only two species possess HoxA11 polyalanine tracts >23 residues long. Given the significant potential for deleterious effects of expanding polyalanine repeats beyond the threshold of ~23 residues, it is remarkable that selection has maintained long polyalanine repeats in any protein and has coopted at least one into a repressor domain. These observations suggest that while there is a cost associated with functional polyalanine tracts, there is also a benefit for transcription factor function.

Conclusions
Alanine repeats are common in eukaryotic proteins and emerged during waves of repeat expansions across vertebrates (Albà and Guigo, 2004;Salichs et al., 2009;Haerty and Golding, 2010). Our observations suggest that the predisposition of polyalanine tracts to from stable ahelices may have facilitated their cooption into SIN3-dependant repressor domains independently in many proteins and indicates that amino acid repeats can contribute to the diversification of transcription factor functions (Salichs et al., 2009;Radó-Trilla et al., 2015), despite the potential deleterious consequences of repeat expansions.