Intrinsically disordered pathogen effector alters the STAT1 dimer to prevent recruitment of co-transcriptional activators CBP/p300

Signal transducer and activator of transcription (STATs) proteins signal from cell-surface receptors to drive transcription of immune response genes. The parasite Toxoplasma gondii blocks STAT1-mediated gene expression by secreting the intrinsically disordered protein TgIST that traffics to the host nucleus, binds phosphorylated STAT1 dimers, and occupies nascent transcription sites that unexpectantly remain silenced. Here we define a core repeat region within internal repeats of TgIST that is necessary and sufficient to block STAT1-mediated gene expression. Cellular, biochemical, mutational, and structural studies demonstrate that the repeat region of TgIST adopts a helical conformation upon binding to STAT1 dimers. The binding interface is defined by a groove formed from two loops in the STAT1 SH2 domains that reorient during dimerization. TgIST binding to this newly exposed site at the STAT1 dimer interface altered its conformation and prevented recruitment of co-transcriptional activators, thus defining the mechanism of blocked transcription.

Signal transducer and activator of transcription (STATs) proteins signal from cell-surface receptors to 27 drive transcription of immune response genes. The parasite Toxoplasma gondii blocks STAT1-28 mediated gene expression by secreting the intrinsically disordered protein TgIST that traffics to the 29 host nucleus, binds phosphorylated STAT1 dimers, and occupies nascent transcription sites that 30 unexpectantly remain silenced. Here we define a core repeat region within internal repeats of TgIST 31 that is necessary and sufficient to block STAT1-mediated gene expression. Cellular, biochemical, 32 mutational, and structural studies demonstrate that the repeat region of TgIST adopts a helical 33 conformation upon binding to STAT1 dimers. The binding interface is defined by a groove formed 34 from two loops in the STAT1 SH2 domains that reorient during dimerization. TgIST binding to this 35 newly exposed site at the STAT1 dimer interface altered its conformation and prevented recruitment 36 of co-transcriptional activators, thus defining the mechanism of blocked transcription. phosphorylation, dimerization, and nuclear transportation 2 ; whereupon they show high-affinity DNA-42 binding 3 . Type I interferons signal through STAT1/STAT2 heterodimers to activate genes that contain 43 IFN sensitive response elements (ISRE) in their promoters 4 . Similarly, type II interferon signals through 44 STAT1/STAT1 homodimers that recognize genes containing gamma activated sequences (GAS) in their 45 promoters 5 . There is considerable overlap between these two pathways as many IFN stimulated genes 46 (ISGs) contain both ISRE and GAS sequences 4,5 . Once bound to cognate response genes, transcriptional 47 co-activators CBP/p300 and BRG1 interact with STAT complexes 6-8 and recruit polymerase II to initialize 48 gene transcription 9 . However, the structural basis of STAT1 dimer recognition by transcriptional co-49 activators is still largely unknown. 50 As a widespread and successful apicomplexan parasite, Toxoplasma gondii can infect and survive in 51 almost all warm-blooded hosts where it resides within its host cell in a protective compartment called 52 the parasitophorous vacuole. Interferon signaling plays a central role in Toxoplasma infection and while 53 type II interferon plays a dominant role 10,11 , type I interferon is also implicated in control of chronic 54 infection 12 . Infection by T. gondii blocks STAT1 mediated transcription 13,14 , despite not altering STAT1 55 phosphorylation, dimer formation, nuclear import or DNA recognition 15 . This block is mediated by the 56 secreted effector TgIST (Toxoplasma inhibitor of STAT1-dependent transcription) that disrupts type I and 57 type II interferon mediated gene expression 12,16,17 . TgIST is exported from the parasitophorous vacuole 58 and transported to the host nucleus where it interacts with STAT complexes bound to cognate 59 recognition GAS sequences on the DNA. Additionally TgIST recruits a nucleosome remodeling and 60 deacetylase complex known as Mi-2/NuRD, which is known for its role in repressing gene expression 61 during development 18 , suggesting that chromatin modification may contribute to altered expression 62 14,15 . A number of pathogens have been shown to disrupt STAT1 signaling 19,20 by a variety of different 63 mechanisms. However, these prior findings do not address how TgIST, which binds to STAT1 complexes 64 that are poised on chromatin at correct transcriptional start sites, is able to block transcription. 65 Moreover, TgIST like many other pathogen secreted effectors, is an intrinsically disorder protein (IDP), 66 making its function enigmatic. Such disordered proteins exhibit several features that facilitate their roles 67 in transcriptional regulation and cell signaling including flexible conformation that allows for permissive 68 partner interactions and ability to function dynamically in regulatory networks 21 . Hence the study of 69 pathogen IDPs may inform us about how such proteins evolve and function to modulate host signaling 70 pathways. 71 In the present study, we utilized a combination of cellular, biochemical, and structural studies to 72 define a repeat region in TgIST that mediates binding to the phosphorylated STAT1 dimer. A core 73 sequence in the repeat region of TgIST was both necessary and sufficient to bind to STAT1 and block 74 transcription. Biophysical and structural analysis revealed that the repeat region binds in a groove 75 formed by two loops of the SH2 domain in STAT1, resulting in a stabilized and altered conformation. 76 Occupancy of the SH2 groove by the TgIST repeat displaced the transcriptional co-activators CBP/p300, 77 thus both defining the molecular mechanism of silenced transcription, and providing new insight into 78 STAT1-mediated gene expression. 79

80
TgIST recruits STAT1 and Mi-2/NuRD using different domains 81 To dissect the interactions between TgIST and host proteins that underlie its inhibition of IFN-g signaling, 82 we performed immunoprecipitation (IP) experiments to determine if TgIST binds the Mi-2/NuRD 83 complex independently of STAT1 or only as a ternary complex. Human sarcoma cell lines (U3A, STAT1 84 deficient; U3A-STAT1, STAT1 complemented) were infected with TgIST-Ty expressing parasites for 16 hr 85 and then activated with IFN-g (100 U/mL) for 1 hr. Nuclear extracts were used to capture TgIST-Ty by IP 86 and co-precipitating proteins were analyzed by western blot. Consistent with a previous report 17 , TgIST 87 interacted with STAT1 in U3A-STAT1 cells and also co-immunoprecipitated components of the Mi-88 2/NuRD complex MTA1 and HDAC1 (Fig. 1a). TATA binding protein (TBP), another component of nuclear 89 extracts, did not interact with TgIST-Ty (Fig. 1a). TgIST-Ty also efficiently immunoprecipitated HDAC1 90 and MTA1 in U3A cells that lack STAT1 (Fig. 1a), indicating that TgIST binds to each complex separately. 91 To further explore host protein complexes interacting with TgIST, liquid chromatography-tandem 92 mass spectrometry (LC-MS/MS) was performed on TgIST IP'd samples from cells described in Fig. 1a. We 93 compared proteins IP'd by TgIST in U3A cells that do not express STAT vs. U3A-STAT1 expressing cells 94 both in the absence and presence of IFN-g. Uninfected but IFN-g treated samples were used as a control 95 to filter proteins that were IP'd non-specifically. Proteins that were represented by a minimum of ≥ 2 96 peptides with 99% identify threshold from three independent TgIST IP replicates were analyzed to 97 identify proteins that were significantly enriched in U3A-STAT1 expressing cells (P ≤ 0.05 unpaired 98 Student's t-test, two-tailed) were identified in Scaffold (Table S1, S2). Significantly enriched proteins 99 were then subjected to protein-protein network analysis using the STRING database 22 to identify 100 putative interactions within host proteins. Consistent with the IP and western blot results, String 101 network analysis indicated a highly significant interaction of TgIST with HDAC1 and MTA1 and other Mi-102 2/NuRD components (Fig. 1b). The interaction between TgIST and Mi-2/NuRD was STAT1-independent 103 since the identical Mi-2/NuRD complex was detected when STAT1-deficient U3A cells were used (Fig.  104 1c). We did not detect significant interactions with other transcription factors or chromatin modifying 105 complexes (Tables S1, S2). 106

Identification of STAT1 and Mi-2/NuRD binding domains in TgIST 107
TgIST is predicted to be an intrinsically disordered protein that lacks conserved folded domains or 108 sequence similarity to other known proteins, thus complicating analysis of function. TgIST contains an N-109 terminal hydrophobic sequence important for secretion and several nuclear localization sequences 110 (NLS), a composition that is highly conserved among a variety of strains representing all the major 111 lineages of T. gondii 23 (Fig. S1a). Interestingly, several strains of T. gondii contain ~ 40 amino acid region 112 that is repeated twice in tandem, the arrangement and phylogenetic relatedness of which suggests that 113 they arose by several independent duplications in different lineages (Fig. S1b). The importance of the 114 repeat structure is described further below, but even strains with a single sequence of this element are 115 able to block IFN-g signaling 17 . 116 To explore the function of different regions of TgIST, we generated a series of C-terminal truncations 117 of the TgIST type I allele of TgIST that contains the tandem repeat region, expressed them transiently in 118 HEK293T cells, and tested binding to STAT1 and Mi-2/NuRD. Full length and truncated constructs of 119 TgIST were designed to initiate downstream of the TEXEL processing site that is cleaved by the ASP5 120 protease 24 during export in T. gondii (Fig. 1d). Sequential C-terminal deletions were created to either 121 contain both central repeat domains (TgIST-T1 and TgIST-T2), lack the second repeat (TgIST-T2-∆R2), or 122 lack both repeats (TgIST-T3). All four of these constructs contained at least one nuclear localization 123 sequence (NLS). Truncated TgIST proteins containing a C-terminal Ty-tag were transfected into HEK293T 124 cells and immunoprecipitated using an anti-Ty antibody. Total and phosphorylated STAT1 were readily 125 co-precipitated with the mature form of TgIST (TgIST-MT) in IFN-g treated nuclear factions but not in 126 untreated cells, consistent with TgIST only binding to the phosphorylated STAT1 dimer that forms after 127 INF-g treatment (Fig. 1e). Furthermore, we found the interaction between STAT1 and TgIST relied on the 128 repeat region based on the findings that truncations containing the repeats (i.e. TgIST-T1, TgIST-T2) 129 pulled down pSTAT1 while a construct lacking the repeats (i.e. TgIST-T3) did not (Fig. 1e). The core 130 components of Mi-2/NuRD complex, MTA1 and HDAC1, co-immunoprecipitated only with the mature 131 form of TgIST and this interaction was IFN-g independent (Fig. 1e). Thus, these data indicate that the 132 repeat region of TgIST mediates binding to STAT1 dimers, while independently, the C-terminal region is 133 responsible for the recruitment of Mi-2/NuRD transcription repression complex. 134

Repeat region of TgIST is sufficient to block IFN-g signaling 135
To examine the ability of TgIST to directly interact with STAT1, we co-expressed them as recombinant 136 proteins in E. coli. We expressed wild type STAT1 core fragment that was previously used for 137 crystallization 25 , or a double cysteine mutated version (A656C and N658C, referred to as STAT1cc) that 138 generates a locked dimer (STAT1cc) independent of phosphorylation 26 . STAT1 monomer, or STAT1cc 139 dimer, were expressed as tag-free forms together with the different 6XHis-tagged TgIST constructs 140 shown in Figure 1. Capture of His-tagged TgIST by nickel chromatography was used to assess STAT1 141 binding by SDS-PAGE and Coomassie blue staining. These studies revealed that TgIST requires the 142 internal repeats for STAT1 binding as both TgIST-T1 and TgIST-T2 pulled down STAT1cc, while TgIST-T3 143 did not (Fig. 2a). Furthermore, the dimer form of STAT1cc was required for the interaction with TgIST as 144 all three constructs failed to pull down STAT1 monomer (Fig. 2a). To further validate the necessity of 145 repeats in STAT1 binding, we compared binding of a construct containing only the first repeat (TgIST-146 T2DR2) with a construct that contains both repeats (TgIST-T4 as shown in Fig. 1d). TgIST-T4 147 immunoprecipitated STAT1cc much more efficiently than TgIST-T2DR2, suggesting that allow one repeat 148 is sufficient for binding, the two repeats act cooperatively (Fig. 2b). Collectively, these findings indicate 149 that the internal repeats are both necessary and sufficient for binding to STAT1 and they only bind to 150 the STAT1cc dimer form. 151 To examine the role of the repeats in blocking IFN-g signaling, TgIST-GFP fusion constructs based on 152 the similar truncations described above were transfected into Hela cells treated with IFN-g and the 153 expression of IRF1 was visualized by immunofluorescence microscopy. Surprisingly, truncated forms of 154 TgIST-T1 and TgIST-T2, truncated forms of TgIST that harbor internal repeats but lack the C-terminal Mi-155 2/NuRD binding domain, were still able to block IFN-g signaling (Fig. 2c). However, TgIST-T3, which lacks 156 the repeat region, did not block expression of IRF1 despite its normal trafficking to the host nucleus (Fig. 2c). Deletion of one of the repeats in TgIST-T2-∆R2 still led to a block in expression of IRF1 (Fig. 2c), in 158 agreement with its ability to bind STAT1cc (Fig. 2a). To evaluate the efficacy of IRF1 repression among 159 different constructs, we monitored expression by quantitative imaging. There was no significant 160 difference between the constructs containing repeats (i.e. TgIST-MT, TgIST-T1 and TgIST-T2), however, 161 the IRF1 intensity of cells expressing TgIST-T3, which lacks repeats, was significantly elevated from the 162 other constructs (Fig. 2d). As expected, the construct TgIST-T2DR2 was less effective in blocking IRF1 163 induction than constructs bearing two repeats (Fig. 2d), again suggesting the repeats act cooperatively. 164 The differences in IRF1 induction were not due to differences in STAT1 expression levels that remained 165 the same in all conditions (Fig. S2). 166

Mapping the minimal STAT1 binding domain of TgIST 167
To map the binding interface, purified TgIST-T2 complexed with STAT1cc was subjected to limited 168 proteolysis using trypsin, followed by SDS-PAGE separation of resistant fragments (Fig. S3). Six partial 169 digestion products referred to as S1-S6 (Fig. S3a) spanning from high to low molecular weight were 170 isolated from the gel and subjected to mass spectrometry (MS) analysis. All of the limited proteolysis 171 fragments contained a 7-amino acid region TALDV(F/L)R that is found at the core of both repeats (Fig.  172 s3b). To confirm the importance of the 7 residue core region in the repeat, we changed all the residues 173 to Ala in the construct that only contains the second of the two repeats (i.e. TgIST-R2) to generate the 174 mutant called TgIST-R2-M1 (Fig. 2e). Compared to the wild-type TgIST-R2, mutated TgIST-R2-M1 lost the 175 ability to bind to STAT1cc in the co-expression assay, confirming that this core region is necessary for 176 STAT1 dimer binding (Fig. 2f). We also examined the role of the tandem repeats, and the 7-residue core, 177 within TgIST in blocking IRF1 expression induced by IFN-g using transient transfection in HeLa cells. 178 TgIST-R2 that only contains the second repeat with an NLS sequence (TgIST-NLS-R2) partially blocked 179 IRF1 induction while TgIST-T4, which contains both repeats with addition of a NLS (TgIST-NLS-T4), was 180 much more efficient (Fig. 2g). To test whether the core 7 amino acids at the center of the repeat were 181 necessary to block induction, we mutated these residues to Ala (similar to that shown in (Fig. 2e) in both 182 repeats of the TgIST-T2 construct to generate the mutant TgIST-T2-M2. This construct lost the ability to 183 block IRF1 induction, demonstrating the repeats are necessary to block STAT1-mediated transcription 184 ( Fig. 2f,g). Taken together, these experiments define a core region within the repeats of TgIST that 185 mediates binding to STAT1 and is both necessary and sufficient to block IRF1 induction. 186 A core sequence in the repeats of TgIST mediated binding to the phosphorylated STAT1 dimer 187 To facilitate further biochemical and structural studies, we expressed N-terminal Strep-tagged STAT1 in 188 E.coli TKB1 cells that co-express ELK kinase to generate phosphorylated STAT1 dimers (pSTAT1d). multiple angle laser light scattering with in-line size exclusion chromatography (SEC-MALS) analysis to 191 confirm the correct formation of the ternary complex ( Fig. S4 a-d). To validate the importance of the 192 internal repeats of TgIST in binding to pSTATd, we monitored interactions using Bio-Layer Interferometry 193 (BLI). We immobilized phosphorylated pSTAT1d or STAT1 monomer proteins on an  and monitored the interaction with purified TgIST-R2 or TgIST-R2-M1 in solution. pSTATd interacted 195 strongly with TgIST-R2 as shown by the considerable increase of signal while no binding was detected to 196 STAT1 monomer (Fig. 3a). The mutated TgIST-R2-M1 construct showed dramatically decreased binding 197 compared to the wild type TgIST-R2 construct (Fig. 3a). These findings confirm that TgIST binds only to 198 the phosphorylated STAT1 dimer, and that this interaction requires the 7 amino acid core region shared 199 by the repeats. To explore the stoichiometry of TgIST binding to pSTAT1d, we immobilized pSTAT1d on 200 the Ni-NTA biosensor then immersed the loaded pins in purified TgIST-R2 peptide after GST removal. In 201 this configuration, where monomeric TgIST-R2 in solution binds to dimeric STAT1 on the pin, the 202 calculated affinity was 330 ± 50 nM using a 1:1 model for curve fitting (Fig. 3b). We also reversed the 203 sample order and charged the biosensor with TgIST-R2 then interacted the pins with soluble pSTAT1d In 204 this configuration, where TgIST on the pin binds to dimeric pSTAT1d in solution, the apparent avidity 205 was to 22.0±2.0 nM when the curves were fit with a 1:2 model, consistent with cooperative binding by 206 the STAT1 dimer (Fig. 3c). 207 We then examined the role of the TgIST repeats in interacting with pSTAT1d in complex with a GAS 208 oligo to form the gamma activated factor (GAF), as detected by electrophoretic mobility shift assay 209 (EMSA). The GAF complex readily formed when pSTAT1d was combined with a labeled GAS oligo, but 210 not with an ISRE oligo that normally binds to STAT1/STAT2 heterodimers (Fig. 3d). TgIST-T2 bound to 211 GAF and formed a super shifted 2 nd GAF complex in a concentration dependent manner ( Fig. 3d) that 212 was competed by excess cold probe (Fig. S4e). A similar super-shifted 2 nd GAF was also observed when 213 GST-tagged TgIST-R2 was used in the EMSA; however, the mutant TgIST-R2-M1 did not bind GAF ( Fig.  214 3e, S4f). These findings indicate that the interaction between TgIST and pSTAT1d bound to DNA also 215 requires the core 7 amino acids that are conserved in both repeats in TgIST. 216

Crystal structure of TgIST-STAT1 represents a novel binding mode 217
To reveal the mechanism of how TgIST recognizes pSTAT1d, we crystalized pSTAT1d complexed with 218 DNA either alone or together with TgIST-R2 and obtained structures at 2.8 Å and 4.5 Å, respectively. 219 Comparison of the structures revealed a new electron density located between two relatively 220 unstructured loops of the SH2 domain of STAT1 ( Fig. S5a and S5b). Because this structure was of limited 221 resolution, we linked TgIST-R2 to the C-terminus of STAT1 (Fig. 4a) and obtained crystals where the 222 peptide bound in the same place as visualized with much better resolution. This construct had lower 223 binding to TgIST-R2 immobilized on the pin when compared to STAT1 alone, supporting the conclusion 224 that the R2 peptide in this fused construct was bound in a functional state (Fig. 4b). Four datasets were 225 merged and scaled to yield a final model of the STAT1-TgIST-R2 structure at 2.9 Å resolution (Table S5, 226 Fig. 4c). TgIST-R2 adopted an a-helix and was bound in a groove between two loops formed by the SH2 227 domains of STAT1 (Fig. 4c). The density of the two loops, which are not clearly resolved in the pSTAT1d 228 structure and only partially seen in monomeric STAT1, are distinct in the new TgIST-R2 bound structure 229 (Fig. S5c). The binding of TgIST-T2 creates a 17Å wide, 18Å deep cavity on top of the protomer of the 230 STAT1 dimer structure (Fig. 4d). Electrostatic potential analysis of the TgIST binding region on STAT1 231 reveals a negatively charged cavity at the bottom of two loops (Fig. 4d, red), in addition to a small 232 hydrophobic patch. One STAT1 protomer shares a 551 Å 2 buried surface area with TgIST-R2, a 233 comparatively large binding interface in terms of the size of the a-helix. After binding to TgIST-R2, the 234 length of three b-strands (L601-F603, I612-W616 and H629-A630) in the SH2 domain of STAT1 was 235 extended to form four additional hydrogen bonds (Fig. S5d). Additionally, two small b-strands (W666-236 L667 and I671-D672) formed at the end of STAT1 that were stabilized by three hydrogen bonds. These 237 newly established hydrogen bonds alter and stabilize the conformation of loop 1 and loop 2 (Fig. 4e). 238 Some hydrophobic amino acids in TgIST (i.e. L383, V385, and L386) are positioned opposite to the 239 hydrophobic patch formed by F614, W616, Y651, V653 and P663 located at the bottom of loop 1, and 240 L664 at the bottom of loop 2 in the STAT1 dimer (Fig. 4e). On the opposite side of the helix, TgIST 241 interacts with STAT1 by means of multiple polar or electrostatic interactions. For example, R387 on 242 TgIST forms a salt bridge with E618 and a hydrogen bond forms between T381 on TgIST with E661 on 243 STAT1. Additionally, residues D627 and H629 in STAT1 collectively form a hydrophilic surface on a small 244 b-sheet, where D627 forms direct hydrogen bonds with T390 in TgIST (Fig. 4f). Finally, Q391 of TgIST, 245 forms a hydrogen bond with Q621 in loop 1 of STAT1, further strengthening the insertion of the helix 246 into the cavity formed on top of STAT1. 247

The repeat region of TgIST adopts a helical transition after binding to pSTAT1d 248
Although full length TgIST is predicted to be disordered, our crystal structure revealed that it adopts a 249 helical conformation when bound to STAT1, a result consistent with secondary structure prediction of 250 the repeat region (Fig. S5e). To further explore the secondary structural transition of the repeat region, 251 we examined purified TgIST-R2 by circular dichroism (CD). When suspended in buffered saline, the CD 252 spectrum for TgIST-R2 suggests the protein is mainly unstructured, though the non-zero intensity 253 between 210 -240 nm indicates some potential structure. When TgIST-R2 was suspended in buffered 254 saline with increasing concentrations (0% to 40%) of trifluoroethanol (TFE), the CD signal increased, 255 particularly between 205 -225 nm, indicating a propensity to form an a-helix (Fig. 5a). Further analysis 256 of the TgIST-R2 region using HeliQuest 27 indicated that the ten residues from V379 to E388 form an amphipathic a-helix (Fig. 5b). We generated a mutant of TgIST-R2 that interchanged two amino acids 258 pairs (VR to RV at position 379-380, and EL to LE at position 388-389) to destroy the amphipathic nature 259 of the peptide without altering the helical conformation (Fig. 5b). BLI assays confirmed that wild type 260 TgIST-R2 bound strongly to pSTAT1d, while the altered TgIST-R2-M3 peptide showed no binding (Fig.  261   5c). These data suggest that the amphipathic nature of TgIST is important in mediating the interaction 262 with pSTAT1d, which is also confirms our observations in the crystal structure. 263 To test the sidechain interactions observed in the co-crystal structure, mutations were introduced 264 into a co-expression plasmid harboring 6XHis-tagged TgIST-R2 and STAT1cc, followed by purification of 265 TgIST by nickel chromatography and detection of STAT1 binding by SDS-PAGE and Coomassie blue 266 staining. Mutation of TgIST residues R380A and T390A significantly reduced binding to STAT1cc (93% 267 and 99%, respectively), suggesting that they serve as N-and C-terminal anchors for the interaction (Fig.  268 5d and S6a). Mutation of hydrophobic residues L383A and D384A in TgIST-R2 completely ablated 269 binding to STAT1cc, suggesting that both the length and hydrophobic nature of the sidechain is 270 important for the interactions. Additionally, polar and charged residues such as T381, D384, R387 lost 271 binding to STAT1cc to various degrees (from 43% to 99%, Fig. 5d), further indicating the amphipathic 272 property of the a-helix is critical for STAT1 binding. In summary, point mutations within the core binding 273 domain region of TgIST-R2 support an a-helical amphipathic structure, and demonstrate that the 274 importance of key residues in this structure in binding to the STAT1 dimer. 275 Mutations on STAT1 of D627A in loop 1 and M654A in loop 2 of STAT1 almost completely abrogated 276 binding to TgIST-R2 (Fig. 5e and S6b). The D627A mutation removes its interaction with T390 of TgIST, 277 and might destabilize the composition of the b-sheet that maintains the charged binding grove. The 278 M654A mutation is critical in forming the interface between two STAT1 protomers in the dimeric STAT1. 279 Additionally, mutations of K652 and N662 in loop 2 also reduced binding, while other mutations outside 280 the two loops, had no effect on binding (Fig. 5e). The sidechain interaction between TgIST R387 and 281 STAT1 E661 is further supported by the evidence that the STAT1 E661A mutation had partial binding 282 defects to TgIST-T2. In summary, results from the point mutations corroborate observations from the 283 co-crystal structures and suggest that TgIST-R2 binds STAT1 at its symmetrical dimer interface. 284

The basis of STAT1 dimer recognition by TgIST 285
To explain the mechanism of specific STAT1 dimer binding by the repeat region of TgIST, we 286 superimposed the structure of the SH2 domain of the pSTAT1d dimer obtained here with the 287 corresponding structures of monomeric STAT1 (PDB: 1YVL) 28 using sequence-based alignment. 288 Interestingly, loop 2 is clearly present in monomeric STAT1, yet it shows a strikingly different orientation 289 (Fig. 6a, orange colored close state) compared to the structure of protomers within dimeric STAT1 (Fig.  290  6a, cyan colored open state).These alterations are stabilized by extended b-sheets forming additional 291 hydrogen bonds at the base of each loop (Fig. S5d), as discussed above. In the free monomeric STAT1 292 structure, loop 2 is in a closed conformation, folding over towards loop 1, while it flips to an open 293 conformation in the dimeric structure (Fig. 6a). Importantly, the position of loop 2 in the free 294 monomeric structure partially occupies the proposed TgIST-T2 binding site, blocking entry into the grove 295 formed by two loops in the pSTAT1 dimer (Fig. 6 a,b). Hence, the altered conformation of loop 2 in the 296 dimeric STAT1 structure exposes a surface for TgIST-R2 binding, that is otherwise absent in the STAT1 297 free monomer, thus providing an explanation for the specificity of TgIST-R2 in binding to the dimeric 298 form of STAT1. The loop regions of STAT1 are highly conserved across a number of mammalian species 299 and they are predicted to adopt a similar topology (Fig. S7a) consistent with the ability of T. gondii to 300 block IFN-g induced gene expression from mouse to human [13][14][15][16][17] . 301 Based on the importance of the SH2 domain of STAT1 in mediating interaction with TgIST, we made 302 an alignment of other SH2 harboring proteins including STAT1, STAT3 and STAT6. The core composition 303 of SH2 domain, which is composed of a large b-sheet flanked by two a-helices aligned well for all 304 structures (Fig. 6c,d). STAT3 has a similar loop 2 orientation compared to STAT1, however, loop 1 is 305 folded inward and it does not create a similar open groove seen in the STAT1 dimer. STAT6 lacks the 306 loop 2 region that is found in STAT1 and STAT3 (Fig. 6d), suggesting sequence differences between 307 STATs proteins affect the folding of this region. These structural variations help explain the specificity of 308 TgIST for binding to STAT1. 309

TgIST competes the interaction between STAT1 and CBP/p300 coactivators 310
The histone acetyltransferase coactivators CBP and p300 directly interact with STAT1 within its C-311 terminal transcriptional activation domain (TAD) domain to facilitate efficient transcription activation 29 . 312 The TAD domain is located C-terminally to the STAT1 dimer interface (our structure and PDB: 1BF5), 313 which suggests that binding of TgIST in the grove formed by the STAT1 dimer may be responsible for 314 preventing recruitment of CBP/p300. We tested this hypothesis by expressing constructs of TgIST in 315 HEK293T cells and determining whether they could displace CBP/p300 from immunoprecipitated STAT1. 316 We compared a form of TgIST containing two copies of the STAT1-binding repeat (TgIST-T2) to a similar 317 copy where the core residues where mutated to alanine (TgIST-T2-M2) and a truncated construct that 318 lacks the repeats (TgIST-T3). Although the efficiency of immunoprecipitation of STAT1 was the same, 319 CBP/p300 was less efficiently coprecipitated in cells expressing TgIST-T2 that binds STAT1, when 320 compared to the mutated TgIST constructs that do not bind STAT1 (Fig. 6e,f). A reciprocal experiment 321 performed by immunoprecipitating Ty-tagged TgIST from HEK293T cell lysates confirmed that only 322 TgIST-T2, and the mature form TgIST-MT, were able to interact with STAT1 (Fig. S7c). Consistent with 323 this result, label-free quantitative mass spectrometry showed less CBP/p300 co-precipitated with STAT1 324 in TgIST-T2 vs. TgIST-T2-M2 transfected cells (Fig. S7d,e). Taken together, these results indicate that the 325 ability of the repeat regions of TgIST to disrupt IFN-g signaling is based on the reduction in CBP/p300 326 association with STAT1. Although previous studies have shown that TgIST binds both to STAT1 and to Mi-2/NuRD, the 342 relationship between these two interactions in blocking IFN signaling is uncertain, especially as changes 343 in host chromatin modification differ between primary and secondary response genes 14,15 . Here we 344 compared IPs of TgIST from STAT1 null to STAT1 expressing cells and found that TgIST interacts with the 345 Mi-2/NuRD complex in the absence of STAT1 and without IFN-g stimulation. Furthermore, we 346 demonstrate using truncations that the C-terminus of TgIST contains the Mi-2/NuRD interacting domain, 347 and that this region is dispensable for the ability of the N terminal repeat-containing region to block IFN-348 g induction of the primary response gene IRF1. Although the recruitment of Mi-2/NuRD was not 349 required for the block mediated by the N terminal repeats, it may function in the regulation of other 350 IFN-g induced genes. Consistent with this model, it has recently been suggested that recruitment of Mi-351 2/NuRD to IFN-g responsive genes is more import in secondary response genes where chromatin 352 modulation may play a more important role 30 . 353 In addition to TgIST, a number of other T. gondii effectors released beyond the parasitophorous 354 vacuole are IDPs and they bind a range of different host molecules to disrupt signaling and gene 355 expression 31,32 . The binding mechanism of most such IDPs to their cognate host targets is largely unknown. One exception is GRA24, which contains a well recognizable kinase interaction motif (KIM) 357 within its two internal repeats that naturally adopt a helical structure when it was crystallized with its 358 host target p38a MAP kinase 33 34 . In contrast, TgIST lacks any secondary structure when unbound and 359 its repeat region only adopted a helical conformation on binding to the STAT1 dimer, a property of 360 induced-fit that is similar to other IDPs 21 . A single repeat region of TgIST was sufficient to block STAT1, 361 consistent with previous reports 17 , while the duplicated repeat was more effective. Internal repeats are 362 common in intrinsically disordered proteins where they likely arise by duplication, allowing rapid 363 evolution of new functionality 35 . The flexible features of IDPs 21 appear to be advantageous for secreted 364 virulence factors due to their ease of exportation across membranes, adaptable binding to different host 365 targets, and ability to rapidly evolve in the absence of structural constraint. 366 The crystal structure of the repeat region of TgIST with pSTAT1d revealed the R2 peptide bound in a 367 groove that forms at the dimer interface. Binding of the helical peptide of R2 within the STAT1 groove 368 was stabilized by polar and hydrophobic, as supported by mutational analysis in both partners.   represented by red and black, respectively). 514 (d) Nickel affinity purification of His-tagged TgIST-R2 wild type (WT) and various point mutants of 515 TgIST-R2 was used to test copurification with the STAT1 locked dimer (STAT1cc). Eluted fractions 516 were separated by SDS-PAGE gel then stained with Coomassie blue (see Fig. S6a). The relative 517 binding intensities were adjusted to the binding between wild type TgIST-R2 and STAT1cc. At 518 least three biological replicates were performed, and each dot represents one experiment. The 519 top of the bar represents the mean, and the error bars represent the standard deviation. 520 repeats; TgIST-T3 (T3) lacking the two repeats (see Fig. 1a and 3b). Cells were grown for 23 541 hr, then treated ± IFN-g (100 U/mL) for additional 60 min prior to whole cell extract 542 preparation. Membranes were incubated with corresponding primary antibodies as indicated 543 and then IR dye-conjugated secondary antibodies. Visualization was performed using an 544 Samples were eluted using 50 mM glycine (pH 2.8) after three times washes with PBS. The input extracts 571 (5% of total) and immunoprecipitated samples (10% of total) were separated using 8%-12% acrylamide 572 gels and transferred onto a nitrocellulose membrane for western blotting. The membrane was blocked 573 using 5% milk diluted in PBST (Phosphate buffered-saline with 0.05%v/v Tween-20) and probed with 574 primary antibodies for overnight at 4°C followed by 3 washes with PBST. 575

MS/MS Analysis 576
Dynabeads from IP experiments were suspended in ammonium bicarbonate, then reduced with 2 mM 577 DTT at 37°C for 1h. Subsequently alkylation was performed in 10 mM iodoacetamide for 20 min at 22°C 578 in the dark. Sample digestion by trypsin was carried out overnight at 37°C followed by drying and 579 redissolving in 2.5% acetonitrile and 0.1% formic acid. Samples were analyzed by nanoLC-MS/MS using a 580 2 hr gradient on a 0.075mm x 250mm C18 Waters CSH column feeding into a Q-Exactive HF mass performed by Swiss-model server using STAT1 dimer structure (PDB: 1BF5) as a template. All models 594 were subsequently aligned and visualized by Pymol 42 . 595 The SH2 domain of STAT1 (PDB: 1BF5), STAT3 (PDB: 1BG1) and STAT6 (PDB: 5D39) were obtained 596 from their corresponding PDB files and imported into the DALI server 43 for structural comparison. 597 Amino acids represent conserved secondary structures defined by the DALI server were manually 598 highlighted. 599

Immunofluorescence Microscopy 600
HeLa cells were grown on glass coverslips (for qualitative assays) or 96-well optical clear plates (Greiner) 601 (for automated quantitative analysis). For conventional microscopy, transfected cells were fixed with 4% 602 formaldehyde in PBS, permeabilized with 0.1% Triton X-100, blocked with 5% fetal bovine serum and 5% 603 normal goat serum, labeled with primary antibodies followed by Alexa fluor-conjugated secondary 604 antibodies, and mounted with Prolong Gold with DAPI. Images were captured and analyzed with a Zeiss 605 quantitative assays, cells were stained with primary antibodies followed by Alexa fluor-conjugated 607 secondary antibodies and finally stained with DAPI. Images were captured using a Cytation 3 multimode 608 plate imager with Gen5 software (BioTek). Following the capture of fluorescence signals using the 609 Cytation 3, the intensity of IRF1 was analyzed using CellProfiler software 44 . In order to measure the IRF1 610 intensity only in TgIST transfected Hela cells, the GFP channel was analyzed to identify transfected cells, 611 followed by measurement of IRF1 in regions defined by DAPI-positive nuclei. 612

Protein Expression and Purification of TgIST and STAT1 613
A portion of the STAT protein corresponding to residues S132-H713 25 and a lock dimer mutant 614 containing two mutations at A656C and N658C (designated as STAT1cc) 26 were expressed in E. coli as 615 recombinant proteins. A modified the pET-15b plasmid that encodes both TgIST and STAT1 with 616 separate T7 promoters and terminators (also see