HAP40 orchestrates huntingtin structure for differential interaction with polyglutamine expanded exon 1

Huntington’s disease results from expansion of a glutamine-coding CAG tract in the huntingtin (HTT) gene, producing an aberrantly functioning form of HTT. Both wildtype and disease-state HTT form a hetero-dimer with HAP40 of unknown functional relevance. We demonstrate in vivo that HTT and HAP40 cellular abundance are coupled. Integrating data from a 2.6 Å cryo-electron microscopy structure, cross-linking mass spectrometry, small-angle X-ray scattering, and modeling, we provide a near-atomic-level view of HTT, its molecular interaction surfaces and compacted domain architecture, orchestrated by HAP40. Native mass-spectrometry reveals a remarkably stable hetero-dimer, potentially explaining the cellular inter-dependence of HTT and HAP40. The polyglutamine tract containing N-terminal exon 1 region of HTT is dynamic, but shows greater conformational variety in the mutant than wildtype exon 1. By providing novel insight into the structural consequences of HTT polyglutamine expansion, our data provide a foundation for future functional and drug discovery studies targeting Huntington’s disease.

The autosomal dominant neurodegenerative disorder Huntington's disease (HD) is caused by the 46 expansion of a CAG repeat tract at the 5' of the huntingtin gene above a critical threshold of ~35 47 repeats 1 . CAG tract expansion corresponds to an expanded polyglutamine tract of the Huntingtin 48 (HTT) protein which functions aberrantly compared to its unexpanded form 2 . Polyglutamine 49 expanded HTT is thought to be responsible for disrupting a wide range of cellular processes including 50 proteostasis 3,4 , transcription 5,6 , mitochondrial function 7 , axonal transport 8 and synaptic function 9 . 51 HD patients experience a range of physical, cognitive and psychological symptoms and longer repeat 52 expansions are associated with earlier disease onset 10 . The prognosis for HD patients is poor, with an 53 average life expectancy of just 18 years from the point of symptom onset and a continuous 54 deterioration of quality of life through this manifest period. There are currently no disease-modifying 55 therapies available to HD patients. 56 57 Huntingtin (HTT) is a 3144 amino acid protein comprised of namesake HEAT (Huntingtin,Elongation 58 factor 3, protein phosphatase 2A, TOR1) repeats and is hypothesised to function as a scaffold for 59 larger multi-protein assemblies 11,12 . Many proteomics and interaction studies suggest HTT has an 60 extensive interactome of hundreds of proteins but the only biophysically and structurally validated 61 interactor of HTT is the so-called 40-kDa huntingtin-associated protein HAP40 13,14 , an interaction 62 partner conserved through evolution 15,16 . HAP40 is a TPR domain protein with suggested functions in 63 endocytosis [17][18][19] . An earlier 4 Å mid-resolution cryo-electron microscopy (cryo-EM) model of HTT in 64 complex with HAP40 reveals that the HEAT subdomains of HTT wrap around HAP40 across a large 65 interaction interface 20 . Biophysical and biochemical analyses comparing purified HTT and HTT-HAP40 66 samples have revealed that HAP40-bound forms of HTT exhibit reduced aggregation propensity, 67 greater stability and monodispersity as well as conformational homogeneity 20,21 . Consequently, apo 68 HTT is a more difficult sample to work with for structural and biophysical characterisation, and several 69 studies to date have required cross-linking approaches to constrain the HTT molecule to facilitate its 70 analysis, suggesting HTT-HAP40 interactions may stabilize HTT 22,23 . The biological function of the HTT-71 HAP40 complex however, remains elusive, and it is not clear if the function of this complex differs 72 from apo HTT in vivo. It is also not yet understood whether HTT is constitutively bound to HAP40 or 73 whether apo and HAP40-bound forms of HTT perform different functions in the cell. intrinsically disordered region (IDR), which spans residues 407-665 is subject to a range of post-83 translational modifications, is postulated to be critical in mediating various protein interactions 21,27,28 , 84 and is also unresolved in the cryo-EM structure. Understanding the function of both wildtype and 85 expanded forms of HTT is critical as many potential HD treatments currently under clinical 86 investigation aim to lower HTT expression, using both allele selective or non-selective approaches 29 . 87 Deeper biological insight into the determinants of cellular HTT protein levels, as well as normal and 88 expanded HTT cellular function would help direct which approaches should be prioritised for long-89 term patient therapies. 90 91 Here, we report in vivo studies that show a strong correlation of HTT and HAP40 levels in different 92 genetic backgrounds, providing evidence for the importance of the HTT-HAP40 complex in a 93 physiological setting. Combining the power of multiple complementary structural techniques, we 94 shed light on the missing regions of our high-resolution (2.6 Å) model of HTT-HAP40, including the 95 biologically critical exon 1 region of HTT and the N-terminal region of HAP40. We demonstrate the 96 remarkable stability of the HTT-HAP40 complex, potentially explaining in vivo codependence of these 97 two proteins and providing important insight for future drug developments in pursuit of treating HD. 98

100
HTT and HAP40 protein levels correlate in vivo.

102
The Huntingtin-associated protein HAP40 co-evolved with HTT 15 and a HAP40 orthologue has been 103 identified in many species, including invertebrates 16 . To investigate the in vivo relationship and 104 hypothesised codependency of HTT and HAP40, we analysed the levels of both proteins in liver tissue 105 from different mouse lines using western blot analysis (Figure 1). Comparing wildtype (WT) mice, 106 Htt Q111/+ Huntington's knock-in mice 30 which express slightly lower levels of HTT 31 , and hepatocyte-107 specific Htt knock out mice, a statistically significant correlation was observed for the levels of HTT 108 and HAP40. 109 110 111 112 Figure 1. HAP40 levels correlate with the levels of HTT in vivo. 113 a i HTT and ii HAP40 levels were quantified in mouse liver lysates by western blot in wildtype (WT), Htt Q111/+ and 114 hepatocyte-specific knockout (LKO) mice. Hepatocytes constitute approximately 80% of liver mass 32 and an approximately 115 80% reduction in HTT levels is observed in the hepatocyte specific LKO liver tissue as expected. b HTT and HAP40 levels 116 correlate in these models with statistical significance.

117
High-resolution structure of HTT-HAP40 Complex 118 119 HTT-HAP40 was expressed in insect cells and purified as previously described 21 . We determined the 120 structure of HTT-HAP40 (PDBID: 6X9O) to a nominal resolution of 2.6 Å using cryo-EM (Figure 2a, 121 Figure 2b and Supplementary Figure 1) of the HTT-HAP40 complex, including exon 1 and the IDR, were not resolved in our high-resolution 125 maps (Figure 2c). However, our improved resolution permits more confident positioning of amino 126 acid side chains of the protein structure resolved in the maps and more precise analysis of the 127 different features of the structure. 128 129 The overall structure of the complex is similar to the previously published model (PDBID: 6EZ8) with 130 an RMSD of 1.9 across the models when superposed. However, key differences exist between the two 131 models (Figure 2d). Two additional C-terminal a-helices in the HTT C-HEAT domain spanning residues 132 3105-3137 are resolved in our model (all residue numbering based on HTT NCBI reference 133 NP_002102.4 sequence), whereas the resolution of two N-terminal a-helices of HAP40 spanning 134 residues 42-82 is lost. The unmodified native HAP40 C-terminus in our model is able to thread into the 135 centre of the C-HEAT domain (Figure 2e). This extended interaction of HAP40 with HTT may be 136 responsible for a small shift we observe of the C-HEAT domain, which pivots ~5° relative to the 137 previous model, reducing the interaction interface of HTT-HAP40 from ~5350 Å 2 to ~4700 Å 2 . One 138 potential reason for this difference is that the C-terminus of HAP40 in our construct is unmodified 139 whereas Guo and colleagues used a C-terminal Strep-tag in their expression construct which is 140 unresolved in their model. The differences observed for the HTT and HAP40 interface when 141 comparing our high-resolution structural model (PDBID: 6X9O) and the previous mid-resolution model 142 (PDBID: 6EZ8) indicate that the extensive interaction interface is able to accommodate some 143 variation. 144 145 Our high-resolution model enables a comprehensive analysis of the surface-charge features of the 146 HTT-HAP40 complex. The HTT-HAP40 interface is predominantly formed by extensive hydrophobic 147 interactions between the two proteins (Figure 2f). Previous analysis of this interface has also 148 highlighted a charge-based interaction between the BRIDGE domain of HTT and the C-terminal region 149 of the HAP40 TPR domain 20 . Interestingly, the N-HEAT domain of HTT has a defined positively 150 charged tract spanning almost 40 Å in length and 5-10 Å in width formed between two stacked HEAT 151 repeats in the N-HEAT solenoid (Figure 2f arrow). We also conducted an in-depth sequence 152 conservation analysis of both HTT and HAP40, which we mapped to the high-resolution structure of 153 the complex. Interestingly this revealed surfaces of the protein on the HAP40-exposed face as highly 154 conserved, with extended regions of strict conservation partially spanning the C-HEAT domain, 155 BRIDGE and N-HEAT (Figure 2g). However, the opposite face is less conserved, whilst the HTT-HAP40 156 interface is moderately conserved for both HTT and HAP40. The HTT-HAP40 model was searched for 157 ligand-able pockets which were assessed for druggability according to various factors, including their 158 buriedness, hydrophobicity and volume. One of the most promising pockets, which is predicted to be 159 ligand-able, lies at the HTT-HAP40 interface and is lined by residues from the N-terminal region of the 160 HAP40 TPR domain as well as the HTT N-HEAT domain (Figure 2h, Supplementary Table 2). The high 161 resolution of our HTT-HAP40 model provides a foundation for virtual screening of such pockets and 162 other structure-based drug-discovery efforts towards the identification of HTT ligands. 163

164
Our 2.6 Å structure is of sufficient resolution to allow the identification of post-translational 165 modifications (PTMs). However, no PTMs were observed for any of the resolved residues in the HTT-166 HAP40 complex. Native mass spectrometry (MS) analysis, on the other hand, revealed the high purity 167 of our HTT-HAP40 samples, albeit that a small mass difference (compared to the theoretical mass) 168 was observed, consistent with the presence of a few PTMs (Supplementary Figure 2a). Further 169 analysis of the HTT-HAP40 complex upon Caspase6 digestion revealed these PTMs to be primarily 170 phosphorylations (at least two), which could be mapped to the regions spanning 586-2647 and 2647-171 3144 of the HTT sequence (Supplementary Figure 2b, c and d). Based on the cumulative evidence 172 from the MS data, these modifications reside within the two flexible portions of HTT not resolved in 173 our cryo-EM maps. Although many studies have identified numerous different sites and possible 174 PTMs of the HTT protein 21,27,28,34 , these approaches have so far been qualitative and do not give us a 175 good understanding of the key proteoforms the Huntington's disease community is studying in either 176 in vitro or in vivo models. Our quantitative top-and middle-down MS approaches suggest many post-177 translational modifications are in fact only present at very low abundance, at least in our insect cell 178 expressed samples. 179 180 We attempted to separately purify HTT and HAP40 for comparison to the complex. As reported by 181 Guo and colleagues 20 , we were also unable to express recombinant HAP40 alone, although it is 182 readily expressed in the presence of HTT, a trend that parallels our in vivo observations. In the 183 absence of HAP40, we and others have shown that recombinant HTT self-associates and is 184 conformationally heterogenous in vitro 21,22,34 . Cryo-EM analysis of our apo HTT samples yielded a 12 185 Å resolution envelope (Figure 3a and b) using collisions with neutral gas molecules typically results in dissociation of a non-covalent complex 198 into constituent subunits. Interestingly, our native top-down MS analysis of the intact HTT-HAP40 199 complex (Figure 4a and b) primarily resulted in backbone fragmentation of HTT, eliminating both N-200 and C-terminal fragments (Figure 4c-g). Remarkably, the vast majority of concomitantly formed high-201 mass dissociation products retained HAP40 (Figure 4f), suggesting that the extensive hydrophobic 202 interaction interface we observe in our high-resolution model keeps the HTT-HAP40 complex 203 exceptionally stable. Similarly, gas-phase activation of Caspase6-treated HTT-HAP40 revealed that 204 HAP40 remained intact and bound to HTT even at the highest activation energies, whereas the N-and 205 C-terminal fragments of HTT produced upon digestion were readily dissociating from the complex 206 (Supplementary Figure 2c).

208
The recombinant samples of HTT-HAP40 were found to be highly monodisperse (Figure 4b), displaying 209 optimal biophysical properties (see also Supplementary Figure 3a). Systematically screening the 210 stability of the HTT-HAP40 complex using a differential scanning fluorimetry assay indicates the 211 complex is highly stable under a broad range of buffer, pH and salt conditions (Supplementary Figure  212 3b and c). Destabilisation of the complex was only observed at low pH (Figure 4h). Similarly, the 213 interaction between HTT and HAP40 is retained upon mild proteolysis of the complex (  Next, we sought to understand how the disease-causing polyglutamine expansions affect HTT 268 structure. Our structural, biophysical and biochemical data presented so far focus on wildtype HTT (23 269 glutamines; Q23) and illustrate the importance of HAP40 in stabilising and orienting the HEAT repeat 270 subdomains of HTT. However, 25% of the complex is not resolved in the cryo-EM maps, including 271 many functionally important regions of the protein such as exon 1 (residues 1-90), which harbors the 272 polyglutamine repeat region, and the IDR (residues 407-665). To further investigate the HTT protein 273 structure in its entirety and the influence of polyglutamine expansion within exon 1, we repeated the 274 DSF and proteolysis studies using HTT-HAP40 samples containing either a pathological Huntington's 275 disease HTT with 54 glutamines (Q54), or an HTT with a partially deleted exon 1 (Δexon 1; comprising 276 residues 80-3144, missing N17, polyglutamine and proline-rich domain). We found that neither the 277 Q54 expansion nor the removal of exon 1 had detectable effects on the stability of the HTT-HAP40 278 complexes compared to the canonical Q23 complex (Supplementary Figure 3). 279 280 To better describe the structure of exon 1 and the effects of the polyglutamine expansion on the HTT-281 HAP40 complex, we performed cross-linking mass spectrometry (XL-MS) experiments 36 using the 282 IMAC-enrichable lysine cross-linker, PhoX 37 . For Q23, Q54 and Δexon1 isoforms of HTT-HAP40, we 283 mapped approximately 120 cross-links for each sample (Supplementary Data File 7). Importantly, the 284 vast majority of cross-links map to regions unresolved in the cryo-EM maps (Figure 5a), thereby 285 providing valuable restraints for structural modeling of a more complete HTT-HAP40 complex. The 286 mean distance of cross-links observed for resolved regions of the cryo-EM model was significantly 287 below the 25 Å distance limit of PhoX in all three datasets (Q23: 7 cross-links -mean distance 13.7 Å; 288 Q54: 11 cross-links -mean distance 14.8 Å; Δexon 1: 12 cross-links -mean distance 14.9 Å; 289 Supplementary Data File 7). This, together with mass photometry data of cross-linked HTT-HAP40, 290 indicates that there is a low probability of intermolecular cross-links between HTT molecules, e.g. 291 from aggregation, being included in our datasets (Supplementary Figure 4a).

293
Overall, we obtained very similar cross-link data for the three different HTT-HAP40 constructs ( Figure  294 5b). However, of particular note are the large number of exon 1 PhoX cross-links in the HTT-HAP40 295 Q23 and Q54 samples mediated via lysine-6 or lysine-9 within the N-terminal 17 residues (N17 region) 296 of exon 1. N17 is reported to play key roles for the HTT protein including modulating cellular 297 localisation, aggregation and toxicity 38-40 and is proposed to interact with distal parts of HTT 41 . 298 299 For both samples (Q23 and Q54), N17 is found to contact several regions of the N-HEAT domain as 300 well as the cryo-EM unresolved N-terminal region of HAP40, via lysine-32 and lysine-40. Interestingly, 301 N17 of Q54 showed additional cross-links to the more distant C-HEAT domain (Figure 5b, 302 Supplementary Figure 4b). Finally, the largest uninterrupted stretch of the HTT-HAP40 protein which 303 is unresolved in the cryo-EM maps is the IDR. However, only a few PhoX cross-links are detected for it, 304 even though this 258 aa. region harbors 8 lysine residues. 305 306 Size-exclusion chromatography multi-angle light scattering (SEC-MALS) analysis of this same series of 307 samples shows no significant difference in mass but does indicate a small shift in the peak for the 308 elution volume of the HTT-HAP40 Δexon 1 complex compared to Q23 and Q54 complex samples 309 (Figure 6a). Together with the XL-MS data, this suggests that there are subtle structural differences 310 between the Q23, Q54 and Δexon 1 HTT-HAP40 complexes. To further interpret the cross-linking data 311 in the context of the 3D structure of the HTT-HAP40 complex, we performed SAXS analysis of our 312 samples to assess any changes to their global structures. We have previously reported SAXS data for 313 HTT-HAP40 Q23 21 . This revealed that the particle size was significantly larger than the cryo-EM 314 model, which likely accounts for the ~25% of the protein not resolved in cryo-EM maps and therefore 315 not modeled in the structure. Similar analysis of the HTT-HAP40 Q54 and HTT-HAP40 Dexon 1 and 316 comparison with our previous Q23 data shows that polyglutamine expansion or deletion of exon 1 has 317 only very modest effects on the SAXS profiles (Figure 6b, c and d). HTT-HAP40 Q54 is slightly larger 318 than the HTT-HAP40 Q23 whereas HTT-HAP40 Δexon 1 samples are slightly smaller, as might be 319 expected, but overall the SAXS determined parameters for the three samples are very similar (Figure  320 6e). In line with that, the SAXS-calculated particle envelopes for the three samples are also very 321 similar in size and shape (Supplementary Figure 5a). 322 323 Next, we modelled the complete structures of HTT-HAP40, including flexible and disordered regions, 324 integrating our cryo-EM, SAXS and XL-MS data. Coarse-grain modelling molecular dynamics 325 simulations were performed and an ensemble of models that best fit both the cross-linking and SAXS 326 data for HTT-HAP40 was calculated for all three variants of the HTT-HAP40 complex (Supplementary 327 Figure 5b and c). This modeling approach assumed that the residues with known coordinates in the 328 cryo-EM model form a quasi-rigid complex, whereas the residues with missing coordinates are 329 flexible. As expected from our cross-linking results, the conformations adopted by exon 1 in the 330 ensemble model of Q54 HTT-HAP40 complex are skewed compared to the Q23 ensemble with exon 1 331 interacting with many more surfaces of the Q54 HTT-HAP40 complex (Figure 7a). Mapping our PhoX 332 exon 1 cross-linked residues for each sample to a representative model from each ensemble reveals 333 how exon 1 Q23 cross-links are largely constrained to the N-HEAT domain whereas exon 1 Q54 cross-334 links are also found on the C-HEAT domain (Supplementary Figure 4b). Exon 1 of our HTT-HAP40 Q54 335 ensemble explores a larger volume of conformational space and this seems to have a knock-on effect 336 on the conformational space occupied by the IDR (Figure 7b). Modeling of our HTT-HAP40 structure 337 indicates that the exon 1 region of the Q23 HTT is long enough to make cross-links with the C-HEAT 338 domain, but we do not observe such cross-links in our PhoX datasets (Supplementary Figure 5d). This 339 suggests that the additional cross-links observed for the polyglutamine expanded form of HTT-HAP40 340 may not be driven solely by the length of the exon 1 region. For all ensembles the IDR is differentially 341 constrained and occluded from adopting certain conformations depending on the conformational 342 space occupied by exon 1, suggesting polyglutamine and exon 1-mediated structural changes 343 propagate to the IDR. For the HTT-HAP40 Q54 model ensemble where exon 1 adopts the most diverse 344 conformations, the IDR is the most constrained, occupying a more finite space. However, for the HTT-345 HAP40 Dexon 1 model ensemble, the IDR is not occluded and so adopts a much wider range of 346 conformations. 347 348 Together, our data suggest that whilst polyglutamine expansion does not affect the core HEAT repeat 349 structure, it does affect the conformational dynamics of not only the exon 1 region but also the IDR.   We present unprecedented findings for the HTT-HAP40 structure, highlighting the close relationship 382 between HTT and HAP40 as well as unveiling the effect of the polyglutamine expansion, thereby 383 contributing to a richer understanding of HTT and its dependence on HAP40. 384 385 HTT is reported to interact with hundreds of different proteins 14  The structural differences of Q23, Q54 and Dexon 1 HTT-HAP40 samples are not resolved within the 407 high-resolution cryo-EM maps we calculated. Our experiments using lower resolution structural 408 methods such as SAXS and mass spectrometry, which do consider the complete protein molecule, 409 also show modest differences between the samples. One way we might rationalise this observation 410 with what we know about HD pathology and huntingtin biology in physiological conditions is that our 411 experimental systems do not capture any subtle, low abundance or slowly occurring differences of 412 the samples which could be important in HD progression that occurs very slowly, over decades of a 413 patient's lifetime. Alternatively, it may be that models of HD pathogenesis which posit that large 414 changes in HTT's globular structure caused by polyglutamine expansion are incorrect. 415 416 Notwithstanding the above caveats, our cross-linking mass spectrometry studies provide some of the 417 first insight into the structure of the exon 1 portion of the protein in the context of the full-length, 418 HAP40-bound form of HTT. In both Q23 and Q54 samples, exon 1 appears to be highly dynamic and 419 able to adopt multiple conformations. We demonstrate clear and novel structural differences 420 between the unexpanded and expanded forms of exon 1 in the context of the full-length HTT protein 421 with expanded Q54 forms of exon 1 sampling different conformational space than unexpanded Q23. 422 This is not just due to the additional length of this form of exon 1, conferring a higher degree of 423 flexibility and extension to different regions of the protein but perhaps some biophysical consequence 424 of a longer polyglutamine tract. This is the opposite of what has been reported for HTT exon 1 protein 425 in isolation, where polyglutamine expansion compacts the exon 1 structure [42][43][44]  and eluted with a gradient from 50 mM KCl buffer to 1 M KCl buffer over 10 CV. All samples were 481 purified with a final gel filtration step, using a Superose6 10/300 column in 20 mM HEPES pH 7.4, 300 482 mM NaCl, 1 mM TCEP, 2.5 % (v/v) glycerol. HTT-HAP40 samples were further purified with an 483 additional Ni-affinity chromatography step prior to gel filtration. For cross-linking experiments, HTT-HAP40 samples (HTTQ23-HAP40, HTTQ54-HAP40, HTT Δexon 1-512 HAP40) were diluted to a protein concentration of 1 mg/1 mL using cross-linking buffer (20 mM Hepes 513 pH 7.4, 300 mM NaCl, 2.5 % glycerol, 1 mM TCEP). HTT-HAP40 samples were treated with an 514 optimised concentration of PhoX cross-linker to avoid protein aggregation (Supplementary Figure 4a). 515 After incubation with PhoX (0.5 mM) for 30 min at RT, the reaction was quenched for additional 30 516 min at RT by the addition of Tris HCl (1 M, pH 7.5) to a final concentration of 50 mM. Protein digestion 517 was performed in 100 mM Tris-HCl, pH 8.5, 1 % SDC, 5 mM TCEP and 30 mM CAA, with the addition of 518 Lys-C and Trypsin proteases (1:25 and 1:100 ratio (w/w)) overnight at 37 °C. The reaction was stopped 519 by addition of TFA to a final concentration of 0.1 % or until pH ~ 2. Next, peptides were desalted using 520 an Oasis HLB plate, before IMAC enrichment of cross-linked peptides like previously described 37 . 521 522 LC-MS analysis of cross-linked HTT-HAP40 samples 523 For LC-MS analysis, the samples were re-suspended in 2 % formic acid and analyzed using an 524 UltiMate™ 3000 RSLCnano System (Thermo Fischer Scientific) coupled on-line to either a Q Exactive 525 HF-X (Thermo Fischer Scientific), or an Orbitrap Exploris 480 (Thermo Fischer Scientific). Firstly, 526 peptides were trapped for 5 min in solvent A (0.1 % FA in water), using a 100-µm inner diameter 2-cm 527 trap column (packed in-house with ReproSil-Pur C18-AQ, 3 µm) prior to separation on an analytical 528 column (50 cm of length, 75 µM inner diameter; packed in-house with Poroshell 120 EC-C18, 2.7 µm). 529 Peptides were eluted following a 45 or 55 min gradient from 9-35 % solvent B (80 % ACN, 0.1 % FA), 530 respectively 9-41 % solvent B. On the Q Exactive HF-X a full scan MS spectra from 375-1600 Da were 531 acquired in the Orbitrap at a resolution of 60,000 with the AGC target set to 3 x 106 and maximum 532 injection time of 120 ms. For measurements on the Orbitrap Exploris 480, a full scan MS spectra from 533 375-2200 m/z were acquired in the Orbitrap at a resolution of 60,000 with the AGC target set to 2 x 534 106 and maximum injection time of 25 ms. Only peptides with charged states 3-8 were fragmented, 535 and dynamic exclusion properties were set to n = 1, for a duration of 10 s (Q Exactive HF-X), 536 respectively 15 s (Orbitrap Exploris 480 consisting of 6000 final frames were recorded using AcquireMP software at a 100 Hz framerate. 563 Particle landing events were automatically detected amounting to ∼ 3000 per acquisition. The data 564 was analyzed using DiscoverMP software. Average masses of HTT proteins and HTT-HAP40 complexes 565 were determined by taking the value at the mode of the normal distribution fitted into the histograms 566 of particle masses. Finally, probability density function was calculated and drawn over the histogram 567 to produce the final mass profile. Measurement and analysis of mass photometry data were done for 568 the following samples: HTT-Q23-HAP40, HTT-Q54-HAP40, and HTT-∆exon Raw native MS and high-m/z native top-down MS data were processed with UniDec 50 to obtain zero-652 charged mass spectra. Native top-down MS data recorded with high resolution (140,000) were 653 deconvoluted using the Xtract algorithm within FreeStyle software (1.7SP1; Thermo Fisher Scientific). 654 The resulting zero-charge fragments were matched to the theoretical fragments produced for HTT 655 and Hap40 using in-house scripts with 5 ppm mass tolerance. Final visualization was performed in R 656 extended with ggplot2 library. 657 658 Cryo-EM sample preparation and data acquisition

659
HTT was diluted to 0.4 mg/ml in 20 mM HEPES pH 7.5, 300 mM NaCl, 1 mM TCEP and adsorbed to 660 glow-discharged holey carbon-coated grids (Quantifoil 300 mesh, Au R1.2/1.3) for 10 s. Grids were 661 then blotted with filter paper for 2 s at 100 % humidity at 4 °C and frozen in liquid ethane using a 662 Vitrobot Mark IV (Thermo Fisher Scientific). 663 664 HTT-HAP40 was diluted to 0.2 mg/ml in 25 mM HEPES pH 7.4, 300 mM NaCl, 0.025 % w/v CHAPS, 1 665 mM DTT and adsorbed onto gently glow-discharged suspended monolayer graphene grids 666 real-space refinement in PHENIX v. SAXS experiments were performed at beamline 12-ID-B of the Advanced Photon Source (APS) at 718 Argonne National Laboratory. The energy of the X-ray beam was 13.3 keV (wavelength λ = 0.9322 Å), 719 and two setups (small-and wide-angle X-ray scattering) were used simultaneously to cover scattering 720 q ranges of 0.006 < q < 2.6 Å−1, where q = (4π/λ)sinθ, and 2θ is the scattering angle. For HTT-HAP40 721 Q54, thirty two-dimensional images were recorded for buffer or sample solutions using a flow cell, 722 with an exposure time of 0.8 s to reduce radiation damage and obtain good statistics. The flow cell is 723 made of a cylindrical quartz capillary 1.5 mm in diameter and 10 µm wall thickness. Concentration-724 series measurements for this sample were carried out at 300 K with concentrations of 0.5, 1.0, and 725 2.0 mg/ml, in 20 mM HEPES, pH 7.5, 300 mM NaCl, 2.5% (v/v) glycerol, 1 mM TCEP. No radiation 726 damage was observed as confirmed by the absence of systematic signal changes in sequentially 727 collected X-ray scattering images. The 2D images were corrected for solid angle of each pixel, and 728 reduced to 1D scattering profiles using the Matlab software package at the beamlines. The 1D SAXS 729 profiles were grouped by sample and averaged. 730 731 For HTT-HAP40 Dexon 1, data were collected using an in-line FPLC AKTA micro setup with a Superose6 732 Increase 10/300 GL size exclusion column in 20 mm HEPES, pH 7.5, 300 mm NaCl, 2.5% (v/v) glycerol, 733 1 mm TCEP. A 150uL sample loop was used and the stock sample concentration was 5 mg/ml. The 734 sample passed through the FPLC column and was fed to the flow cell for SAXS measurements. The 735 SAXS data were collected every 2 seconds and the X-ray exposure time was set to 0.75 seconds. Only 736 the SAXS data collected above the half maximum of the elution peak, about 50-100 frames, were 737 averaged and for further analysis. Background data were collected before and after the peak (each 738 100 frames), while data before the peak were found better and used for the background subtraction. 739 740 SAXS data were analyzed with the software package ATSAS 2.8 . The experimental radius of gyration, 741 Rg, was calculated from data at low q values using the Guinier approximation. The pair distance 742 distribution function, P(r), the maximum dimension of the protein, Dmax, and Rg in real space were 743 calculated with the indirect Fourier transform using the program GNOM 60 . Estimation of the 744 molecular weight of samples was obtained by both SAXMOW 61,62 and by using volume of correlation, 745 Vc 63 . The theoretical scattering intensity of the atomic structure model was calculated using FoXS 64 . 746 Ab-initio shape reconstructions (molecular envelopes) were performed using both bead modeling 747 with DAMMIF 65 and calculating 3D particle electron densities directly from SAXS data with DENSS 66 . 748 749 Coarse-grained molecular dynamics simulations 750 We used a Gō-like coarse-grained model of HTT/HAP40 for structural modeling of the complex as it 751 was described previously 21 . We build two different models that are based on two experimental EM 752 structures of the complex (PDBIDs: 6EZ8 and 6X9O, respectively). We used experimentally observed 753 cross-links to improve the sampling of the flexible regions of the model by introducing in the force 754 field a distance restraint term given by the following potential: