Summary
R-Smads are effectors of the transforming growth factor β (TGFβ) superfamily and along with Smad4 form trimers to interact with DNA. The 5GC-DNA complexes determined here by X-ray crystallography for Smad5 and Smad8 proteins corroborate that all MH1 domains bind SBE and 5GC sites similarly, although Smad2/3/4 MH1 domains bind DNA as monomers whereas Smad1/5/8 form helix-swapped dimers. To examine the relevance of the dimerization phenomenon and to exclude a possible crystallography-induced dimeric state, we studied these MH1 domains in solution. As in the crystals, Smad5/8 domains populate dimers and open monomers in equilibrium, whereas Smad/3/4 ones adopt monomeric closed conformations. We also found that swapping the loop1-sequence between Smad5 and Smad3 results in the chimera-DNA complex crystallizing as a monomer, revealing that the loop1-sequence determines the monomer/dimer propensity of Smad MH1-domains.
We propose that distinct MH1-dimerization status of TGFβ and BMP activated Smads influences the interaction with specific loci genome-wide by distinct R-Smad and Smad4 complexes.
Significance TGFβ- and BMP-activated R-Smads were believed to have different preferences with respect to the recognition of DNA motifs and to respond to specific activation inputs. However, recent results indicate that several types of R-Smads can be activated by similar receptors and that all Smads might recognize various DNA motifs. These results pose new questions as to why different types of R-Smads have been conserved for more than 500 million years if they could have a redundant function. They also raise questions as to how different Smad complexes recognize specific clusters of DNA motifs genome-wide.
Here, using structural biology approaches, we elucidate some of the rules that help define dimers of Smad-DNA complexes and propose how these complexes could influence the recognition of specific cis regulatory elements genome-wide.
Highlights R-Smads and Smad4 interact with GGCGCx and GTCT sites using a conserved binding mode.
Functional differences of TGFβ- and BMP-activated R-Smads are not exclusively related to DNA specificity.
Dimer/monomer propensities are detected in solution and in the absence of DNA.
Introduction
The gene responses activated by the TGFβ cytokine family (a term that includes the transforming growth factor β, bone morphogenetic proteins (BMP), Nodal, Activin and other members) play important roles in embryo development, apoptosis, tissue homeostasis, repair, and immunity (1, 2). These critical roles demand a high level of conservation and fidelity of the TGFβ signaling elements in healthy organisms (1).
The main TGFβ signal transduction mechanism is the Smad pathway, with Smad transcription factors being responsible for the transmission of the signals from the membrane receptor into the nucleus (3). Receptor-activated Smads (R-Smads) and Smad4 (Co-Smad) are versatile proteins. They all contain a DNA-binding domain (MH1) and a protein-protein interaction region composed of the linker and the MH2 domain (4, 5). The MH1 and MH2 domains are highly conserved across Smad proteins and along evolution, whereas the linker has a higher sequence variability. R-Smad linkers contain PY motifs and phosphorylatable Ser/Thr residues, which are recognized by cofactors containing pairs of WW domains (Supplementary Figure S1A) (6, 7). After being phosphorylated at the MH2 domains by the TGFβ receptor, activated R-Smads interact with Smad4 and define the canonical trimeric functional unit. Once in the nucleus, and upon linker phosphorylation, the trimeric Smad complex is ready to define a new set of interactions with cofactors and with cis regulatory elements, interactions that go on to modulate the outcome of the signaling network (8–10).
R-Smad proteins were believed to have different specificities regarding the recognition of DNA motifs and to respond to specific BMP- and TGFβ-activation inputs (11). Initial hypotheses suggested that the TGFβ-activated Smads (Smad2/3) and Smad4 showed a preference for the GTCT site (known as the Smad Binding Element, SBE), whereas the BMP-activated Smads (Smad1/5/8) preferred GC-rich motifs. However, the sequence conservation of the MH1 domains (Supplementary Figure S1B) and recent experimental evidences indicate that the separation between DNA binding preferences of R-Smads is subtler that initially thought. For instance, combined TGFβ and BMP receptors influence Smad1/5-driven responses (12) and the MH1 domains of Smad3 and Smad4 proteins interact —efficiently and specifically— with GC-rich motifs grouped in the 5GC consensus GGC(GC)|(CG) (13). This 5GC consensus is functionally relevant for TGFβ-activated Smads and for Smad4, and it overlaps with the palindromic and compressed 6-BRE site GGCGCC, previously defined as the GC-rich target sequence of BMP-activated Smads (14). Crystal structures of Smad2/3 and Smad4 bound to GTCT and 5GC sites, as well as those of Smad1 and Smad5 bound to the GTCT site, have been determined (13, 15–18). These structures reveal that all R-Smads and Smad4 MH1 domains are able to interact with the SBE, as well as with specific 5GC sites in vitro, and perhaps also in vivo, as the analysis of numerous ChIP-Seq experiments showed that cis regulatory elements bound by the different Smad proteins are enriched in clusters of SBE and 5GC sites (13). In all these structures, the Smad proteins interact with the 5GC and SBE sites using a distinctive binding mode. Notably, while keeping the same DNA binding characteristics, these crystal structures showed that Smad3 and Smad4 MH1 domains adopted closed conformations (13, 15, 18), whereas Smad1 and Smad5 domains adopted a dimer organization, where the α1 helix of one monomer was interchanged with the α1’ of a second monomer (16). To add more complexity to these observations, a structure of Smad5 bound to the palindromic compressed BRE has been determined (17). This model contains some parts that are not well-defined and shows the fewest specific hydrogen bonds between conserved protein residues and DNA bases of all Smad complexes.
In the search for new clues to clarify how BMP-activated Smad proteins interact with non-compressed GC sites and to decipher the characteristics that define monomers and dimers of MH1 domains, we focused our attention on studying the BMP-activated Smads, which are prone to form dimers in the presence of DNA. We set out to study the interaction of Smad5/8 MH1 domains with non-compressed GC sites using X-ray crystallography, as well as to examine the conformations of these MH1 domains in solution and in the absence of DNA. These structures confirmed that MH1 domains (dimers) of BMP-activated Smads interact efficiently with a non-compressed 5GC site using the canonical protein-DNA binding mode and covering seven base pairs with a single MH1 domain. Moreover, Nuclear Magnetic Resonance, Ion Mobility-Mass Spectrometry and SAXS (Small-angle X-ray scattering) revealed open monomeric and dimeric conformations in equilibrium, thereby indicating that dimers are also present in non-crystallographic conditions. The analysis of these complexes and of the domains in solution also allowed us to clarify the molecular basis for the dimer formation, which is dependent on the loop1 length and sequence. In fact, after swapping the loop1 sequence of Smad5 by that of Smad3, we shifted the monomer/dimer equilibrium towards a predominantly monomeric domain in crystals bound to DNA. Perhaps, the functional differences between Smads are independent from their capacity to interact with specific GC or GTCT motifs. Instead, these differences could have arisen from the dimerization propensity observed in Smad1/5/8 proteins, absent in Smad2/3 and Smad4.
In the context of full-length proteins, R-Smads form heterotrimers with other R-Smads or with Smad4 via contacts between their MH2 domains. Since not all possible R-Smads and Smad4 combinations are found in cellular experiments (19, 20), it is very tempting to suggest that other protein parts like the MH1 domains —and their capacity to form monomers or dimers— help define the selection of the Smad components for a given ternary complex.
Results
Smad5 and Smad8 complexes with the 5GC site
We first examined how Smad5/8 recognize 5GC-sites and if the recognition mode is similar to that described for other Smads (13, 16) or to the BRE-GC interaction (17). To test these hypotheses, we determined the structures of the Smad5 and Smad8 MH1 domains bound to the non-compressed 5GC GGCGC motif also using X-ray crystallography. For the constructs, we used the domain boundaries described in the Smad1/GTCT complex (16), which lack the first 10 protein residues since these residues were included in the Smad5/GTCT complex but were disordered (17). In both cases, the GTCT-bound Smad1/5 proteins were oriented as dimers and the interaction with SBE was unaffected by the presence or absence of these additional residues.
Before setting up the crystallization screenings, we study the protein-DNA interactions by EMSA assays (Supplementary Figure S1C) and observed that the interaction with 5GC motifs is in the same range of concentration as that observed for the SBE sequence. The best diffracting crystals were obtained with a 16bp dsDNA TGCAGGCGCGCCTGCA containing the 5GC sequence (underlined). These crystals diffracted at 2.31 Å and at 2.46 Å resolution for Smad5 and Smad8 MH1 domains, respectively. We solved the Smad5/5GC complex by molecular replacement using a model derived from the Smad1/GTCT complex (PDB: 3KMP) and then used the Smad5 complex to refine that of Smad8 bound to the same DNA. In both complexes, the asymmetric unit (ASU, space group P212121) contained a dimer of Smad MH1 domains bound to one 5GC site (Smad5 shown in gold and gray Figure 1A and Smad8 in violet and gray Figure 1B), with the α1 helix being swapped between monomers. Crystallization conditions, data collection and statistics are shown in Table 1.
The electron densities for the Smad5 and Smad8 proteins and the bound DNA, are well defined (Supplementary Figures S1D,E,F). Smad5 and Smad8 MH1 structures display all the characteristic features of MH1 domains (5, 13). They are composed of four helices (arranged as a four-helical bundle) and three anti-parallel pairs of short strands (β1-β5, β2-β3, and β4-β6) and the fold is stabilized by the presence of a Zn2+ tightly coordinated by three cysteines and one histidine.
The protein-DNA binding region comprises the loop following the β1 strand, and the β2–β3 hairpin (residues 70–83, highlighted in blue, Figure 1C). This hairpin contains Arg75, Gln77 and Lys82 residues, which are strictly conserved in all MH1 domains. These residues interact directly with the major groove through a network of hydrogen bonds (HBs) with the first four consecutive base pairs of the GGCGCg motif (shown in graphite, Figure 1C). The complex is further reinforced by a set of HB interactions between Ser79, Leu72, Gln77, (backbone atoms) and His101 and His102 (side chain) with G8’, G10’ and C3 bases (Figure 1C, middle and right). There is also a set of nine well-ordered water molecules bound at the interface of the protein-DNA-binding site that contribute to the stability of the complex (Supplementary Table S1). Similar interactions are observed for the Smad8-5GC complex (Figure 1D). When superimposed, the Smad5 and the Smad8 MH1 domains are nearly identical (Cα RMSD of 0.25 Å for 124 aligned residues) and the complexes are very similar to that of Smad1 bound to the GTCT site (3KMP, 123 aligned residues, Cα RMSD of 0.30 Å, Supplementary Table S2). The observed contacts are collected as a cartoon in Figure 1E showing that one bound MH1 domain covers the 3-CAGGCGC-9 area.
Overall these results show that Smad5/8 MH1 domain dimers are observed in complexes with 5GC motifs as well as with the SBE site previously characterized and corroborate that dimers seem to be a characteristic of these MH1-DNA complexes. Given the sequence conservation at the MH1 dimer interface, we predict that homo- and hetero-dimers of BMP-activated Smads can also occur in a cellular context.
SBE/5GC sites: One binding mode for all Smads
With the exception of the monomer/dimer arrangement of the MH1 domains, the protein-DNA binding interface of the Smad5 and Smad8 complexes is very similar to those of Smad4 and Smad3 bound to the same GGCGC motif (PDBs: 5MEY, 5OD6) (Figure 2A) (13). The similarity of 5GC complexes corresponding to all R-Smads and to Smad4 is reflected by the conserved pattern of interactions between the protein and the DNA and by the RMSD value of their Cα superimposition (Supplementary Table S2). Even the general topology of the major groove (the principal binding site of all complexes) is conserved between the different bound 5GC DNAs (Supplementary Figure S2A,B), as characterized using Curves (21). This analysis reveals that the major groove widths in these 5GC complexes are larger than in the complexes with the GTCT site, with all major groove depths being comparable.
These complexes also revealed that one MH1 domain is efficiently accommodated on one full DNA major groove. This is in contrast to the arrangement observed in the compressed BRE-Smad5 complex that shows a highly-distorted DNA due to the presence of two MH1 domains bound to six bp (Figures 2B and Supplementary Figure S2C,D). This is in contrast with the clear distinction of minor and major grooves observed in all SBE and 5GC Smad complexes. In this BRE complex (17), both minor and major DNA grooves show a similar depth and width due to the distortion introduced by the presence of two adjacent MH1 domains bound to the motif, an arrangement that could have perhaps been stabilized during the crystallization process.
Considering that cis regulatory elements bound by Smad proteins often contain clusters of SBE, 5GC and BRE sites (13), we propose that the most probable binding mode used by Smad proteins to recognize DNA in vivo is that of the 5GC and SBE complexes. It seems very unlikely that two MH1 domains would interact with a compressed BRE site —using half of their protein binding site and causing a high distortion to the DNA structure— if there is the possibility to interact with two neighboring sites (as shown in goldenrod in Figure 2B) using the full protein binding interface and perfect accommodation to the DNA major groove.
MH1 domain stability, a property independent from dimer or monomer propensities
In the literature, the presence of open/dimeric conformations of Smad1 has been related to the lower thermal stability of its MH1 domain with respect to R-Smads that populate closed-monomeric conformations (16). To study how the structural differences correlate with protein stability, we performed thermal unfolding experiments of the recombinant MH1 domains of Smad3, Smad4, Smad5 and Smad8. Average values obtained from triplicates revealed that melting point values of Smad8 (54.5 ± 0.2°C) are similar to those previously reported for Smad1, while the values of Smad3 and Smad5 are slightly higher and similar (57.8 ± 0.4°C), with Smad4 being the most stable (65.1 ± 0.2°C) (Figure 2C). Upon DNA binding, all MH1 domains are stabilized (2°C in the case of Smad4 and up to 6°C for the R-Smads).
The higher Tm of Smad5 (identical to Smad3 despite their distinct structural features) was surprising considering the high level of sequence/structural similarity of Smad5 with Smad1/8. Looking for an explanation to these results, we compared the sequences of Smad1/5/8 and observed that there are only eight sequence variations in these MH1 domains, most of them conservative (none at the dimer interface, Supplementary Figures S1B and S2E). The most different residue is Ile109 (loop 5), a position often occupied by hydrophobic residues in all Smad proteins including Smad5, but a Cys in Smad1 and Smad8. Introducing this mutation in Smad5 (Ile109Cys) caused a 3 °C decrease in the melting temperature, a Tm value close to that of native Smad1 and Smad8 domains. This residue (Ile109Cys) precedes Cys110, which is required for Zn coordination and perhaps the presence of two consecutive Cys residues might compete for metal coordination and facilitate some misfolding processes, as detected by the thermal denaturation experiments (Figure 2C).
All in all, these observations suggest that small differences in the domain sequence (such as Ile109 to Cys) might have a more prominent role in the stability of MH1 domains than dimer/monomer propensities only.
Smad5/8 MH1 domains display a dimer/monomer equilibrium in solution and in the absence of DNA
N-terminus helix-swapped dimers has been considered a crystallization induced state potentiated upon DNA binding. We hypothesized that if the Smad1/58 MH1 domains can associate as dimers in crystals, is because there might be an intrinsic flexibility solution of these dimeric domains, absent in the monomeric Smad4 and Smad2 previously characterized (13, 22). This flexibility will allow the presence of an ensemble of conformations, which can facilitate the interchange of structural elements between monomers and dimers. To clarify this hypothesis, we sought to analyze the different species in solution, using Small-Angle X-ray scattering (SAXS) and Nuclear Magnetic Resonance (NMR). We also studied the different species using Ion mobility coupled to Mass spectrometry.
The SAXS data obtained for both Smad5 and Smad8 indicated an interval of radius of gyration of 17.6-19.3 Å and 16.7-18.6 Å respectively, which is between 15.8 Å expected for a compact monomeric form and 23.6 Å for the fully formed dimer (Figure 3A). Therefore, in order to fit the experimental data accurately, we incorporated dimeric models, together with open and closed conformations of monomeric domains (Figure 3B). The analysis indicated that the best fit was consistent with an equilibrium of open monomeric particles, as well as dimers, and that closed and compact monomeric conformations were not abundant in solution (Figure 3C).
The presence of several conformations in solution was further analyzed by NMR. To this end we acquired backbone triple resonance experiments for the Smad5 MH1 domain. The analysis of the carbon chemical shift (CCS) values allowed us to identify all elements of secondary structure. However, the CCS values corresponding to the first helix indicate that, in solution and in the absence of DNA, this helix is slightly shorter and more flexible than expected for a compact MH1 domain structure (Figure 3D).
Furthermore, we also measured longitudinal, transverse relaxation times (T1 and T2), and heteronuclear 1H-15N-nuclear Overhauser effect (hetNOE) to characterize the relaxation properties of the domain. Similar T1 and T2 experiments were acquired for Smad3 for comparison. Weak signals as well as lower heteronuclear NOE values were also observed from some of the residues located at the end of the first α1 helix and in loop1, supporting the presence of several conformations. We also obtained average correlation times (τc) of 12.1 ns and of 13.5 ns at 850 and 600 MHz respectively in contrast to 9.8 ns for both Smad3 and Smad4 MH1 domains (13). Whereas values of 10 ns correspond to 15-kDa compact structures tumbling as monomers, we interpret the values obtained for Smad5 as an indication of the presence of larger conformations including monomers and dimers in equilibrium.
Finally, Smad5 dimers were also detected in gas phase by ion mobility mass spectrometry (IM-MS). The analyses of m/z and drift time values revealed that, Smad5 can populate monomeric, dimeric and even larger oligomeric conformations. All these distinct species are resolved at different drift times (Figure 3E, Supplementary Figure S3A-C), with monomer and dimer forms (more compact) travelling faster than oligomeric ones. It is worth noting that, under the same conditions, the most abundant MH1 species of Smad3 (Figure 3F, Supplementary Figure S3D) and Smad4 (Supplementary Figure S3E) were monomeric forms, while signals arising from the dimer conformations were detected as minority species. Attempts to measure Smad8 samples were unsuccessful under these experimental conditions since these samples lost the bound Zn2+after buffer exchange.
All together, these results indicate that the different studies of these proteins in non-crystallographic conditions are consistent with a high degree of plasticity of the MH1 domains of Smad5/8 present as dimers in equilibrium with monomers mostly populating open —and not closed— conformations. These results indicate that the monomeric or dimeric properties exhibited in the crystal structures of the different Smad/DNA complexes are also displayed by the domains in solution and in the absence of DNA.
Swapping the loop1 sequence between Smads converts MH1 dimers into monomers
Our findings regarding the presence of dimers in solution challenge the previous hypothesis that crystallization conditions enriched an atypical domain-swap conformation (16). Our results also point to the first helix and the loop1, which are swapped in the dimeric conformation, as the most flexible areas in the monomers (Figure 3D). When structures of Smad2,3,4,5,8 MH1 domains are superimposed, we observed the variable lengths of loop1 and of the α2 helix (Figure 4A). In fact, Smad proteins are highly conserved in metazoans and in general, the differences are more marked with respect to Smad4 than among the five R-Smads (4) (Supplementary Figure S4A). As highlighted in the figure, BMP Smads have four residues in loop1, whereas the same loop has six residues in Smad3 and in Smad4, and sixteen in Smad2 (22). Smad2/3 long loops can bridge the distance to accommodate the α1 helix packed to the same monomer, even for the α2 helix, which is one turn longer than in Smad4 (Supplementary Figure S4A). In the case of Smad3 (PDB 5OD6), its loop1 shows internal hydrogen bond contacts that help stabilize the turn, favoring the intramolecular packing of the first two helices (Figure 4B) (16). These hydrogen bonds are not essential since they are absent in the monomers of Smad2 (22) and in the Smad4 structures (PDB 5MEY), Figure 4C). In Smad1/5/8, the combination of the α2 helix as long as that in Smad2/3 and a two-residue shorter loop1 than in Smad2/3/4 explains the difficulties of these BMP activated sequences to maintain a compact and monomeric structure. Instead, they select dimers (Figure 4D, Supplementary Figure SB) and ensembles of partially unfolded and flexible monomer conformations as we detected by NMR and SAXS data in solution (Figure 3B). Actually, when we extended the loop1 length of Smad5 by introducing several Gly residues or by swapping the loop1 sequence with that of Smad3, we detected monomers as the main specie in native mass spectrometry for these two chimeric constructs (Supplementary Figure S3F).
Prompted by these results, we set to determine whether crystal complexes of these chimeric constructs select dimeric properties yet again. We found that both chimeras are now arranged as monomers in the crystals, with the same packing interface for helices 1 and 2 than in the dimer but with both helices belonging to the same monomer (Figure 4E). Moreover, in the Smad5/3 chimera, the loop1 sequence is well defined (Figure 4F, Supplementary Figure 4D-E). A similar monomeric packing is observed for the chimeric construct containing the Gly residues, but in this model, we could not fully trace the loop in the electron density map (Supplementary Figure 4F).
In summary, these structures confirmed that the propensity to form dimers or monomers is encoded by the loop1 sequence, which has been conserved during metazoan evolution, and not artifacts enhanced by the crystallographic conditions.
Discussion
In eukaryotes, the association between transcription factors (TFs) to form homo- and hetero-dimers is a common feature employed by many TF families (members of the helix–loop–helix (bHLH), leucine zipper (bZIP), Nuclear receptor (NR) and Nuclear factor-kappa B) (23–25). This capacity of association has implications in the regulation of specific cellular responses, in the stability of the proteins, in the optimal selection of DNA binding sites and in determining overall affinity. Smad proteins also follow this rule and associate with other Smads and with cofactors, ensuring the efficient interaction of these complexes with cis regulatory elements genome-wide (5, 13, 26).
Comparison of the Smad5/8 complexes determined here to those previously characterized for Smad2/3 and Smad4 indicate that all R-Smads and Smad4 are able to interact specifically with non-compressed GC motifs and SBE sites by means of a conserved binding mode, mostly using the β2-β3 hairpin. Only the long isoform of Smad2 shows additional contacts from residues in the E3 insert, exclusively present in this specific isoform (22). Indeed, the main difference observed in all MH1-DNA complexes is not in the recognition of DNA, but in the MH1 domain itself. Whereas TGFβ-activated Smads and Smad4 interact with DNA as monomers, BMP-activated Smads form dimers by swapping the α1 helix between two monomers (Figure 5A).
R-Smad and Smad4 interactions have been observed with over-expressed as well as with endogenous full-length proteins (27, 28). It is well accepted that these associations are driven via direct contacts of the conserved MH2 domains of R-Smads and a single Smad4 protein, as detected in the crystal structures of various complexes of MH2 domains (4, 5, 29, 30) However, despite the high level of MH2 domain conservation among different R-Smads, only complexes with the presence of Smad1/5/8 and either Smad2/3 or Smad4 proteins have been experimentally detected using full-length proteins (20, 21), suggesting that a second layer of selection might exist to favor the composition of some complexes over others.
One of these selection rules could include holding one dimer of Smad1/5/8 MH1 domains and one monomer of either Smad2/3/4 (whose MH1 domains cannot form dimers) thereby suggesting that these dimers may also occur in full-length Smad complexes, in native conditions. The formation of these complexes would be facilitated by the flexibility provided by the long linkers (80 residues) connecting the MH1 and MH2 domains and they should not compromise either, the formation of MH2 domain trimers as detected in crystals of isolated MH2 domains (5).
The monomer/dimer formation of MH1 domains would also have implications in the selection of several DNA sites recognized simultaneously by the complexes. In the case of Smad4 and BMP-activated Smad hetero trimeric complexes, the dimers of MH1 domains will select DNA sites separated by at least the distance of the two DNA binding sites in the dimer (a distance ≥ 60 Å) and with the Smad4 bound as close as 22 Å (the distance between two consecutive DNA sites that allows the interaction of two MH1 domains without steric hindrance, Schematic representation, Figure 5B). In contrast, Smad4 complexes with TGFβ-activated Smads (all MH1 domains monomers) can bind adjacent motifs, thereby allowing a much more compact interaction with DNA (Schematic representation, Figure 5C). These distinct structural features are in agreement with ChIP-Seq peaks where Smad binding motifs are distributed as clusters. Remarkably, clusters recognized by BMP-activated Smads contain few adjacent motifs whereas the clusters often contain several consecutive sites in TGFβ-activated ones (13).
Overall, our results reveal two new hypotheses for the function of Smad complexes in vivo. First, the composition of Smad complexes can be modulated by the association through dimers/monomers via the MH1 domains and not only through MH2 domain interactions. We propose that this feature has been among the keys to shaping two classes of R-Smad proteins since the origin of metazoans. The second hypothesis is related to DNA recognition. Although all Smad proteins can interact with the same DNA motifs, finding the optimal DNA sites for a given Smad trimer must fulfill certain specific spatial requirements. This implies that not all theoretically available DNA motifs can be recognized by a given complex and also, that a given Smad complex could interact with different promoters whose Smad binding sites are not always separated by the same distance. This versatility can explain how similar regions in the genome can be bound to different Smad complexes as reported in the literature (31, 32).
All findings available till now suggest that the selection of optimal DNA targets in a native context is the result of a collaborative approach between the different Smad complexes and bound cofactors. This selection process seems to be also modulated by the internal association of Smad proteins, where all components fit in order to fine tune the context-dependent action of BMP and TGFβ signals. Certainly, additional experiments, as well as structures of full-length Smad complexes bound to DNA, will finally illustrate how these different layers of interactions are defined and modulated.
Methods
Detailed methods on protein expression and purification, Differential Scanning Fluorometry, NMR experiments and analysis, crystallography experiments, TWIM-MS experiments, and SAXS data acquisition and analysis can be found in SI Appendix, Methods. The atomic coordinates have been deposited in the Protein Data Bank, Small-angle scattering data and models in the SASBDB database, and NMR assignments and chemical shifts in the Biological Magnetic Resonance Data Bank (BMRB).
Supplemental Information
Methods
Protein production and cloning
The Smad5 (Uniprot: Q99717-1, Ser9-Arg143), the Smad5_gly, Smad5_3 chimeric construct and Smad8 (O15198-1, Thr14-Pro144) domains were cloned using synthesized DNA templates with optimized codons for Bacterial expression (Thermo Fisher Scientific) and confirmed by DNA sequencing (GATC Biotech). Proteins were expressed fused to a N-terminal His-tag followed by a TEV or 3C protease cleavage site in E. coli BL21(DE3) or C41(DE3) pLysS and purified following standard procedures (13). Unlabeled and labeled samples were prepared using Luria Broth (LB) and minimal media (M9) cultures, respectively (Melford). D2O (99.95%, Silantes), 15NH4Cl and/or D-[13C] glucose (Cambridge Isotope Laboratories, Inc) were used to prepare the labeled samples (33). Cells were cultured at 37°C to reach an OD600 range of 0.6-0.8. After induction with IPTG (final concentration of 0.4 mM) and overnight expression at 20°C, bacterial cultures were centrifuged and cells were lysed (EmulsiFlex-C5, Avestin) in the presence of Lysozyme and DNase I and in PBS buffer at pH 7.5. The soluble supernatants were purified by nickel-affinity chromatography (HiTrap Chelating HP column, GE Healthcare Life Science) using a NGC™ Quest 10 Plus Chromatography System (Bio-Rad). Eluted proteins were digested with TEV or 3C proteases (at 4°C or room temperature respectively) and further purified by ion exchange chromatography using a HiTrap SP HP and size-exclusion chromatography on a HiLoadTM Superdex 75 16/60 prepgrade columns (GE Healthcare) equilibrated in 20 mM Tris-HCl buffer (pH 7.2), 80 mM NaCl and 2 mM TCEP.
Purified proteins were verified by Liquid chromatography-Mass Spectrometry (LC-MS) using an ACQUITY UPLC Binary Sol MGR LC system (Waters) equipped with a BioSuite Phenyl 1000Å column (Waters, 10 μm RPC 2.0×75 mm) at a flow rate of 100 μL/min. The column outlet was directly connected to the mass spectrometer, which acquired full MS scans (400-4000 m/z) working in positive polarity mode. Samples were eluted using a linear gradient from 2% to 5% B in 5 min and from 5% to 80% B in 60 min (A= 0.1% Formic Acid, FA, in water, B= 0.1% FA in CH3CN) and analyzed using the MassLynxTM Software, (V4.1.SCN704, Waters). The purity of the recombinant proteins was over 95%, as shown by the Mass Spec analysis.
Duplex DNAs
Duplex DNAs were annealed using complementary single-strand HPLC-purified DNAs. DNAs were mixed at equimolar concentrations (1 mM), heated at 90°C for 3 min and allowed to cool to room temperature for 2 h. DNAs (with and without fluorophores) were purchased at Biomers and/or at Metabion, Germany.
Differential scanning fluorometry
Experiments were performed in a StepOnePlus Real-time PCR System (Applied Biosystems) using 96-well plates (MicroAmp Fast 96-Well Reaction Plate, Applied Biosystems) and a 25 μL total volume (for each reaction). Melting curves were acquired in triplicates to determine the average melting temperature (Tm), and lysozyme and Smad4 MH1 domains were used as positive controls. All samples (0.5 mg/mL) were prepared in 50 mM Tris-HCl (pH 7.2), 150 mM NaCl and 2 mM TCEP. SYPRO orange dye (Sigma-Aldrich) was used at 10X starting from a 5000X dilution. The plate was sealed with optical quality sealing film (Platemax, Axygen) and centrifuged at 2000g for 30 s. Samples were equilibrated for 60 s and analyzed using a linear gradient from 25°C to 95°C in increments of 1°C/min, recording the SYPRO orange fluorescence throughout the gradient. Tm values were determined from the maximum value obtained from the first derivative of the sigmoid curve, using the Applied Biosystems® Protein Thermal Shift™ Software.
NMR chemical shift assignment and perturbation experiments
NMR data were recorded on a Bruker Avance III 600-MHz spectrometer equipped with a quadruple (1H, 13C, 15N, 31P) resonance cryogenic probe head and a z-pulse field gradient unit at 298 K. Backbone 1H, 13C and 15N resonance assignments were obtained by analyzing the 3D HNCACB and HN(CO)CACB experiment pair (33). Experiments were acquired as Band-Selective Excitation Short-Transient-type experiments (BEST) with TROSY and Non-Uniform Sampling (NUS) (34, 35). This strategy allowed us to unambiguously assign 110 of the 121 possible amides (131 residues, 10 of them prolines). Comparison of the Smad5 MH1 (Cα and Cβ) chemical shifts to reference values, as well as the 15N edited-NOESY data, corroborated the presence of bound Zn2+ and of four helices and six strands, characteristic of the MH1 fold. The strands are ordered as three anti-parallel pairs: β1β5, β2β3, and β4β6. The presence of many long-range interactions confirmed that, in the absence of DNA, the structure of the MH1 domain is well defined in solution. Chemical shifts have been deposited in the BMRB (entry 27548). For the titration experiments, HSQCs were recorded using a Non-Uniform Sampling (NUS) acquisition strategy to reduce experimental time and increase resolution.
T1 and T2 relaxation measurements were acquired using standard pulse sequences (34). The rotational correlation time of the Smad5 MH1 domain (τc) was calculated assuming slow molecular motion, τc larger than 0.5 ns, and only J(0) and J(ωN) spectral density terms contributed to the overall value.
where νN is the 15N resonance frequency.
Protein samples (400 μM for backbone experiments) were equilibrated in a 20-mM TRIS buffer containing 100 mM NaCl at pH7.2 supplemented with 10% D2O. Spectra were processed with NMRPipe (36) and MddNMR (35), and analysis was performed with CARA (37).
X-ray
High-throughput crystallization screening and optimization experiments were performed at the HTX facility of the EMBL Grenoble Outstation (38). Human Smad5 was concentrated to 5 mg/mL prior to the addition of the annealed DNAs (Metabion) dissolved in 20 mM Tris-HCl pH 7, 10 mM NaCl and 2 mM TCEP. The final molar protein DNA ratio was 1:1. Screenings and optimizations were prepared by mixing 100 nL of the complex solution and 100 nL reservoir solution in 96-well plates. Crystals of the complexes were grown by sitting-drop vapor diffusion at 4°C. Crystals were obtained with several DNAs of different lengths and in different conditions, and in all cases they were reproducible. Several datasets were acquired for the best diffracting crystals and were then analyzed. Final conditions for the three best diffracting complexes were optimized as follows:
Smad5: 0.2 M NaF, 0.1 M bis-tris propane pH 7.5, 20% PEG3350
Smad8: 0.2M NaF, 0.1 M bis-tris propane pH 8.5, 20% PEG3350
Smad5_3 chimera: 0.05 M sodium citrate pH 5.5, 22% PEG3350
Smad5_gly: 0.05 M Hepes pH 7.0, 21% PEG Smear Medium (PEG 2000, 3350, 4000 and 5000 MME).
All crystals were cryoprotected in mother liquid supplemented with glycerol and frozen in liquid nitrogen. Diffraction data for Smad5 and Smad8 complexes were recorded at the ESRF in Grenoble (France) (beamline ID30a3) and Smad5_3 chimera and Smad5_gly data at the ALBA Synchrotron Light Facility (BL13-XALOC beamline), Barcelona, Spain. The data were processed, scaled and merged with autoPROC (39). Initial phases were obtained by molecular replacement using PHASER (40, 41) from the CCP4 and PHENIX suites (42, 43) (search model PDB code: 3KMP) with anisotropic correction. REFMAC (44) phenix.refine (Liebschner, Afonine et al. 2019) and BUSTER (45) were employed for the refinement, and COOT (46) for the manual improvement of the models. For Smad5 mutants, the PDB-REDO server was used for the selection of data resolution cutoff (paired-refinement) and for the structure model optimization (47). Water molecules bound at the DNA-protein interface were selected when they participated in at least three hydrogen bonds (cutoff distance of 3.5 Å). Figures describing the structures were generated with UCSF Chimera (48).
TWIM-MS experiments
Experiments were performed using a Synapt G1 HDMS mass spectrometer (Waters UK Ltd., Manchester, UK). Mass spectra were acquired by positive nano-electrospray ionisation (ESI) using a Nanospray Triversa (Advion Biosciences Corpn., Ithaca, NY, USA) interface. To optimize the separation of the different conformers, travelling-wave drift times of selected ions corresponding to monomers and dimers of Smad MH1 domains (in 150 mM ammonium acetate buffer) were measured at wave heights of 7 V, 8 V, 9 V and 10 V and at a velocity of 300 m/s. Data acquisition and processing were carried out using MassLynx (v4.1) software. Drift time calibration of the T-Wave cell was performed using β-lactoglobulin (monomer, 18 kDa, and dimer, 37 kDa) from bovine milk. Reduced cross-sections (Ω’) were calculated from published cross-sections (49) and subsequently plotted against final corrected drift times (tD). Calibration coefficients were determined applying an allometric y = AxB fit. Experimental cross-sections were determined by measuring the drift time centroid for the molecular-related ions by means of Gaussian fitting to the drift time distribution (Prism v6, GraphPad Software Inc., California, USA).
SAXS data
Data were collected on samples of Smad5 and Smad8 MH1 domains at protein concentrations ranging from 0.96 to 10 mg/mL for Smad5 and from 0.97 to 7.69 mg/mL for Smad8_9. All samples were concentrated in 20 mM Tris buffer, 150 mM NaCl, and 2 mM TCEP, pH 7.2. Data were acquired at Beamline 29 (BM29) at the European Synchrotron Radiation Facility (ESRF; Grenoble, France). Protein samples were centrifuged for 10 min at 10,000g prior to data acquisition. Experiments on BM29 were collected at an energy of 12.5 keV and data were recorded on a Pilatus 1M detector at 10°C. For each sample and buffer, 10 exposure frames of 1 s were collected, and the exposure set was combined during data reduction to produce each SAXS curve. Buffer subtraction was performed after data reduction. Image conversion to the 1D profile, data reduction, scaling and buffer subtraction were done by the software pipeline available at the BM29 beamline. Further processing was done with the ATSAS software suite and Scatter (50). Guinier plot calculation (for the estimation of the radius of gyration Rg) was performed with PRIMUS, included in the ATSAS suite, using low q regions (qmax × Rg<1.3). Small-angle scattering data and models were deposited in the SASBDB database under entry codes SASDE32 and SASDE42 respectively. Regarding the possible conformations in solution, the following four states were considered plausible: the crystallographic dimer; the closed monomer present in other crystal structures of MH1 domains; and two open monomers where the extended N-term helix was considered either as in the crystal structure or flexible. Molecular modeling for the open and free structure was performed with the Rosetta modeling software suite, using the FloppyTail application (51) and starting from the MH1 crystal structures of Smad5 and Smad8_9 determined in this work (PDB: 6FZS and 6FZT respectively). For closed monomer models, we used Modeller (52) and the Smad4 MH1 structure (PDB: 3QSV), whereas the dimers were directly taken from the complexes of Smad5 and Smad8 (PDB: 6FZS and 6FZT). In all cases, DNA and water molecules were removed and secondary structure elements were restrained, except for the flexible N and C-terminal tails and the flexible N-term helix. For each state, one thousand conformers were simulated, in order to generate sufficient conformational sampling. Theoretical SAXS curves were calculated using CRYSOL (53) and compared to the experimental data, selecting the linear combination of the four states with the lowest chi-squared, as implemented in ATSAS.
The chi-squared metric for N data points was calculated using the equation:
Author Contributions
L.R., Z.K. and T.G. designed and performed most experiments and coordinated collaborations with other authors. L.R. and E.A. cloned, expressed and purified all proteins, L.R., R.F., C.T., N.M. and J.C performed EMSA and thermal denaturation experiments. L.R., E.A., T.G., T.C., P.M.M. and M.J.M. performed the SAXS and NMR measurements and analyzed the data. Z.K. B.B. and R.P. screened crystallization conditions, collected X-ray data and determined the structures. Z.K., R.P., T.G., J.A.M. and M.J.M. analyzed the structures. All authors contributed ideas to the project. M.J.M. designed and supervised the project, and wrote the manuscript with contributions from all other authors.
The authors declare no conflict of interest.
Data Deposition: NMR assignments and chemical shifts have been deposited in the Biological Magnetic Resonance Data Bank, BMRB entry 27548, and the Small-angle scattering data and models have been deposited in SASBDB, entries SASDE32 (Smad5) and SASDE42 (Smad8). Densities and coordinates have been deposited in the Protein Data Bank, entries (6FZS, Smad5) (6FZT, Smad8) (6TBZ and 6TCE, Smad5 chimeras).
Acknowledgments
We thank Dr. N. Berrow (IRB Barcelona) for some DNA constructs, The Automated Crystallography Platform staff (IRB Barcelona-CSIC) and the EMBL staff for assistance at the HTX facility (Grenoble), the joint EMBL and ESRF group for access to synchrotron beamlines ID29, ID23-1 and ID23-2 and the staff at the ALBA synchrotron (Barcelona) for access to the BL13-XALOC beamline. Thanks also go to Dr. J. Massagué for insightful suggestions and discussions, Drs. M. Díaz and M. Vilaseca (Mass Spectrometry Core Facility, IRB Barcelona) for support with the IM-MS data, Dr. M. Navia for suggestions on binding assays, and Dr. B. Brutscher, (Institut de Biologie Structurale, Biomolecular NMR Spectroscopy Group, Grenoble, France) for help with the acquisition of the NMR data at 800 MHz. We also thank J. Cordero for some preliminary experiments.
T.G. is a PhD students funded by a Severo Ochoa and by the BBVA. Z.K., R.F, R.P. and B.B are co-funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie COFUND actions of the EMBL, IRB Barcelona and the PROBIST and PREBIST Postdoc and Predoc Programmes (agreements EMBL_291772, IRBPostPro2.0_600404 and PROBIST_754510 and PREBIST_754558). M.J.M is an ICREA Programme Investigator. This work was supported by the Spanish MINECO program (BFU2014-53787-P and BFU2017-82675-P, M.J.M), IRB Barcelona and the BBVA Foundation. Access to the HTX facility at EMBL (Grenoble) was granted by the Horizon 2020 Programme iNEXT of the European Commission (grant 653706, title: Smad complexes), and to the NMR facility (Grenoble) by the Instruct Integrating Biology program (grant 2520, title: Monomer-dimer equilibrium in Smad proteins). Access to Bio-SAXS BM29 was part of the MX-1941 BAG proposal and to ALBA though the BAG proposal 2018092972.
We gratefully acknowledge institutional funding from the CERCA Programme of the Catalan Government and from the Spanish Ministry of Economy, Industry and Competitiveness (MINECO) through the Centres of Excellence Severo Ochoa award.
Footnotes
↵1 These authors contribute equally to the work