Structural basis of HMCES interactions with DNA reveals multivalent substrate recognition

HMCES can covalently crosslink to abasic sites in single-stranded DNA at stalled replication forks to prevent genome instability. Here, we report crystal structures of the HMCES SRAP domain in complex with DNA-damage substrates, revealing interactions with both single-stranded and duplex segments of 3’ overhang DNA. HMCES may also bind gapped DNA and 5’ overhang structures to align single stranded abasic sites for crosslinking to the conserved Cys2 of its catalytic triad.

by uracil DNA glycosylase (UDG) at stalled replication forks 4 . The authors suggested that these DNAprotein crosslink (DPC) intermediates prevented ssDNA breaks that may consequently occur upon cleavage by AP endonucleases, which could subsequently be repaired through error-prone pathways4.
Human HMCES has a highly conserved N-terminal SOS Response-Associated Peptidase domain (SRAPd) that is widely found in bacteria and eukaryotes, with occasional presence in certain bacteriophages and archaea 5 . Animal SRAP proteins have an additional C-terminal disordered extension with multiple copies of the PCNA-interacting motif (PIP) 4 . Gene-neighborhood analysis identified SRAPd as a novel component of the bacterial SOS response, associated with multiple components of the DNA repair machinery 5 . The SRAPd contains a highly conserved triad of predicted catalytic residues, namely Cys2, Glu127, and His210, which are believed to support autoproteolytic activity 5 . In addition, Cys2 was recently shown to mediate the DPC activity 4 . To better understand the mechanism of HMCES association with DNA, we crystallized the human HMCES SRAPd in its DNA-free form (Apo-SRAPd) and in complex with several DNA-damage substrates containing 3' overhangs of different lengths.
The crystal structure of SRAPd in complex with duplex DNA containing a three-nucleotide overhang at the 3' end (referred to here as SRAPd_3nt) revealed SRAPd binding to two DNA molecules: DNA-A interacts via the 3' overhang, and another molecule (DNA-B) via the blunt-end (Fig. 1a). Both DNA interaction surfaces are highly conserved.
SRAPd interacts with the 3' overhang of DNA-A through a hydrophobic shelf created by Trp81 and Phe92, which form pI stacking interactions with the duplex segment of DNA at the ssDNA-dsDNA junction (Fig.   1b, c). The ssDNA 3' overhang is sharply bent by ~90 degrees and lies in a narrow, positively charged cleft directing it towards the catalytic triad. The ssDNA-binding cleft includes conserved Arg98 and Arg212, which form salt-bridges with the phosphate backbone of ssDNA (Fig. 1b, d). Alanine substitutions of either of these Arg residues severely hinder ssDNA-binding (Supplementary Fig. 1), and are consistent with gelshift assays reported by Mohni et al. 4 The pocket housing the catalytic triad accommodates the 3'-OH of the ssDNA overhang (Fig. 1d). Mutating the catalytic triad residues independently yielded SRAPd variants with higher affinity for ssDNA compared to wild type (WT) protein, suggesting a role other than simply DNA binding (Supplementary Fig. 1). In the SRAPd_3nt structure, Cys2 is ~5.0Å from Thymine9 (T9) of the 3' overhang ( Fig. 2a). In the case of an abasic site, the deoxyribose moiety could freely rotate to within less than 3Å to crosslink with Cys2 at the catalytic triad site (Fig. 2a). This observation provides the structural logic for the recent demonstration of HMCES sensing and covalently crosslinking to abasic sites in ssDNA through Cys2 4 .
Mohni et al. 4 showed that HMCES forms DPC intermediates with abasic sites in ssDNA generated by uracil-DNA glycosylase (UDG), which is a monofunctional glycosylase that cannot cleave ssDNA. However, other variants of damaged bases require the use of bifunctional glycosylases with both glycosylase and lyase activities, such as NEIL3, which is a single-strand specific glycosylase with a limited lyase activity able to cleave ssDNA 3' to an abasic site to generate a 3' overhang 6 . Our structure indicates how SRAPd can recognize and potentially crosslink to abasic sites at the 3' end of ssDNA overhangs (Fig. 2a, d). By forming a stable DPC, SRAPd is thought to protect the ssDNA from error-prone DNA synthesis and nucleolytic degradation, thus safeguarding genome integrity 4 .
Our SRAPd_3nt structure also revealed that the blunt-end of DNA-B interacts with SRAPd via dsDNAinteraction site B, composed of residues Gly3, Arg4, Pro46, Asp47, W128 (Fig. 1e). This interaction surface represents a potential binding site for 5' overhang DNA, as SRAPd was shown to bind both 5' and 3' overhangs with similar affinities 4 . This dsDNA-interaction site B accounts for the remaining highly conserved residues of the SRAPd, suggesting that it is a universal functional feature of this domain. It is immediately adjacent to the catalytic triad and forms a contiguous, similarly charged surface with the ssDNA binding site (Fig. 1b). These features suggested that dsDNA-interaction site B may also be able to accommodate ssDNA extending from a longer 3' overhang substrate bound to the dsDNA-interaction site A.
To address this question, we determined the crystal structure of SRAPd with DNA containing a sixnucleotide overhang at the 3' end (referred to here as SRAPd_6nt). Although SRAPd has 10-fold higher affinity for ssDNA compared to dsDNA (Supplementary Fig. 1), the longer 3' overhang did not displace the blunt-end-interacting DNA-B from its dsDNA-interaction site B. Instead, the extra single strand bases protrude out of the catalytic triad pocket (Fig. 2c), (Supplementary Fig. 2). This suggests that the dsDNAinteraction site B has been specifically evolved to bind duplex DNA and may form the binding site for 5' overhang DNA structures as well. Nevertheless, given that DNA is a mediator of the crystal lattice in this crystal form, we cannot entirely rule out that a longer ssDNA might occupy the dsDNA-interaction site B in the absence of a competing duplex DNA.

5
In SRAPd_3nt, the distance between the 3' end of DNA-A and the 5' end of DNA-B at the catalytic triad is around 3.2Å, which is sufficient to accommodate a phosphate group linking the two substrates together (Fig. 2b). Consistent with our observations, the affinity of SRAPd to dsDNA with a 3-nucleotide gap is ~7fold higher than intact dsDNA of the same sequence (Supplementary Fig. 1). These data suggest the potential for binding other types of gapped DNA structures that form during DNA repair (Fig. 2d), such as nucleotide excision repair intermediates.
Our SRAPd structures also shed light on other proposed activities of HMCES. First, proteomics studies using dsDNA baits with modified cytosines identified HMCES as a reader for oxidised 5-methyl-Cytosine (oxi-mC) containing duplex DNA 7 . The SRAPd only contacts one base-pair at the ssDNA-dsDNA junction (Fig. 1b); hence SRAPd of HMCES either recognizes a single oxi-mC at this junction or alternatively in singlestrand regions. Second, HMCES was previously reported to recognize and cleave dsDNA with an oxi-mC modification in a metal ion dependent manner 8 . Our structure and the conservation pattern do not reveal any capacity in the SRAPd for metal ion binding. Accordingly, we and others 4 were unable to reproduce the HMCES nuclease activity. Taken together, our structures support an important role for HMCES in recognizing and sensing flapped and gapped DNA damaged products, and reveal its broad substrate recognition spectrum.

ACKNOWLEDGMENTS:
We are grateful to Dr. Haley Wyatt for fruitful discussions and comments about the manuscript. We thank Dr. Wolfram Tempel for collecting two datasets at the ALS beamline.

COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.

Protein expression and purification
Wild type and mutant variants of HMCES were subcloned into pNIC-CH vector by modifying the C-terminal tag with a TEV cleavable N-terminal His6-tag, and were expressed in E. coli Rosetta. All clones are sequence verified. The recombinant proteins were purified first by nickel-affinity chromatography and, after TEV cleavage of the His6-tag, by anion exchange and gel-filtration chromatography using S200 column. Purified SRAPd was concentrated to ~20 mg/mL in 20 mM Tris-HCl [pH 8.0], 150 mM NaCl, 2 mM tris(2-carboxyethyl)phosphine (TCEP). The sequences for all cloned constructs were verified by sequencing, and the corresponding molecular weight for all purified constructs were verified by liquid chromatography-mass spectrometry LC-MS.
DNA used for co-crystallization were purchased from Integrated DNA Technologies, Inc. For cocrystallization, purified SRAPd protein at 12 mg mL -1 was mixed, at a molar ratio of 1:1.