Structure-based alignment of human caspase recruitment domains provides a framework for understanding their function

Intracellular signalling is driven by protein-protein interactions. Members of the Death Domain superfamily mediate protein-protein interactions in both cell death and innate immune signalling pathways. They drive the formation of macromolecular complexes that act as a scaffold for protein recruitment and downstream signal transduction. Death Domain family members have low sequence identity, complicating their identification and predictions of their structure and function. We have taken all known human caspase recruitment domains (CARDs), a subfamily of the Death Domain superfamily, and generated a structure-guided sequence alignment. This alignment has enabled the identification of 14 positions that define the hydrophobic core and present a template for the identification of novel CARD sequences. We identify a conserved salt bridge in over half of all human CARDs and find a subset of CARDs likely to be regulated by tyrosine phosphorylation in their type I interface. Our alignment highlights that the CARDs of NLRC3 and NLRC5 are likely to be pseudodomains that have lost some of their original functionality. Together these studies demonstrate the benefits of structure-guided sequence alignments in understanding protein functionality.


Introduction 27
The Death Domain protein superfamily contains a collection of helical protein domains that provide 28 a central, and crucial, function in the formation of macro-molecular protein signalling complexes in 29 both cell-death and immune signalling pathways. There are four members of the superfamily: the 30 death domain (DD), the pyrin domain (PYD), the death effector domain (DED) and the caspase 31 recruitment domain (CARD). Each of these protein domains folds into a helical bundle which is 32 usually formed from six independent alpha helices (Kersse et al., 2011). 33 Most interactions between DD family members are homotypic in nature i.e. CARD with CARD. These 34 interactions are mediated by discrete interfaces on the protein surface known as type I, II and III. 35 Each of these interfaces consists of an 'a' and a 'b' component; one on each binding partner. For 36 example, a Type I interaction involves the coming together of a Type Ia interface on one protein with 37 a Type Ib interface on the other. The precise positioning of these interfaces means that potentially 38 every DD-type fold involved in the complex can form six distinct interactions. It is this multiplicity of 39 binding surfaces that helps to drive the formation of large signalling complexes such as the 40 Myddosome (Lin et al., 2010;Motshwene et al., 2009) and Death Inducing Signalling Complex 41 (Kischkel et al., 1995). Heterotypic interactions have been reported, but are not common. The first 42 2N7Z) and missing in NOL3 (PDB: 4UZ0). In fact, overall helix 6 showed variability between CARDs in 127 its length, its position relative to helices 1-5, and its contribution to the hydrophobic core. This 128 suggests that it is helices 1-5 that provide the critical structural and biological functionality of the 129 CARD family. 130 In total 17 different CARDs were used to generate a structure-guided alignment of the human CARD 131 repertoire (Supplementary Figure 1). Aside from the absence of helix 6 in RIPK2 and NOL3 discussed 132 above the hydrophobic core residues were conserved with the exception of residue h4. Despite 133 pointing into the hydrophobic core this residue is a serine in the RIPK2 structure and a histidine in 134 the Bcl-10 structure. This suggests that, at least in some positions, there is flexibility in the 135 composition of these residues, though it remains to be determined whether there is any functional 136 relevance to this variation. It should also be noted that the inherent structural variations in helix 6 137 means that these conserved hydrophobic residues could not always be assigned with complete 138 confidence, but there tended to always be two hydrophobic residues facing towards the protein 139 core. 140

All typical human CARDs can be incorporated into the alignment 141
Once all the human CARDs with solved structures had been aligned we added in the sequences of 142 the remaining 19 human CARDs using the presence of the core hydrophobic residues as a primary 143 guide for the alignment, complemented using the FUGUE fold recognition server. The resulting 144 alignment can be seen in Figure 5. NLRC3, NLRC5, GRP1 and TNFRSF21 were not included in the 145 alignment due to their extensive deviation from the classical CARD sequence and structure. 146 The majority of CARDs aligned well and could be fitted so that any insertions and deletions were 147 placed outside of the likely helix positions while still fulfilling the requirement to create a 148 hydrophobic core. Somewhat remarkably, the hydrophobic core identified in the preliminary 149 alignment ( Figure 1) showed a strong level of retention in all of the 36 CARD sequences contained in 150 the final alignment ( Figure 5). The final alignment is most reliable over the first five helices (marked 151 with numbers 1-5 in Figure 5) and for the conserved hydrophobic residues. The size of the inter-152 helical loops varies significantly between CARDs and is consequently more difficult to compare 153 directly. The most likely hydrophobic residues for positions h13 and h14 are labelled in Helix 6 but 154 structural information will be required to verify these in most cases. It has also been noted in 155 previous studies (Hu et al., 2014) that the initiating methionine can contribute to the hydrophobic 156 core, which suggests that the stability of recombinant constructs may benefit from beginning at the 157 start of the protein sequence. 158 Analysis of the full CARD alignment identified a small number of notable deviations from the 159 conservation of the hydrophobic core. These include: caspases-4 and -5 which both possess a lysine 160 residue at position h8; Bcl-10 and RIPK2 which contain a histidine and a serine at position h4 161 respectively; CIITA, which contains a serine at position h10; and the presence of a threonine at h5 for 162 NLRC4 and the highly similar group of CARDs formed from CARD9, 10, 11 and 14. We know that 163 RIPK2 (PDB: 2NZ7), Bcl-10 (PDB:2MB9) and CARD11 (PDB: 4LWD) still adopt a CARD-like structure 164 despite these substitutions indicating that the h4 and h5 positions have at least some flexibility in 165 terms of the biochemical properties of the residue situated there. It remains to be seen whether 166 caspase-4 and caspase-5 also adopt the classical CARD fold or whether the presence of a large 167 positively charged side chain results in structural distortion. Of course, it may well be that such 168 structural distortion is necessary for the CARDs of caspase-4 and caspase-5 to be able to act as 169 intracellular sensors of LPS (Shi et al., 2014). 170

Surface salt bridges are found in over half the CARDs 171
Eight of the sixteen structures analysed possess a salt bridge between an aspartate or glutamate on 172 Helix 2 and a lysine or arginine in the 4-5 loop ( Figure 6A, B). The alignment in Figure 5 shows that a 173 further 12 CARDs contain appropriately charged residues in these positions, suggesting that 20 of 174 the human CARDs maintain this salt bridge. 175 It is plausible that these salt bridges may be important structurally and functionally. Certainly their 176 potential formation needs to be considered when producing homology models and in the 177 interpretation of site-directed mutagenesis data. The use of electron microscopy to study death 178 domain filaments is becoming more common and in the absence of high-resolution monomeric 179 structures models of these filaments are built using homology models of the individual domains. In a 180 recent study (Lu et al., 2016) of the caspase-1 CARD filaments, homology models of the caspase-1 181 CARD were constructed based on the structure of the closely related CARD from CARD18 (ICEBERG; 182 PDB 1DGN). These were then docked to an electron density map and provided a basis for 183 understanding caspase-1 filament assembly. This study could arguably be further improved by the 184 inclusion of a salt bridge in these homology models. Figure 6C shows the significant differences in 185 residue position and surface electrostatic potential of caspase-1 homology models which either 186 maintain or break this salt bridge, changes which may influence functional interpretation. The Type Ib patch in all six of these proteins also contains an aspartate residue and it may well be 197 that tyrosine phosphorylation is necessary for the formation of a Type I interaction resembling that 198 seen between caspase-9 and Apaf-1. In this instance positively charged and negatively charged 199 surfaces interact around two key residues on each side (Qin et al., 1999). Further structural and 200 functional studies of CARD:CARD interactions may need to use expression systems other than E. coli 201 in order to investigate the importance of these phosphorylation events. 202 The CARDs from human and murine NLRC3 and NLRC5 are pseudodomains 203 The structure of the N-terminus of murine NLRC5 has been solved by NMR. It forms an atypical CARD 204 in that Helix 1a and Helix 6 are separated from the rest of the domain by extended loops and helix 3 205 is completely unstructured. Alignment of the human and murine NLRC5 sequences, based on the 206 murine structure, shows that while most hydrophobic residues are conserved in both species, h1, 207 h5, h8 and h9 are non-hydrophobic. The orthologous region of zebrafish NLRC5 contains the 208 expected hydrophobic residues in these regions and in general aligned more readily with other 209 CARDs ( Figure 8A). 210 NLRC3 was examined in a similar manner. The zebrafish contains many NLRC3-like genes and so the 211 NLRC3 orthologue from the platypus was used for comparison instead. Fold recognition analysis of 212 the human NLRC3 sequence with FUGUE suggests that whilst many of the conserved hydrophobic 213 residues are present in the human protein five -h2, h6, h8, h12, h14 -appear to have been altered 214 ( Figure 8B). The same pattern is seen for murine NLRC3 in which h1, h2, h5, h6, h12 and h14 are all 215 substituted. In contrast, the platypus orthologue contains all of the expected hydrophobic residues 216 and also appears to have the surface salt bridge commonly found in CARDs ( Figure 8B) which is 217 absent in both the murine and human NLRC3 CARDs. 218 The sequence deviation of the human and murine NLRC3 and NLRC5 CARDs, along with the non-219 typical structure of murine NLRC5 CARD indicate that they have become pseudodomains in humans 220 and mice and have most likely lost some of their original functionality. Little is known about the roles 221 of the NLRC3 and NLRC5 CARD, but our alignment shows apparent degeneration of these CARDs in 222 humans and mice compared to their orthologues from certain vertebrates. The future study of 223 CARD-containing proteins using other species might take these differences into account when 224 drawing conclusions about protein function. It also suggests that too many substitutions in the 225 hydrophobic core result in structural changes to the CARD fold. This has clear implications for 226 downstream CARD function. This observation may have further implications for the CARDs of 227 caspase-4 and caspase-5 which are substituted in the h1 and h8 positions and have uncertain helix 6 228 alignments. A disrupted or altered structure may help explain why these domains function as 229 intracellular LPS receptors rather than the usual homotypic interaction motifs and structural 230

scaffolds. 231
The hydrophobic core can be used for identifying potential novel CARDs 232 Through use of PSI-BLAST and jackhmmer it is possible to find candidate CARDs in protein databases. 233 However, the high sequence variability in the CARD superfamily can make these difficult to pick up 234 on and confirm. One example from our search is Tetratricopeptide Repeat Domain 3 (TTC3), which 235 came up as a low-confidence hit in multiple PSI-BLAST searches and was labelled as containing a 236 possible CARD by FUGUE (Supplementary Table 1). The TTC3 sequence fits into the constraints of the 237 conserved hydrophobic pattern ( Figure 8C) and no other domains are predicted to overlap the CARD 238 prediction, suggesting that it may indeed adopt a CARD-like fold, although of course ultimately 239 structural confirmation will be required. 240

Conclusion 241
We have used a combination of structure-based and sequence-derived information to create a 242 global alignment of the human CARD superfamily. Almost all human CARDs were predicted to 243 contain a set of fourteen hydrophobic residues and half of these also maintain a surface salt bridge. 244 Deviations were most common in helix 6 while some CARDs such as that from RIPK2 also had single 245 position exceptions to the hydrophobic pattern. The atypical structure of NLRC5, which is missing 246 three conserved residues, suggests that too many differences can significantly alter the CARD fold, 247 leading us to suggest that the CARDs of human and murine NLRC3 and NLRC5 are likely to be 248 pseudodomains; whilst providing a potential explanation for the ability of the caspase-4 and -5 249

CARDs to bind LPS 250
Our investigation highlights the potential importance of hydrophobic residues, salt bridges and 251 phosphorylation sites as a resource for successful design and interpretation of bioinformatic and 252 experimental studies on the CARD superfamily. In particular, homology modelling and site-directed 253 mutagenesis efforts may benefit from these observations. For example consideration of the 254 potential structural roles of charged surface residues may aid the choice and design of appropriate 255 expression systems to allow post-translation modifications and the production of functional, soluble 256 and stable protein. It is highly likely that such an approach is also applicable to the other protein 257 folds in the Death Domain superfamily. 258  acts as a molecular switch that controls the formation of speck-like aggregates and inflammasome 308 activity. Nat. Immunol. 14, 1247-1255. 309 Hu, Q., Wu, D., Chen, W., Yan, Z., Yan, C., He, T., Liang, Q., and Shi, Y.   The putatative CARD-containing protein TT3 shows conservation of the CARD hydrophobic core. In all panels residues contributing to the hydrophobic core are coloured red and presented in bold underlined text. Residues deviating from the hydrophobic core are light blue, bold and underlined. In panel B residues in the correct position to form a surface salt bridge are coloured orange (acidic), dark blue (basic) or purple (incompatible).