Structure-Based Survey of the Human Proteome for Opportunities in Proximity Pharmacology

Proximity pharmacology (ProxPharm) is a novel paradigm in drug discovery where a small molecule brings two proteins in close proximity to elicit a signal, generally from one protein onto another. The potential of ProxPharm compounds as a new therapeutic modality is firmly established by proteolysis targeting chimeras (PROTACs) that bring an E3 ubiquitin ligase in proximity to a target protein to induce ubiquitination and subsequent degradation of the target protein. The concept can be expanded to induce other post-translational modifications via the recruitment of different types of protein-modifying enzymes. To survey the human proteome for opportunities in proximity pharmacology, we systematically mapped non-catalytic drug binding pockets on the structure of protein-modifying enzymes available from the Protein Databank. In addition to binding sites exploited by previously reported ProxPharm compounds, we identified putative ligandable non-catalytic pockets in 188 kinases, 42 phosphatases, 26 deubiquitinases, 9 methyltransferases, 7 acetyltransferases, 7 glycosyltransferases, 4 deacetylases, 3 demethylases and 2 glycosidases, including cavities occupied by chemical matter that may serve as starting points for future ProxPharm compounds. This systematic survey confirms that proximity pharmacology is a versatile modality with largely unexplored and promising potential, and reveals novel opportunities to pharmacologically rewire molecular circuitries.


INTRODUCTION
Proteolysis targeting chimeras (PROTACs) are bifunctional small molecules that simultaneously bind an E3 ubiquitin ligase and a target protein, thereby inducing the ubiquitination and subsequent proteasomal degradation of the protein target 1 . This type of molecules has evolved over the past 20 years from a chemical biology curiosity to a promising therapeutic modality, with clear dosedependent degradation of therapeutic targets such as AR, IRAK4 or BTK observed in man (clinicaltrials.gov identifiers NCT03888612, NCT04772885, NCT04830137), and the question is no longer whether but when the first PROTAC will be approved for therapeutic use by regulatory agencies. With proof-of-concept in sight, the scientific community is now looking at novel ways to apply the concept of proximity pharmacology (ProxPharm), where chemically induced proximity between proteins can be used to rewire the molecular circuitry of cells for chemical biology applications or therapeutic benefit 2,3 . Indeed, ProxPharm compounds were recently reported that recruit a phosphatase, two kinases, an acetyltransferase, and a deubiquitinase to posttranslationally modify neo-substrates [4][5][6][7] .
Structural studies have shown that PROTACs are not simply acting as chemical linkers but rather stabilize non-natural protein-protein interactions between E3 ligases and target proteins 8 . Because compatible protein interfaces do not always exist between two proteins, a prevailing notion is that a collection of chemical handles binding a diverse array of E3 ligases will be necessary to productively induce the degradation of any given protein. Additionally, the tissue expression profile and subcellular localization of the E3 ligase must match that of the target protein for a PROTAC to be active. Finally, PROTACs recruiting E3 ligases with disease-specific tissue expression profiles can avoid adverse effects associated with the indiscriminate inhibition of the protein target. For example, a senolytic PROTAC exploits the restricted expression profile of the E3 ligase CRBN to avoid toxicity associated with the adverse inhibition of the target protein, Bclxl, in platelets 9 . Similar rules are expected to apply to ProxPharm compounds beyond PROTACs, emphasizing the need to identify chemical handles for a diverse array of protein-modifying enzymes.
To uncover novel opportunities for the development of future ProxPharm compounds, we searched for non-catalytic ligandable pockets (structural cavities that can be occupied by small-molecule ligands) in all experimental structures of human protein-modifying enzymes, including kinases, phosphatases, acetyltransferases, deacetylases, methyltransferases, demethylases, glycosyltransferases, glycosidases and deubiquitinases. The ligandability of E3 ligases was previously reviewed 10 and not considered in this analysis which is focused on opportunities for 4 proximity pharmacology beyond PROTACs 1,10-13 . We identified non-catalytic pockets in 287 human enzymes, including those recruited by previously reported ProxPharm compounds. This analysis further confirms the rich potential of proximity pharmacology for chemical biology applications.

Mapping binding pockets
A list of enzymes was compiled from the Expasy ENZYME database and the UniprotKB database and mapped to corresponding PDB codes. The 3D structures were extracted from the PDB and the biologically relevant oligomeric state was generated with ICM. The icmPocketfinder module was run against each converted ICM object using default settings. The pockets were categorized as non-catalytic based on the following two approaches.

Interpro domain analysis
The domain architecture of each enzyme was extracted from the InterPro database 14 . The domains were marked either as catalytic or non-catalytic based on GO ontology or literature. Residues within 2.8Å of the pocket mesh generated by ICM were considered as lining the pocket, and the N-and C-terminal boundaries of this selection were used to define a 'pseudo' sequence for the pocket. These sequences were aligned and compared with the domain architecture of the enzyme to determine the domain location of the pocket. If the pocket was in a manually curated noncatalytic domain, the pocket was marked non-catalytic. 5

Catalytic residues proximity analysis
For each enzyme, the corresponding catalytic residue information was extracted from either the Mechanism and Catalytic Site Atlas database 15 or UniprotKB database 16 . If the catalytic residues were present in the structure, the distance between the pockets and the catalytic residues were measured. If the pocket was more than 7 Å away from the catalytic residues, it was categorized as non-catalytic.

Additional filters
Nucleotide binding residues and co-factor binding residues information was extracted from the UniprotKB database to determine which pockets corresponded to nucleotide or co-factor binding sites. For example, the ATP binding site in protein kinases or the acetyl-CoA binding site in acetyltransferases. If the distance between the pocket and nucleotide/co-factor binding residues was less than 7 Å, the pocket was filtered out. If the pocket was in proximity (<5Å) of unresolved residues in the structure due to poor electron density, the pocket was not included for further analysis. If the catalytic residues were among the missing residues, pockets were excluded as well.
Pockets were also excluded when located at the interface of inhibitor proteins and enzyme complexes. Next, pockets were filtered for duplicates (when two structures representing the same enzyme had a similar pocket, the largest pocket was retained) and druggability. Druggability was determined using the pocket properties generated by ICM (volume: 1555.

Cysteine reactivity
The predicted reactivity of cysteine side-chains lining pockets was predicted using the ReactiveCys module of ICM. The method is based on reactivity data for 34 reactive and 184 nonreactive cysteines from isoTOP-ABPP (isotopic tandem orthogonal proteolysis activity-based protein profiling) 18 and a nonredundant set of PDB protein structures (resolution < 2.5 A) with covalently-modified cysteines (272 reactive).

RESULTS
To assemble a list of druggable binding pockets that may be exploited by ProxPharm compounds, all high-resolution structures of human protein-modifying enzymes beyond E3 ligases in the PDB were analyzed with the cavity mapping tool IcmPocketFinder (Molsoft, San Diego). Only structural cavities with properties (volume, area, hydrophobicity, buriedness and drug-like density (DLID)) within a pre-defined range (detailed in the Methods section) were deemed ligandable and were considered further. A permissive definition of ligandability was used to reflect the fact that chemical handles for ProxPharm compounds do not need to bind potently to their target. Indeed, ligands with up to 10 µM affinity have been successfully used to make PROTACs 19 . When a ligandable cavity was found in a non-catalytic domain, the domain was also deemed ligandable in the context of enzymes not in the PDB, but with a low confidence score. When enzymes were bound to other proteins in the PDB, cavities were also searched at the protein interface. Pockets that may be exploited by ProxPharm compounds could be divided into three categories: 1) those located in non-catalytic domains, 2) those found at non-catalytic sites of the catalytic domain, 3) those mapping at the interface of protein complexes ( Figure 1).

7
Potentially ligandable non-catalytic pockets were found in 188 kinases, 42 phosphatases, 26 deubiquitinases, as well as several writers and erasers of methyl, acetyl and glycosyl groups ( Figure 1, Table S1 and S2). In the following section, we review in detail each protein family.

Protein kinases
Ligandable non-catalytic pockets were found in the catalytic domain of 170 kinases ( Figure 1, Table S2). For instance, in 86 kinases, a pocket is found in the a-lobe of the kinase domain (  Multiple potentially ligandable cavities were also identified in non-catalytic domains of kinases. For example, a cavity was found in the non-catalytic C1 domain of 23 kinases such as BRAF, CDC42 binding kinases, or PKC kinases (Table S1, Figure 3). Binding of diacylglycerol to this pocket leads to translocation from the cytosol to the membrane of PKC kinases, and catalytic Other protein domains of potential interest were identified in human kinases, but even though cavities meeting our selection criteria were found, the general ligandability of these domains remains to be supported experimentally. For instance, 29 kinases contain an immunoglobulin-like domain ( Figure 1,3). Small molecule ligands were shown to bind to the immunoglobulin-like domain of the unrelated protein RAGE, but ligands were prohibitively weak 31 . Another 28 kinases contain both SH2 and SH3 domains (Figure 1,3), known to participate in the formation of an autoinhibitory state and contribute to substrate recruitment of Src family kinases. Despite sustained efforts, potent, drug-like, cell-penetrant ligands remain to be found for these domains.
Nevertheless, they may be sufficiently ligandable for the discovery of weak compounds that may serve as valid chemical handles for kinase-recruiting ProxPharms. In another example, the poorly characterized kinase STK31 includes a Tudor domain (Table 2, Figure 1), generally found in 11 proteins involved in chromatin-mediated signaling. This domain was targeted by a potent chemical probe in the context of the methyltransferase SETDB1 32 and may be ligandable in STK31. and PRKAA (PDB 6C9F 24 ). Details are provided in Table S1.

Protein phosphatases
Non-catalytic pockets were found in 43 protein phosphatases (Table 1). Among these, 40 were in the catalytic domain and 24 in juxtaposed domains (Figure 1). Some of the non-catalytic cavities were recurrently found in the phosphatase domain: 14 tyrosine-protein phosphatases share a cavity 15Å from the catalytic site ( Figure 4A, Pocket PP3), which, in the context of PTPN5, is occupied 12 by an allosteric activator (PDB 6H8R) 36 . Other recurrent cavities are found at five other locations of the catalytic domain and could potentially be exploited to recruit tyrosine-protein phosphatases to target proteins. Furthermore, 5 serine/threonine-protein phosphatases have 4 recurrent noncatalytic cavities in their catalytic domain ( Figure 4B). Non-catalytic pockets were also found at multiple protein-protein interfaces, including a cavity This hypothesis is further supported by the fact that PP2A was successfully recruited to 13 dephosphorylate the kinases AKT or EGFR by linking kinase inhibitors to peptidic ligands that exploit the tetratricopeptide repeat domain in PP2A5.  Details are provided in Table S1. (bromodomains are typically druggable (Figure 1) 63 , but no ligand was reported for these domains.
A recurrent pocket was also found in the catalytic domain of two protein arginine methyltransferases, PRMT3 and PRMT8, which is located more than 17Å away from the catalytic site ( Figure S1, Pocket M1). Other unique non-catalytic pockets were found in the methyltransferase domain of 3 PMTs (PRMT3, PRMT5, CARM1) (Table S2). These cavities met our ligandability criteria but so far, their chemical tractability was not validated experimentally.

Lysine demethylases
A number of non-catalytic domains of lysine demethylases include potentially ligandable pockets. and UTY and the SWIRM domain of KDM1A and KDM1B (Table S1, Figure 1), but no ligand was so far reported for these domains.

Lysine acetyltransferases
With over 3000 acetylated lysine side-chains across 1700 human proteins, acetylation is a ubiquitous post-translational modification involved in a diverse array of cellular machineries such as the regulation of gene expression, splicing or cell cycle 64,65 . Out of 35 lysine acetyltransferases in the human genome, we found non-catalytic ligandable pockets in nine (Table S2, Figure 1).
Several acetyltransferases include an acetyl-lysine binding bromodomain, five of which were crystallized in complex with multiple small-molecule ligands (EP300, CREBBP, KAT2A, KAT2B and TAF1) (Figure 7) 63 . A compound targeting the bromodomain of one of these, EP300, was chemically linked to an FKBP12-binding molecule to successfully induce the acetylation of FKBP12-fusion proteins by EP300, thereby confirming that acetyltransferases are amenable to proximity pharmacology, and strongly suggesting that bromodomain ligands could be used as chemical handles to recruit other acetyltransferases to neo-substrates 7 .

18
A WDR domain is also found in GTF3C4, a poorly characterized acetyltransferase (Table S1, Figure 1). The structure of this domain was not experimentally solved, but WDR domains are ligandable in the context of other proteins 30,67 , and this enzyme could potentially be harnessed for targeted acetylation.

Lysine deacetylases
Deacetylases have a limited number of non-catalytic domains and a ligandable site was found in only one of them: the zinc-finger ubiquitin-binding domain (Znf-UBD) of HDAC6 (Figure 7).
This binding pocket recognizes the C-terminal extremity of ubiquitin and was successfully targeted by small molecule ligands 68 representing excellent chemical handles for proximity pharmacology applications. Non-catalytic pockets were also found in the catalytic domain of three other deacetylases: HDAC4, HDAC8 and HDAC1, but the ligandability of these sites remains to be experimentally validated (Figure 9).

Deubiquitinases
Deubiquitinases (DUBs) typically remove ubiquitin tags deposited by E3 ligases. When these tags are signalling for proteasomal degradation, DUBs deubiquitinate and rescue their protein substrates from the ubiquitin-proteasome system and have a stabilizing effect on their target.
Chemical handles binding non-catalytic pockets of DUBs may therefore enable the recruitment of DUBs for targeted protein stabilization. As a proof-of-concept, a bifunctional molecule linking a ligand that covalently engages the DUB OTUB1 to a chemical moiety that binds ΔF508-CFTR in cystic fibrosis could stabilize ΔF508-CFTR in an OTUB1-dependent manner 6 . There is no structural information on the N-terminal domain of OTUB1 that is covalently recruited by this chimeric compound, but structures of other non-catalytic domains in DUBs reveal other opportunities for targeted protein stabilization.
The most recurrent ligandable non-catalytic domain of DUBs is the Znf-UBD, found in 11 ubiquitin-specific proteases (USPs, a class of DUBs) (Figure 7, Table S1). Low micromolar ligands were reported for the Znf-UBD of USP5, but these compounds were shown to inhibit the catalytic activity of USP5 and therefore cannot be used as chemical handles to productively recruit USP5 to neo-substrates 54,69 . However, the function of the Znf-UBD of DUBs is poorly understood in other USPs, and ligands targeting this domain may still be valid handles for targeted protein stabilization in the context of other DUBs.
Ligandable pockets were also found in a tandem ubiquitin-like domain located at the C-terminus of three DUBs: USP7, 11, 15 ( Figure 1, Table S1). In the context of USP7, this domain binds and activates the catalytic domain 73 . In the absence of structure of full-length USP7 in its activated 20 form, it is unclear whether ligands occupying this C-terminal binding pocket would preserve the activation mechanism of USP7 and could be used to productively recruit USP7 for targeted protein stabilization. Other non-catalytic domains present in deubiquitinases are an EF-hand in USP32 and a SWIRM domain in MYSM1. Chemical ligands have not yet been reported for these domains.
Non-catalytic pockets were recurring at six locations of seven USPs within the peptidase C19-type catalytic domain ( Figure 10A, Table S1). Another non-catalytic cavity is observed in the peptidase C12-type catalytic domain of UCHL1 and UCHL5 ( Figure 10B). As above, the ligandability of these pockets needs to be confirmed experimentally. 3, and POFUT2 were also found but, as above, their ligandability should be confirmed experimentally.

Glycosidases
Similar to glycosyltransferases, protein constructs have been developed using O-GlcNAcase or sialidase connected to nanobody to artificially induce deglycosylation [76][77][78] . There are limited structures and domain information available for glycosidases, but ligandable pockets are found in the catalytic domain of OGA and MAN1B1 that could be explored for deglycosylation-inducing chimeras.

Reactive cysteines
PROTACs covalently engaging an E3-ligase have demonstrated that covalent binding is a valid strategy for proximity-induced post-translational modification of target proteins [79][80][81][82][83]  Reactive cysteines were found in multiple proteins (Table S2). For instance, C576 is lining a pocket in the UBL domain of USP7 C-terminal to the catalytic domain, C210 is found at an ectopic site of the STK16 kinase domain, C266 at a non-catalytic site of the PP2BA phosphatase domain, and C1030 at a cavity remote from the active site of the deacetylase HDAC4 ( Figure 11). It would be interesting to screen such proteins with electrophilic fragments to find covalent adducts that may serve as a starting point for novel proximity-pharmacology applications. Figure 11. Examples of non-catalytic pockets with reactive cysteine residue lining the cavity.
Pockets are highlighted in red. Cysteine residues predicted reactive are colored in yellow.

DISCUSSION
Our systematic structural survey of the human proteome reveals numerous opportunities for the pharmacological recruitment of protein-modifying enzymes beyond E3 ligases to non-natural substrates. The predicted ligandability of a binding pocket can vary from one method to another and is not a conclusive metric. Here, we use a permissive definition based on volume, area, hydrophobicity, buriedness and DLID values. We first notice that this approach does retrieve binding sites for known ProxPharm compounds, including a protein-protein interface pocket used to recruit the kinase PRKAA ( Figure 2B, Pocket PKI1) 4 and a bromodomain pocket used to recruit the acetyltransferase EP300 (Figure 7) 7 .
Among the collection of binding sites that we compiled, the better validated are the ones for which a high-affinity ligand was already reported (Table S1, confidence level 1). For instance, V8benzolactams bind the C1 domain of protein kinase C 4,28 , UNC6934 binds the PWWP domain of NSD2 57 and compound R734 binds a protein interface of the kinase AMPK (Figures 3,7,2) 4,24 . A number of non-catalytic pockets were also found that are targeted by weak ligands that may be valid starting points for the development of ProxPharm compounds (Table S1, (Table S1).
Similarly, Tudor domains are found in demethylases (KDM4A, KDM4B, KDM4C) and protein kinase STK31 (Table S1), and share a canonical aromatic cage with the Tudor domain of SETDB1 targeted by a high-affinity ligand (KD 90 nM) 32 . Finally, sites that meet our ligandability criteria but for which no ligands were found in the protein of interest or close homologues are less reliable, but of potential interest (confidence level 4).
A limitation of our analysis is that we focused exclusively on the structures of enzymes that add or remove chemical or peptidic tags to proteins and are therefore related to E3 ligases in their functional mechanism. In the future, we believe it would be interesting to expand to other enzymes, such as proteases, or potentially to proteins beyond enzymes. Indeed, targeted recruitment of 25 proteins to specific protein interaction hubs may offer novel opportunities to regulate cellular machineries. We also limited our approach to proteins (and homologs) with structural information in the protein databank, but recent breakthroughs in protein structure predictions 87-89 may enable a future expansion of the analysis to the entire human proteome.
ASSOCIATED CONTENT Supporting Information. The following figures and tables are provided as supporting online information. Putative ligandable domains (defined by their Interpro ID) and associated list of human protein-modifying enzymes (Table S1); Ligandable non-catalytic pockets in protein modifying enzymes including acetyltransferases, deacetylases, methyltransferases, demethylases, glycosyltransferases, glycosidases, deubiquitinases, protein kinases and protein phosphatases (Table S2); Recurrent non-catalytic pockets in catalytic domain of Protein arginine methyltransferases ( Figure S1).

Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript