Lysine-deficient proteome can be regulated through non-canonical ubiquitination and ubiquitin-independent proteasomal degradation

The ubiquitin-proteasome system (UPS) removes damaged and unwanted proteins by attaching ubiquitin to lysines in a process termed ubiquitination. Little is known how functional components of the UPS, often exposed to erroneous labeling by ubiquitin during functioning, avoid premature proteolysis. An extensive lysine-less region (lysine desert) in the yeast E3 ligase Slx5 was shown to counteract its ubiquitin-dependent turnover. We conducted bioinformatic screens among prokaryotes and eukaryotes to describe the scope and conservation of this phenomenon. We found that lysine deserts are widespread among bacteria using pupylation-dependent proteasomal degradation, an analog of the UPS. In eukaryotes, lysine deserts appear with increasing organismal complexity, and the most evolutionarily conserved are enriched in the UPS members. Using VHL and SOCS1 E3 ligases, which elongate their lysine desert in the course of evolution, we established that they are non-lysine ubiquitinated, which does not influence their stability, and can be subject to proteasome turnover irrespective of ubiquitination. Our data suggest that a combination of non-lysine ubiquitination and ubiquitin-independent degradation may control the function and fate of the lysine-deficient proteome, as the presence of lysine deserts does not correlate with the half-life.


Introduction
Maintaining protein homeostasis (proteostasis) requires the degradation of damaged or unwanted proteins and plays a crucial role in cellular function, organismal growth, and, ultimately, viability 1,2 . A principal proteolytic component of the cellular proteostasis network is the ubiquitin-proteasome system (UPS) 3 . Enzymes operating within the UPS recognize the substrates destined for degradation and label them by covalent attachment of a small, evolutionarily conserved protein -ubiquitin 4 . Ubiquitination of a substrate requires a cascade of enzymes. The ubiquitin-activating enzyme (E1) hydrolyzes adenosine triphosphate (ATP) and forms a high-energy thioester bond between an internal cysteine residue and the C-terminus of ubiquitin, containing a di-glycine (diGly) motif. Activated ubiquitin is then passed on to the ubiquitin-conjugating enzyme (E2), which forms similar thioester-linked complexes. Finally, ubiquitin is covalently attached mainly to lysine sidechains of the substrate protein by a ubiquitin ligase (E3), which often directly interacts with the substrate 5 .
Mechanistically, two classes of E3s are widely characterized. HECT (Homologous to E6AP C-terminal) E3s form an intermediate thioester bond with the ubiquitin before catalyzing substrate ubiquitination. By contrast, RING (Really Interesting New Gene)/U-box E3s are molecular scaffolds that bring E2-ubiquitin and target protein in close proximity to facilitate the transfer of ubiquitin [6][7][8] . The majority of E3s belong to the RING family, with the largest acid sequence nor structure 23 . A feature shared with ubiquitin is the conserved diGly motif; however, it is not located at the C-terminus itself, as in processed ubiquitin, but followed by either glutamine or glutamic acid in all Pup sequences. A single enzyme, the Pup ligase, PafA 24,25 catalyses Pup attachment to substrates via isopeptide bonding. Pupylation is counterbalanced by a depupylating enzyme, Dop, that mediates isopeptide bond cleavage, releasing Pup from the modified protein 26,27 . The proteasome complex recognizes pupylated proteins due to the binding of Pup to the N-terminal coiled-coil domains of the proteasomal ATPase 28,29 . Pupylation-dependent proteasomal degradation is restricted to the Actinobacteria phylum, including pathogens of importance such as M. tuberculosis and M. leprae 21 .
E3s, while explicitly recognizing their target proteins, are known to be promiscuous when choosing the lysine residues and often modify multiple substrate positions in their region of action (ubiquitination zone) 30 . Furthermore, in the absence of target proteins, E3s can selfcatalyze ubiquitination (auto-ubiquitination), leading to their degradation 31 . For some, selfinitiated degradation is regulated by substrate availability and is probably a desirable feature evolutionarily consolidated 32 . Other E3s avoid this fate by recruiting suitable DUBs; examples include the protection of RNF123 by USP19 33 or the stabilization of viral ICP0 by recruiting cellular USP7 34 . E3s can also be targeted for degradation by ubiquitination performed by other E3s 31 . However, E3s targeting misfolded proteins do not require such moderation; their autoubiquitination and subsequent degradation would deplete the cell of valuable quality control factors 31 . Thus, proteolytic mechanisms may also threaten essential or functional proteins, including proteins that constitute the architecture of the UPS. Therefore, functional proteomes must have evolved mechanisms to prevent unintended proteolytic destruction.
An intuitive "strategy" of a proteome susceptible to premature ubiquitination is to avoid lysines in critical domains or entire sequences, potentially leaving a few whose ubiquitination can be precisely controlled. Indeed, an extensive lysine-less region (lysine desert) in the yeast SUMO-targeted ubiquitin ligase (STUbL) Slx5, constituting over 64% of its sequence, was shown to counteract its ubiquitin-dependent turnover 35 . A similar example is the San1 yeast E3 ligase, in which self-destruction is limited by the lack of lysines in disordered substrate-binding regions that could be immediately ubiquitinated after binding to the E2-ubiquitin complex 36 .
Extensive lysine deserts are also found in other protein quality control factors, including BAG6 37 . Moreover, many bacterial AB-type toxins exhibit unusual lysine depletion in their sequence 38 , presumably to avoid ubiquitination and degradation. Thus, lysine desert could be common in various organisms' proteomes and constitute an adaptation of functional proteomes to avoid premature turnover.
To measure the widespread of lysine desert phenomenon, we conducted bioinformatic screens among prokaryotes and eukaryotes, considering not only mere lysine desert appearance in sequence but also its conservation in orthologs, which allowed us to gain deeper insights into the evolutionary traits of this phenomenon and discover conserved lysine desert regions, pointing at their possible functional role. We also assessed in cellula the role of lysine desert in preventing the degradation of VHL (von Hippel-Lindau tumor suppressor) and SOCS1 (suppressor of cytokine signaling 1) substrate receptor subunits of the cullin-RING E3 complexes 39 . We demonstrated that the lysine deserts in VHL and SOCS1 undergo noncanonical ubiquitination, which likely serves regulatory-only purposes. On the other hand, their stability is proteasome-dependent but ubiquitination-independent, which probably accounts for the lack of correlation between lysine desert and the half-life of the equipped human proteome.

Results
We first aimed to define the lysine desert region regarding the absolute amino acid (aa) length and sequence fraction. To find such thresholds, we searched for continuous lysine-less regions using sliding windows of varying lengths, defined as either sequence fraction (Fig 1A) or nominal value (Fig 1B), in proteins (≥150 aa) from all UniProt 40 eukaryotic reference proteomes (1329 taxons; Table S1). Based on these results, we defined the lysine desert as a continuous lysine-less region either constituting min. 50% of a given sequence (hereafter lysine desert min. 50%) or min. 150 aa length (hereafter lysine desert min. 150 aa) (Fig 1C), as such defined lysine deserts occur in less than 10% of eukaryotic proteins.

Lysine deserts are widespread among bacteria with pupylation pathway
Next, we aimed to ascertain if the lysine deserts may have already emerged in bacteria that employ a pupylation and proteasome-dependent degradation pathway. We analyzed all available bacteria reference proteomes (for 8881 taxons; Table S1) from the UniProt database to check the fraction of proteins (≥150 aa) possessing a lysine desert min. 50%/150 aa in each proteome separately and averaged the results across different taxonomic classes. Interestingly, Actinobacteria possess the most proteins with lysine desert min. 50% in their proteomes (Fig   2A), and the same tendency is preserved for lysine desert min. 150 aa (Fig S1A). We also compared proteomes of the most studied bacteria taxons belonging to different taxonomic classes -M. tuberculosis H37Rv (virulent), M. smegmatis, C. glutamicum, S. ceolicolor, L. ferrooxidans, B. subtilis, and E. coli. From each aforementioned proteome, we again selected sequences ≥150 aa, and with no more than two predicted transmembrane helices (TMH) using the TMHMM-2.0 software 41,42 ; this condition was applied to exclude proteins with multiple transmembrane regions as they would introduce bias due to their reduced frequencies of polar residues 43 . Applying these criteria resulted in analyzing 58-71% of sequences (see Table 1 in Materials and Methods). Again, the most lysine desert-rich proteomes belonged to the Actinobacteria phylum that utilizes the pupylation route for protein breakdown, regardless of the applied lysine desert definition (Fig 2B, Fig S1B). It is noteworthy that the number of sequences with lysine deserts min. 50% was approx. 3-to 4-fold higher in taxons that encode and use the proteasome for regulated degradation of pupylated proteins (25.7% for M. tuberculosis, 26.2% for M. smegmatis and 35.4% for S. ceolicolor) than those that use Pupmodifications but lack proteasomal subunit genes (9.4% for C. glutamicum).
To further investigate the possible linkage of pupylation and lysine desert occurrence, we retrieved information on identified pupylated proteins (pupylomes) of M. tuberculosis, M.
smegmatis, and C. glutamicum from the PupDB database 44 , which gathers data from four largescale proteomics studies. Interestingly, only about 1% of these bacterial species' whole proteomes (unfiltered in any way) undergo pupylation (Table S2). Similarly as before, we selected sequences ≥150 aa with no more than two predicted TMH among pupylated and nonpupylated proteins of the taxons mentioned above (as non-pupylated we considered all proteins in a proteome except those reported in the PupDB) and searched for lysine deserts among them.
We noted that the fraction of sequences with lysine desert min. 50% among pupylated proteins was 1.52% for M. smegmatis, 2.56% for C. glutamicum, and 8.16% for M. tuberculosis (Table   S3). The presence of proteins with lysine desert min. 150 aa was also much more prevalent in M. tuberculosis (Table S4). Notably, this trend was not related to the average length of filtered sequences in both analyses. In line with the previous results, the fraction of sequences with lysine desert among non-pupylated proteins was approx. 3-(lysine desert min. 50%) to 4-(lysine desert min. 150 aa) fold higher in bacteria employing proteasome to protein turnover (Table S5, Table S6). This observation strongly suggests an as-yet-unidentified mechanism of Pup avoidance on lysines and, thus, proteasomal degradation, similar to yeast examples 35,36 .

Mycobacterium phages are equipped with lysine desert proteins
Phages are known to be highly specific toward their hosts, co-evolving with them to adopt successful hijacking strategies. We, therefore, decided to inspect whether the difference in lysine desert quantity between Actinobacteria and other bacteria phyla also finds reflection in phages specific to different genuses of bacteria. We selected 133 to 527 proteomes of phages specific to Bacillus, Escherichia, and Mycobacterium genuses and clustered them into separate pan proteomes containing 5158 to 7332 sequences ≥150 aa (see Table 2 in Materials and Methods). Interestingly, sequences of Mycobacterium phages contain approx. 4-(lysine desert min. 50%) to 5-(lysine desert min. 150 aa) fold more proteins with lysine desert compared to Escherichia phages and from approx. 12-(lysine desert min. 150 aa) to 22-(lysine desert min. 50%) fold more proteins with lysine desert compared to Bacillus phages (Fig 2C, Fig S1C).
This observation suggests an adaptive strategy by phages to minimize the pupylation of viral proteins, potentially avoiding their removal by the host cell.

Lysine deserts appear with increasing organismal complexity in eukaryotes
We performed a similar screen for the presence of proteins with a lysine desert among the proteomes of five model eukaryotic organisms: S. cerevisiae, C. elegans, D. melanogaster, M. musculus, and H. sapiens. Again, we excluded very short sequences (<150 aa) with more than two TMH predicted by the TMHMM-2.0 software; hereafter, we refer to those as filtered proteomes. This filtering procedure resulted in the analysis of 66-77% of the sequences (see Table 3 in Materials and Methods). We observed an ascending trend of lysine desert proteome coverage -fractions of lysine desert proteins constituted from 1.04%/2.38% of S. cerevisiae proteome to 3.86%/10.5% of H. sapiens proteome (lysine desert min. 50%/min. 150 aa, respectively) ( Fig 3A, Fig S2A).
Next, we wanted to assess whether the lysine desert regions are conserved among closely related orthologs. We performed a similar analysis of lysine desert coverage within all available Orthologous Groups (OGs; each OG contains sequences of analogous proteins) from the eggNOG5 database 45 of Saccharomycetaceae, Rhabditida, Drosophilidae, Rodentia, and Hominidae, which include model organisms used in the above described analysis. As previously, we excluded OGs of short, transmembrane, or unrepresentative proteins (see Table 4 for details). Again, we noticed that with the increase of organismal complexity, fractions of OGs with conserved lysine desert proteins were ascending, constituting 0.27%/1.04% of OGs in Saccharomycetaceae to 3.2%/9.72% in Hominidae (lysine desert min. 50%/min. 150 aa, respectively) ( Fig 3B, Fig S2B). These results indicate that lysine desert fractions expand with increasing organismal complexity, and these regions are conserved in homologous proteins, indicating their potential functional involvement.

Materials and Methods and
We also wanted to assess the uniqueness of the observed tendency of lysine desert ascendance with regard to the amino acid whose absence establishes the desert region. For this purpose, we analogously searched for regions devoid of each of the 20 amino acids constituting min. 50% of the sequence or min. 150 aa in the filtered proteomes of S. cerevisiae, C. elegans, D. melanogaster, M. musculus, and H. sapiens. We detected a similar gradient trend in isoleucine, asparagine, and aspartic acid for the desert min. 50% ( Fig 3C) and additionally in tyrosine and threonine for the desert min. 150 aa ( Fig S2C). Interestingly, there was no similar evolutionary ascendance of arginine or histidine desert regions, although these residues, similarly to lysine, yield a positive charge; in fact, we detected a reversed trend for arginine.
Next, we focused on deciphering whether lysine desert proteins are enriched or depleted in particular amino acid(s). We calculated the relative frequencies (each amino acid was normalized by its frequency in the population of a given filtered proteome) of 20 amino acids among the lysine desert proteins and their lysine desert region only in the filtered proteomes of S. cerevisiae, C. elegans, D. melanogaster, M. musculus, and H. sapiens. We noted that arginine was moderately enriched in M. musculus and H. sapiens among proteins with lysine desert min. 50% but not min. 150 aa. This may indicate that lysine-less regions could compensate for the lack of lysine with another positively charged amino acid, arginine.We also observed that proteins with lysine desert min. 150 aa and lysine desert min. 50% are enriched (19% -116%) in alanine, glycine (except for S. cerevisiae), and proline in all analyzed eukaryotic model organisms (Fig 3D, Fig S2D), which may point to the occurrence of low complexity regions within them 46 .
As lysine desert proteins show features characteristic of low-complexity regions we aimed to investigate their structural features in the human proteome. First, to obtain a picture of the preferred location of the lysine desert regions, we analyzed their distribution among all protein sequences (again, when referring to the human proteome, we mean its filtered version as described previously; data on lysine desert occurrence of a given type in each human protein are available in Table S7). Interestingly, regions of lysine desert min. 50% tend to occupy internal parts of the protein, avoiding the N-/C-terminus ( Fig 3E). This tendency, albeit not as pronounced, is also evident for lysine desert defined as min. 150 aa (Fig S2E).
Since the lysine deserts of yeast San1 and Slx5 E3s are mainly disordered, we sought to determine the structural status of lysine desert regions throughout the human proteome. For each protein, we predicted its disorder score based on its sequence using the IUPred3 software 47 and obtained the pLDDT (predicted Local Distance Difference Test) values, which estimate the modeling accuracy of each residue 48 , from the corresponding AlphaFold2 model 49,50 (see Materials and Methods). Both techniques have been demonstrated to be the gold standard for forecasting disordered regions 47,51 . Using a sequence-and a structure-based method, our approach allowed us to get unbiased and consistent results -lysine desert regions, either min. regions in sequence and the structure of the human proteome (its filtered version; we also excluded from the analysis proteins with more than 5% of residues without calculated solvent accessibility values) ( Fig 3G). Interestingly, the coverage of residues building structural lysine deserts among residues constituting sequence lysine deserts (in other words, the common residues between sequence/structural lysine desert) varies, with most pronounced cases where the longest lysine-less region constituting over 40% of the sequence does not overlap with the structural lysine desert in that protein ( Fig S2G). This indicates how important it is not to overlook the structures, as the information encoded in the sole sequence of a protein may provide a misinterpretation of data. Information on the length and residues building the three most extended structural lysine deserts, tabulated with information about the longest lysineless regions present in the sequence, can be found in Table S8.

Most evolutionarily conserved lysine desert proteins operate within the UPS pathway and tend to group their lysines into clusters
Yeast E3 ligases remain the only identified functional lysine deserts 35,36 . Therefore, we decided to establish if lysine-free sequences are exclusive to the UPS pathway. To this end, we analyzed the function of lysine desert proteins and their evolutionary conservation among organisms with comparable and vastly different levels of complexity. First, we checked which proteins of S. cerevisiae, the simplest eukaryotes studied here, possess conserved lysine desert min. 50% among their orthologs in the Saccharomycetaceae family. We used the same set of OGs from Saccharomycetaceae and the methodology of defining a conserved lysine desert as previously described (see Materials and Methods). As the criteria for recognizing OGs as conserved lysine desert-containing were very stringent, we found 10 such cases for the following proteins: Cue1, Mix17, Rad23, San1, Slx5, Tif6, Tir3, and YMR295C (gene names provided for S. cerevisiae) (see Table S9; summaries of OGs with conserved lysine desert min. 150 or 50% in Saccharomycetaceae, Rhabditida, Drosophilidae, Rodentia, and Hominidae are available at https://github.com/n-szulc/lysine_deserts 52 ). Notably, six of the listed -Cue1, Dsk2, Hlj1, Rad23, San1, and Slx5, regulate protein turnover ( Fig 4A).
Next, we aimed to compare the evolutionary conservation of lysine deserts min. 50% between analogous OGs of different eukaryotic families/orders (e.g., whether protein X with a conserved lysine desert in a given OG of Saccharomycetaceae also has a conserved lysine desert in an analogous OG of Hominidae; we term such analogous OGs as vectors). For this analysis, we examined the occurrence of conserved lysine deserts in the vectors of respective OGs (see Materials and Methods for details; complete results are available at https://github.com/n-szulc/lysine_deserts). We applied stringent criteria for the presence of lysine desert min. 50% conserved in OGs from Drosophilidae, Rodentia, and Hominidae among 4282 vectors of known orthologs from these families/order (for Saccharomycetaceae and Rhabditida, OGs either could be absent or, if present, conserved lysine desert min. 50% also needed to occur). We again found proteins responsible for protein degradation (7 out of 9 vectors; see Table S10) -E3s, ubiquitin-adaptor proteins, and other proteostasis components ( Fig 4B).
Interestingly, during our analyses we noticed that lysines in some lysine desert proteins (e.g., RNF126, RNF165, ubiquilins, HERPUD1, HERPUD2) occur within a relatively limited part of the sequence. We reasoned that such clustering of lysines while maintaining a large lysine desert region might have a regulatory role in turnover control. We estimated the frequency of lysine clusters within the lysine desert proteins to examine how this tendency relates to our global studies of selected model organisms' proteomes (again, their filtered versions as previously described). We arbitrarily defined a lysine cluster as ⌊80% * total number of lysines in sequence⌋ within 20% of sequence; applies only to proteins with min. two lysine residues (Fig 4C). Over 47% of human lysine desert min. 50% proteins possess a lysine cluster, and other model organisms also show high (31-49%) such co-occurrence ( Fig 4D).
However, this trend is much less pronounced for lysine desert min. 150 aa, where lysine cluster occurs in 10-16% of such proteins from the analyzed model organisms (Fig S3A). Relating to the global filtered proteomes, lysine clusters occur in 0.8-3.3% of proteins (Fig S3B), which indicates that the size of the desert may correspond with the existence of the lysine cluster. We also analyzed lysines' distribution among sequences of human proteins possessing lysine desert regions. Although lysine deserts favor internal parts of the protein (Fig 3E, Fig S2E), lysines show a much more bimodal distribution with peaks at N-/C-terminus ( Fig 4E, Fig S3C). Even among proteins with lysine desert min. 150 aa, which have a much flatter distribution of their lysine-less regions (Fig S2E), there is a visible tendency of lysines to localize at the C-terminus ( Fig S3C).
To evaluate molecular functions associated with lysine deserts and lysine clusters, we performed Gene Ontology (GO)-based overrepresentation analysis of genes derived from sets of human proteins with lysine desert min. 50%. We observed that molecular functions over-  (Table S11). We detected that 25 of them possess a lysine desert min. 50%, and within them, 14 also have a lysine cluster ( Fig 4G). Interestingly, when we analyzed E3s by their types, it was remarkable that the lysine desert and lysine cluster are typical features of the RING E3 ligases -12 out of 14 E3s with both lysine desert min. 50% and a lysine cluster were of this type; a similar trend also occurs for lysine desert min. 150 aa ( Fig 4G). In addition, lysine-deficient regions are primarily found in the disordered regions of the human RING E3s (Fig S3E, F).
Similarly, as for lysine deserts, we wanted to check whether lysine clusters co-occurring with lysine deserts tend to locate in disordered regions of proteins. Using the same approach as previously, we noted that lysines of lysine clusters are more structured than other lysines among human proteins with lysine desert ( Fig 4H). Intriguingly, many human E3s, such as aforementioned highly conserved RNF126 and RNF165, but also other conserved among mammals, e.g., RNF6 (lysine desert of 527 aa, constituting 77% of the sequence), RNF12/RLIM (lysine desert of 477 aa, constituting 76% of the sequence), or RNF44 (lysine desert of 350 aa, constituting 81% of the sequence) aggregate their lysine residues within the RING domain, which interacts with E2s. Presumably, owing to the lysine clusters in the functional domains, E3 can undergo precise auto-ubiquitination without modification in the vast lysine desert region, which i.e., could affect substrate binding.

CRL substrate receptors are deprived of lysines in the course of evolution
The restricted availability of orthologous sequences from evolutionarily distant organisms (such as invertebrates and mammals) poses a challenge for comparative analyses. Moreover, searching only for long lysine-less regions may prevent the detection of proteins that lost lysines during evolution but do not have lysine-free stretches long enough to surpass the predetermined threshold (e.g., the threshold for considering protein as containing a lysine desert min. 50%). For these reasons, we searched for E3 ligases in the filtered human proteome possessing max five lysines (Table S7), excluding those described in the UniProt database as membrane-bound (even single-pass). We tabulated the obtained lysine-poor E3s with their distant vertebrate orthologs from D. rerio, X. tropicalis, and G. gallus, as well as from more closely related M. musculus, based on the information from the eggNOG5 and Xenbase 53 databases (Table S12). Out of 14 such human E3 ligases, eight showed significant lysine "desertification" (from 2-up to a 15-fold decrease of lysines) in the course of evolution ( Fig   5A). Notably, the majority (seven out of eight) of such E3s act as substrate receptors subunits of CRLs. While operating within the complexes, these proteins risk being labeled by ubiquitin during binding client proteins and bringing them into the vicinity of the E2 enzyme for ubiquitination.
Since lysine is one of the most solvent-accessible amino acids 54 , we next aimed to determine if the few lysines of the abovementioned CRL substrate receptors subunits are also exposed to solvent. We performed the solvent accessibility analysis precisely the same as for the structural lysine desert search (see Materials and Methods for details) based on the AlphaFold2 models of selected lysine-poor CRLs substrate receptors subunits, as their experimental structures, except for VHL, remain unsolved. Interestingly, not all their lysines are solvent accessible, making them even less prone to lysine-dependent ubiquitination, with the most extreme case of SOCS1, which decreased its lysines' content 15-fold and possesses only one but buried lysine ( Fig 5B, Table S13). VHL complex-free monomer has all its lysines exposed ( Fig 5B), but when it associates with the elongin B, elongin C, and cullin 2, access to one or two lysines may be restricted (Fig 5C), likely making them inaccessible for any modifications. Moreover, the burial of lysines within the complex might be exploited by other CRL substrate receptors; therefore, their lysine solvent accessibility could differ from the one calculated for their models of complex-free monomers. The above analyses may indicate an evolutionary pressure to limit the ubiquitination of these CRL substrate receptors.

Lysine desert human proteins, SOCS1 and VHL, undergo non-canonical ubiquitination and ubiquitin-independent proteasomal degradation
To assess the involvement of the UPS in the regulation of CRL substrate receptors subunits, we measured the ubiquitination of SOCS1 and VHL, as well as their all-lysine-deficient variants (VHL K159R, K171R, K196R; SOCS1 K118R), in living HEK293 cells. To this end, we utilized the NanoBRET technology (Promega), which involved the transient expression of NanoLuc fusions (at the C-or N-terminus) of SOCS1 and VHL variants and an N-terminal HaloTag-ubiquitin fusion, followed by the NanoBRET assay, the output of which is a signal that increases proportionally to the degree of ubiquitination. Both wild-type SOCS1 and VHL, as well as their lysine-free variants, exhibited comparable ubiquitination levels. Interestingly, inhibition of the proteasome achieved by treating the cells with MG132 inhibitor (cell viability assessed with CellTiter-Glo following treatment was unaltered; see Fig S4A, B) did not result in the accumulation of ubiquitinated SOCS1 and VHL variants, which may suggest that they undergo non-lysine ubiquitination, which does not contribute to proteasome-dependent degradation (Fig 6A, B).
To determine the turnover of SOCS1 and VHL and their lysine-free variants, we carried out cycloheximide chase assays (CHX). Here, we assessed the levels of SOCS1 and VHL variants tagged with High BiT (HiBiT, 11 amino acid tag; Promega) in HEK293 cells and detected their levels with a reagent containing the complementary peptide Large BiT (LgBiT; 17.6 kDa; Promega), which binds to HiBiT with high affinity generating luminescence. This provides quantitative protein abundance measurements in a linear dynamic range. Noteworthy, there is no risk of introducing additional ubiquitination sites when tagging proteins with HiBiT, as the HiBiT tag was validated as not prone to ubiquitination (information from the Promega R&D Department). We did not detect significant differences in the stability of SOCS1 and VHL variants during the CHX chase. With this assay, we also determined if inhibition of ubiquitination via the E1 inhibitor (TAK-243) 55 , or the proteasome inhibitor (MG132) impacts the wild-type and lysine-free SOCS1 and VHL degradation rates. Interestingly, E1 inhibition did not affect the stability of these CRL receptors, in contrast to the proteasome inhibition, where there was a significant accumulation of SOCS1 and VHL wt and lysine-free variants ( Fig 6C, D). CHX assays and NanoBRET ubiquitination measurements revealed that the lysine deserts in SOCS1 and VHL do not prevent their ubiquitination but enable non-canonical ubiquitin labeling. In addition, these modifications are rather regulatory and do not increase proteasome targeting. Nevertheless, irrespective of ubiquitination, SOCS1, and VHL are degraded by the proteasome.

Lysine deserts do not correlate with human proteins half-live
Since our experiments demonstrated that lysine deserts do not reduce the possibility of proteasomal degradation, we investigated whether there is a correlation between the half-life and the presence of lysine deserts in cell-specific human proteomes. To this end, we obtained protein turnover datasets from two large-scale proteomic studies 56,57  compare the nominal length of the lysine desert region or its fraction of the protein sequence (Fig 7, Fig S5). Together with ubiquitination and turnover analyses of SOCS1 and VHL, these global analyses suggest that the presence of lysine deserts does not contribute to the half-life of equipped human proteins, as they may be subject to efficient non-canonical regulation by the UPS system.

Discussion
A functional components of the UPS exposed to ubiquitination should be equipped with mechanisms to prevent accidental proteolytic destruction. Selective pressure to avoid lysine residues, especially in intrinsically disordered regions, may underpin a strategy to avoid redundant ubiquitination of ubiquitin-proteasome components. To determine this possibility, we designed and implemented a bioinformatic pipeline to quantitatively search for lysine-free regions and investigate their evolution and functional roles among orthologous prokaryotic and eukaryotic taxa.
We noted that the abundance of lysine deserts in Actinobacteria is most prevalent among species that possess proteasomes and utilize a pupylation pathway. The Pup-proteasome system (PPS) plays a key role in mycobacterial stress responses 58 . For example, nitrogen starvation or reactive nitrogen species secreted by host macrophages in response to infection by M. tuberculosis induces PPS 58,59 . Perhaps the abundance of lysine deserts in the M. tuberculosis proteome promotes the feasible degradation of only specific nitrogen metabolic network components, such as the HrcA repressor of chaperonin, which promote the nitrite reductase NirBD to assimilate nitrogen from nitrate 60 . Furthermore, we found that Mycobacterium phages' sequences contain several to dozens of times more lysine-depleted proteins than phages of bacteria not equipped with the PPS. Possibly this contributes to limiting pupylation and degradation of phages' proteins enabling more efficient infection or killing of the mycobacterial host exposed to stress conditions.
We performed a similar screen for the presence of proteins with a lysine desert among the proteomes of model eukaryotic organisms and identified many E3s, ubiquitin-adaptor proteins, and other components of the cellular proteostasis system containing long sequence stretches completely devoid of lysine. One example of the latter is the ubiquitin-like (UBL) domain of BAG6. Kampmeyer and colleagues showed that introducing lysine residues into the BAG6 lysine-free sequence leads to increased ubiquitination and proteasomal degradation driven by its associated partner, E3 RNF126 61 , which itself is also an example of a protein with highly conserved lysine desert 62 . We observed that in human E3s, predominantly among the class of the RING ligases, lysine-depleted regions are present primarily in disordered regions ( Fig S3D, E). Perhaps avoidance of lysine modification by ubiquitin in these parts is required for their localization, conformation, activity and substrate binding 63,64 . In addition to the UPSrelated proteins, we identified several components of the multisubunit molecular complexes, such as NF-Y and the RNA exosome complex, equipped with an extensive lysine-deficient region. Notably, in NF-Y, only NFYA protein is devoid of lysines, whereas, in the RNA exosome complex, only two of its nine subunits, EXOSC4 and EXOSC6, contain extensive lysine deserts. We speculate that these lysine-deficient proteins may serve as a homing element for the UPS to target other components of the complexes or act as a stable seed-initiating protein (de)complexation.
Interestingly, many lysine desert E3s aggregate their lysine remnants within the RING domain, which interacts with E2. This may indicate pressure for specific auto-ubiquitination in the ordered cluster zone but not in the disordered lysine-free region. The example of RNF12/RLIM, a 624 aa long E3 ligase with a large disordered lysine desert undergoing intensive auto-ubiquitination on the RING-localised lysine cluster while sparing the remaining part responsible for sorting and substrate binding, seems to support this hypothesis [65][66][67] . However, we did not observe a correlation between the extent of lysine deserts and protein half-life in human protein turnover data sets from proteomics studies (Fig 7, Fig S5). This may indicate that a lysine desert does not generally increase protein stability but only reduces lysine ubiquitination. Correspondingly, lysine desert in proteasome substrate shuttle RAD23A protects against its ubiquitination but does not affect proteasome degradation 62 . This raises the question of whether the UPS can use non-lysine ubiquitination or proteasomal degradation independent of ubiquitination to regulate the lysine-deficient proteome.
To investigate the turnover of lysine desert proteins in relation to non-canonical regulation by the UPS, we conducted studies with human VHL and SOCS1 proteins and their lysine-free variants. Using the NanoBRET technology, we performed measurements of intracellular ubiquitination of VHL and SOCS1 variants in HEK293 cells under normal growth conditions and noted that they are subjected to modification by ubiquitin at non-lysine positions. In a quantitative kinetic degradation assay based on the HiBiT tagging system, we showed that VHL and SOCS1, and their lysine-free variants, displayed similar turnover rates, which did not depend on ubiquitination, but on proteasomal activity. However, this result is not entirely consistent with previous studies. Pozzebon and colleagues showed that the expression of Gam1, an adenoviral protein, induces VHL ubiquitination and degradation that depends on cullin 2 and cullin 5 71 . However, these studies did not characterize the type of ubiquitination nor track VHL turnover in the case of global ubiquitination inhibition by the E1 inhibition. Wu et all suggested that SOCS1 is subject to ubiquitination which is enhanced by proteasome inhibition and regulated by CUEDC2 72 . Yet, their experimental approach, which makes the analysis of non-lysine modification virtually impossible, indicates extensive polyubiquitination of SOCS1, despite only one lysine in this protein. Moreover, SOCS1 pull-down approach used in this study did not exclude the scenario that these modifications involve substrates bound by SOCS1. Thus, the results of both reports do not preclude our conclusions, which are based on a methodology that allows us to show that the lysine-deficient proteins can undergo non-lysine ubiquitination and proteasomal degradation independent of ubiquitination.
Supporting this assumption, recent Trim-Away assay results showed that the TRIM21 E3 efficiently degrades lysine-less substrates, potentially in tandem with a non-canonical ubiquitination mechanism 73 . This suggests that UPS may recruit specialised E3s that control the abundance of lysine-deficient proteins.
The presence of disordered regions in proteins that interact with the proteasome is a necessary structural requirement for ubiquitin-independent degradation 74 . Yet, VHL and SOCS1 are virtually devoid of such elements and should therefore have other features that favour ubiquitin-independent proteasomal degradation. However, as most lysine desert proteins have a high content of disordered regions, we hypothesise that they can undergo proteasomal degradation independent of ubiquitination. We envisage that our analysis and observations provide the foundation for a deeper understanding of the origin and function of lysine deserts and the non-canonical regulation of such a proteome by the UPS.

Declaration of interests
The authors declare no competing interests.     Bar plot of the number of exposed and buried lysines in human E3 ligases possessing max. five lysines. Relative solvent accessibility, based on which residue is classified as exposed or buried, was calculated using the DSSP program and the Sander method (see Materials and Methods for details). Residues with RSA >0.2 are considered as solvent-exposed. Human CRL substrate receptors are marked in magenta. (C) VHL in complex with cullin 2 (PDB ID: 4WQO). Color codes explanation: yellow -VHL, green -elongin B, blue -elongin C, greycullin 2, magenta -lysine residues (three in total) of VHL. Visualized in the PyMOL software (Schrödinger) (v. 2.5.0).      Table S1. Proteomes from the UniProt database used in the lysine desert analyses.

Analysis of lysine deserts in bacteria and eukaryotic proteomes
All bacteria and eukaryotic reference proteomes (for 8881 and 1329 taxons, respectively) were downloaded from the UniProt database 40 (from the FTP repository; data obtained on 25.05.2022; see Table S1 for the summary of downloaded data). In all performed analyses, sequences <150 aa were excluded. For selected taxons, namely, M. tuberculosis H37Rv  41,42 . For analyses concerning these taxons, proteins with a predicted number of TMH >2 were excluded. Summary of a number of sequences prior and after filtering can be found in Table 1 and Table 3. Sequences with a lysine desert of a declared type (lysineless region of min. 150 aa or constituting min. 50% of the sequence) were counted in each proteome and a fraction of sequences with a given lysine desert was reported for each taxon or averaged for proteomes of the same bacteria class (only classes with at least 10 taxons were considered).

Analysis of bacteria pupylomes
The dataset of pupylated proteins of M. tuberculosis, M. smegmatis, and C. glutamicum was downloaded from the PupDB database 44 (data obtained on 04.08.2022). As some of the UniProt IDs in the obtained dataset were obsolete and could not be mapped directly to the UniProt reference proteomes of selected taxons, the UniProt Retrieve/ID mapping tool 40 was used to retrieve the correct UniProt IDs. Among pupylated and non-pupylated proteins of the aforementioned taxons, sequences <150 aa and with >2 predicted TMH were excluded from further analyses (all proteins from the UniProt reference proteome of given taxon were considered as non-pupylated except those reported in the PupDB). The remaining proteins were screened for the presence of a lysine desert.

Analysis of lysine deserts in phages' proteomes
Proteomes of Mycobacterium, Escherichia and Bacillus phages were downloaded from the UniProt database using the mycobacterium AND (taxonomy_id:10239), escherichia AND (taxonomy_id:10239), and bacillus AND (taxonomy_id:10239) queries, respectively (data obtained on 31.07.2022). Proteomes marked as outliers as well as those with <40 sequences were excluded from further analyses (see the summary of proteomes selected for analysis in Table S1). Next, for each phages' group separately, all sequences, excluding those <150 aa, were concatenated into one fasta file and clustered using the cd-hit web server 75 (with the default parameters, identity cutoff =0.9) to create non-redundant pan proteome (summary of pan proteomes' properties is presented in Table 2). Sequences with a lysine desert of declared type were counted in each pan proteome and a fraction of sequences with a given lysine desert was reported.   (Table 4).  Aim. The algorithm intends to find the longest, uninterrupted lysine-less regions among solvent-exposed residues that remain in contact. Therefore, buried residues may also break the continuity of the structural lysine desert, as there may be no exposed neighbors to spread to (cases visible in Fig 3G when the entire protein is lysine-less yet the structural lysine desert does not equal 100%).
Algorithm. The complete code and documentation, including the algorithm's visualization, are available at https://github.com/n-szulc/lysine_deserts. Briefly, all contacts are analyzed for each residue that is not a lysine and is solvent-exposed (RSA cut-off value >0.2). If there is no solvent-exposed lysine among these contacts, the exposed contacts are added to the temporary list containing residues of the structural lysine desert; otherwise, the algorithm stops. Next, all contacts of the aforementioned contacts are analyzed. If there is no solvent-exposed lysine among them, the exposed ones are added to the temporary list containing residues of the structural lysine desert. The algorithm stops if such exposed ones are absent or a solventexposed lysine occurs. The algorithm repeats and saves the three most extended structural lysine deserts.

Remarks.
When iterating over contacts, if a residue has no calculated SASA or RSA value due to the DSSP error, it is omitted, and the algorithm proceeds to the next one, except for lysine.
In such a case, lysine is always considered solvent-exposed.

Analysis of the correlation of length of lysine desert and protein half-life
The protein half-lives datasets were obtained from recent high-throughput proteomic studies 56,57 .  Table 5.

Over-representation analysis
The GO-based over-representation analyses of human lysine desert min. 150 aa/min. 50% proteins, regardless of and along with lysine clusters, were performed using the WebGestalt web server with default parameters 79 ; our filtered human proteome served as background. The false discovery rate (FDR) was controlled to 0.05 using the Benjamini-Hochberg method for multiple testing. The results were visualized as treemaps using the modified R scripts generated by the REVIGO web server 80 (species specified as Homo sapiens, the rest of parameters was set as default).

Plasmid construction
The sequence and ligation independent cloning (SLIC) method 81 , was used to construct HiBiT vectors, respectively, and the linearized pNLF1-N and pNLF1-C vectors. The list of primers' sequences used to generate the constructs is available in Table 6. antimycotic antibiotic (15240062, Gibco) at 37°C, 5% CO2 in a humidified incubator.

NanoBRET ubiquitination assay
Cells preparation. HEK293 cells were seeded in 6-well plates at 800.000 cells per well. After 6-8 hours, cells were transiently transfected with 1 or 2 µg HaloTag-Ubiquitin (N2721, Promega) or the control vector pHTN HaloTag CMV-neo (G7721, Promega) and 0.01 or 0.02 µg NanoLuc-tagged SOCS1 or VHL expression constructs, respectively (acceptor to donor ratio was maintained at the ratio of 100:1). Transfection was carried out using the FuGENE Cell viability assay. To assess cell viability during the CHX assays, they were multiplexed with the CellTiter-Fluor Cell Viability Assay (G6080, Promega) following the manufacturer's guidelines. Briefly, after 3-hour incubation with CHX, 20 µl of the 5x concentrated CellTiter-Fluor Reagent was added to all wells. Cells were incubated at 37°C for 1 hour, and fluorescence was measured using the Tecan Infinity M1000 fluorescence plate reader equipped with the Magellan Pro software with the parameters setup of 390nmEx/505nmEm. The untransfected cells were used as the global viability reference. Oligonucleotides Primers for VHL and SOCS1 cloning and mutagenesis, see Table 6 This work N/A

Recombinant DNA
HiBiT-SOCS1 WT Azenta Life Sciences N/A HiBiT-and NanoLuc-tagged constructs, see Table 6 This work N/A  2). Results of the CHX and NanoBret assays were analyzed and visualized in the GraphPad Prism 9 (v. 9.5.0). All other plots were generated using the matplotlib (v. 3.5.1) and seaborn (v. 0.11.2) python modules. Graphics were created with BioRender.com.

Data availability
The code required for performing all the described analyses, datasets generated during this study, and raw luminescence and fluorescence measurements from the CHX and NanoBret assays can be found in the repository at https://github.com/n-szulc/lysine_deserts 52 .