ABSTRACT
Genetic suppression occurs when the deleterious effects of a primary “query” mutation are rescued by a suppressor mutation elsewhere in the genome. To capture existing knowledge on suppression interactions between human genes, we examined 2,400 published papers for potential interactions identified through either genetic modification of cultured human cells or through association studies in patients. The resulting network encompassed 476 unique suppression interactions that often linked genes that function in the same biological process. The suppressor genes were strongly enriched for genes with a role in stress response or signaling, suggesting that deleterious mutations can often be buffered by modulating signaling cascades or immune responses. Suppressor mutations were frequently deleterious when they occurred in absence of the query mutation, in apparent contrast with their protective role in the presence of the query. We formulated and quantified mechanisms of genetic suppression that could explain 71% of interactions and provided mechanistic insight into disease pathology. Finally, we used these observations to predict suppressor genes among all genes in the human genome. The emerging frequency of suppression interactions and range of underlying mechanisms suggest that compensatory mutations may exist for the majority of genetic diseases.
INTRODUCTION
Despite our progress in sequencing genomes, translating the variants detected in an individual into knowledge about disease risk or severity remains challenging. The relationship between genotype and phenotype is complex because genes and their products function as components of dynamic networks, with each gene or protein linked to many others through genetic and physical interactions. Modifying mutations in such interaction partners can either increase the severity of a genetic trait, or can have a protective effect and compensate for the deleterious effects of a particular mutation, a phenomenon referred to as genetic suppression (Genin et al, 2008; Harper et al, 2015). Genetic suppression is of particular interest for human disease, as suppressors of disease alleles highlight biological mechanisms of compensation, thereby potentially uncovering new therapeutic strategies. For example, a genome-wide association study discovered a loss-of-function variant in BCL11A, encoding a transcriptional repressor of fetal hemoglobin subunit ψ, as protective against severe β-thalassemia (Uda et al, 2008). When expressed in adults, the ψ-subunit of hemoglobin can replace the β-subunit, which is mutated in β-thalassemia patients, a finding that led to the development of gene therapies targeting BCL11A (Frangoul et al, 2021). Despite its success, this approach for discovering protective modifiers cannot be universally applied, as most monogenic diseases and/or protective variants are too rare for such systematic association studies (Genin et al, 2008). Alternative methods to identify suppressor genes are thus needed.
The systematic mapping of large numbers of suppressor mutations can highlight properties of suppression interactions that can be used to find or predict suppressors in other contexts (Van Leeuwen et al, 2020). To date, such systematic analyses have only been performed in inbred model organisms. The use of model organisms enables the rigorous assessment of the effects of combining mutations in an otherwise isogenic background. These systematic suppression studies have led to the discovery of specific mechanistic classes of suppression (Hodgkin, 2005; Prelich, 1999; Van Leeuwen et al, 2016). Suppression interactions can be intragenic, occurring between two mutations within the same gene, or extragenic, involving mutations in different genes (Hodgkin, 2005; Lehner, 2011; Prelich, 1999). In bacteria, fungi, fly, and worm, most extragenic suppression interactions occur between genes that are annotated to the same biological process (Fievet et al, 2013; Harcombe et al, 2009; Jorgensen & Mango, 2002; Manson, 2000; Szamecz et al, 2014; Van Leeuwen et al, 2016). Extragenic suppression of partial loss-of-function alleles can also occur through general mechanisms of suppression, which are often allele-specific and affect the translation of the deleterious mutation, the expression of the affected gene, or the stability of its gene-product (Hodgkin, 2005; Prelich, 1999; Van Leeuwen et al, 2017). Together, these mechanisms of suppression explain ∼70% of all described suppression interactions in the budding yeast Saccharomyces cerevisiae (Van Leeuwen et al, 2017) and have been used to predict suppressors among hundreds of genes on aneuploid chromosomes (Van Leeuwen et al, 2020).
Aside from the mechanisms of suppression that have been described in model organisms, additional biological mechanisms of compensation may exist in humans that could be of relevance for understanding variation in disease severity or penetrance. Here, we systematically analyzed suppression interactions among human genes to define general principles of suppression specific to humans. A thorough understanding of suppression mechanisms and properties may guide the discovery or prediction of protective alleles for rare genetic diseases, which could direct the rational design of new therapeutics.
RESULTS
A network of literature-curated suppression interactions
To capture existing suppression interactions among human genes, we examined 2,400 published papers for potential interactions (Data S1). Papers were derived from the BioGRID (Oughtred et al, 2021), OMIM (Amberger et al, 2015), specific PubMed searches (see Methods), and references found within the examined papers. We considered suppression interactions from two types of studies. First, we included interactions identified through genetic modifications in cultured human cells. Two genes were considered to have a suppression interaction when the genetic perturbation of a “query” gene led to reduced survival, decreased proliferation, or was otherwise associated with decreased cellular health, which was rescued by mutation of a different gene (the “suppressor” gene).
Second, we included interactions found through association studies in patients. Two genes were considered to have a suppression interaction when the disease risk or severity associated with a particular allele of a query gene was reduced in the presence of a minor allele of a suppressor gene. We excluded cancer patients in our analysis, as cancer is a disease of increased cell proliferation and thus mechanistically quite different from diseases caused by decreased cellular health. For genome-wide association studies (GWAS), we generally considered the gene that was closest to the SNP with the most significant association to a protective effect to be the suppressor gene. While the gene that is closest to a GWAS peak is not always the causal gene, it is in about 70-80% of the cases (Backman et al, 2021; Nasser et al, 2021; Pietzner et al, 2021; Wu et al, 2017). When data was provided supporting that another gene was causal, we based our suppressor annotation on this additional evidence (see Methods). For both cell-derived and patient-derived interactions, we excluded suppression interactions that were intragenic, occurred between more than two genes, or involved the major allele of either the query or the suppressor gene from the final dataset.
In total, we collected 932 suppression interactions from 466 papers. From each interaction, we annotated the system in which the interaction was identified (cultured cells or patients), the query and suppressor mutations and whether these had a loss- or gain-of-function effect, the used cell line or affected tissue, the relative effect size of the suppression, whether any drugs were used, and the disease. After removing duplicate interactions that had been described multiple times, the resulting network encompassed 484 different genes and 476 unique suppression interactions (Fig. 1A). Four interactions were identified in both directions, meaning that both suppressor and query mutations were deleterious, but combination of the two gene mutants could restore fitness (Data S2). In total, 302 unique interactions were identified in cultured cells and 180 in patients (Fig. 1B). Although we observed significant overlap between these two subnetworks (6 shared interactions, p<0.0005, Fisher’s exact test), 99% of interactions were reported in only one type of study (either in cultured cells or in patients).
About half of the query genes (46%) were suppressed by multiple suppressor genes, with eight query genes (BBS4, BRCA1, BRCA2, CFTR, HBB, HTT, PARP1, and PARP3) interacting with more than 10 suppressor genes (Fig. 1C). Especially for CFTR (127) and HBB (69), high numbers of suppressor genes have been described, likely because mutations in these genes lead to relatively common Mendelian disorders resulting in the availability of rather large numbers of patients to study. In contrast to the rather high average interaction degree observed for query genes, the vast majority of suppressor genes (92%) suppressed a single query gene (Fig. 1C, Data S2). The most common suppressor gene, TP53, suppressed 10 query genes. The encoded protein, p53, induces cell cycle arrest and apoptosis in response to various stresses (Hafner et al, 2019) and the suppressed query genes are functionally diverse with roles in transcription (TP63), DNA repair (FANCA, FANCD2, FANCG), protein degradation (CUL3, UBE2M, KCTD10), ribosome maturation (SBDS), and p53 regulation (MDM2, MDM4). Although loss of p53 can cause uncontrolled cell proliferation and tumor formation, heterozygous mutation of TP53 can be beneficial under conditions that would otherwise lead to excessive cell death. For example, mutation of a single copy of TP53 can protect against severe bone marrow failure in patients with Shwachman-Diamond syndrome (Kennedy et al, 2021). Thus, although suppressor genes tended to be specific to a single query gene, mutations in TP53 could suppress a diverse set of query genes.
Suppressor genes are essential for optimal (cellular) health
Consistent with their requirement for maintaining (cellular) health, query genes were significantly more likely to be intolerant to loss-of-function mutation in the human population, had a more deleterious effect on the proliferation of cultured human cells when inactivated, were more likely to be required for viability across a panel of human cell lines, and tended to be conserved in a higher number of species than other genes in the human genome (Fig. 2A-D, S1A-D). In apparent contrast with their role in ameliorating phenotypes in the presence of the query mutation, suppressor genes were also significantly depleted for deleterious mutations in the human population, were generally required for optimal proliferation of cultured cells, and tended to be highly conserved across species (Fig. 2A-D, S1A-D). Furthermore, mutations in suppressor genes were often associated with diseases themselves (Fig. 2E, S1E). The deleteriousness of query and suppressor mutations was weakly correlated (Fig. S1F). These results suggest that the beneficial effects of suppressor mutations may only be apparent in the presence of the query mutation. Alternatively, because these analyses look at the effect of deleterious mutations in the suppressor gene, the variants that cause the suppression phenotype may not lead to loss-of-function of the suppressor. To investigate the latter possibility, we considered gain-of-function or loss-of-function suppressor mutations separately (Fig. 2F, S1G). We did not observe significant differences in loss-of-function intolerance between genes carrying gain-of-function or loss-of-function suppressor mutations (Fig. S1H). Furthermore, when focusing solely on suppressor genes that were identified using knockout experiments in cell culture, 86% of these genes were needed for optimal cellular proliferation. These results suggest that the loss-of-function intolerance of suppressor genes is not driven by gain-of-function suppressor mutations. Suppressor mutations thus appear to be frequently detrimental in the absence of the query mutation.
Overlap with other interaction networks
The suppression interactions overlapped significantly with protein-protein interactions and with positive genetic interactions (Fig. S2) (Oughtred et al, 2021). The overlap with positive genetic interactions is expected, as suppression interactions are an extreme type of positive interaction. Significant overlap was also observed with negative genetic interactions (Fig. S2) (Oughtred et al, 2021). This overlap reflects that mutations in a gene may lead to either loss-of-function or gain-of-function effects, which may display opposite types of genetic interactions. Most of the suppression interactions that overlapped with the protein-protein or genetic interaction networks were observed in cultured cells (Fig. S2). This observation probably has two underlying causes: (i) both the protein-protein and genetic interaction datasets consist solely of interactions that have been mapped in cells, and (ii) researchers investigating genetic interactions in cultured cells may be more likely to test the interacting genes for protein-protein interactions than clinical researchers working with patient data. Despite the overlap with other interaction networks, the vast majority of suppression interactions (79%) are specific to the suppression network and thus highlight novel functional connections between genes.
Suppression interactions within and across cellular processes
Consistent with other organisms (Fievet et al, 2013; Harcombe et al, 2009; Van Leeuwen et al, 2017; Van Leeuwen et al, 2016), suppression interactions often occurred between functionally related genes, such that a query mutant tended to be suppressed by another gene annotated to the same biological process (Fig. 3A, S3A). Genes connected by suppression interactions also tended to be co-expressed and encode proteins that function in the same subcellular compartment and/or belong to the same pathway or protein complex (Fig. 3B). The extent of functional relatedness between suppression gene pairs did not depend on the conditions under which the interaction was identified (e.g., in the presence of a specific drug), whether the interaction was discovered in patients or in cultured cells, the number of times a particular interaction had been described, the relative effect size of the suppression, or whether the mutations had a gain- or loss-of-function effect (Fig. S3B). Furthermore, excluding data for CFTR and HBB, the two query genes with the highest number of suppression interactions, did not affect the observed enrichments (Fig. S3C). When multiple suppressors had been described for a query gene across independent studies, the suppressor genes also tended to be co-expressed and encode proteins that function in the same pathway or protein complex and/or that localize to the same subcellular compartment (Fig. 3C).
Despite their tendency to connect functionally related genes, suppression interactions also linked different biological processes. Genes with a role in signaling or the response to stress suppressed defects associated with mutation of genes involved in many different biological processes. This central role for signaling and stress response in the suppression network was observed both for interactions identified in patients and for those found in cultured cells (Fig. S3D). The suppressor genes in this category often played a role in protein phosphorylation and kinase cascades (57%) and/or in apoptosis or its regulation (46%). Moreover, in patients with infectious or inflammatory diseases, such as multiple sclerosis or HIV, the suppressor genes frequently encoded members of the major histocompatibility complex family that play a critical role in the immune system (Gregersen et al, 2006; Martin et al, 2018).
Genes involved in chromatin organization or transcription were also strongly overrepresented as suppressors, mainly in interactions identified in cultured cells (Fig. 3A and S3D). These interactions reflect a mechanism whereby modified expression of genes encoding members of the same pathway as the query gene can compensate for the altered activity of the query. For example, the deleterious effect of loss of BRCA2, which encodes a protein with a role in double-strand DNA break repair via homologous recombination, can be rescued by silencing transcriptional repressor E2F7 (Clements et al, 2018). E2F7 inhibits expression of several genes with a role in recombination or double-strand break repair, including CHEK1, DMC1, GEN1, and MND1, that when expressed can potentially compensate for the absence of BRCA2. In total, we found that ∼44% of suppressor genes that encode characterized transcription factors affect expression of query pathway members (see Methods).
Mechanistic categories of suppression interactions
We classified the 476 unique suppression interactions into distinct mechanistic categories on the basis of the functional relationship between the query and suppressor genes. Many of the reported queries (27%) were suppressed by mutations in functionally related genes (“Functional mechanisms”, Fig. 4A,C). These include 72 interactions in which both the query and the suppressor genes encode members of the same protein complex (“Same complex”, 21 interactions) or pathway (“Same pathway”, 51 interactions). Thirty-six interactions involved suppression by a different, but related, pathway or complex (“Alternative pathway”). In this scenario, the deleterious phenotype caused by absence of a specific cellular function required for normal (cellular) health is suppressed when an alternative pathway is rewired to re-create the missing activity. Finally, 21 gene pairs were annotated to the same biological process but pathway or complex annotation data were not available for one or both genes (“Unknown functional connection”). In addition to suppression interactions between functionally related genes, more general, pleiotropic classes of suppressors exist that affect degradation of the mutated query protein or mRNA, gene expression, or signaling and stress response pathways (“General mechanisms”, Fig. 4A,D). Together, these general mechanisms of suppression explain 44% of interactions, with half of these (22%) involving altered signaling or stress response processes. In total, 71% of interactions could be assigned to a mechanistic class.
When comparing suppression interactions described among human genes to those identified using a similar literature curation approach in the budding yeast Saccharomyces cerevisiae (Van Leeuwen et al, 2016), there were significant differences in the distribution of the interactions across mechanistic classes (Fig. 4A). Notably, whereas 55% of suppression interactions in yeast occurred between genes with a functional connection, only 27% of the human suppression gene pairs were functionally related (p<0.0005 comparing yeast to human, Fisher’s exact test). Although the yeast genome is more extensively functionally annotated, this is unlikely to be the cause of this difference, as nearly all genes considered here have a biological process annotation (Data S2) and the percentage of unclassified gene pairs is similar between the two datasets (26% for yeast, 29% for human, p=0.22, Fisher’s exact test). In contrast, the percentage of gene pairs involving a general suppression mechanism, in particular suppression by modifying the stress response or signaling pathways, was significantly less frequent among yeast gene pairs compared to human suppression interactions (19% for yeast, 44% for human, p<0.0005, Fisher’s exact test).
The observed differences between yeast and human could be due to differences in the methods used to identify suppression interactions. Yeast suppressor isolation experiments generally rely on genetically engineered query mutant alleles, such as gene deletion alleles or temperature sensitive point mutants, and defined laboratory environments, whereas interactions detected in patients occur between natural variants in an uncontrolled setting. Because interactions that were discovered in cultured human cells also often involved genome modification and controlled laboratory environments, we investigated the distribution across mechanistic classes separately for interactions identified in cultured cells and those found in patients. The distribution across mechanistic classes between the two sets of human suppression interactions was largely similar (Fig. 4B). Although interactions found in patients more often involved suppression by altering signaling or stress response processes than those in cultured cells (p<0.0005, Fisher’s exact test), the percentage of interactions involving suppression by signaling or stress response genes was still significantly higher in cultured human cells than in yeast cells (p<0.0005, Fisher’s exact test). Moreover, the fraction of gene pairs with a functional connection was lower in cultured cells compared to patients, in contrast to the high percentage of functionally related pairs seen for yeast (Fig. 4A,B). Thus, experimental factors do not appear to be the main cause of the observed differences in frequency of suppression mechanisms between yeast and human.
Suppressors provide mechanistic insight into disease pathology
Combining data from multiple suppression studies can reveal the general significance of particular protein classes in attenuating disease phenotypes. As mentioned above, a relatively high number of suppressor genes have been identified for HBB and CFTR, which are mutated in sickle cell disease/β-thalassemia and cystic fibrosis patients, respectively (Fig. 1C). To investigate the molecular mechanisms driving suppression of these two query genes, we examined the 69 HBB and 127 CFTR suppressors in more detail, using our mechanistic suppressor classification (Fig. 4). Our systematic analysis highlighted both similarities and differences in disease pathology (Fig. 5). Attenuating cytokine signaling could for example reduce symptoms of both cystic fibrosis and sickle cell disease, highlighting the importance of inflammation in both diseases (Fig. 5) (Conran & Belcher, 2018; Roesch et al, 2018). However, whereas HBB suppressors occurred frequently in genes with a functional connection to HBB, CFTR suppressors tended to function through more general mechanisms of suppression (Fig. 5C). The most commonly found suppressors of HBB, encoding the β-subunit of hemoglobin, encode either other hemoglobin subunits (i.e. HBA1/2, HBG2) or their transcriptional regulators (i.e. BCL11A, MYB) (Fig. 5A,C). These hemoglobin subunits can either functionally replace the mutated HBB or balance the ratio of hemoglobin subunits, thereby increasing the relative amount of functional hemoglobin (Steinberg & Sebastiani, 2012). Thus, suppressors of complete loss-of-function mutations in HBB function through circumventing the need for HBB. In contrast, suppressors of CFTR mutants tend to restore CFTR function (Fig. 5B,C). CFTR encodes an ion channel located on the plasma membrane of epithelial cells where it regulates the flow of chloride and bicarbonate ions in and out of the cell. The F508del mutation, an inframe-deletion that removes the phenylalanine residue at position 508, occurs in ∼90% of cystic fibrosis patients (Rowe et al, 2005). Although CFTR-F508del retains substantial function, it is recognized by the ER quality control machinery as misfolded and is prematurely degraded (Ward et al, 1995). Changes in CFTR transcription or translation, chaperone levels, activity of the protein degradation machinery, or efficiency of ER to plasma membrane trafficking can (partially) restore expression of the mutant CFTR protein at the plasma membrane and explain 53% of the CFTR suppression interactions. These examples highlight how integrating data from tens to hundreds of papers can provide insight on the general mechanisms through which suppression of particular disease mutations can occur.
Query-suppressor gene pairs are often co-mutated in tumor cells
Cancer cells generally have increased genome instability and reduced DNA repair, leading to the accumulation of hundreds to thousands of mutations, the majority of which are considered passenger mutations that do not favor tumor growth (Kumar et al, 2020; Li et al, 2020). Because loss-of-function mutations in query genes tend to have a negative effect on cell proliferation, we suspected that damaging passenger mutations affecting query genes would be more likely to persist in a tumor if they were accompanied by mutations in the corresponding suppressor gene(s). To test this hypothesis, we first examined gene fitness data from genome-scale CRISPR-Cas9 gene knockout screens across 1,070 cancer cell lines from the Cancer Dependency Map (DepMap) project (Dempster et al, 2021). We found that knock-out of the query genes led to more variable effects on cell proliferation than knock-out of other genes with a comparable mean fitness defect (Fig. S4A,B). This suggests that the deleterious consequences of loss of the query gene are buffered in some cell lines but not in others, potentially due to differences in the presence of suppressor variants. To further explore this possibility, we looked at the presence of damaging mutations in query and suppressor genes across 1,758 cancer cell lines (Ghandi et al, 2019). We found that damaging mutations in the query gene were more frequently accompanied by mutations in the corresponding suppressor genes than expected by chance (Fig. 6A, S4C). Furthermore, we examined the co-occurrence of mutations in tumor samples collected from 69,223 patients across 213 different studies (Cerami et al, 2012). Also in these patient samples, impactful mutations in query genes frequently co-occurred with mutations in the corresponding suppressor genes (Fig. 6B, S4D-F). These results suggest that the genetic interactions that lead to improved health of patients with a genetic disease or increased proliferation of cultured cells also provide a selective advantage to tumor cells carrying mutations in these query genes.
Predicting suppressor genes
Given the strong functional connection frequently observed between interacting query and suppressor genes (Fig. 3), we developed models that use these signatures to identify suppressor genes for a given query gene of interest (see Methods). First, we adapted a model we developed previously to predict suppressor genes in yeast (Van Leeuwen et al, 2020) to predict suppressors among human genes. In brief, this model scores and ranks potential suppressor genes by prioritizing close functional connections to the query gene. In this functional prioritization model, shared complex or pathway membership weigh more heavily than more distant functional connections, such as co-localization or co-expression. For 27 query genes, at least one suppressor gene ranked among the top 100 of those predicted, with 15 suppressor genes ranking in the top 10 (Fig. S5). Consistent with the design of the model, 14 out of the 15 suppressors that were predicted with high accuracy encoded members of the same protein complex as the query gene.
Next, we aimed to further improve this model. We used a set of diverse features, including functional relationships (Fig. 3), other types of genetic and physical interactions (Fig. S2) and co-mutation in cancer cell lines (Fig. 6) to train a random forest classifier (see Methods). The model showed increased predictive power over the functional prioritization model, with 40 suppressor genes ranking among the top 100 of those predicted (Fig. 7). Only two suppressors would be expected to rank in the top 100 by random chance. In addition to predicting suppression interactions among genes with shared complex or pathway membership, the random forest model also accurately predicted 13 interactions involving genes with more distal functional relationships or involving general suppression mechanisms. For example, the query-suppressor pair TERC-TERT was correctly predicted based on protein-protein interaction and co-localization profiles of the gene pair. These results show that for ∼45% of query genes, the strong functional relationship that is generally observed for query-suppressor gene pairs can be used to narrow the search space for potential suppressor genes from thousands to less than a hundred genes.
DISCUSSION
We collected 932 suppression interactions from the biomedical literature and used this dataset to define general properties and mechanistic classes of suppression. We found that suppression interactions often linked functionally related genes. General compensation mechanisms were also frequent and tended to affect gene expression or stress response signaling. Furthermore, using CFTR and HBB as examples, we showed that systematic analysis of suppression interactions can highlight differences in disease pathology.
We discovered that in the absence of a query mutation, suppressor variants are likely deleterious (Fig. 2). This suggests that at least some suppressor variants are presumably rare in natural populations and will therefore be difficult to detect using association studies. Nonetheless, suppressor variants may exist for the associated disease alleles and may be identified using alternative methods, such as in vitro studies. We have shown here that suppression interactions observed in patients had similar general properties as those found in cultured human cells. For example, both datasets displayed similar fractions of functionally related gene pairs or detrimental suppressor genes (Fig. 4B, S2, S3D). Furthermore, we found that there is a significant overlap between interactions detected in patients and those identified in cultured cells. Together, these observations suggest that cultured cells can be used to discover clinically relevant suppressor genes. Furthermore, these results also imply that despite problems with some association studies that may have lacked statistical power or failed to properly correct for population structure, genome-wide and hypothesis-driven association studies mostly map relevant modifier loci and genes.
Although many of the general properties we identified for human suppression interactions overlapped with those we previously observed in the budding yeast, there were several differences between the two species. Interactions discovered in yeast occurred more frequently between functionally related genes compared to human gene pairs and were depleted for general compensatory mechanisms, especially those involving signaling or stress response processes such as apoptosis or the immune response. Suppression in human cells or patients thus often involved more indirect mechanisms of suppression that were not available in unicellular organisms such as yeast. As both the yeast and human datasets are based on literature curated data that can come from specific hypothesis-driven experiments, the datasets may be biased. This bias could differ between the two datasets, due to diverse interests of communities studying different organisms. Nonetheless, we previously mapped a systematic, unbiased experimental suppression network in yeast and showed that the properties of the unbiased network were largely comparable to a literature curated network (Van Leeuwen et al, 2016). Using for example CRISPR-Cas9 knockout screens, a similar unbiased experimental suppression network could be mapped for human genes. Such a network could serve to validate the mechanisms of suppression identified here and, as 80% of described interactions were unique to the suppression network, is likely to reveal new functional connections between genes.
We used the various properties of suppression interactions to develop predictive models of suppression (Fig. 7, S5). Although these models can be used to predict suppressor genes for any query gene, the quality of the predictions will be dependent on the availability of functional data for the query and suppressor genes. Although the random forest model could also predict more distal functional and general suppression relationships, the majority of the predicted interactions (68%) still involved query-suppressor gene pairs that encode members of the same protein complex or pathway, whereas only 15% of all suppression interactions in our dataset occurred between members of the same complex or pathway. The availability of larger, unbiased suppression interaction networks would likely further improve the prediction of suppression interactions beyond same complex or pathway relationships.
Protective modifiers have been identified for most common “monogenic” genetic diseases, including sickle cell disease, β-thalassemia, cystic fibrosis, Huntington’s disease, Duchenne’s muscular dystrophy, and spinal muscular atrophy. For haemoglobinopathies and spinal muscular atrophy, the protective modifiers can completely reverse disease symptoms and have led to the development of effective therapies targeting the suppressor gene (Day et al, 2021; Esrick et al, 2021; Finkel et al, 2016; Frangoul et al, 2021). Given that suppressor variants have been detected for most common monogenic diseases and that suppressor variants can be isolated for >95% of point mutations in model organisms (our unpublished results), compensatory mutations may exist for nearly all disease alleles. Identification of such suppressor variants may reveal the molecular mechanisms underlying the disease and has the potential to pinpoint new avenues of therapeutic intervention.
METHODS
Literature curation
Papers describing potential suppression interactions were collected from multiple sources. First, the Homo sapiens “synthetic rescue” and “dosage rescue” datasets were downloaded from the BioGRID on April 11th, 2020 (version 3.5.184) (Oughtred et al, 2021). After removing interactions that did not occur between two human genes, this dataset consisted of 36 genetic interactions described in 21 publications. Second, on April 29th, 2020, the OMIM dataset (Amberger et al, 2015) was downloaded and filtered for entries containing the word “modifier”, which led to the identification of 36 papers potentially describing suppression interactions. Third, we performed PubMed searches for the terms “positive modifier”, “protective modifier”, “synthetic rescue”, “dosage rescue”, “genetic suppression”, and “modifier locus”. Finally, we included papers containing potential suppression interactions that were cited within the examined papers. In total, this resulted in a set of 2,400 papers for further curation (Data S1).
All 2,400 papers were read in detail by at least two people. We collected suppression interactions from two types of studies: (i) interactions identified through genetic modifications in cultured human cells and (ii) interactions found through association studies in patients with diseases other than cancer. Two genes were considered to have a suppression interaction when genetic perturbation of a “query” gene led to a disease, reduced survival, decreased cellular proliferation, or was otherwise associated with decreased (cellular) health, which was at least partially rescued by mutation of a “suppressor” gene.
In total, 469 papers were found to describe suppression interactions. From each interaction, we annotated the type of study in which the interaction was identified (cell culture or patients), the query and suppressor genes and mutations and whether these had a loss- or gain-of-function effect, the used cell line or affected tissue, the relative effect size of the suppression, whether any drugs were used, and the disease. All gene names were updated according to the latest approved human gene nomenclature rules (Tweedie et al, 2021). For GWAS, we generally assigned the gene that was closest to the most significant protective SNP as the suppressor gene. However, when data was provided within the paper supporting that another gene was causal, we based our suppressor annotation on this additional evidence. In the case of suppression of HBB by SNPs in the intergenic HBS1L-MYB locus, we assigned all significant SNPs within this locus to MYB, which was identified as the causal gene (Galarneau et al, 2010; Jiang et al, 2006). Furthermore, all deletions in the HBA locus, for which it often was not specified whether HBA1 and/or HBA2 were deleted, were assigned to HBA2. The relative effect size of the suppression was classified as “small” if the deleterious phenotype was rescued by less than 50% or the suppressor explained less than 5% of the phenotypic variance in the population. Otherwise, the effect size was classified as “Large”.
Interactions identified in high-throughput screens yielding >50 suppression interactions were excluded (PubMed IDs 28319085, 32694731, 29891926, and 34764293), as due to their size, these studies would have a disproportionate influence on the complete dataset. For paper 29891926, three interactions that were validated individually were included in the dataset. Also suppression interactions that were intragenic, occurred between more than two genes, or involved the major allele of either the query or the suppressor gene were excluded from the final dataset. In total, the resulting network encompassed 484 different genes and 932 suppression interactions, of which 476 were unique interactions (Data S2).
Loss-of-function tolerance
The loss-of-function tolerance of query and suppressor genes was evaluated using multiple datasets (Fig. 2, S1). First, we used the probability of loss-of-function intolerance of genes that was previously determined based on the frequency of deleterious variants affecting the genes in the human population (gnomad v2.1.1.) (Karczewski et al, 2020). Second, we used the median effect of gene knockout on cell proliferation across a panel of 1,070 cell lines that was determined as the change in abundance of guide RNAs targeting a gene in pooled CRISPR-Cas9 screens (Dempster et al, 2021) (version 22Q1). Third, we looked at the fraction of genes that were required for optimal proliferation across the same set of human cell lines (“common essential genes”, version 22Q1) (Dempster et al, 2021). Fourth, we used PANTHER version 16.1 to detect the presence of gene orthologs across species (Mi et al, 2019). Finally, we considered the number of diseases that were associated with a gene in DisGeNET v7.0 (Pinero et al, 2020).
Analysis of gene function and functional relatedness
For analysis of suppression interactions within and across different biological processes (Fig. 3A, S3A,D), genes were manually assigned to broadly defined functional categories (Data S2). Highly pleiotropic or poorly characterized genes were excluded from the analysis. Also interactions involving query genes annotated to the “Protein folding & glycosylation” class were removed from consideration, as only one interaction fell into this category. G:Profiler version e106_eg53_p16_65fcd97 (Raudvere et al, 2019) was used to identify suppressor genes within the broader “Signaling & stress response” category that had a role in protein phosphorylation (GO:0006468) or apoptosis (GO:0006915).
We used systematic, genome-wide datasets describing protein localization, GO term annotation, co-expression, protein complex membership, and pathway membership to evaluate the functional relatedness between query-suppressor gene pairs (Fig. 3B-C, S3B-C). In each case, only gene pairs for which functional data was available for both the query and the suppressor gene were considered. Protein localization was determined based on immunofluorescence staining data available in The Human Protein Atlas version 21.1 (Thul et al, 2017). Two proteins were considered to co-localize if they were found in at least one shared cellular compartment. GO co-annotation was calculated based on biological process terms with less than 500 annotated genes. Co-expression data was derived from SEEK (Zhu et al, 2015) as explained previously (Luck et al, 2020). Proteins that were annotated to the same protein complex in either CORUM 4.0 (Giurgiu et al, 2019) or BioPlex 3.0 (Huttlin et al, 2021) were considered as co-complexed. Proteins in distinct non-overlapping protein complexes were considered not co-complexed. The same approach was used to define the co-pathway membership using Reactome data (downloaded January 2020) (Gillespie et al, 2022). For each of these datasets, we calculated the overlap with the suppression interactions. The expected overlap by chance was calculated by considering all possible pairs between a background set of queries and suppressors. The background set of queries consisted of genes found as queries in the suppression network. As a background set of suppressors, we considered all genes in the genome. Pairs with a suppression interaction were removed from the background set. For a given functional standard, we defined as fold enrichment the ratio between the overlap of suppression gene pairs and the overlap of the background set with that standard. Significance of the overlap was assessed by Fisher’s exact tests.
Overlap with other types of interactions
We compared our suppression interaction network to three different interaction datasets collected from the BioGRID (Oughtred et al, 2021): physical interactions, negative genetic interactions, and positive genetic interactions (Fig. S2). For the genetic interaction datasets, we removed papers from the BioGRID data that were used for the suppression interaction literature curation. Overlap of the interaction networks with our suppression interaction dataset were calculated as explained above (see “Analysis of gene function and functional relatedness”).
To investigate whether suppressor genes with a role in transcription or chromatin organization affected expression of members of the same pathway as the query gene, we used transcription factor (TF) target gene information from g:Profiler e106_eg53_p16_65fcd97 (Raudvere et al, 2019) and MotifMap (Daily et al, 2011) and pathway annotations from Reactome (Gillespie et al, 2022). To exclude non-specific annotations, we only considered pathways and g:Profiler TF-target lists with less than 100 members. MotifMap and g:Profiler gave comparable results, with respectively 46% and 42% of the suppressor genes that encode transcription factors with known targets affecting expression of corresponding query pathway members.
Mechanistic classes
The 476 unique suppression interactions were assigned to distinct mechanistic classes. Gene pairs that had the same biological process annotation (Data S2) or gene pairs that were not annotated to the same biological process but encoded members of the same complex or pathway (see “Analysis of gene function and functional relatedness” for details on the used datasets) were considered to be functionally related. These functionally related gene pairs were further subdivided into subclasses. First, gene pairs that encoded subunits of the same protein complex were assigned to the “Same complex” subclass. Second, gene pairs that encoded members of the same pathway were assigned to the “Same pathway” subclass. Third, gene pairs that shared a biological process annotation but functioned in different pathways were assigned to the “Alternative pathway” subclass. Finally, all other functionally related gene pairs were assigned to the “Unknown functional relation” subclass.
Gene pairs that did not have a functional relationship were further subdivided based on the function of the suppressor gene. Suppressor genes that were annotated to the biological processes “Transcription & chromatin organization”, “Translation & RNA processing”, “Protein degradation”, and “Signaling & stress response” were assigned to the corresponding subclasses. The remaining gene pairs were assigned to the “Other” class.
Co-occurrence of mutations in cancer models and patients
To determine whether the effect of knockout of a given gene on cellular fitness strongly depended on the genetic background, we examined fitness data from genome-scale CRISPR-Cas9 gene knockout screens across 1,070 cancer cell lines from the DepMap project (Dempster et al, 2021). Because the variance in gene knockout fitness varied depending on the average fitness of the gene knockout across cell lines, we fitted a quadratic model to the fitness data and used it to determine whether a given gene had a higher fitness variance across cell lines than expected (Fig. S4A,B).
To evaluate the frequency of co-occurrence of mutations in query-suppressor gene pairs (Fig. 6, S4C-F), we used data from two sources. First, we used cell line mutation data from the Cancer Cell Line Encyclopedia from DepMap (Ghandi et al, 2019). We considered only “damaging mutations” as defined by DepMap (Ghandi et al, 2019). Second, we examined the co-occurrence of mutations in tumor samples collected from 69,223 patients across a curated set of 213 non-overlapping studies on cBioPortal (Cerami et al, 2012). We either used all variants that were detected in these samples, or we excluded variants of unknown significance as defined by cBioPortal (Cerami et al, 2012). We then calculated how often the query and corresponding suppressor genes were co-mutated compared to a background set of gene pairs, as explained above (see “Analysis of gene function and functional relatedness”), with the exception that for CBioPortal analysis, we used genes found as suppressor genes in the suppression interaction dataset of interest as background set. Excluding data for CFTR and HBB, the two query genes with the highest number of suppression interactions, did not significantly affect the results.
Predicting suppressor genes
We predicted potential suppressor genes by ranking all genes in the human genome by their functional relationship to the query gene (Fig. 7, S5). We used two different models to do this. First, based on a suppressor-prediction algorithm we previously developed for yeast (Van Leeuwen et al, 2020), we evaluated the following functional relationships in this order of priority: co-complex (highest priority), co-pathway, co-expression, and co-localization (lowest priority). Thus, genes with co-complex relationships were ranked above those with only co-pathway relationships. Additionally, the order between genes within a given set was established by evaluating the rest of the functional relationships. For instance, the set of genes that were co-expressed with the query gene, but did not encode members of the same complex or pathway, was further ranked by whether the encoded protein co-localized (highest rank) or not (lowest rank) with the query protein. Second, we used the same four functional datasets, genetic interactions, protein-protein interactions, and co-mutation data in cancer cell lines to train a random forest classifier using the R package “randomForest” (Liaw & Wiener, 2002).Performance of the predictor was evaluated with out-of-bag samples. See the previous sections for details on the used datasets.
ACKNOWLEDGEMENTS
This work was supported by an Eccellenza grant from the Swiss National Science Foundation (PCEGP3_181242) (J.v.L) and a Ramon y Cajal fellowship (RYC-2017-22959) (C.P.).