Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

SWATH-MS co-expression profiles reveal paralogue interference in protein complex evolution

Luzia Stalder, Amir Banaei-Esfahani, Rodolfo Ciuffa, Joshua L Payne, Ruedi Aebersold
doi: https://doi.org/10.1101/2020.09.08.287334
Luzia Stalder
1Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: luzia.stalder@unine.ch aebersold@imsb.biol.ethz.ch
Amir Banaei-Esfahani
1Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rodolfo Ciuffa
1Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joshua L Payne
2Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
3Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ruedi Aebersold
1Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
4Faculty of Science, University of Zurich, 8057 Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: luzia.stalder@unine.ch aebersold@imsb.biol.ethz.ch
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Understanding the conservation and evolution of protein complexes is of critical value to decode their function in physiological and pathological processes. One prominent proposal posits gene duplication as a potential mechanism for protein complex evolution. In this study we take advantage of large-scale proteome expression datasets to systematically investigate the role of paralogues, and specifically self-interacting paralogues, in shaping the evolutionary trajectories of protein complexes. First, we show that protein co-expression derived from quantitative proteomic matrices is a good indicator for complex membership and is conserved across species. Second, we suggest that paralogues are commonly strongly co-expressed and that for the subset of paralogues that show diverging co-expression patterns, the divergent co-expression patterns reflect both sequence and functional divergence. Finally, on this basis, we show that homomeric paralogues known to be part of protein complexes display a unique co-expression pattern distribution, with a subset of them being highly diverging. These findings support the idea that homomeric paralogues can avoid cross-interference by diversifying their expression patterns, and corroborates the role of this mechanism as a force shaping protein complex evolution and specialization.

Introduction

Protein complexes – described as stable protein assemblies that can be isolated by biochemical means – are one of the main modes of proteome organization and fundamental functional entities of the cell. The dramatic increase in structural knowledge of these complexes, as well as the advent of proteome-wide profiling methods (most prominently via mass spectrometry) have allowed the interrogation of general principles governing protein complex formation, function and evolution. For instance, bioinformatics approaches of the Protein Data Bank (PDB) and of large-scale proteomics datasets, respectively, have identified core symmetries that complexes obey, and the extent to which the co-regulation of their subunits is constrained1,2. An especially fertile area of investigation relates to the evolution of protein complexes.

It has been demonstrated that a possible mechanism for protein complex core formation starts with the duplication of self-interacting proteins (homomeric paralogues; Figure 1A)3,4. Importantly, recent research on the fate of paralogues pointed out that duplication of genes for obligate homomeric proteins can lead to interference between the resulting paralogues: mutations in only one of the paralogues can “poison” the oligomer and affect the ancestral function5. It has been suggested that this constrains the evolution of paralogues on the one hand, but also promotes additional regulatory complexity on the other5. Specifically, if sequence divergence of the homomeric paralogues impairs the ancestral function, the mutated paralogue acts as a highly specific competitive inhibitor for the ancestral protein. This is referred to as paralogue interference5. To prevent this type of interference, the paralogues will be under selection pressure to develop mechanisms that prevent their cross-interaction, for example by mutations that drive expressional separation. Importantly, if paralogues that are separated by expression give rise to protein complexes, they will contain either the ancestral or the duplicated paralogous pair, but not both at the same time. Expressional separation after duplication is a common pattern on the RNA level5. However, expressional separation has not been analyzed on the protein level in a general way and its role in protein complex evolution remains elusive5.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1. SWATH-MS co-expression profiles of protein complex modules are conserved across species.

(A) Schematic drawing illustrating possible evolutionary routes from homomer-derived paralogues into protein complexes.

(B) Study design. Protein co-expression profiles were acquired in SWATH-MS from 272 samples covering the proteome of four species. Spearman’s rank correlation (rs) was used to measure profile similarity.

(C) Conserved correlation networks of human and mouse. Nodes represent proteins, edges indicate high correlation in both human and mouse datasets. Red edges are annotated in CORUM. In panels on the left, a stringent correlation cutoff of rs > 0.8 was chosen, on the right a more relaxed cutoff of rs > 0.6 was used. For comparison, correlation values of human and mouse were quantile normalized.

Left: Proteins of the ATP-synthase complex (top) and proteins of the NADH dehydrogenase complex (bottom).

Right: Proteins involved in the tricarboxylic acid cycle (top) and proteins involved in the spliceosome (bottom).

(D) (I) Mouse Spearman correlation for mouse orthologues that are highly correlated in human (top 0.5 percentile, rs > 0.8, npairs = 1429) and for mouse orthologues that not correlate in human (bottom 0.5 percentile, npair = 1429). Pairs that highly correlate in human correlate also significantly higher in mouse (md 0.2 vs 0.0, Wilcoxon signed-rank test p-value < 0.001).

(II) Mouse Spearman correlation for pairs that are annotated as CORUM complex pairs in human (npairs = 1807) and for pairs that are not annotated in CORUM (npairs = 712803). Human CORUM pairs correlate significantly higher in mouse (md 0.35 vs 0.0, Wilcoxon signed-rank test p-value < 0.001).

(III) Human Spearman correlation for human orthologues that are highly correlated in mouse (top 0.5 percentile, rs > 0.8, npairs = 1429) and for human orthologues that not correlate in mouse (bottom 0.5 percentile, npairs = 1429). Pairs that highly correlate in mouse correlate also significantly higher in human (md 0.2 vs 0.0, Wilcoxon signed-rank test p-value < 0.001).

(IV) Human Spearman correlation for pairs that are annotated as CORUM complex pairs in mouse (npairs = 331) and for pairs that are not annotated in CORUM (npairs = 714279). Mouse CORUM pairs correlate significantly higher in human (md 0.3 vs 0.0, Wilcoxon signed-rank test p-value < 0.001).

A main limitation in studying protein complex evolution has been the difficulty of measuring protein complex states in a high-throughput manner. Here we address this by using data-independent acquisition and analysis of mass spectrometry data, particularly by the sequential windowed acquisition of all theoretical mass spectra (SWATH-MS)6. We study complex states at the proteome level in human, mouse, Drosophila melanogaster and Saccharomyces cerevisiae7–9. We demonstrate that protein co-expression profiles determined by SWATH-MS are substantially conserved across species and that they can be used to map the diversification of protein paralogues, as measured for instance by divergence in sequence or interactors. Critically, we find that protein paralogues that are known to be part of a protein complex are enriched in homomer-derived paralogues that separated in their expression levels. These are indeed the proteins shown to be affected by interference, and our finding therefore supports the notion that this mechanism can propel protein complex diversification and expression divergence. In conclusion, our study indicates that quantitative proteomic data can be used to infer protein complex relationships and identifies paralogue interference as a constraint of their evolution.

Results

SWATH-MS co-expression profiles recover protein complexes and are conserved across species

To analyze the role of paralogue diversification in the evolution of protein complexes, we first set out to define a suitable methodological framework. In recent years, a number of studies have taken advantage of the rapidly accumulating body of data on protein-protein interactions and protein expression to analyze constraints on, and evolution of, protein complexes1–4,10–13. Some of these studies have shown that the expression levels of protein complex subunits are generally covarying, and that, conversely, co-expression patterns can be used as a proxy for functional, interaction and complex relatedness. Here we used abundance profiles derived from SWATH-MS data across four species: human, mouse, Drosophila melanogaster and Saccharomyces cerevisiae7–9 (Figure 1B). The proteome dataset of each species contains 40 to 112 samples. Overall, between 1610 and 3171 proteins were consistently quantified per species dataset across samples (Table S1 and Figure S1). To examine covariance of protein pairs, we used Spearman‘s rank coefficient (rs). First, we aimed at showing that covariation patterns in our data can indeed preferentially recall known protein complexes and detail their conservation across species. We used manually curated catalogues of protein complexes as a benchmark, specifically CORUM14,15 for human and mouse complexes, DroID16 for Drosophila melanogaster complexes and CYC200817 for Saccharomyces cerevisiae complexes. The results showed that in the datasets of all four species complex members have significantly higher rs compared to those not annotated as members of the same complex (Figure S2, Wilcoxon sign-rank test p-values < 0.001, number of pairs human = 17453, mouse = 1739, Drosophila melanogaster = 2261, Saccharomyces cerevisiae = 1807). To further characterize conserved covariance profiles, we performed functional enrichment by DAVID18,19. We found that protein pairs with highly covarying abundance profiles in human and mouse (rs > 0.8) are enriched in mitochondrial processes. Interestingly, when we relaxed the cutoff to rs > 0.6 we noted that protein pairs functionally annotated with “splicing” and “signaling pathways regulating metabolism”, “proliferation”, “cell-cell adhesion” and “immune responses” were additionally enriched (Table S2, EASE score, a modified Fisher exact test p-value < 0.05). Figure 1C illustrates this point by showing that the correlation network with edges rs > 0.8 recovers the ATP synthase complex and the NADH dehydrogenase complex, whereas the correlation network with edges rs > 0.6 additionally recovers complexes in the tricarboxylic acid cycle and the spliceosome. Next, we asked whether protein pairs with highly correlated abundance in one species are also highly correlated in another species. For this, we selected the most and least correlating orthologues in one species (top and bottom 0.5 percentile) and tested whether these most correlating pairs also showed higher covariance than the least correlating pairs in a second species. We found that this was the case for all species combinations, indicating that covariance profiles are conserved across species (Figure 1D and Figure S3, Wilcoxon signed-rank test p-values < 0.001, number of mouse orthologue pairs that are highly correlated in human/ number of human orthologue pairs that are highly correlated in mouse = 1429). Consequently, proteins that belong to a protein complex in one species also correlate higher in a second species (Figure 1D and Figure S3, Wilcoxon signed-rank test p-values < 0.001, number of pairs in mouse that are annotated as CORUM complex pairs in human = 1807, number of pairs in human that are annotated as CORUM complex pairs in mouse = 331). Taken together, our analyses indicate that protein complex members exhibit coordinated expression, and that such coordinated expression is conserved across species.

SWATH-MS correlation profiles reflect evolutionary trajectories of paralogues

We next asked whether our framework is able to capture important principles driving the evolution of protein complexes. To this end, we focused on protein paralogues, because paralogue diversification has been proposed as a significant factor of complex evolution4,5. First, we wanted to verify that our protein abundance matrices recapitulated paralogue divergence over time, as well as diversification of protein interactions. We identified paralogues using Ensembl 92 (ref 27) and we classified them into paralogue families, whereby a family was defined as the genes emerged from a single ancestral gene by duplication (Figure S4 and Table S3, number of paralogue families human = 73, mouse = 114, Drosophila melanogaster = 3, Saccharomyces cerevisiae = 9, mean size of paralogue families human = 9, mouse = 9, Drosophila melanogaster = 11, Saccharomyces cerevisiae = 9). On a general scale, we found that paralogous proteins exhibit a stronger degree of covariance than non-paralogous proteins in all species examined (Figure 2A, Wilcoxon signed-rank test p-values < 0.001, number of paralogous pairs human = 1302, mouse = 1964, Drosophila melanogaster = 119 and Saccharomyces cerevisiae = 191). To assess whether paralogue covariance patterns recapitulated sequence diversification, we tested whether a higher frequency of differentiating mutations between paralogous pairs corresponded to a decrease in protein covariance. To do so, we quantified all pairwise correlations among paralogue family members and determined for each pair the rate of synonymous and non-synonymous amino acid sequence changes, i.e. the number of nucleotide changes among the two paralogues that affects, respectively not affects, the resulting codon sequence, relative to the paralogue length. As expected, the co-expression of paralogous pairs within a paralogue family was negatively associated with the rate of synonymous and non-synonymous nucleotide changes (Figure 2B left and center, respectively; due to limited numbers of observations we could not examine the Drosophila melanogaster and Saccharomyces cerevisiae dataset in this and subsequent analysis). Furthermore, the association of covariance with non-synonymous changes was stronger than with synonymous changes, in line with intuition that non-synonymous mutations have generally a greater phenotypic effect (Paired sample t-test p-values non-synonymous changes: human = 0.005 and mouse = 0.009, synonymous changes: human = 0.04 and mouse = 0.05; number of paralogue groups: human = 23 and mouse = 64). Finally, we reasoned that, if covariance is a good proxy for sequence diversification, paralogues with more strongly diverging interactomes – i.e. a smaller fraction of shared interactors – should also exhibit more strongly diverging expression patterns, as interactome rewiring is likely a consequence of sequence change. To test whether the paralogue covariance correlates to the diversity of protein interactions, we calculated for each paralogous pair the Jaccard index of interaction partners, defined as the intersection of the interaction partners divided by the union of the interaction partners of each pair. By these means, we found that lower covariance within a paralogue family was associated with more strongly diverging interactomes (Figure 2B right, Paired sample t-test p-values human = 0.008 and mouse = 0.09; number of paralogue groups in human = 35 and mouse = 7). Taken together, the data show that protein covariance recapitulates paralogue divergence over time as well as diversification of protein interactions that drives the evolution of new protein complexes.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2: Co-expression profiles of paralogues are conserved across species and reflect sequence divergence within the paralogue family. Negatively correlating homomer-derived paralogues are functional divergent and prominent in complexes.

(A) Spearman correlation of paralogous pairs in yeast, fly, mouse and human compared to all measured non-paralogous pairs (number of paralogous pairs human = 1302, mouse = 1964, fly = 119 and yeast = 191). Paralogues correlate significantly higher in all species (Wilcoxon signed-rank test p-values < 0.001, md 0.12 vs 0.03, md 0.25 vs 0.01, md 0.34 vs 0.08 and md 0.17 vs 0.04, respectively).

(B) Left: Spearman correlation coefficients of paralogous pairs within the top, respectively bottom third of non-synonymous sequence changes within their paralogue family. Families that have a range in Spearman correlation of < 0.4 and a range in non-synonymous changes of < 0.2 were excluded. Paralogous pairs with fewer non-synonymous changes tend to correlate higher (Paired sample t-test p-values human = 0.005 and mouse = 0.009; number of paralogue families in human = 23 and mouse = 64).

Middle: Spearman correlation coefficients of paralogous pairs within the top, respectively bottom third of synonymous sequence changes within their paralogue family. Families that have a range in Spearman correlation of < 0.4 and a range in synonymous changes of < 0.2 were excluded. Paralogous pairs with fewer synonymous changes tend to correlate higher (Paired sample t-test p-values human = 0.04 and mouse = 0.05, number of paralogue families in human = 28 and mouse = 78).

Right: Spearman correlation coefficients of paralogous pairs within the top, respectively bottom third of interactome divergence within their paralogue family. To quantify interactome divergence, the Jaccard index, defined as the intersection of the interaction partners divided by the union of the interaction partners of each pair was calculated. All interaction partners from Biogrid were considered. Families that have a range of the Jaccard index of < 0.2 and range in Spearman correlation of < 0.2 were excluded. Paralogous pairs with more similar interaction partners tend to correlate higher (Paired sample t-test p-values human = 0.008 and mouse = 0.09; number of paralogue families in human = 35 and mouse = 7).

(C) The overlap of the four most expressed tissues between paralogous pairs (as defined by HPM) is compared between strongly negative correlating homomer-derived and monomer-derived pairs paralogous pairs (rs < -0.5, human npairs = 21, mouse npairs = 36). Negatively correlating homomer-derived paralogues tend to be expressed more diversely across different tissues compared to monomer-derived paralogues.

(D) Left: Spearman correlation of homomer-derived paralogues that are annotated as CORUM complex pairs (npairs = 42) and for pairs that are not annotated in CORUM (npairs = 474). Homomer-derived paralogues are enriched in the bottom and top dodecile of the correlation distribution (Fisher’s exact test p-values 0.06 and 0.1).

Right: Spearman correlation of monomer-derived paralogues that are annotated as CORUM complex pairs (npairs = 51) and for pairs that are not annotated in CORUM (npairs = 569). Monomer-derived paralogues are enriched not in the bottom but in the top dodecile of the correlation distribution (Fisher’s exact test p-values 0.5 and 0.06).

*** indicate p-values ≤ 0.001, ** ≤ 0.01, * ≤ 0.01, + ≤ 0.1.

Negatively correlating SWATH-MS profiles from homomer-derived paralogues are functionally divergent and prominent in complex members

Since correlation of quantitative proteomics data can inform us about the divergence of evolutionary trajectories, we used it to assess, on a proteome wide scale, the notion of paralogue interference (Figure 2C), first on a whole proteome level and then focusing specifically on known protein complexes. We first reasoned that, if the divergence in protein abundance can serve as a mechanism to escape interference, then homomeric paralogues should be differentially abundant to a greater extent than monomer-derived paralogues. To test that, we retrieved homo-, hetero-and monomer annotations from InterEvols32 and classified paralogous pairs either as homomer-derived if at least one member was annotated as homomer, or as monomer-derived when none of the members was annotated as either homo-or heteromer. In line with our expectations, we found that among negatively correlating pairs, homomer-derived paralogues showed an enrichment factor of 6.9 in human and 1.8 in mouse, respectively, over monomer-derived paralogues (Figure S5; negative correlation rs < -0.7; number of human homomeric pairs = 540, human monomeric pairs = 620, mouse homomeric pairs = 615, mouse monomeric pairs = 1121, Fisher’s exact test p-values = 0.04 and 0.2, respectively). Of note, we also found that among all negatively correlating paralogous pairs, homomer-derived pairs were more likely to be of different abundance in tissues than monomer-derived paralogues, as defined by the tissue specific expression analysis of the Human Proteome Map20 (HPM) (Figure 2D and Figure S6, number of pairs human = 21 and mouse = 36). This indicates that homomeric paralogues are more prone to be affected by spatial separation of expression, and gives credence to the notion that this separation has evolved in response to protein interference.

Finally, we asked what impact the mechanism of paralogue interference has had on the diversification of protein complexes. If interference from homomeric paralogues has played any role in the organization of complexome diversity, then there must be a subset of complexes whose homomer-derived paralogue members exhibit highly divergent expression patterns; and, as a corollary, such divergence should not be observed in the case of monomer-derived paralogues. We therefore plotted the distribution of correlations of protein abundance for all paralogues present in CORUM complexes across the two classes listed above. Strikingly, we found that protein complexes containing homomer-derived paralogues contained two discrete subsets of highly positively and highly negatively correlating paralogue members (Figure 2E and Figure S7). In support of the homomer-derived paralogue specificity of such a pattern, we found no evidence for a similar distribution for the monomer-derived paralogues. Furthermore, by calculating the covariance correlation for all CORUM complex members, irrespective of them being paralogous or non-paralogous proteins, we found that homomer-derived paralogues are strongly enriched among the negatively correlating complex pairs (Figure S8; negative correlation rs < -0.7; number of human homomeric CORUM pairs = 5645, human CORUM pairs = 11820, mouse homomeric CORUM pairs = 457, mouse CORUM pairs = 1572, Fisher’s exact test p-values < 0.001 and 0.05, respectively). This leads us to suggest a classification of protein complexes containing homomer-derived paralogues in two distinct groups: those where paralogues are not interfering with each other’s function and therefore the need to minimize their spatiotemporal co-existence is alleviated; and those that have diverged in their abundance levels under the pressure of negative interference. To further corroborate our findings and conceptualization, we manually curated the whole set of protein complexes in the ‘escaped’, i.e. negatively correlating class. Consistent with our proposal, we found for 84% of the human and 67% of the mouse complexes in this category, respectively, literature evidence supporting mutual exclusivity/complementarity (Number of complexes with negatively correlating pairs human = 13 and mouse = 6, for the complete list of paralogue correlations see Table S4 and Figure S9-10). In the human dataset, for example, negatively correlating, homomer-derived CORUM paralogous pairs included two subcomplexes of the emerin complex, one with lamin A and the other with lamin B1. Lamin A has been shown to regulate nuclear mechanisms and is also associated with several diseases, including Emery–Dreifuss muscular dystrophy. In contrast, Lamin B1 is involved in intermediate filaments from the cytoskeleton, but not in nuclear mechanisms21. Another example was the homomer-derived paralogues Hspa5 and Hspa8, which are part of the HCF-1 complex involved in cell cycle and transcriptional regulation21. Whereas Hspa5 localizes in the ER lumen, Hspa8 resides in the nucleolus and the cell membrane21. In both the human and the mouse dataset, the paralogues were among the most negatively correlated pairs. Further examples from the mouse dataset include negatively correlating homomeric paralogous pairs of the ubiquitin E3 ligase complex, that is Cul1 and Cul2, as well as Cul2 and Cul3. This is consistent with studies that established that the paralogues Cul1, Cul2 and Cul3 are involved in three distinct subcomplexes22. Additionally, the mouse dataset showed two negatively correlating members of the Ubiquitin-proteasome complex, UBQLN1 and UBQLN2. Only UBQLN2 was shown to be able to translocate to the nucleus, and it has been shown that after heat stress, the two proteins are in distinct subcellular locations23. Taken together, our data indicate that the class of proteins postulated to be more prone to protein interference, that is, homomer-derived paralogues, and especially those that are part of protein complexes, exhibit a stronger divergence in protein abundance than other paralogues, as well as subcellular specialization. By this means, our correlation studies pinpoint interference escape as an important mechanism of protein complex evolution.

Discussion

In this study we show that rigorous statistical analysis of sets of protein abundance maps across species and tissues can inform us about the evolution of protein complexes. We used this framework to address the role of paralogue interference and diversification in protein complex evolution and specialization. Besides demonstrating that protein complex members tend to display highly correlated expression profiles, and that these profiles are conserved across species, we also indicate that protein abundance matrices recapitulate paralogue divergence over time, as well as diversification of protein interactions. Our study culminates with the observation that homomeric paralogues that are part of protein complexes show highly divergent expression patterns. This supports the notion that this is a mechanism by which protein interference among homomeric paralogues is avoided and complexes are diversified.

While many general aspects of protein complex architectural principles and evolution have been addressed in previous studies, to the best of our knowledge this is the first time that the escape of paralogue interference, specifically by separation of expression, has been analyzed on the proteome level and across species. Typically, positive co-variance of expression has been used to study conservation of protein complexes. Here we show that, since divergence of co-expression patterns seem to scale to some extent with functional and structural diversification, negative correlation may pinpoint specific evolutionary constraints in maintaining separation of homomer-derived paralogues. In fact, our data suggest that anti-correlating complex pairs are enriched in homomer-derived paralogues (Figure 2E). This is in agreement with the suggestion that homomer-derived paralogues are the most likely to be affected by protein-protein interference which can be resolved by separating expression. We find only a fraction of homomer-derived paralogues to exhibit such anticorrelation, while others strongly correlate. We therefore suggest that complex paralogues can be divided into two main groups. First, highly correlating co-evolved subunits and second, negatively correlating paralogues which, by being expressed in complementary fashions minimize the risk of interference. We indicate several examples of paralogues belonging to the latter class, for example lamin A and B1, which could represent suitable targets for follow-up studies.

At present, the scope and generalizability of our conclusions is limited by several factors. First, protein complex formation and evolution are likely to have many determinants, which can mask the effect of paralogue interference or compensate for it in ways other than expression divergence. Second, our analyses rests on resources curating protein complexes and distinguishing homomer-from monomer-derived paralogues. Broadening and improvement in such curations and annotation will allow more extensive and statistically more robust analyses and conclusions. This has a clear impact on comparative studies, where the extent of annotation between species varies greatly. Finally, the correlational nature of this, and many other methodologically related studies, must be stressed. We show that proteome-wide, cross-tissue and cross-species analyses are capable of capturing patterns that would otherwise be indiscernible. In this respect, we identified specific trends behind complex evolution which give support to specific proposals, such as homomer-derived paralogues-driven complex formation and interference escape. However, targeted studies should decode the mechanisms underlying these trends. Such an investigation, together with correlation analyses covering larger sets of conditions, are what in our view holds the greatest promise to refine our understanding of the forces shaping protein complex evolution.

Material and methods

SWATH-MS datasets and co-expression measures

We obtained all protein abundance data from publicly available SWATH-MS datasets. For human we used the data of Guo et al. 20197, for mouse the data from Williams at al. 20189, for fly the data of Okada et al. 20168 and for yeast the SWATH-MS dataset of the yeast strains described Zhu et al., 200824 and Brem et al., 200225 (manuscript in preparation). For further description of the datasets see Table S1. All analyses were based on the available protein matrix with relative protein intensities.

For each protein pair, we calculated the Spearman correlation of raw protein abundances across all samples. We used the cor function of the R package stats v. 3.4.4.26 with the option pairwise.complete.obs to compute the correlation between each pair of proteins using all complete pairs of observations on those proteins.

Orthologue identification

Orthologue mapping was conducted with BioMart Ensembl 9227 (release April 2018). We considered only genes with a “one2one” mapping, i.e. when the gene in one species has only one defined ortholog in another species. To translate protein and gene identifiers, we used the R package biomaRt v2.34.228,29.

Paralogue identification and analysis

We identified paralogues and paralogue families with Ensembl 92 (release April 2018) 27. As paralogue family we defined all paralogues connected via direct pairs. We determined the rate of non-synonymous changes and synonymous changes between the paralogous pairs using Ensembl 92 (ref27). We next assessed the similarity of interaction partners among paralogous pairs. We obtained the interaction partners of each paralogue from BioGrid v3.530,31 and we calculated the Jaccard similarity index, that is the intersection divided by the union of distinct interaction partners.

To determine whether a paralogous pair is derived from an ancestral protein that either formed homomers or was only present as a monomer, we used the InterEvol database (release 2010), designed for the analysis of co-evolution events at the structural interfaces of hetero-and homo-oligomers32. We considered a paralogous pair as homomer-derived if at least one member was annotated as homomer, and a pair as monomer-derived when none of the members was annotated as either homo-or heteromer. For manual paralogue annotation, we additionally considered the UniProt database (release November 2018)21.

Functional annotation

For functional enrichment analysis we used DAVID v6.818 with the following parameters: Annotation categories: GOTERM_BP_DIRECT; GO_Kappa similarity: Similarity term overlap = 3, similarity threshold = 0.5; Classification: Initial group membership = 2, final group membership = 2, multiple linkage threshold = 0.5; Enrichment thresholds: EASE = 0.05. We retrieved lists of protein complexes from CORUM v2.014,15. Tissue specific expression data was retrieved from the Human Proteome Map portal20 (HPM). To compare expression between tissue, the data was normalized using the normalizeBetweenArrays function from the R package limma v. 3.40.633.

Statistics and visualization

We conducted all statistical analysis with R v3.4.426. For the Fisher’s exact tests and the Wilcoxon tests we used a one-sided alternative hypothesis if applicable. We drew density graphs, bar-and boxplots with the R package ggplot2 v3.1.034. Boxplots were drawn in default settings (lower and upper hinges correspond to the first and third quartiles, whiskers extend up to 1.5 x the inter-quartile range or the distance between the first and the third quartile). In addition, we used the Van de Peer’s webtool to draw the Venn diagrams35. For network representations, we used Cytoscape v3.6.136.

Author contributions

L.S. conceived the study, performed the analysis and wrote the manuscript. R.C. helped with manuscript writing. R.A. and J.L.P supervised the project and provided feedback on the manuscript. A. B.E. helped with project supervision.

Declaration of interests

The authors declare no competing interests.

Acknowledgments

We thank Marija Buljan for help with the project supervision. This project was supported by the Swiss National Science Foundation through grant # SNSF 31003A_166435 to R.A. and by the European Research Council (ERC) through grant (ERC-20140AdG 670821 to R.A. R.C. was supported in part by the IMI project ULTRA-DD (FP07/2007-2013, grant no. 115766). J.L.P. acknowledges support from Swiss National Science Foundation Grant PP00P3_170604.

References

  1. 1.↵
    Romanov, N. et al. Disentangling Genetic and Environmental Effects on the Proteotypes of Individuals. Cell 177, 1308-1318.e10 (2019).
    OpenUrl
  2. 2.↵
    Ahnert, S. E., Marsh, J. A., Hernandez, H., Robinson, C. V. & Teichmann, S. A. Principles of assembly reveal a periodic table of protein complexes. Science (80-.). 350, 2245–2245 (2015).
    OpenUrl
  3. 3.↵
    Marsh, J. A. & Teichmann, S. A. Structure, Dynamics, Assembly, and Evolution of Protein Complexes. Annu. Rev. Biochem. 84, 551–575 (2015).
    OpenUrlCrossRefPubMed
  4. 4.↵
    Pereira-Leal, J. B., Levy, E. D., Kamp, C. & Teichmann, S. A. Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 8, (2007).
  5. 5.↵
    Kaltenegger, E. & Ober, D. Paralogue Interference Affects the Dynamics after Gene Duplication. Trends Plant Sci. 20, 814–821 (2015).
    OpenUrlCrossRefPubMed
  6. 6.↵
    Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, 1–17 (2012).
    OpenUrlFREE Full Text
  7. 7.↵
    Guo, T. et al. Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines. iScience 21, 664–680 (2019).
    OpenUrl
  8. 8.↵
    Okada, H., Ebhardt, H. A., Vonesch, S. C., Aebersold, R. & Hafen, E. Proteome-wide association studies identify biochemical modules associated with a wing-size phenotype in Drosophila melanogaster. Nat. Commun. 7, 1–11 (2016).
    OpenUrlCrossRefPubMed
  9. 9.↵
    Williams, E. G. et al. Quantifying and localizing the mitochondrial proteome across five tissues in a mouse population. Mol. Cell. Proteomics 17, 1766–1777 (2018).
    OpenUrlAbstract/FREE Full Text
  10. 10.↵
    Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 525, 339–344 (2015).
    OpenUrlCrossRefGeoRefPubMed
  11. 11.
    Skinnider, M. A. et al. An Atlas of Protein-Protein Interactions Across Mammalian Tissues. SSRN Electron. J. (2018). doi: 10.2139/ssrn.3219264
    OpenUrlCrossRef
  12. 12.
    Heusel, M. et al. Complex-centric proteome profiling by SEC - SWATH - MS. Mol. Syst. Biol. 15, 1–22 (2019).
    OpenUrlCrossRef
  13. 13.↵
    Snider, J. et al. Fundamentals of protein interaction network mapping. Mol. Syst. Biol. 11, 848 (2015).
    OpenUrlAbstract/FREE Full Text
  14. 14.↵
    Giurgiu, M. et al. CORUM: The comprehensive resource of mammalian protein complexes - 2019. Nucleic Acids Res. 47, D559–D563 (2019).
    OpenUrlCrossRef
  15. 15.↵
    Ruepp, A. et al. CORUM: The comprehensive resource of mammalian protein complexes-2009. Nucleic Acids Res. 38, 497–501 (2009).
    OpenUrl
  16. 16.↵
    Murali, T. et al. DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila. Nucleic Acids Res. 39, D736–43 (2011).
    OpenUrlCrossRefPubMedWeb of Science
  17. 17.↵
    Pu, S., Wong, J., Turner, B., Cho, E. & Wodak, S. J. Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 37, 825–831 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  18. 18.↵
    Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  19. 19.↵
    Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  20. 20.↵
    Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
    OpenUrlCrossRefPubMedWeb of Science
  21. 21.↵
    UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46, 2699–2699 (2018).
    OpenUrlCrossRefPubMed
  22. 22.↵
    Bosu, D. R. & Kipreos, E. T. Cullin-RING ubiquitin ligases: Global regulation and activation cycles. Cell Division 3, 7 (2008).
    OpenUrl
  23. 23.↵
    Hjerpe, R. et al. UBQLN2 Mediates Autophagy-Independent Protein Aggregate Clearance by the Proteasome. Cell 166, 935–949 (2016).
    OpenUrlCrossRef
  24. 24.↵
    Zhu, J. et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat. Genet. 40, 854–861 (2008).
    OpenUrlCrossRefPubMedWeb of Science
  25. 25.↵
    Brem, R. B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of transcriptional regulation in budding yeast. Science (80-.). 296, 752–755 (2002).
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    Team, R. C. No Title. R: A Language and Environment for Statistical Computing (2019). Available at: https://www.r-project.org/.
  27. 27.↵
    Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, (2018).
  28. 28.↵
    Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/ Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  29. 29.↵
    Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005).
    OpenUrlCrossRefPubMedWeb of Science
  30. 30.↵
    Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 34, D535–9 (2006).
    OpenUrlCrossRefPubMedWeb of Science
  31. 31.↵
    Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids Res. 47, D529–D541 (2019).
    OpenUrlCrossRefPubMed
  32. 32.↵
    Faure, G., Andreani, J. & Guerois, R. InterEvol database: Exploring the structure and evolution of protein complex interfaces. Nucleic Acids Res. 40, 847–856 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  33. 33.↵
    Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
    OpenUrlCrossRefPubMed
  34. 34.↵
    Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2016).
  35. 35.↵
    Van der Peer, Y. Calculate and draw custom Venn diagrams. (2018).
  36. 36.↵
    Shannon, P. et al. Cytoscape: A software Environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
    OpenUrlAbstract/FREE Full Text
Back to top
PreviousNext
Posted September 08, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
SWATH-MS co-expression profiles reveal paralogue interference in protein complex evolution
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
SWATH-MS co-expression profiles reveal paralogue interference in protein complex evolution
Luzia Stalder, Amir Banaei-Esfahani, Rodolfo Ciuffa, Joshua L Payne, Ruedi Aebersold
bioRxiv 2020.09.08.287334; doi: https://doi.org/10.1101/2020.09.08.287334
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
SWATH-MS co-expression profiles reveal paralogue interference in protein complex evolution
Luzia Stalder, Amir Banaei-Esfahani, Rodolfo Ciuffa, Joshua L Payne, Ruedi Aebersold
bioRxiv 2020.09.08.287334; doi: https://doi.org/10.1101/2020.09.08.287334

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Evolutionary Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (3505)
  • Biochemistry (7346)
  • Bioengineering (5323)
  • Bioinformatics (20263)
  • Biophysics (10016)
  • Cancer Biology (7743)
  • Cell Biology (11300)
  • Clinical Trials (138)
  • Developmental Biology (6437)
  • Ecology (9951)
  • Epidemiology (2065)
  • Evolutionary Biology (13322)
  • Genetics (9361)
  • Genomics (12583)
  • Immunology (7701)
  • Microbiology (19021)
  • Molecular Biology (7441)
  • Neuroscience (41036)
  • Paleontology (300)
  • Pathology (1229)
  • Pharmacology and Toxicology (2137)
  • Physiology (3160)
  • Plant Biology (6860)
  • Scientific Communication and Education (1272)
  • Synthetic Biology (1896)
  • Systems Biology (5311)
  • Zoology (1089)