Conditional interactions in literature-curated protein interaction databases

Databases of literature-curated protein-protein interactions (PPIs) are often used to interpret high-throughput interactome mapping studies and estimate error rates. These databases combine interactions across thousands of published studies and experimental techniques. Because the tendency for two proteins to interact depends on the local conditions, this heterogeneity of conditions means that only a subset of database PPIs are interacting during any given experiment. A typical use of these databases as gold standards in interactome mapping projects, however, assumes that PPIs included in the database are indeed interacting under the experimental conditions of the study. Using raw data from 20 co-fractionation experiments and six published interactomes, we demonstrate that this assumption is often false, with up to 55% of purported gold standard interactions showing no evidence of interaction, on average. We identify a subset of CORUM database complexes that do show consistent evidence of interaction in co-fractionation studies, and we use this subset as gold standards to dramatically improve interactome mapping as judged by the number of predicted interactions at a given error rate. We recommend using this CORUM subset as the gold standard set in future co-fractionation studies. More generally, we recommend using the subset of literature-curated PPIs that are specific to experimental conditions whenever possible.

associates proteins whose amounts are correlated between fractions. As each fraction is analyzed 45 with mass spectrometry, PCP-SILAC and other CF techniques can detect tens of thousands of 46 interacting proteins (2-8). In order to separate signal from noise, it is common for high-throughput 47 protein interactome studies to consult databases of known, unequivocal interactions ("gold 48 standards") (2,3,9,10). For example, co-fractionation studies often use gold standard interactions as 49 training labels in a machine learning classifier (2,3,11). Gold standards are also used to define false 50 positive/negative and true positive/negative interactions in order to calculate common statistics such 51 as precision, recall, and sensitivity (5-7,11,12).

53
Gold standard databases are assembled from different experiments and techniques, each with a 54 unique set of biases. Since protein-protein interactions (PPIs) can be conditional and transient, single 55 datasets, which are typically generated by a single technique, can disagree with gold standards. This 56 variability partly reflects true biological differences. For example, the majority of in vivo yeast PPIs 57 were observed to depend on environmental and chemical conditions (13). Some assays also impose 58 technical biases that limit detectable PPIs, such as a bias of some high-throughput techniques toward 59 highly expressed or well studied protein pairs, or a bias against PPIs involving transmembrane proteins (12). Therefore, gold standard databases that include all interactions that can occur will fail to 61 describe the subset of interactions that are either not occurring due to current experimental 62 conditions, or that are unlikely to be detected due to technical limitations.

64
Therefore, a distinction should be made between the large, curated compilations of interactions 65 across many studies, and the gold standard sets used as a reference for a single dataset. Our own 66 focus has been on interactome mapping using co-fractionation, so here we quantify the proportion of 67 gold standard interactions that fail to display any evidence for interaction in 20 co-fractionation 68 datasets. Using a conservative measure of protein interaction, we find that between 19 and 55% of 69 gold standard PPIs display no evidence of interacting. Across co-fractionation experiments, there is 70 evidence that a subset of literature-curated complexes consistently co-fractionates, suggesting this 71 subset would be a more appropriate gold standard reference set. Indeed, the number of predicted 72 interactions at a given stringency increases dramatically when using this subset as a gold standard 73 set. We recommend using this subset as the gold standard reference in future co-fractionation studies 74 and, more generally, using experiment-and condition-specific gold standards whenever possible.

79
Using the CORUM database of protein complexes (14), we first examined the degree to which 80 literature-curated PPIs were unsupported by data from single co-fractionation datasets. Many 81 database PPIs show clear evidence of interaction, as shown by their tendency to co-fractionate for 82 the entire chromatogram ( Fig 1A) or a portion of the chromatogram (Fig 1B). However, other protein 83 pairs from within a single CORUM complex show little evidence of interaction in certain experiments.

84
For example, two chaperone proteins, HSP-90a (UniProt ID P07900) and BiP (P11021) are known to 85 interact as part of a larger chaperone multiprotein complex (15) (CORUM complex "HCF-1"), yet there 86 is little evidence that the two proteins co-fractionate in our data ( Fig 1C).

88
More broadly, across 20 PCP-SILAC co-fractionation datasets, the majority of random protein pairs 89 do not co-fractionate, as quantified by anti-correlated fractionation profiles, a conservative measure of 90 which protein pairs are non-interacting (red, Fig 1D). While the majority of gold standard protein pairs

102
Euclidean distance between every gold standard interaction in our co-fractionation data (20 datasets, 103 grey; average, black). All other protein pairs in our data are shown, the vast majority of which are not 104 interacting (red). Example pairs A, B, C are shown (arrows).

106 107
While the full set of CORUM complexes is a widely used gold standard (6,7,9-11), there are many 108 other literature-curated interaction databases. In addition to CORUM, we examined nine databases of 109 protein interactions (16-24) and two subsets of CORUM used previously as gold standards (2,3).

110
These range from databases that include interactions from high-throughput experiments to manually 111 curated databases composed exclusively of low-throughput experiments. All had anti-correlated 112 protein pairs in our co-fractionation datasets (Fig 2). As a baseline, 62% of all protein pairs, the large 113 majority of which can be assumed to be non-interacting, were anti-correlated (Fig 2, red). The 114 proportion of anti-correlated pairs in gold standard sets ranged from 55% (HPRD) to 19% (CORUM).

115
Restricting gold standard PPIs to those supported by two or more publications limits but does not 116 eliminate uncorrelated protein pairs (Supp. Fig 1). Therefore all interaction databases investigated 117 here contain protein pairs that are not supported by our co-fractionation data, and comparisons to 118 CORUM give a conservative estimate of the discrepancy between our data and interaction

195
Evaluating significance at four p-value thresholds (p < 1, 10 -2 , 10 -6 , 10 -10 ) produced four subsets of 196 CORUM complexes that contain 302, 122, 95, and 80 complexes, respectively (Table 1). To avoid 197 training and testing on the same data, we defined the co-fractionation-specific gold standard subsets 198 using interactomes published by other groups (CF4, CF5, CF6), and these gold standard subsets 199 were then used to predict interactomes using co-fractionation data generated by our group. investigating their impact on recovering novel PPIs from high-throughput data (30). However, to our 258 knowledge our study is the first to specifically address the conditional nature of PPI entries in these 259 databases.

261
Since CORUM is manually curated from low-throughput experiments, we do not interpret these anti-262 correlated pairs as errors in the database. Rather, we attribute any discrepancy between our raw data 263 and the databases to the conditional nature of protein interactions and the fact that databases 264 compile evidence from many different experiments and conditions. Indeed, under certain conditions, 265 60S ribosomal proteins, which have been extensively studied and shown to interact, display poor 266 evidence of interaction (Supp. Fig 1).

268
Therefore studies should take care not to conflate interaction databases, which attempt to list all

276
One solution is to use condition-or technique-specific gold standard subsets. We show that subsets 277 of gold standard databases that have consistent, independent evidence taken from similar conditions 278 to those under which the raw data was produced can increase the size of interactomes judged at the 279 same precision level (Fig 4). We include this set of CORUM gold standard complexes and