Abstract
The expression and purification of integral membrane proteins remains a major bottleneck in the characterization of these important proteins. Expression levels are currently unpredictable, which renders the pursuit of these targets challenging and highly inefficient. Evidence demonstrates that small changes in the nucleotide or amino-acid sequence can dramatically affect membrane protein biogenesis; yet these observations have not resulted in generalizable approaches to improve expression. In this study, we develop a data-driven statistical model that predicts membrane protein expression in E. coli directly from sequence. The model, trained on experimental data, combines a set of sequence-derived variables resulting in a score that predicts the likelihood of expression. We test the model against various independent datasets from the literature that contain a variety of scales and experimental outcomes demonstrating that the model significantly enriches expressed proteins. The model is then used to score expression for membrane proteomes and protein families highlighting areas where the model excels. Surprisingly, analysis of the underlying features reveals an importance in nucleotide sequence-derived parameters for expression. This computational model, as illustrated here, can immediately be used to identify favorable targets for characterization.
Introduction
The central role of integral membrane proteins motivates structural and biophysical studies that require large amounts of purified protein, often at considerable cost of both material and labor. Only a small percentage can be produced at high-levels resulting in membrane protein structural characterization lagging roughly 20 years behind that of soluble proteins1. To increase the pace of structure determination, the scientific community created large government-funded structural genomics consortia facilities, like the NIH-funded New York Consortium on Membrane Protein Structure (NYCOMPS)2. For this representative example, more than 8000 genes, chosen based on characteristics hypothetically related to success, yielded only 600 (7.1%) highly expressing proteins3 resulting to date in 34 (0.4%) unique structures (based on annotation in the RCSB PDB4). Despite considerable investment on many scales, the lack of expressed targets has hampered membrane protein structural studies5.
Tools for improving the number of expressed membrane proteins are needed. While significant work has shown promise on a case-by-case basis, e.g. growth at lower temperatures, codon optimization6, and regulating transcription7, a generalizable solution remains elusive. Currently, each target must be addressed individually as the conditions that were successful for a previous target seldom carry over to other proteins, even amongst closely related homologs8,9. For individual cases, simple changes can have dramatic effects on the amount of expressed proteins10,11. Considering the scientific value of membrane protein studies and absence of systematic methods to improve expression, it is surprising that no method can provide solutions with broad applicability across protein families and genomes nor describe the variation in expression levels seen between closely related sequences.
While there are no approaches to broadly decode sequence-level information precluding its use for predicting membrane protein expression, the concept that sequence variation can measurably influence membrane protein biogenesis is commonplace. For example, positive-charges on cytoplasmic loops are important determinants of membrane protein topology12,13; yet introduction of mutations presumed to enhance certain properties, such as the positive inside rule, has not proven generalizable for improving expression9. This reasons for this likely lie in the complex underpinnings of membrane protein biogenesis, where the interplay between sequence parameters at the protein and nucleotide levels must be considered. Optimizing for a single sequence-level feature likely diminishes the beneficial effect of other features (e.g. increasing positive residues on internal loops might diminish favorable mRNA properties). Without accounting for the broad set of features related to membrane protein expression, it is impossible to predict differences in expression.
To connect sequence to prediction, we develop a statistical model that maps a set of sequences to experimental expression levels via calculated features—thereby simultaneously accounting for the many determinants of expression. The resulting model allows ranking of any arbitrary set of membrane protein sequences in order of their relative likelihood of successful expression. In doing so, we leverage the corpus of work that shows that sequence-level characteristics are important determinants of protein biogenesis, e.g. RNA secondary structure14,15, transmembrane segment hydrophobicity16, the positive inside rule17, and loop disorder18. 105 of these sequence-derived parameters were calculated for individual proteins within datasets of interest (Table S1). As the first report to predict expression, we train a linear equation that provides a score based on calculating the sum of weighted features where the weights are derived from fitting to experimental expression data, a “training set.” This model can be used broadly to score any membrane protein based on its calculated features. We validate the model against a variety of independent datasets demonstrating its generalizability. To support further experimental efforts, we broadly score the membrane proteome from a variety of important genomes and showcase the performance of the model across protein families. This approach and resulting model provides an exciting example for connecting sequence space to complex experimental outcomes.
Results
For this study, we focus on heterologous expression in E. coli, due to its ubiquitous use as a tool for membrane protein expression. While the benefits derived from low cost and low barriers for adoption are obvious, the applicability to the spectrum of the membrane proteome are becoming clear er. Of note, 43 of the 216 unique eukaryotic membrane protein structures were solved using protein expressed in E. coli (based on annotation in the RCSB PDB4). This demonstrates the utility of E. coli as a broad tool and its potential if the expression problem can be overcome.
Development of a computational model trained on E. coli expression data
A key component of any machine learning model is the choice of dataset used for training. Having searched the literature, we identified two publications that contained quantitative datasets on the heterologous expression of E. coli membrane proteins in E. coli. The first set, Daley, Rapp et al., contained activity measures, proxies for expression level, from C-terminal tags of either GFP or PhoA (alkaline phosphatase)19. The second set, Fluman et al., contained a more detailed analysis of a subset from the first utilizing in-gel fluorescence to measure folded protein20 (see Methods 4c). The expression results strongly correlated between the two datasets notably in that normalized GFP activity was a good measure of the amount of folded membrane protein (Figure 1a, also 21). The experimental set-up employed multiple 96-well plates over multiple days resulting in pronounced variability in the absolute expression level of a given protein between trials. Daley, Rapp, et al. calculated average expression levels by dividing the raw expression level of each protein by that of a control construct (Inverse LepB-GFP or LepB-PhoA) on the corresponding plate. While the resulting values were useful for the relevant question of identifying topology, we were unable to successfully fit a linear regression or a standard linear-SVM on either the raw data compiled from all plates or averaged outcomes of each gene. This unexpected outcome suggested that the measurements required a more complex analysis.
Training performance. a, A comparison of GFP activity19 with folded protein20 where each point represents the mean for a given gene tested in both works, and error bars plot the extrema. Spearman's rank correlation coefficient ρ and 95% confidence interval (CI)42 are shown. b, Plates are the number of independent sets of measurements within which expression levels can be reliably compared. Genes are the number of proteins for which the C-terminus was reliably ascertained19. Observations are the total number of expression data points accessible. Total pairs are the number of comparable expression measurements (i.e. those within a single plate). Kendall's τ is the metric maximized by the training process (See Methods 4b). The color of the column heading identifying each experimental set is retained throughout the figure. c, Agreement against the normalized outcomes plotted as the mean activity (see Methods 5 for definition) versus the score with error bars providing the extent of observed activities (Spearman's ρ and 95% CI noted). d, Illustrative ROCs for thresholds at 25th and 75th percentile in activity with the number of positive outcomes at that threshold, the AUC, and 95% CI43 indicated. e, The AUC of the ROC at every possible activity threshold.
We hypothesized that measurements could be more accurately compared within an individual plate then across the entire dataset. To account for this, a preference-ranking linear SVM algorithm (SVMrank 22) was chosen (see Methods 4b). Simply put, the SVMrank algorithm determines the optimal weight for each feature to best rank the order of expression outcomes within each plate over all plates, which results in a model where higher expressing proteins have higher scores. The outcome is identical in structure to a multiple linear regression, but instead of minimizing the sum of squared residuals, the SVM cost function is used accounting for the plate-wise constraint specified above. In practice, the process optimizes as a training metric the correlation coefficient Kendall's τ to converge upon a set of weights. Kendall's τ measures the agreement between ordinal quantities by calculating correctly-ordered and swapped pairs.
Various metrics related to the training data can be derived to assess the accuracy with which the model fits the input data (see Methods 4C). The SVMrank training metric shows varying agreement for all groups (i.e., τkendall >0) (Figure 1b). For individual genes, activity values normalized and averaged across trials were not directly used for the training procedure (see Methods 4a); yet one would anticipate that scores for each gene should broadly correlate with expression. Indeed, the observed normalized activities positively correlate with the SVMrank score output by the model (Figure 1c).
For a more quantitative approach to assessing the models success within the training data, we turn to the Receiver Operating Characteristic (ROC). ROC curves quantify the tradeoff between true positive and false positive predictions across the numerical scores output from a predictor. This is a more reliable assessment of prediction than simply calculating accuracy and precision from a single, arbitrary score threshold 23. The figure of merit that quantifies an ROC curve is the Area Under the Curve (AUC). Given that the AUC for a perfect predictor corresponds to 100% and that of a random predictor is 50% (Figure 1d, grey dashed line), an AUC greater than 50% indicates predictive performance of the model (percentage signs hereafter omitted) (see Methods 5 and 23). Here, the ROC framework will be used to quantitatively assess the ability of our model to predict the outcomes within the various datasets.
The training datasets are quantitative measures of activity requiring that an activity threshold be chosen that defines positive or negative outcomes. For example, ROC curves using two distinct activity thresholds, at the 25th or 75th percentile of highest expression, are plotted with their calculated AUC values (Figure 1d). While both show that the model has predictive capacity, a more useful visualization would consider all possible activity thresholds. For this, the AUC value for every activity threshold is plotted showing that the model has predictive power regardless of an arbitrarily chosen expression threshold (Figure 1e). In total, the analysis demonstrates that the model can rank expression outcomes for all proteins. Interestingly, for PhoA-tagged proteins the model is progressively less successful with increasing activity. PhoA activity is an indirect measure of expression of proteins with their C-termini in the periplasm bringing into question either the utility of this quantification method relative to GFP activity or perhaps that this class of proteins are special in the model. An argument for the former is presented later (Figure 2e).
Success of the model against outcomes from NYCOMPS. a, An overview of the NYCOMPS outcomes and a plot of the number of conditions tested per gene with outcomes highlighted. b, The PPV plotted for each percentile SVMrank score, e.g. 75 on the x-axis indicates the PPV for the top 25% of genes based on score. The grey dashed line shows the ∼25% overall success rate of the NYCOMPS experimental outcomes. c, Histograms of the total count of proteins at a given SVMrank score colored by NYCOMPS-determined outcomes. d, ROC curve, positive (red) and total (black) counts, and AUC values with 95% CI. The grey dashed line shows the performance of a completely random predictor (AUC=50). e, The AUCs for all trials together (left) followed by outcomes in individual plasmid and solubilization conditions (DDM except LDAO where noted) along with 95% CI (numerically in Table S2). Performances are also split by predicted C-terminal localization24. Overall positive percentage (red) and total number of outcomes within each group is noted below the axis.
Demonstration of prediction against an independent large expression dataset
While the above analyses show that the model successfully fits the training data, we assess the broader applicability of the model based on its success at predicting the outcomes of independent largeand small-scale expression trials. The first test considers results from NYCOMPS, where 8444 membrane protein genes entered expression trials, in up to eight conditions, resulting in 17114 expression outcomes2. The majority of genes were attempted in only one condition (Figure 2a), and outcomes were non-quantitative (binary: expressed or not expressed) as indicated by the presence of a band by Coomassie staining of an SDS-PAGE gel after small-scale expression, solubilization, and purification3. Therefore, for this analysis, we consider the experimental results in various ways: either outcomes per gene (if at least one trial is positive, the gene is considered positive for expression), all conditions (each expression trial considered independently), or based on defined expression conditions. For the first, several metrics demonstrate prediction (Figure 2b-d).
A major aim of this work is to enrich the likelihood of choosing positively expressing proteins. The positive predictive value (PPV, true positives ÷ predicted positives) becomes a useful metric for positive enrichment as it conveys the degree of improvement over the experimental baseline of the dataset. The PPV of the model is plotted as a function of the percentile of the SVMrank score threshold for the definition of predicted positives (Figure 2b). In the figure, the overall positive percentage (∼24%), an experimental baseline, is represented by a grey dashed line; therefore, a relative increase reflects the increased predictive power of the algorithm. For example, considering the top fourth of genes by SVMrank score (75th percentile) shows that the algorithm enriches for positive outcomes by 8.4% over baseline. Seen another way, a histogram of the SVMrank score for each protein is plotted separated by positive versus negative outcomes (Figure 2c). Visually, the distribution of the scores for the positive group is shifted to a higher score relative to the negative group, which is substantiated quantitatively by the ROC and its corresponding AUC (Figure 2d). Interestingly, considering the predictive power against all conditions as opposed to by gene shows slightly better statistics (AUC=62.6) reflective of the fact that many genes have mixed outcomes (Figure 2e). Importantly, the model shows consistent performance throughout each of the eight possible conditions tested (Figure 2e, black, numerically in Table S2).
The ability to predict the experimental data from NYCOMPS allows a return to the question of alkaline phosphatase as a metric for expression. To investigate the trend that the expression of proteins with periplasmic C-termini measured by alkaline phosphatase (Figure 1, orange) show less consistent fitting by the model, the NYCOMPS outcomes are split by putative C-terminal localization as predicted by Phobius24. No significant difference in AUC between C-terminal localizations across all conditions (Figure 2e, green vs. orange) indicate that the model is applicable for all topologies.
Further demonstration of prediction against small-scale independent datasets
The NYCOMPS example demonstrates the predictive power of the model across the broad range of sequence space encompassed by that dataset. Next, the performance of the model is tested against relevant subsets of sequence space (e.g. a family of proteins or the proteome from a single organism), which are reminiscent of laboratory-scale experiments that precede structural or biochemical analyses. While a number of datasets exist8,25–35, we could only identify six for which complete sequence information could be obtained to calculate all the necessary sequence parameters25–30.
The first dataset is derived from the expression of 14 archaeal transporters in E. coli chosen based on their homology to human proteins25. For each putative transporter, expression was performed in three plasmids and two strains (six total conditions) with the membrane fraction quantified by both a Western blot against a histidine-affinity tag and Coomassie Blue staining of an SDS-PAGE gel. Here, the majority of the expressing proteins fall into the top half of the SVMrank scores, 7 out of 9 of those with multiple positive outcomes (Figure 3a, top). Strikingly, quantification of the Coomassie Blue staining highlights a clear correlation with the SVMrank score where the higher expressing proteins have the highest score (Figure 3a, bottom). ROC curves are plotted for the two thresholds: expression detected at least by Western blot or, for the smaller subset, by Coomassie Blue (Figure 3b). In both cases, the model shows predictive power.
Success of the model against a variety of small scale outcomes. For each set, vertical lines indicate the median SVMrank score. ROCs along with AUCs and 95% CI as well as the total number of positives for the given threshold (red hues) along with the total outcomes (black) are presented. In each curve, increasing expression thresholds as defined by the original publication are displayed as deeper red. a, The expression of archaeal transporters in up to 6 trials. Top, positive expression count is plotted above the dashed line and negative outcomes below the line. Bottom, from the same work, the expression of proteins detected by Coomassie Blue25. b, ROC curves for each positive threshold (i.e. Coomassie Blue or Western Blot) from trials in a. c, Experimental expression of M. tuberculosis membrane proteins plotted based on outcomes. d, ROC curves for each possible threshold from trials in c. e, Mammalian GPCR expression in either E. coli (top) or P. pastoris (bottom). f, ROC curves for each possible threshold from trials in e.
The next test considers the expression of 105 Mycobacterium tuberculosis proteins in E. coli26. Protein expression was measured both by Coomassie Blue staining of an SDS-PAGE gel and Western blot with only outcomes from the membrane fraction considered for this analysis. The highest expressing proteins (detected via Coomassie Blue) follow the trend given by the SVMrank score with 7 of the 9 falling within the top half of scoring proteins (Figure 3c) and is reflected in the ROC (Figure 3d). In contrast, using the positive Western blot outcomes as the minimum threshold (Figure 3c) shows an AUC no better than random (Figure 3d). Given that no internal standard was used and that each expression trial was performed only once, proteins that were positive by Western blot may represent a pool indistinguishable in expression from those not detected; alternatively, these results support that our statistical model accurately captures the most highly expressing proteins.
A broader test considers expression trials of 101 mammalian GPCRs in bacterial and eukaryotic systems27. Trials in E. coli, measured via Western blot of an insoluble fraction, again show highly expressing proteins at higher SVMrank scores while the expression of the same proteins in P. pastoris, measured via dot blot, fail to show broad agreement (Figure 3e,f). The lack of predictive performance in P. pastoris suggests that the parameterization of the model, calibrated for broadly characterizing E. coli expression, requires retraining to generate a different model that captures the distinct interplay of sequence parameters in yeast.
Further expression trials of membrane proteins from H. pylori, T. maritima as well as microbial secondary transporters continues to show the same broad agreement28–30 (Figure S1). H. pylori membrane proteins showed that as the threshold for positive expressing proteins increases, the performance of the model improves (using the highest threshold n=46 and AUC=67.7) (Figure S1a,b). For T. maritima expression, the model weakly captures outcomes for two defined thresholds (n=5 and 19, AUC=61.7 and 58.7), but due to the small number of successful outcomes, the confidence intervals are broad (Figure S1c,d). The expression of microbial secondary transporters shows varied agreement with the model. Taking proteins at the lower defined expression threshold shows predictive performance (n=59, AUC=60.5); however, considering the defined high-expressing proteins is less conclusive (n=26, AUC=52.0) (Figure S1e,f).
Forward predictions on genomes of interest
The model successfully enriches for heterologous expression of membrane proteins in E. coli strikingly across scales, laboratories, quantification methods, and protein families supporting its broad generalizability. While few genes express in every condition tested (Figure 2a and 3a), the model predicts the likelihood that a gene will express within a set of conditions and enriches for those that will work in any condition (Figure 2e, numerically in Table S2). Notably, had this model been implemented during the NYCOMPS target selection process, only testing targets with an SVMrank score greater than 0. 5 (90th percentile or above), based on known outcomes, the percentage of successful genes would have increased from 25% to 37% (Figure 2b). For perspective, testing the same number of genes would have resulted in an additional 912 expressed proteins, representing a significant improvement in the return on investment.
To expand on the utility of this model, SVMrank scores were calculated for membrane proteins from a variety of metazoan and microbial genomes (Figure 4a and Figure 2a). Many genomes have a significant proportion of proteins with high scores particularly evidenced by portions of the distributions ahead of the median in E. coli given by the vertical dashed line (Figure 4a). The likelihood for successful expression may be inferred by equating SVMrank score with the PPV from the most prevalent NYCOMPS expression condition which rises dramatically at scores above zero (Figure 4b). The range of scores spans those representative of high-expressing membrane proteins in both E. coli (Figure 1c) as well as in the NYCOMPS dataset (Figure 2c) and provides suggested targets for future biophysical studies (Table S4).
Forward predictions of membrane protein expression for various genomes. a, Calculated scores for proteins from a variety of genomes (count in parentheses; complete set provided in Figure S2a) plotted as contours of kernel density estimates of the number of proteins at a given score. Amplitude is only relative within a genome. The dot indicate the median, and the lines depict quantities of an analogous Tukey boxplot44,45. The vertical line shows the median score in E. coli to provide context for other distributions. b, PPV of the model within the most tested NYCOMPS condition. c, Distribution of overlap coefficients (see Methods 7) for each sequence parameter comparing the entire E. coli membrane proteome vs. the training set from E. coli. The dashed line provides a threshold separating the cluster of highly-related features from those with lower overlap. d-f, A comparison of overlap coefficients with the training set between NYCOMPS and d, all forward predictions (Figure S2a), e, thermophilic genomes (orange), or f, P. falciparum. Mean Absolute Deviation is indicated for each plot.
The predictions present several surprises at the biological level. One such is that the distribution of membrane proteins from representative thermophilic bacterial genomes have generally lower relative SVMrank scores than other genomes, which implies that these proteins, on average, are harder to express in E. coli. This contrasts the many empirical examples of proteins from thermophiles being used for biophysical characterization. In the case of the malarial parasite P. falciparum, the inverse trend is true with higher than expected relative SVMrank scores despite the expectation that these proteins would be hard to express in E. coli. A possible cause for the unexpected distribution of scores may lie in the differences in the parameters that define the proteins in these particular groups. As the training set consists only of native E. coli sequences, the range of values for each parameter in the training set may not represent the full range of possible values for the parameter. For the special cases highlighted, perhaps the underlying sequence parameters fall into a poorly characterized subset of sequence space bringing into question the applicability of the model for these cases.
To address the utility of the model relative to differences in the sampling of sequence parameters, we measure the overlap of the distributions of sequence parameters for a given subset (see Methods 7) (Figure S3b). Simply put, if two subsets contain the same distribution of sequence parameters the expectation is that a given parameter should approach 100%. In the simplest case, comparing the distribution of sequences parameters in all E. coli membrane proteins against the subset used in the training set shows that the majority of parameters have overlap values over 75% (Figure 4c), which provides a lower threshold for similarity of sequence parameter range. For NYCOMPS sequences, most of the overlap values relative to the training set are above the threshold. As this set shows predictive performance, comparison to the training set provides a baseline to assess the reliability of predictions within other subsets (Figure 4d-f, x-axis). In the first case (Figure 4d), there is a strong correlation between all the forward predictions and NYCOMPS, i.e. values are near the diagonal (quantified by a Mean Absolute Deviation (MAD) = 11.6), suggesting that differences in parameter space do not significantly affect the predictive power of the model. For the thermophiles subset (Figure 4e), the values again are close to the diagonal (i.e. low MAD = 10.6) implying that the predictions are credible. P. falciparum (Figure 4f), on the other hand, clearly shows stark differences as most parameters fall below the 75% cut-off (MAD = 29.0) bringing into question the reliability of these predictions. A training set with broader coverage of the parameter space may generate a better predictor for all genomes.
Performance of the model across protein families
To provide a clear path forward for experiment, we consider the performance of the model with regards to protein homology families where protein family definitions are based on Pfam classifications36. For outcomes from NYCOMPS (Figure 5a), there are no significant difference in the predictive performance of the model between groups of genes whether or not they are part of a protein family found in the training set.
Model performance across protein families a, The NYCOMPS dataset is split by those proteins with a Pfam found either in the training set, not in the training set or those without a Pfam (Pfam counts in parentheses). The AUC and 95% confidence interval for each set is plotted with the positive rate (red) and number of trials (black) indicated below the x-axis. b, For each family, the AUC across all outcomes is plotted arranged in order of the value, an empirical cumulative distribution function, with horizontal bars indicating the 95% confidence interval. The color indicates the significance of the prediction within the family: purple, predictive at 95% confidence, blue, predictive but not at 95% confidence, green, not predictive. The size of each significance group and total number of families (black) are indicated on the plot. c, Outcomes for specific protein families. Each was only tested in a single condition (N). The overall positive percentage within the group, total number of outcomes, and AUC with 95% CI is labelled to the right.
The scale of NYCOMPS allows us to investigate whether there are protein families for which the model does better or worse than the aggregate. For this, an AUC is calculated for each protein family that has minimally five total outcomes (including at least one positive and one negative). Figure 5B plots the AUC for each protein family in increasing order as a cumulative distribution function. The breadth of the AUC values highlights the variability in predictive power across families. Most families can be predicted by the model (115 of 159 have an AUC > 0.5, visually blue and purple) though some not at 95% confidence (57 of 115, blue), likely due to an insufficient number tested.
For the protein families that are well-predicted within the NYCOMPS set, the model gives accurate insight into the likelihood of expression of a given protein. We demonstrate the utility of this prediction by looking at protein families that have yet to be characterized structurally. While there are a number of choices, a first example is the protein family annotated as short-chain fatty-acid transporters (PF02667), characterized by AtoE in E. coli, that typically contains 10 transmembrane domains with an overall length of ∼450 amino acids. A second example is the protein family annotated as copper resistance proteins (PF05425), characterized by CopD in E. coli, that typically contains eight transmembrane domains with an overall length of ∼315 amino acids. In both cases, as indicated by the AUC values, the model provides a clear score cut-off for consideration for expression. For example, considering CopD homologs, one would expect that those with SVMrank scores above −1 will express.
Biological importance of various sequence parameters
Using a simple proof-of-concept linear model allowed for a straightforward and robust predictor; however, intrinsically this complicates determination of biological underpinnings due to the unequal distribution of weight across correlating features. For example, the average ΔGinsertion of transmembrane segments has a positive weight whereas average hydrophobicity, a correlating parameter, has a negative weight (Table S1, Figure S3). As many parameters, such as those related to hydrophobicity, are highly correlated; conclusive information cannot be obtained simply using weights of individual features to interpret the relative importance of their underlying biological phenomena. An alternative is to collapse related features into biologically meaningful categories reducing correlation (Figure 6a), thereby providing a mechanism to interpret information from the model. For example, the hydrophobicity group incorporates sequence features such as average hydrophobicity, maximum hydrophobicity, ΔGinsertion, etc. The full list of groupings are provided in Table S1 and Figure S3.
Feature contributions to the model a, Pearson correlation coefficients between feature categories are shown. Feature labels are green for protein-sequence derived and brown for nucleotide-sequence derived. b, Total weight for each category is represented as a bar. The contribution of each feature to the category is shown by partitioning the bar. The red dot indicates the total sum of weights within the category. c, The AUC and 95% confidence interval when predicting with the entire model (SVMrank score) or single category specified on the NYCOMPS dataset. Red shows the outcome of predicting at the level of individual genes (Figure 2b-d) and grey show the outcome within each vector individually (as in Figure 2e). d, The AUC and 95% confidence interval when predicting with the entire model (SVMrank score) or by excluding single category specified on the NYCOMPS dataset. e, As b, but classifying features by the type of sequence they are calculated from. f, The AUC and 95% confidence intervals using only protein or nucleotide features. g, Relative difference in SD-like sites (green), expression (purple), and SVMrank score (yellow) between wild-type and mutants with silent mutations. See Methods 7 for further detail.
Analysis of categories suggests the phenomena that drive prediction. To visualize this, the collapsed weights are summarized in Figure 6b where each bar contains individual feature weights within a category. Features with a negative weight are stacked to the left of zero and those with a positive weight are stacked to the right. A red dot represents the sum of all weights, and the length of the bar gives the total absolute value of the combined weights within a category. Ranking the categories based on the sum of their weight suggests that some of categories play a more prominent role than others. These include properties related to transmembrane segments (hydrophobicity and TM size/count), codon pair score, loop length, and overall length.
To explore the role of each category in prediction, we calculate the performance of the model by either only using weights from features within a single category or excluding weights from within a single category. We assess predictive performances by calculating ROC curves across genes and expression trials from NYCOMPS dataset for each case (Figure 6c,d). Feature categories that are sufficient for prediction will have an AUC > 0.5 when used alone (Figure 6c) and those necessary for the model will show an AUC < 0.5 when excluded from prediction (Figure 6d). Notably, when only considering all genes independent of condition, most individual categories cannot predict expression (i.e. AUC with 95% CI straddling 0.5) (Figure 6c, red). A notable exception is tRNA Adaptation Index, where the per gene AUC is slightly higher than the performance of the model. However, since the model demonstrates predictive performance at 95% confidence across all experimental conditions (Figure 2e), feature categories that are sufficient for prediction must also perform across these conditions. In this case, tRNA Adaptation Index performs poorly against a number of the experimental subsets, so it is not sufficient for prediction. On the other hand, while Codon Pair Score alone shows predictive power across experimental conditions, when excluded from the model, the category alone cannot explain the model (only a single 95% confidence interval crossing AUC = 50, Figure 6d).
Importantly, no feature category independently drives the predictor as excluding each individually does not significantly affect the overall predictive performance, except for Length/pI (isoelectric point) (Figure 6d). Sequence length composes the majority of the weight within this category and is one of the highest weighted features in the model. This is consistent with the anecdotal observation that larger membrane proteins are typically harder to express. However, this parameter alone would not be useful for predicting within a smaller subset, like a single protein family, where there is little variance in length (e.g. Figure 3,4). One might develop a predictor that was better for a given protein family under certain conditions with a subset of the entire features considered here; yet this would require a priori knowledge of the system, i.e. which sequence features were truly most important, and would preclude broad generalizability as shown for the predictor presented here.
A coarser view of the weights is a comparison of the features derived from either protein or nucleotide sequence. The summed weight for protein features is around zero, whereas for nucleotide features the summed weight is slightly positive suggesting that in comparison these features may be more important to the predictive performance of the model (Figure 6e). Comparison of the predictive performance of the two subsets of weights shows that the nucleotide features alone can give similar performance to the full model (Figure 6f). It is important to note that this does not suggest that protein features are not important for membrane protein expression. Instead, within the context of the trained model, nucleotide features are critical for predictive performance for a large dataset such as NYCOMPS. This finding corroborates growing literature that the nucleotide sequence holds significant determinants of biological processes14,20,37–39.
Sequence optimization for expression
The predictive performance of the model implies that the parameters defined here provide a coarse approximation of the fitness landscape for membrane protein expression. Attempting to optimize a single feature by modifying the sequence will likely affect the resulting score and expression due to changes in other features. Fluman, et al,20 provides an illustrative experiment. They hypothesized that altering the number of Shine-Dalgarno (SD)-like sites in the coding sequence of a membrane protein would affect expression. To test this, silent mutations were engineered within the first 200 bases of three proteins (genes ygdD, brnQ, and ybjJ from E. coli) to increase the number of SD-like sites with the goal of improving expression. Expression trials demonstrated that only one of the proteins (BrnQ) had improved expression of folded protein (Figure 6g). However, the resulting changes in the SVMrank score correspond with the changes in measured expression as the model considers changes to other nucleotide features. Capture of the outcomes in this small test case by the model illustrates the utility of integrating the contribution of the numerous parameters involved in membrane protein biogenesis.
Discussion
The model developed here provides a robust predictor for membrane protein expression. The current best practice for characterization of a membrane protein target begins with the identification and testing of many homologs or variants for expression. The model presented here will allow for prioritization of targets to test for expression thereby making more optimal use of limited human and material resources. In addition, due to the scale of NYCOMPS, protein families that were extensively tested provide ranges of scores (e.g. Figure 5c) where the score of an individual target directly indicates its likelihood of expression relative to known experimental results. We provide the current predictor as web service where scores can be calculated, and the method, associated data, and suggested analyses are publically available to catalyze progress across the community (http://clemonslab.caltech.edu).
The generalizability of the model is remarkable despite several known limitations. Using data from a single study for training precludes including certain variables that empirically influence expression such as the parameters corresponding to fusion tags and the context of the protein in an expression plasmid, e.g. the 5’ untranslated region, for which there was no variation in the Daley, Rapp, et al. dataset. Moreover, using a simple proof-of-concept linear model allowed for a straightforward and robust predictor; however, intrinsically it cannot be directly related to the biological underpinnings. While we can extract some biological inference, a linear combination of sequence parameters does not explicitly reflect the reality of physical limits for host cells. To some extent, constraint information is likely encoded in the complex architecture of the underlying sequence space (e.g. through the genetic code, TM prediction, RNA secondary structure analyses). Future statistical models that improve on these limitations will likely hone predictive power and more intricately characterize the interplay of variables that underlie membrane protein expression in E. coli and other systems.
The ability to predict phenotypic results from sequence based statistical models opens a variety of opportunities. As done here, this requires a careful understanding of the system and its underlying biological processes enumerated in a multitude of individual variables that impact the stated goal of the predictor, in this case enriching protein expression. As new variables related to expression are discovered, future work will incorporate these leading to improved models. Based on these results, expanding to new expression hosts such as eukaryotes seems entirely feasible, although a number of new parameters may need to be considered, e.g. glycosylation sites and trafficking signals. Moreover, the ability to score proteins for expressibility creates new avenues to computationally engineer membrane proteins for expression. The proof-of-concept described here required significant work to compile data from genomics consortia and the literature in a readily useable form. As data becomes more easily accessible, broadly leveraging diverse experimental outcomes to decode sequence-level information, an extension of this work, is anticipated.
Author Contributions
S.M.S., A.M., and W.M.C. conceived the project. S.M.S. developed the approach. S.M.S., A.M., and N.J. compiled sequence and experimental data. N.J. created code to demonstrate feasibility. S.M.S. performed all published calculations. S.M.S. and WMC wrote the manuscript.
Author Information
The authors declare no competing financial interests.
Correspondence and requests for materials should be addressed to clemons{at}caltech.edu.
Materials and Methods
Sequence mapping & retrieval and feature calculation was performed in Python 2.746 using BioPython47 and NumPy48; executed and consolidated using Bash (shell) scripts; and parallelized where possible using GNU Parallel49. Data analysis and presentation was done in R50 within RStudio51 using magrittr52, plyr53, dplyr54, asbio55, and datamart56 for data handling; ggplot257, ggbeeswarm58, GGally59, gridExtra60, cowplot61, scales62, viridis63, and RColorBrewer64,65 for plotting; multidplyr66 with parallel50 and foreach67 with iterators68 and doMC69/doParallel70 for parallel processing; and roxygen271 for code organization and documentation as well as other packages as referenced.
1. Collection of data necessary for learning and evaluation
E. coli Sequence Data
The nucleotide sequences from 19 were deduced by reconstructing forward and reverse primers (i.e. ∼20 nucleotide stretches) from each gene in Colibri (based on EcoGene 11), the original source cited and later verified these primers against an archival spreadsheet provided directly by Daniel Daley (personal communication). To account for sequence and annotation corrections made to the genome after Daley, Rapp, et al.'s work, these primers were directly used to reconstruct the amplified product from the most recent release of the E. coli K-12 substr. MG1655 genome72 (EcoGene 3.0; U00096.3). Although Daniel Daley mentioned that raw reads from the Sanger sequencing runs may be available within his own archives, it was decided that the additional labor to retrieve this data and parse these reads would not significantly impact the model. The deduced nucleotide sequences were verified against the protein lengths given in Table S1 from 19. The plasmid library tested in 20 was provided by Daniel Daley, and those sequences are taken to be the same.
E. coli Training Data
The preliminary results using the mean-normalized activities echoed the findings of 19 that these do not correlate with sequence features either in the univariate sense (many simple linear regressions, Table S119) or a multivariate sense (multiple linear regression, data not shown). This is presumably due to the loss of information regarding variability in expression level for given genes or due to the increase in variance of the normalized quantity (See Methods 4a) due to the normalization and averaging procedure. Daniel Daley and Mikaela Rapp provided spreadsheets of the outcomes from the 96-well plates used for their expression trials and sent scanned copies of the readouts from archival laboratory notebooks where the digital data was no longer accessible (personal communication). Those proteins without a reliable C-terminal localization (as given in the original work) or without raw expression outcomes were not included in further analyses.
Similarly, Nir Fluman also provided spreadsheets of the raw data from the set of three expression trials performed in 20.
New York Consortium on Membrane Protein Structure (NYCOMPS) Data
Brian Kloss, Marco Punta, and Edda Kloppman provided a dataset of actions performed by the NYCOMPS center including expression outcomes in various conditions.2,3 The protein sequences were mapped to NCBI GenInfo Identifier (GI) numbers either via the Entrez system73 or the Uniprot mapping service74. Each GI number was mapped to its nucleotide sequence via a combination of the NCBI Elink mapping service and the “coded_by” or “locus” tags of Coding Sequence (CDS) features within GenBank entries. Though a custom script was created, a script from Peter Cock on the BioPython listserv to do the same task via a similar mapping mechanism was found75. To confirm all the sequences, the TargetTrack76 XML file was parsed for the internal NYCOMPS identifiers and compared for sequence identity to those that had been mapped using the custom script; 20 (less than 1%) of the sequences had minor inconsistencies and were manually replaced.
Archaeal transporters Data
The locus tags (“Gene Name” in Table 1) were mapped directly to the sequences and retrieved from NCBI25. Pikyee Ma and Margarida Archer clarified questions regarding their work to inform the analysis.
GPCR Expression Data
Nucleotide sequences were collected by mapping the protein identifiers given in Table 1 from 27 to protein GIs via the Uniprot mapping service74 and subsequently to their nucleotide sequences via the custom mapping script described above (see NYCOMPS). The sequence length and pI were validated against those provided. Renaud Wagner assisted in providing the nucleotide sequences for genes whose listed identifiers were unable to be mapped and/or did not pass the validation criteria as the MeProtDB (the sponsor of the GPCR project) does not provide a public archive.
Helicobacter pylori Data
Nucleotide sequences were retrieved by mapping the locus tags given in Supplemental Table 1 from 28 to locus tags in the Jan 31, 2014 release of the H. pylori 26695 genome (AE000511.1). To verify sequence accuracy, sequences whose molecular weight matched that given by the authors were accepted. Those that did not match, in addition to the one locus tag that could not be mapped to the Jan 31, 2014 genome version, were retrieved from the Apr 9, 2015 release of the genome (NC_000915.1). Both releases are derived from the original sequencing project77. After this curation, all mapped sequences matched the reported molecular weight.
In this data set, expression tests were performed in three expression vectors and scored as 1, 2, or 3. Two vectors were scored via two methods. For these two vectors, the two scores were averaged to give a single number for the condition making them comparable to the third vector while yielding 2 additional thresholds (1.5 and 2.5) result in the 5 total curves shown (Figure S1b).
Mycobacterium tuberculosis Data
The authors note using TubercuList through GenoList78, therefore, nucleotide sequences were retrieved from the archival website based on the original sequencing project79. The sequences corresponding to the identifiers and outcomes in Table 1 from 26 were validated against the provided molecular weight.
Secondary Transporter Data
GI Numbers given in Table 1 from 30 were matched to their CDS entries using the custom mapping script described above (see NYCOMPS). Only expression in E. coli with IPTG-inducible vectors was considered.
2 Calculation of sequence features
Based on experimental analyses and anecdotal evidence, approximately 105 different protein and nucleotide sequence features thought to be relevant to expression were identified and calculated for each protein using custom code together with published software (codonW82, tAI83, NUPACK40, Vienna RNA84, Codon Pair Bias85, Disembl18, and RONN86). Relative metrics (e.g. codon adaptation index) are calculated with respect to the E. coli K-12 substr. MG165572 quantity. The octanol-water partitioning87, GES hydrophobicity88, ΔG of insertion16 scales were employed as well. Transmembrane segment topology was predicted using Phobius Constrained for the training data and Phobius for all other datasets24. Two RNA secondary structure metrics were prompted in part by 14 Several features were obtained by averaging per-site metrics (e.g. per-residue RONN3.2 disorder predictions) in windows of a specified length. Windowed tAI metrics are calculated over all 30 base windows (not solely over 10 codon windows). Table S1 lists a description of each feature. Features are calculated solely from a gene of interest excluding portions of the ORFs such as linkers and tags derived from the plasmid backbone employed (future work will explore contributions of these elements).
3 Preparation for model learning
Calculated sequence features for the membrane proteins in the E. coli dataset as well as raw activity measurements, i.e. each 96-well plate, were loaded into R. As is best practice in using Support Vector Machines, each feature was “centered” and “scaled” where the mean value of a given feature was subtracted from each data point and then divided by the standard deviation of that feature using preprocess89. As is standard practice, the resulting set was then culled for those features of near zero-variance, over 95% correlation (Pearson's r), and linear dependence (nearZeroVar, findCorrelation, findLinearCombos)89. In particular this procedure removed extraneous degrees of freedom during the training process which carry little to no additional information with respect to the feature space and which may over represent certain redundant features. Features and outcomes for each list (“query”) were written into the SVMlight format using a modified svmlight. write90.
The final features were calculated for each sequence in the test datasets, prepared for scoring by “centering” and “scaling” by the training set parameters via preprocess89, and then written into SVMlight format again using a modified svmlight.write.
4 Model selection, training, and evaluation using SVMrank
a. At the most basic level, our predictive model is a learned function that maps the parameter space (consisting of nucleotide and protein sequence features) to a response variable (expression level) through a set of governing weights (w1, w2, …, wN). Depending on how the response variable is defined, these weights can be approximated using several different methods. As such, defining a response variable that is reflective of the available training data is key to selecting an appropriate learning algorithm.
The quantitative 96-well plate results19 that comprise our training data do not offer an absolute expression metric valid over all plates—the top expressing proteins in one plate would not necessarily be the best expressing within another. As such, this problem is suited for preference-ranking methods. As a ranking problem, the response variable is the ordinal rank for each protein derived from its overexpression relative to the other members of the same plate of expression trials. In other words, the aim is to rank highly expressed proteins (based on numerous trials) at higher scores than lower expressed proteins by fitting against the order of expression outcomes from each constituent 96-well plate.
b. As the first work of this kind, the aim was to employ the simplest framework necessary taking in account the considerations above. The method chosen computes all valid pairwise classifications (i.e. within a single plate) transforming the original ranking problem into a binary classification problem. The algorithm outputs a score for each input by minimizing the number of swapped pairs thereby maximizing Kendall's τ91. For example, consider the following data generated via context A (XA,1,YA,1),(XA,2,YA,2) and B (XB,1,YB,1),(XB,2,YB,2) where observed response follows as index i, i.e. Yn < Yn+1. Binary classifier f (Xi,Xj) gives a score of 1 if an input pair matches its ordering criteria and — 1 if not, i.e. Yi < Yj:
Free parameters describing f are calculated such that those calculated orderings f(XA,1),f(XA,2) …; f(XB,1),f(XB,2) … most closely agree (overall Kendall's τ) with the observed ordering Yn, Yn+1,…. In this sense, f is a pairwise Learning to Rank method.
Within this class of models, a linear preference-ranking Support Vector Machine was employed92. To be clear, as an algorithm a preference-ranking SVM operates similarly to the canonical SVM binary classifier. In the traditional binary classification problem, a linear SVM seeks the maximally separating hyper-plane in the feature space between two classes, where class membership is determined by which side of the hyper-plane points reside. For some n linear separable training examples D = { (xi)| xi∊ ℝd}n and two classes yi ∊ {—1,1}, a linear SVM seeks a mapping from the d-dimensional feature space ℝd → {-1,1} by finding two maximally separated hyperplanes w · x − b = 1 and w ·x − b = − 1 with constraints that w · xi − b ≥ 1 for all xi with yi ∊ {1} and w · xi − b ≤ − 1 for all xi with yi ∊ {−1}. The feature weights correspond to the vector w, which is the vector perpendicular to the separating hyperplanes, and are computable in O(n log n) implemented as part of the SVMrank software package, though in O(n2)22. See 92 for an in-depth, technical discussion.
c. In a soft-margin SVM where training data is not linearly separable, a tradeoff between misclassified inputs and separation from the hyperplane must be specified. This parameter C was found by training models against raw data from Daley, Rapp, et al. with a grid of candidate Cvalues (2n ∀ n∊ [−5, 5]) and then evaluated against the raw “folded protein” measurements from Fluman, et al. The final model was chosen by selecting that with the lowest error from the process above (C= 25). To be clear, the final model is composed solely of a single weight for each feature; the tradeoff parameter C is only part of the training process.
Qualitatively, such a preference-ranking method constructs a model that ranks groups of proteins with higher expression level higher than other groups with lower expression value. In comparison to methods such as linear regression and binary classification, this approach is more robust and less affected by the inherent stochasticity of the training data.
5 Quantitative Assessment of Predictive Performance
In generating a predictive model, one aims to enrich for positive outcomes while ensuring they do not come at the cost of increased false positive diagnoses. This is formalized in Receiver Operating Characteristic (ROC) theory (for a primer see 23), where the true positive rate is plotted against the false positive rate for all classification thresholds (score cutoffs in the ranked list). In this framework, the overall ability of the model to resolve positive from negative outcomes is evaluated by analyzing the area under an ROC curve (AUC) where AUCperfect=100% and AUCrandom=50% (percentage signs are omitted throughout the text and figures). All ROCs are calculated through pROC93 using the analytic Delong method for AUC confidence intervals43. Bootstrapped AUC CIs (N = 106) were precise to 4 decimal places suggesting that analytic CIs are valid for the NYCOMPS dataset.
With several of our datasets, no definitive standard or clear-cut classification for positive expression exists. However, the aim is to show and test all reasonable classification thresholds of positive expression for each dataset in order to evaluate predictive performance as follows:
Training data
The outcomes are quantitative (activity level), so each ROC is calculated by normalizing within each dataset to the standard well subject to the discussion in 4a above (LepB for PhoA, and InvLepB for GFP) (examples in Figure 1D) for each possible threshold, i.e. each normalized expression value with each AUC plotted in Figure 1e. 95% confidence intervals of Spearman's ρ are given by 106 iterations of a bias-corrected and accelerated (BCa) bootstrap of the data (Figure 1a,c)42.
Large-scale
ROCs were calculated for each of the expression classes (Figure 2e). Regardless of the split, predictive performance is noted. The binwidth for the histogram was determined using the Freedman-Diaconis rule, and scores outside the plotted range comprising < 0.6% of the density were implicitly hidden.
Small-scale
Classes can be defined in many different ways. To be principled about the matter, ROCs for each possible cutoff are presented based on definitions from each publication (Figure 3b,d,f, Figure S1b,d,f). See Methods 1 for any necessary details about outcome classifications for each dataset.
6 Feature Weights
Weights for the learned SVM are pulled directly from the model file produced by SVMlight and are given in Table S1 after normalizing to the mean value.
7 Forward Predictions
Data collection
We selected several genomes for comparison as shown in Figure 4, Figure S2a, and Table S3. Coding sequences of membrane proteins from human and mouse genomes were gathered by mapping Uniprot identifiers of proteins noted to have at least one transmembrane segment by Uniprot74 to Ensembl (release 82) coding sequences94 via Biomart.95 C. elegans coding sequences were similarly mapped via Uniprot but to WormBase coding sequences96 also via Biomart. S. cerevisiae strain S288C coding sequences97 were retrieved from the Saccharomyces Genome Database. P. pastoris strain GS115 coding sequences98 were retrieved from the DOE Joint Genome Institute (JGI) Genome Portal99. Those sequences without predicted24 TMs were excluded from subsequent analyses. Microbial sequences were gathered via a custom, in-house database populated with data compiled primarily from Pfam36, DOE JGI Integrated Microbial Genomes100, and the Microbial Genome Database101.
Feature calculation
Because of the incredible number of sequences, we did not calculate the features derived from the most computationally expensive calculation (whole sequence mRNA pairing probability). Since predictive performance on the NYCOMPS dataset is slightly smaller, but not significantly different at 95% confidence, in the absence of these features (Table S2), the forward predictions are still valid. For future experiments, these features can be calculated for the subset of targets of interest.
Parameter space similarity
As a first approximation of the similarity of the ∼90 dimensional sequence parameter space between two groupings, features were compared pairwise via the following metric. Let fi and gi represent the true distributions for a given feature i between two groups of interest. The distribution overlap, i.e. shared area, Δi is formalized as
ranging from 0, for entirely distinct distributions, to 1 for entirely identical distributions.
As written fi and gi are probability densities, they need to be approximated before calculating Δi and are done so via kernel density estimates (KDE) of the observed samples using a nonparametric, locally adaptive method allowing for variable bandwidth smoothing implemented in LocFit102 (adpen=2σ2) providing
. The distribution overlap Δi is evaluated over a grid of 213 equally spaced points over the range of fi and gi.
8 Availability
All analysis is documented in a series of R notebooks105 available openly at github.com/clemlab/ml-ecoli-svmrank. These notebooks provide fully executable instructions for the reproduction of the analyses and the generation of figures and statistics in this study. The ranking engine is available as a web service at clemonslab.caltech.edu. Additional code is available upon request.
Acknowledgements
We thank Daniel Daley and Thomas Miller's group for discussion, Yaser Abu-Mostafa and Yisong Yue for guidance regarding machine learning, Niles Pierce for providing NUPACK source code40, and Welison Floriano and Naveed Near-Ansari for maintaining local computing resources. We thank James Bowie, Michiel Niesen, Stephen Marshall, Thomas Miller, Reid van Lehn, and Tom Rapoport for critical reading of the manuscript. Models and analyses are possible thanks to raw experimental data provided by Daniel Daley and Mikaela Rapp19; Nir Fluman20; Edda Kloppmann, Brian Kloss, and Marco Punta from NYCOMPS2,3; Pikyee Ma25; Renaud Wagner27; and Florent Bernaudat31. We acknowledge funding from an NIH Pioneer Award to WMC (5DP1GM105385); a Benjamin M. Rosen graduate fellowship, a NIH/NRSA training grant (5T32GM07616), and a NSF Graduate Research fellowship to SMS; and an Arthur A. Noyes Summer Undergraduate Research Fellowship to NJ. Computational time was provided by Stephen Mayo and Douglas Rees. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-105357541.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.
- 30.↵
- 31.↵
- 32.
- 33.
- 34.
- 35.↵
- 36.↵
- 37.↵
- 38.
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.↵
- 105.↵