Introduction

Template-based modeling (TBM) is by far the only reliable approach to protein 3D structure prediction1,2. With rapid accumulation of experimental structures in the Protein Data Bank (PDB)3, TBM plays an increasingly important role in protein structure determination and structure-based function annotation studies as more and more protein structures become available as putative templates. In fact, recent studies showed that the current PDB library has already approached completeness in structural space4,5. Nevertheless, only around 2/3 of targets can have the templates reliably identified by current threading (or fold-recognition) methods in genome-wide protein structure prediction6,7,8,9. A critical issue for protein template identification is the correct construction and scoring of the target-to-template alignments of amino acid sequences.

Early efforts on protein sequence alignments can be traced back to the 1970s when Needleman and Wunsch pioneered a global alignment algorithm for protein sequences via dynamic programming recursion10. Smith and Waterman extended the algorithm for identifying highly conserved subsequence motifs by local alignments11. However, dynamic programming is too slow for scanning large-scale sequence databases. Altschul, Lipman and coworkers developed FASTA and BLAST based on a heuristic search and extension of common sequence patterns (words) among the compared sequences, which significantly increases the speed of sequence alignment and database search12,13. Later, the authors extended BLAST to PSI-BLAST which improves the sensitivity of sequence-sequence alignments14. The key idea of PSI-BLAST is to generate multiple sequence alignments (MSAs) by iterative sequence database search, where a sequence profile in terms of a position-specific scoring matrix (PSSM) is constructed from the MSAs and used to enhance the accuracy of sequence alignment by sequence-profile comparisons.

The idea of sequence profiles has revolutionized the sequence alignment search and template-based protein structure prediction15,16. A variety of profile alignment based threading methods have been recently developed for efficient protein homologous template identification and structure prediction17,18,19,20,21; most of the methods rely on PSI-BLAST for MSA search and profile generations. The multiple sequence alignments and sequence profiles can also be created by hidden Markov models (HMMs), which are represented by a chain of match and insert/deletion nodes with the MSAs corresponding to the paths with the highest probabilities given by the product of amino acid emission and insertion/deletion probabilities22. Typical HMM-based threading algorithms include SAM23 and HHsearch24, where SAM is based on HMM-sequence alignments and HHsearch on HMM-HMM alignments.

In addition to sequence profiles, a variety of structure features have been recently introduced to improve the alignment accuracy. For example, secondary structure predictions from neural network training25 are used by almost all contemporary threading/alignment programs17,19,24. Other structural characteristics, including residue-residue contacts, backbone torsion angles26, solvation27 and residue depth19, are often exploited in protein threading approaches17,21,28,29.

Despite the extensive effort made in developing sequence alignment algorithms, little is known about the relative performance of the methods. In particular, a number of critical questions remain to be addressed; for example, what are the quantitative differences of sequence- versus profile-based or local- versus global-alignment methods on close- and distant-homology detections? As the two most often-used approaches, what are the strength and weakness of PSSM- and HMM-based profiles in sequence alignments? How much can we possibly gain in fold-recognition by developing the best structural feature prediction methods? The answer to these questions is of essential importance for guiding the uses within the biology community, as well as for leveraging future method development studies in the field.

As community-wide platforms, the CASP30,31, CAFASP32 and Livebench33 experiments provided valuable opportunities for critical assessments of various threading methods. However, one limit of the assessments is due to the fact that predictors in the experiments usually exploit different template libraries, the construction of which can have important impact on the final modeling results. Meanwhile, the number of targets involved in the experiments is limited (~100) and unbalanced; most of the targets are closely homologous to the experimental structures, which are easy to be detected34. The problems have been partly addressed by several of previous studies that compared different sequence alignment methods on large sets of benchmark proteins35,36,37,38,39,40,41,42. For example, Park et al41 and Madera and Gough42 compared HMM- and PSSM-based profiles on the datasets collected from SCOP43 and found that HMM-based profiles can detect more homologous relationships than PSSM-based profiles. Dunbrack and coworkers35,37 examined different sequence alignment tools using structure alignments as the gold standard and found that sequence-profile alignments by PSI-BLAST are only slightly more accurate than sequence-sequence alignments by BLAST but PSI-BLAST achieves much longer alignments. Girshin and coworkers36 evaluated the alignment methods in multiple reference-dependent/independent and global/local modes and showed that different aspects of evaluation reveal different properties of the methods. Barton and coworkers38 developed a multiple-level benchmark suite to evaluate eight alignment methods and concluded that the majority of alignment improvements since 1985 were due to pair-score matrices rather than algorithmic refinements. Elofsson40 compared different sequence alignment and threading algorithms and found that the alignment difference among different methods occurs mainly in the region of 15–20% sequence identity where secondary structure prediction and PSI-BLAST profiles are the major driven force of alignment improvements.

Despite the valuable insights revealed, most of the benchmark studies focused on a limited set of traditional sequence alignment algorithms and were performed nearly a decade ago. Many recent developments, e.g. structural feature integrations and HMM-HMM alignments which are important for protein structure prediction, are yet to be assessed. Meanwhile, the testing datasets used in these studies were mostly collected from the SCOP library and largely belong to the easy homology category (which represents a similar problem in the CASP experiments mentioned above), while the performance of the methods on detecting hard distant-homology templates, which are more challenging to the field, needs to be appropriately examined.

In this work, we aim to develop a comprehensive and balanced experiment to systematically examine the strength and weakness of various up-to-date sequence alignment methods. Ten publicly available methods and ten in-house methods specially designed for concept testing, which constitute a representative set of various alignment/threading approaches, were installed on the local computer cluster. These methods are tested on a large set of 538 proteins consisting of a balanced category distribution of difficulty (i.e. including similar number of easy, medium and hard protein targets), based on a uniform set of template structure libraries. We conducted a detailed analysis on the benchmark results to address a series of critical questions in sequence alignment and template-based protein structure prediction, which aim to provide insightful guidance for biological use and future method developments. All alignments and modeling data in this study can be downloaded at http://zhanglab.ccmb.med.umich.edu/publicdata/benchmark1.

Results

Dataset and template library

All the sequence alignment programs are benchmarked on the same set of 538 non-redundant proteins randomly collected from the PDB library3. These proteins have a pair-wise sequence identity less than 30% and length ranging from 34 to 804 residues. Proteins with broken chains or missing residues were not included. The sequences were divided into three categories: Easy, Medium and Hard targets, based on the consensus confidence score of the meta-threading LOMETS program44, which consists of 9 protein threading programs (dPPAS, MUSTER, HHsearch-I, HHsearch-II, PPAS, PROSPECT, SAM, SPARKS and SP3). A target is defined as Easy if at least one strong template hit can be detected for the target by each program with the Z-score higher than the confidence cutoff; a target is defined as Hard if none of the threading programs has a strong template hit; otherwise, it is considered a Medium target. In total, the 538 proteins are selected to include a balanced category distribution of difficulty with 137 Easy, 177 Medium and 224 Hard targets. Here, we have put more focus on the challenging targets by arbitrarily increasing the number of Medium and Hard proteins in our benchmark protein set, although a naturally collected sample from the PDB would have a much lower portion of Medium/Hard cases. A list of all the 538 proteins, together with the classification, are provided in http://zhanglab.ccmb.med.umich.edu/publicdata/benchmark1/protein_types.txt.

The existence of template structures in the library is a precondition for template identification. To eliminate potential bias of the alignment algorithms from the template structure library, we constructed the libraries of all threading programs using the same sequence identity cutoff updated to the same time stamp (by Jan, 2013). In fact, the template libraries for NW-align, SW-align, BLAST, PSI-BLAST, PSA, PPA, PPAS, dPPAS, MUSTER, SAM, PRC, PROSPECT, SPARKS, SP3 and FFAS are generated from the same set of non-redundant PDB proteins with a pair-wise sequence identify cutoff 70% (see http://zhanglab.ccmb.med.umich.edu/library/). The libraries for HHsearch-I and HHsearch-II are downloaded from ftp://toolkit.lmb.uni-muenchen.de, which has also a sequence identity cutoff of 70%. The size of these two libraries is about the same. The programs of all the tested methods are described in METHODS.

Summary of performance by individual alignment methods

Table 1 presents a summary of the 3D structural models, which are built by copying the framework of the highest ranked and the best in the top ten scoring templates based on the alignments generated by different alignment programs. The quality of alignments is generally measured by the root-mean-square deviation (RMSD) of the models (Columns 4–5), where BLAST, PSI-BLAST, PRC, FFAS and HH- search programs have the lowest RMSD to the targets (~7–9 Å). However, the alignments by these programs tend to have a smaller number of residues aligned (i.e. lower alignment coverage, Columns 6–7), typically below 80%. Such short alignments can have a negative impact on the full-length structure construction by homology modeling since structure information is missed for a large portion of unaligned sequences. In fact, the full-length models by MODELLER45 have a very high RMSD (>20 Å) for all these local alignment methods (see values in parentheses). Here, the full-length models ware generated by the script model-default.py in the MODELLER package. The modeling results from MODELLER are deterministic in the sense that more runs do not change the quality result of the final models.

Table 1 Summary of template identification by different alignment methods

In Columns 2–3, we also list the result of the alignment models on TM-score, which is defined to combine the alignment accuracy and coverage as a unique score46:

where L is the length of target sequence, Lali the number of aligned residues and di the pair-wise distance of ith residue in the model and target after the optimal superposition. In this scoring function, the programs, which have a better balance of alignment coverage and RMSD, excel, including MUSTER, HHsearch and PPAS programs. The simple sequence-to-sequence based alignment algorithms generally have a lower TM-score.

Meanwhile, the local-to-local alignment based algorithms generally have a lower coverage and TM-score compared to the global-to-global alignment methods. A typical example is SW-align based on Smith-Waterman, which only identifies the highly conserved regions and has on average 56% residues aligned, while Needleman-Wunsch based NW-align uses the same parameter and scoring function but generates alignments with a much higher coverage (84.9%). Accordingly, the TM-score of NW-align is 21.1% higher than that of SW-align. The completeness of alignment searching also plays a role in final model determination. For instance, both BLAST and SW-align are local sequence alignments based on BLOSUM62 mutation scores. But BLAST searches are based on an incomplete heuristic word search algorithm, which has an average TM-score 7% lower than SW-align. BLAST is however 39 times faster than SW-align in our test.

Although TM-score aims to balance the accuracy and coverage of alignments, it still favors algorithms that have a higher coverage, since including additional residues in the alignments always has a positive contribution to TM-score according to Eq. 1, although the contribution is small if the added residues from templates are far away from the target. To examine the effect of such bias, we constructed full-length models of the targets based on the alignments, using the widely-used comparative modeling tool MODELLER45. Although TM-score is now normalized by the same length of the target sequence, the TM-score ranking of full-length models is largely consistent with that of the original alignments, except for some small but notable variations. Taking the top hits as an example, the original alignments by HHsearch-I have a lower TM-score than those by dPPAS (0.422 vs. 0.426) due to the low coverage of the sequences (76.3% vs. 81.9%). After full-length modeling, the TM-score of HHsearch-I becomes higher than that of dPPAS (0.444 vs. 0.438) and several other related algorithms (e.g. PPAS and SP3). Here, the more precise alignments by HHsearch-I in the aligned regions have probably introduced some restraint/guidance to the structure modeling of the unaligned regions, e.g. through bond-length and change connectivity, which have resulted in models of a higher overall TM-score.

Performance of sequence alignment programs in different target categories

The performance of different alignment programs varies with the difficulty of the targets, i.e. the evolutionary distance between target and template proteins. If we use the target structure as a probe to search through the PDB library by TM-align47, the average TM-scores of aligned regions of the best structural templates in the three categories of Easy, Medium and Hard are 0.779, 0.666 and 0.586, respectively, after excluding homologous templates with a sequence identity > 30%. This data on one hand sets up an upper-bar for template identifications by fold-recognition; on the other hand, it demonstrates that the target category as defined by the LOMETS prediction is largely consistent with the actual difficulty of the template identification for the targets.

In supplementary Tables S1, S2 and S3, we summarize the results of different programs on the Easy, Medium and Hard targets, respectively. Figure 1 is the histogram of the average TM-score achieved by different programs. As shown in Table S1, HHsearch programs generate the highest TM-score in the Easy targets. MUSTER and other structure-assisted alignment methods (dPPAS, SP3 etc) generally outperform the HHsearch programs in the Medium and Hard targets. This data demonstrates the usefulness of structure-based features in detecting the distant homologous templates.

Figure 1
figure 1

TM-score histogram of the top hits identified by different algorithms in Easy, Medium and Hard categories.

Specificity of alignment programs

Except for the accuracy of the template alignments (or sensitivity), the specificity of the alignments (i.e. the correlation of the scoring function and the accuracy of the final alignments) is another important measurement of the alignment algorithms, as this correlation essentially decides how the results can be used in the comparative structural modeling and function annotations.

In Figure 2, we present the TM-score data of the highest ranked alignment models versus the alignment scores by the 20 alignment programs. Here, we tried both the default alignment score of the programs and the Z-score (defined as the difference between the raw alignment score and average in units of standard deviation) and chose the one with the highest correlation to the TM-score of the final models to present in the plot. As expected, positive correlations are observed for all the alignment programs, with PPAS, SAM and MUSTER having the highest Pearson correlation coefficients (0.789, 0.787 and 0.782, respectively). The NW-align and BLAST programs have the lowest correlation coefficient because a number of targets have a high alignment score but with low quality (TM-score < 0.5), indicating a low specificity of the programs.

Figure 2
figure 2

TM-score of full-length models of 20 threading methods on 538 non-homologous proteins versus the alignment scores.

Easy, Medium and Hard targets are colored blue, green and red, respectively. PSI-BLAST, BLAST and PRC use bit score and others use z-score to score the alignments.

We also mark in Figure 2 an alignment score cut-off that minimizes the false positive rate, FPR = FP/(FP + TN) and the false negative rate, FNR = FN/(TP + FN), where a model of TM-score > 0.5 is defined as a positive hit that has the correct fold48. The score cut-offs, false positive and false negative rates are listed in Table 2. The programs with alignment score that are calibrated by the statistics of random samples, including PSI-BLAST, SAM and FFAS, have the lowest FPR + FNR values, i.e. the highest specificity, based on this measurement. Meanwhile, the Easy and Hard targets are clearly grouped in the right-up and left-bottom regions in Figure 2 for all programs, demonstrating the dependence of the performance of the alignment algorithms on the evolutionary distance of target and templates.

Table 2 Score cutoffs and false positive and negative rates of different programs

Profile-based alignments versus sequence-based alignments

Depending on whether the homologous sequences are included in the target-template alignments, the alignment methods can be grouped into the three general categories of sequence-to-sequence alignment (including NW-align, SW-align and BLAST), sequence-to-profile alignment (PSI-BLAST, SAM and PSA) and profile-to-profile alignment (PRC, HHsearch-I, HHsearch-II, PPA, PPAS, dPPAS, MUSTER, PROSPECT, SPARKS, SP3 and FFAS). Since the sequence profiles derived from multiple sequence alignment of protein families contain important information of conserved/diverged locations along the sequences, the profile-based alignments can generally generate more accurate target-template alignments than that made by single sequence-based alignments16,49.

Such insight is also observed in our data analysis. As shown in Table 1 (rows highlighted in bold), the average TM-score obtained by the sequence-to-profile based methods is 18.4% higher than the TM-score from the sequence-to-sequence based methods. Similarly, the TM-score from profile-to-profile alignment methods is 49.8% higher than that of sequence-to-sequence based methods. These increases in TM-score are not only due to the higher coverage of alignments (81.8% vs. 62.6%), but also the enhanced accuracy of alignments as the average RMSD is reduced slightly in the profile-profile methods from 10.4 to 10.2 Å. This tendency is also seen in Tables S1,S2,S3 where the targets were categorized into different groups of Easy, Medium and Hard, demonstrating that the profile-based alignments enhance both close and distant homology identifications.

Two types of sequence profiles, PSSM and HMM, are often exploited in various alignment methods. Given a MSA, the PSSM profile is designed to account for the estimated frequency of amino acids at each position, while the HMM profile accounts for both amino acid frequency and position-specific probabilities for insertion and deletion. Although the HMMs seem to contain additional gap information from MSAs, there is no obvious difference between the HMM-based (e.g. HHsearch-I and –II) and PSSM-based alignment algorithms (e.g. PPAS), in terms of the TM-score of the alignment models (Table 1). However, HMM-based methods did generate higher TM-scores than PSSM-based methods in the Easy targets (Table S1). Additionally, the HMM-based methods have generally a lower RMSD and lower coverage of alignments, indicating that the HMM method is more sensitive in detecting local structural motifs and scaffolds.

Meanwhile, there are a number of targets that have the correct templates identified by either HMM- or PSSM-based methods (but not both), demonstrating that these two types of methods are complementary to each other. This complementarity from multiple alignment algorithms is essential to the success of meta-server based structure modeling approaches44,50.

How much space is left for improvement by structural feature prediction?

The performance of profile alignments could be further improved by incorporating structural information. One example is secondary structure comparison, which has been used by almost all contemporary alignment/threading methods to guide the target-template alignments. As a quantitative test of the impact of secondary structure information on alignment accuracy, we developed two sequence profile-profile based methods, PPA and PPAS, where the only difference is that PPAS contains a secondary structure match in the scoring function but PPA doesn't (see Eqs. 3 and 4 in METHODS). As a result, the inclusion of secondary structure prediction increases the TM-score of PPA by 6.8%. MUSTER is another typical profile-profile alignment based algorithm that incorporates multiple composite structure features in the alignments, including secondary structure, residue depth, solvent accessibility and backbone torsion angle predictions. These features result in a TM-score increase of 9.6% compared to the PPA method, corresponding to a p-value < 10−14 in paired student t-test.

The performance of the structure feature assisted algorithms relies on the accuracy of structure feature predictions for the target sequence. In our test on the 538 non-homologous proteins, the average Q3 score (three structure states per residue overall accuracy) for PSSpred and PSI-pred is 83.1% and 80.7%, respectively; the mean absolute errors in ψ and φ angle predictions are 28° and 41°, respectively; and the Pearson correlation coefficient correlation between predicted and actual solvent accessibility is 0.678. Incorrect assignments of the structure features can compromise the performance of MUSTER. In fact, we observed a number of cases where the TM-score of the alignments by MUSTER, which considers additional structural features, is lower than that of PPAS.

In order to explore the potential of the alignment improvement obtained by considering structural features, we implemented MUSTER using the native structure features derived from the target structures, where the weighting parameters were re-optimized in a separate test of 100 proteins. As shown in Table 1, the average TM-score of the full-length models from MUSTER alignments showed a gradual increase from 0.449 to 0.511, when we exploited more native structure features from secondary structures (MUSTERSS), backbone torsion angle (MUSTERSS+BTA) and solvent accessibility (MUSTERSS+BTA+SA). This change corresponds to an overall increase of 13.8% in the average TM-score.

In Figure 3, we present an illustrative example from the PP7 bacteriophage coat protein (PDB ID: 2qudA), which has the secondary structures arranged as β1-β2-β3-β4-β5-β6-α1-α2 from the N- to C-terminals. The PSI-pred method has however mis-assigned most secondary structure elements in 12S-90M (see ‘*’ in Figure 3A), which resulted in the first two beta-strands (7T-11S and 15A-25T) in the target mis-aligned to the coiled regions in the template 1qbeB by MUSTER. This alignment has a TM-score = 0.607. When using the actual secondary structure assignment, the MUSTERSS program correctly matches the two beta strands of the target with the strands on the template. Based on the same template, the correction of the secondary structure comparison increases the TM-score of the model to 0.7 in this example.

Figure 3
figure 3

The illustration of template identifications for 2qudA.

(A) MUSTER with predicted secondary structure; (B) MUSTERSS with native secondary structure. The experimental structure and MUSTER models are shown in red and green cartoons, respectively and the first two beta-strands in (A) are in yellow on the template. The secondary structures are labeled as ‘C’ for coil, ‘E’ for strand and ‘H’ for helix, where ‘Pred’ and ‘Obs’ denotes the PSI-pred prediction and the native, respectively. ‘*’ in (A) marks the residues with mis-predicted secondary structure.

Despite the significant increase in alignment accuracy brought by the integration of structure features, the quality of the alignments using the best structure features from the native is still far from the best templates detected by structural alignments, i.e. the average TM-score by TM-align is 37.1% higher than that by MUSTERSS + BTA + SA (Table 1), which demonstrates considerable room for further alignment improvement. The gap is relatively small in Easy targets (7.4%) according to the data in Table S1, which indicates that the current state-of-the-art alignment methods generate nearly optimal alignments for close homology targets. But for the Medium and Hard targets, the gaps become highly significant, which correspond to a TM-score difference of 35.6% and 79.2%, respectively (Tables S2 and S3). Apparently, such gaps cannot be filled by solely improving the structure feature prediction methods and a completely different alignment system based on novel scoring and alignment schemes might be required.

Discussion

We developed a comprehensive experiment to systematically examine the strength and weakness of 20 representative sequence alignment methods for template-based protein structure prediction. The data analysis demonstrates the dominant advantage of profile-profile based alignment methods in protein template identification, which generates structural models with an average TM-score 26.5% higher than sequence-profile alignments and 49.8% higher than single sequence-sequence alignments. The superiority of profile-based alignments over sequence-based alignments was also observed in previous benchmark studies35,51.

The sequence profiles are typically constructed by PSI-BLAST and hidden Markov model (HMM) searches, where the former is usually specified by a position-specific scoring matrix (PSSM) and the latter by a trained chain model of matches and insertions/deletions. Our data analysis showed that there is no obvious difference between PSSM and HMM profiles in terms of overall average TM-score, although the HMM based alignments tend to obtain higher TM-scores in Easy targets and generate alignments of higher accuracy but with lower alignment coverage. This data seems in contradiction to the results by Park et al41 and Madera and Gough42 who concluded that the HMM based methods consistently outperform PSI-BLAST. We believe that the major reason for the seeming contradiction is due to the difference in sample preparations. In our testing dataset, we intentionally included more medium and hard targets to keep a balanced category distribution in difficulty and all templates with a sequence identity > 30% to the target were excluded. In the experiments by Park et al and Madera and Gough, however, the authors collected large-scale proteins from SCOP without intention to include more hard proteins. In addition, the authors used a sequence identity cutoff 40% for template filtering, which includes homologous templates with a sequence identity in 30–40% that are easy to detect by most threading methods. Therefore, it is anticipated that most of the test proteins in these two studies should correspond to Easy targets in our categorization and their conclusion on the HMM and PSSM profile comparisons is in fact consistent with our analysis on the Easy proteins (Table S1).

The profile-based sequence alignments can be considerably improved by the combination of structure feature predictions. For example, the program of MUSTER, which combines profile alignments with sequence-based secondary structure, residue depth, torsion angle and solvation predictions, has a 9.6% higher TM-score on average when compared to the profile-profile alignment algorithms. The performance of structure-assisted methods relies on the accuracy of the sequence-based structure feature predictions, which can be further improved by nearly 10.8% (or 13.8% in full-length models) if the native structure features extracted from experimental structures are exploited. Nevertheless, the latter is still far worse from the best possible templates as identified by structural alignment program TM-align, which uses the target structure as a probe to generate the optimal alignments. In the Easy, Medium and Hard categories, the TM-score by TM-align is 7.4%, 35.6% and 79.2% higher than that of MUSTERSS + BTA + SA. While filling such a big gap is one of the most urgent goals in template-based protein structure prediction, it apparently cannot be achieved solely by the improvement of structure feature prediction methods. New algorithms with completely novel scoring functions and alignment search engines are probably needed to attack the central problem of sequence alignment, which is essential to template-based protein structure prediction and function annotations.

Methods

Twenty threading/alignment methods, covering different categories of target-template alignment algorithms and possible to install at local computers, are benchmarked in this article. All algorithms without cited references are newly developed in house and first presented in this study.

1. NW-align

NW-align is a sequence-to-sequence alignment program constructed based on the standard Needleman-Wunsch dynamic programming algorithm10. The amino acid mutation matrix is from BLOSUM6252 with gap opening penalty = −11 and gap extension penalty = −1.

2. SW-align

SW-align is a sequence-to-sequence alignment program using a similar setting as NW-align but with dynamic programming based on the standard Smith-Waterman algorithm53. The major difference from NW-align is that the negative score values are set to zero and the alignment trace-back procedure starts from the highest scoring cell and ends with a cell of zero score in SW-align. This setting allows SW-align to identify subsequence motifs having the highest local sequence similarity. The source codes and the executables of both NW-align and SW-align are available at http://zhanglab.ccmb.med.umich.edu/NW-align.

3. BLAST

BLAST13 is a local sequence alignment tool based on a heuristic searching method, where high-scoring segment pairs (HSPs, or words) are first identified by gapless comparisons. The final alignments are constructed by extension and connection of the HSP regions. The heuristic algorithm in BLAST is often suboptimal but much faster than the optimal dynamic programming algorithms.

4. PSI-BLAST

PSI-BLAST14 is a sequence-to-profile alignment program extended from BLAST which aims to increase the alignment sensitivity of distant homologous proteins by iterative MSA search. It first collects a list of close homologous sequences from a reference database (e.g. NCBI non-redundant sequence database, NR) by BLAST. A PSSM is then derived from the MSA of the sequence homologies, which is used to search against the reference database again to identify a newer set of homologous sequences. The procedure can be repeated a number of times until the PSSM profiles converge. In our test, PSI-BLAST was searched against NR database for 3 iterations using an E-value cutoff, which assesses the significance of the HSP score, below 0.001.

5. PSA

PSA is sequence-to-profile alignment algorithm based on the Needleman-Wunsch dynamic programming. The scoring function of the ith position in the query (q) aligned with the jth position in the template (t) is

where Fq(i, k) represents the frequency profile of kth amino acid at ith position of the query. Bt(k, j) denotes a BLOSUM mutation score between the amino acid k and jth residue of the template. The shift parameter is introduced to avoid the alignment of unrelated residues in the local regions. Parameters of shift (−0.01), gap opening (go, −8.6) and gap extension (ge, −0.9) penalties were optimized on the ProSup dataset54.

6. PPA

PPA is an in-house profile-profile alignment method on the Needleman-Wunsch algorithm. The scoring function is defined by

where Fq(i, k) and Lt(j, k) stand for the sequence frequency profile of query and the log-odds profile of template, respectively. To build the sequence profiles, the sequences are searched against the NR by 3 PSI-BLAST iterations, at an E-value cutoff 0.001. The Henikoff weighting scheme55 is then used to generate frequency or log-odds profiles. Similarly, the parameters of shift (−0.94), go (−6.8) and ge (−0.52), are optimized by trial and error using the ProSup dataset.

7. PPAS

PPAS is an in-house profile-profile alignment method that combines profile log-odds score and secondary structure comparison. The scoring function is defined by

where ScorePPA(i, j) is defined in Eq. 3, δ(Sq(i), St(j)) is the Kronecker delta function to assess the secondary structure match between target and template. Sq(i) is the secondary structure of the ith residue on the target predicted by PSSpred (http://zhanglab.ccmb.med.umich.edu/PSSpred) and St(j) is the secondary structure of the jth residue on the template structure assigned by DSSP. A position-specific gap penalty scheme is used in the alignment search, i.e. no gap is allowed inside the secondary structure regions, go and ge penalties apply to other regions and the ending gap-penalty is neglected. The four parameters C1 (0.65), shift (−0.96), go (−7.0) and ge (−0.54), are optimized for PPAS in a similar way as PPA.

8. dPPAS

dPPAS is an in-house profile-profile alignment program extended from PPAS. The only difference from PPAS is that a structure fragment depth profile is added in dPPAS to enhance the alignments, i.e.

where Fq(i, k) and Lt(j, k) are defined in Eq. 3. Lstr(j, k) is a frequency depth profile derived from a set of structural fragments that have similar depth as the fragment at jth position of the template17,19. Similarly, the parameters (C1 = 6.5, shift = −0.96, go = −7.0 and ge = −0.54) are optimized on the ProSup dataset.

9. MUSTER

MUSTER17 is a profile-profile based threading program which combines multiple sequence and structure matching information. In addition to the sequence profiles obtained by PSI-BLAST searches, the scoring function contains secondary structure match (SS), fragment depth profiles, solvent accessibility (SA), backbone torsion angles (BTA) and hydrophobic scoring matrix. The optimal alignment is generated by Needleman-Wunsch dynamic programming. Compared to dPPAS, MUSTER contains additional terms from SA, BTA and hydrophobic scoring matrix matches, whereby the weighting parameters are re-trained by a new dataset.

To further examine the potential of structure-assisted threading algorithms, we developed three variants of MUSTER programs, MUSTERSS, MUSTERSS + BTA and MUSTERSS + BTA + SA, which exploit the SS, BTA and SA features extracted from the experimental structures of the target. Similar to MUSTER, all parameters in these algorithms are optimized in a separate training set of 100 non-redundant proteins by maximizing the TM-score.

10. SAM

SAM56 is a hidden Markov model (HMM) based protein fold-recognition method. Starting from the PSI-BLAST search, SAM constructs a HMM profile based on the iterative MSA searches. The HMM profile is then used to search through the PDB library to identify structural templates. SAM can conduct both local and global alignment searches and we use the local alignment mode in this work.

11. PRC

PRC57 is a program for scoring and aligning profile HMMs. To run PRC, we first construct HMMs of both target and template sequences by SAM56. The HMM–HMM based alignments are then computed and ranked by PRC which is designed to find the Viterbi path that maximizes the sum of forward–backward odds scores.

12. HHsearch-I and HHsearch-II

HHsearch24 is a HMM-HMM based alignment program which combines the profile log-odds score and the secondary structure prediction in the Viterbi dynamic programming. We run two versions of HHsearch: HHsearch-I uses PSI-BLAST to start the MSA search for building the profile HMMs for target and template sequences, while HHsearch-II uses HHblits to construct the profile HMM for target sequences. HHblits uses a discretized-profile prefilter that can generate HMM profiles faster than PSI-BLAST58. The final query-template alignments are constructed by the same HHsearch program. Both HHsearch-I and HHsearch-II are in the local alignment mode.

13. PROSPECT

PROSPECT21 is a sequence profile-profile alignment algorithm assisted with a residue-level contact potential and SS predictions. A global optimization of target-template alignment is generated by the divide-and-conquer searching method.

14. SPARKS and SP3

Both SPARKS18 and SP319 were developed in Zhou Lab. In SPARKS, the authors exploit a sequence profile–profile alignment combined with a single-body statistical potential; in SP3, they use a residue depth dependent structure profile to replace the single-body potential used in SPARKS.

15. FFAS

FFAS59 is a sequence profile-profile based alignment program. It calculates the sequence profile by PSI-BLAST searching against the NR85s database with 5 iterations. A dot-product scoring function is then used to align two sequence profiles. The alignment score is finally translated into a statistical measure by comparing it with the distribution of scores obtained for pairs of unrelated proteins.