Journal of Molecular Biology
Volume 292, Issue 2, 17 September 1999, Pages 195-202
Journal home page for Journal of Molecular Biology

Communication
Protein secondary structure prediction based on position-specific scoring matrices1

https://doi.org/10.1006/jmbi.1999.3091Get rights and content

Abstract

A two-stage neural network has been used to predict protein secondary structure based on the position specific scoring matrices generated by PSI-BLAST. Despite the simplicity and convenience of the approach used, the results are found to be superior to those produced by other methods, including the popular PHD method according to our own benchmarking results and the results from the recent Critical Assessment of Techniques for Protein Structure Prediction experiment (CASP3), where the method was evaluated by stringent blind testing. Using a new testing set based on a set of 187 unique folds, and three-way cross-validation based on structural similarity criteria rather than sequence similarity criteria used previously (no similar folds were present in both the testing and training sets) the method presented here (PSIPRED) achieved an average Q3score of between 76.5 % to 78.3 % depending on the precise definition of observed secondary structure used, which is the highest published score for any method to date. Given the success of the method in CASP3, it is reasonable to be confident that the evaluation presented here gives a fair indication of the performance of the method in general.

Section snippets

Method

The prediction method (illustrated in Figure 1) is split into three stages: generation of a sequence profile, prediction of initial secondary structure, and finally the filtering of the predicted structure.

Results

Figure 2(a) and (b) shows the distributions of Q3 scores and Sov3 scores (Rost et al., 1994) for the testing set of 187 protein chains. Note that the average Q3 score for these 187 proteins, calculated by chain, is found to be 76.0 % with a standard deviation of 7.8 %. The average Sov3 score was 73.5 % with a standard deviation of 12.7 %. Taken by residue (i.e. averaging with weighting by sequence length), the average Q3 score is 76.5 %. Using the simpler DSSP mapping, which results in a higher

Conclusions

At this stage it is not yet clear which factors contribute most to the success of the PSIPRED method, and work is currently underway to compare the results obtained from PSIPRED with those obtained from other methods, but using the same input profiles. There are three aspects of the PSI-BLAST program that no doubt contribute, perhaps equally, to the success of PSIPRED. Firstly the alignments produced by PSI-BLAST are based on pairwise local alignments. Previous work Frishman and Argos 1997,

Availability

The PSIPRED Web server, along with the software and test sets used here may be obtained electronically from the following address: http://globin.bio.warwick.ac.uk/psipred. Benner & Gerloff (1990)

Acknowledgements

This work was supported by The Royal Society.

References (30)

  • M.J.J.M Zvelebil et al.

    Prediction of protein secondary structure and active sites using the alignment of homologous sequences

    J. Mol. Biol

    (1987)
  • S.F Altschul et al.

    Gapped BLAST and PSI-BLASTa new generation of protein database search programs

    Nucl. Acids Res

    (1997)
  • S.A Benner et al.

    Patterns of divergence in homologous proteins as indicators of secondary and tertiary structurea prediction of the structure of the catalytic domain of protein kinases

    Advan. Enzyme Reg

    (1990)
  • J.U Bowie et al.

    A method to identify protein sequences that fold into a known three-dimensional structure

    Science

    (1991)
  • P.Y Chou et al.

    Conformational parameters for amino acids in helical, -sheet, and random coil regions calculated from proteins

    Biochemistry

    (1974)
  • Cited by (4693)

    View all citing articles on Scopus
    1

    Edited by G. Von Heijne

    View full text