Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

Yuedong Yang; Jianzhao Gao; Jihua Wang; Rhys Heffernan; Jack Hanson; Kuldip Paliwal; Yaoqi Zhou

doi:10.1093/bib/bbw129

Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129.

Authors

Yuedong Yang¹, Jianzhao Gao², Jihua Wang³, Rhys Heffernan⁴, Jack Hanson⁴, Kuldip Paliwal⁴, Yaoqi Zhou^{1

3}

Affiliations

¹ Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD 4222, Australia.
² School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
³ Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.
⁴ Signal Processing Laboratory, Griffith University, Brisbane, 4122, Australia.

Abstract

Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computational Biology / methods*
Databases, Protein
Humans
Models, Theoretical*
Neural Networks, Computer*
Protein Structure, Secondary*
Proteins / chemistry*

Substances

Proteins