Improved the Protein Complex Prediction with Protein Language Models

Bo Chen; Ziwei Xie; Jiezhong Qiu; Zhaofeng Ye; Jinbo Xu; Jie Tang

doi:10.1101/2022.09.15.508065

Abstract

AlphaFold-Multimer has greatly improved protein complex structure prediction, but its accuracy also depends on the quality of the multiple sequence alignment (MSA) formed by the interacting homologs (i.e., interologs) of the complex under prediction. Here we propose a novel method, denoted as ESMPair, that can identify interologs of a complex by making use of protein language models (PLMs). We show that ESMPair can generate better interologs than the default MSA generation method in AlphaFold-Multimer. Our method results in better complex structure prediction than AlphaFold-Multimer by a large margin (+10.7% in terms of the Top-5 best DockQ), especially when the predicted complex structures have low confidence. We further show that by combining several MSA generation methods, we may yield even better complex structure prediction accuracy than Alphafold-Multimer (+22% in terms of the Top-5 best DockQ). We systematically analyze the impact factors of our algorithm and find out the diversity of MSA of interologs significantly affects the prediction accuracy. Moreover, we show that ESMPair performs particularly well on complexes in eucaryotes.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

1. We rename our method as ESMPair, because our MSA pairing method heavily depends on the ESM-MSA-1b (MSA Transformer); 2. We reorganize the paper sections as Abstract, Introduction, Results, Methods, Conclusion, and Appendix following the nature styles; 3. We carefully redesign the drawing of the overview framework by adding more column attention estimation details, as shown in Fig.1 of the paper; 4. We add more case studies on 74 newly-released targets and one special unresolved case, as shown in Fig.2 of the paper.