Protein contact map prediction using multiple sequence alignment dropout and consistency learning for sequences with fewer homologs

Xuyang Liu; Lei Jin; Shenghua Gao; Suwen Zhao

doi:10.1101/2021.05.12.443740

Abstract

The prediction of protein contact map needs enough normalized number of effective sequence (Nf) in multiple sequence alignment (MSA). When Nf is small, the predicted contact maps are often not satisfactory. To solve this problem, we randomly selected a small part of sequence homologs for proteins with large Nf to generate MSAs with small Nf. From these MSAs, input features were generated and were passed through a consistency learning network, aiming to get the same results when using the features generated from the MSA with large Nf. The results showed that this method effectively improves the prediction accuracy of protein contact maps with small Nf.