Abstract
Protein alignment is a critical process in bioinformatics and molecular biology. Despite structure-based alignment methods being able to achieve desirable performance, only a very small number of structures are available among the vast of known protein sequences. Therefore, developing an efficient and effective sequence-based protein alignment method is of significant importance. In this study, we propose CLAlign, which is a structure-aware sequence-based protein alignment method by using contrastive learning. Experimental results show that CLAlign outperforms the state-of-the-art methods by at least 12.5% and 24.5% on two common benchmarks, Malidup and Malisam.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
yourh{at}nankai.edu.cn,
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.