Abstract
Summary Homology detection by sequence comparison is a typical first step in the study of protein function and evolution. Here, we describe a new homology detection tool, pLM-BLAST, that uses a modified Smith-Waterman algorithm for unsupervised comparison of single-sequence representations obtained from a protein language model (such as ProtT5) trained on millions of sequences. In our benchmarks, pLM-BLAST has shown the ability to detect homology between highly divergent proteins, demonstrating its applicability to tasks such as protein classification, domain annotation, and function prediction.
Availability and Implementation pLM-BLAST is available as a web server in the MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de/tools/plmblast), where it can be used to search precomputed databases. It is also available as a standalone tool to build custom databases and run batch searches (https://github.com/labstructbioinf/pLM-BLAST).
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
The link to the MPI Toolkit has been updated. Moreover, a missing citation of the knnProtT5 method has been added.