PT - JOURNAL ARTICLE AU - Armin Behjati AU - Fatemeh Zare-Mirakabad AU - Seyed Shahriar Arab AU - Abbas Nowzari-Dalini TI - Protein sequence profile prediction using ProtAlbert transformer AID - 10.1101/2021.09.23.461475 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.09.23.461475 4099 - http://biorxiv.org/content/early/2021/10/21/2021.09.23.461475.short 4100 - http://biorxiv.org/content/early/2021/10/21/2021.09.23.461475.full AB - Protein profiles have many applications in bioinformatics. To construct the profile from a protein sequence, the sequence is aligned with database. However, sometimes there are no similar sequences with the query. This paper proposes a method based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; therefore, we can benefit from using these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of the family from a single protein sequence. These assessments are performed on the CASP13 dataset to find representative heads for each of five protein characteristics. Then, these heads are investigated on one thermophilic and two mesophilic proteins as case studies. The results show the significant attention heads for protein family properties extracted from a single protein sequence. This analysis led us to propose an algorithm called PA_SPP for profile prediction using only a single protein sequence as input. In our algorithm, we apply the masked language modeling method of ProtAlbert. The results display high similarity between the predicted profiles and HSSP profiles.Competing Interest StatementThe authors have declared no competing interest.