Abstract
AlphaFold 2 has revolutionised protein structure prediction but, like any new tool, its performance on specific classes of targets, especially those potentially under-represented in its training data, merits attention. Prompted by a highly confident prediction for a biologically meaningless, scrambled repeat sequence, we assessed AF2 performance on sequences comprised perfect repeats of random sequences of different lengths. AF2 frequently folds such sequences into beta-solenoids which, while ascribed high confidence, contain unusual and implausible features such as internally stacked and uncompensated charged residues. A number of sequences confidently predicted as beta-solenoids are predicted by other advanced methods as intrinsically disordered. The instability of some predictions is demonstrated by Molecular Dynamics. Importantly, other Deep Learning-based structure prediction tools predict different structures or beta-solenoids with much lower confidence suggesting that AF2 alone has an unreasonable tendency to predict confident but unrealistic beta-solenoids for perfect repeat sequences. The potential implications for structure prediction of natural (near-)perfect sequence repeat proteins are also explored.
Competing Interest Statement
The authors have declared no competing interest.