Abstract
Type VI secretion system effectors (T6SEs) are crucial for bacterial pathogenicity, making their accurate identification essential for understanding bacterial virulence mechanisms. This study analyzed the differences in amino acid composition of N-terminal signal sequences between T6SEs and non-T6SEs, uncovering distinct positional amino acid preferences in T6SEs. Using a combination of unsupervised and supervised analysis, we evaluated feature encoding methods and developed T6CNN, an ensemble model that integrates N-terminal signal sequences, evolutionary information, and pre-trained protein language features for T6SE prediction. T6CNN demonstrated outstanding performance in independent testing, outperforming existing tools with a 7.9% accuracy increase (to 0.953), a 13.2% sensitivity improvement (to 0.964), and a 6.6% specificity enhancement (to 0.951). The T6CNN model offers a reliable and accurate solution for T6SE prediction, with significant potential to advance research on bacterial pathogenicity.