TY - JOUR T1 - Predicting Residues Involved in Anti-DNA Autoantibodies with Limited Neural Networks JF - bioRxiv DO - 10.1101/2020.08.06.240101 SP - 2020.08.06.240101 AU - Rachel St.Clair AU - Michael Teti AU - Mirjana Pavlovic AU - William Hahn AU - Elan Barenholtz Y1 - 2020/01/01 UR - http://biorxiv.org/content/early/2020/08/06/2020.08.06.240101.abstract N2 - Computer-aided rational vaccine design (RVD) and synthetic pharmacology are rapidly developing fields that leverage existing datasets for developing compounds of interest. Computational proteomics utilizes algorithms and models to probe proteins for functional prediction. A potentially strong target for such a computational approach is autoimmune antibodies which are the result of broken tolerance in the immune system where it cannot distinguish “self” from “non-self” resulting in attack of its own structures (proteins and DNA, mainly). The information on structure, function and pathogenicity of autoantibodies may assist in engineering RVD against autoimmune diseases. Current computational approaches exploit large datasets curated with extensive domain knowledge, most of which include the need for many computational resources and have been applied indirectly to problems of interest for DNA, RNA, and monomer protein binding. Here, we present a novel method for discovering potential binding sites. We employed long short-term memory (LSTM) models trained on FASTA primary sequences directly to predict protein binding in DNA-binding hydrolytic antibodies (abzymes). We also employed CNN models applied to the same dataset. While the CNN model outperformed the LSTM on the primary task of binding prediction, analysis of internal model representations of both models showed that the LSTM models highlighted sub-sequences that were more strongly correlated with sites known to be involved in binding. These results demonstrate that analysis of internal processes of recurrent neural network models may serve as a powerful tool for primary sequence analysis.Competing Interest StatementThe authors have declared no competing interest. ER -