RT Journal Article SR Electronic T1 DeepRC: Immune repertoire classification with attention-based deep massive multiple instance learning JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.04.12.038158 DO 10.1101/2020.04.12.038158 A1 Michael Widrich A1 Bernhard Schäfl A1 Milena Pavlović A1 Geir Kjetil Sandve A1 Sepp Hochreiter A1 Victor Greiff A1 Günter Klambauer YR 2020 UL http://biorxiv.org/content/early/2020/04/13/2020.04.12.038158.abstract AB High-throughput immunosequencing allows reconstructing the immune repertoire of an individual, which is a unique opportunity for new immunotherapies, immunodiagnostics, and vaccine design. Since immune repertoires are shaped by past and current immune events, such as infection and disease, and thus record an individual’s state of health, immune repertoire sequencing data may enable the prediction of health and disease using machine learning. However, finding the connections between an individual’s repertoire and the individual’s disease class, with potentially hundreds of thousands to millions of short sequences per individual, poses a difficult and unique challenge for machine learning methods. In this work, we present our method DeepRC that combines a Deep Learning architecture with attentionbased multiple instance learning. To validate that DeepRC accurately predicts an individual’s disease class based on its immune repertoire and determines the associated class-specific sequence motifs, we applied DeepRC in four large-scale experiments encompassing ground-truth simulated as well as real-world virus infection data. We demonstrate that DeepRC outperforms all tested methods with respect to predictive performance and enables the extraction of those sequence motifs that are connected to a given disease class.Competing Interest StatementThe authors have declared no competing interest.