TY - JOUR T1 - A billion synthetic 3D-antibody-antigen complexes enable unconstrained machine-learning formalized investigation of antibody specificity prediction JF - bioRxiv DO - 10.1101/2021.07.06.451258 SP - 2021.07.06.451258 AU - Philippe A. Robert AU - Rahmad Akbar AU - Robert Frank AU - Milena Pavlović AU - Michael Widrich AU - Igor Snapkov AU - Maria Chernigovskaya AU - Lonneke Scheffer AU - Andrei Slabodkin AU - Brij Bhushan Mehta AU - Mai Ha Vu AU - Aurél Prósz AU - Krzysztof Abram AU - Alex Olar AU - Enkelejda Miho AU - Dag Trygve Tryslew Haug AU - Fridtjof Lund-Johansen AU - Sepp Hochreiter AU - Ingrid Hobæk Haff AU - Günter Klambauer AU - Geir K. Sandve AU - Victor Greiff Y1 - 2021/01/01 UR - http://biorxiv.org/content/early/2021/07/08/2021.07.06.451258.abstract N2 - Machine learning (ML) is a key technology to enable accurate prediction of antibody-antigen binding, a prerequisite for in silico vaccine and antibody design. Two orthogonal problems hinder the current application of ML to antibody-specificity prediction and the benchmarking thereof: (i) The lack of a unified formalized mapping of immunological antibody specificity prediction problems into ML notation and (ii) the unavailability of large-scale training datasets. Here, we developed the Absolut! software suite that allows the parameter-based unconstrained generation of synthetic lattice-based 3D-antibody-antigen binding structures with ground-truth access to conformational paratope, epitope, and affinity. We show that Absolut!-generated datasets recapitulate critical biological sequence and structural features that render antibody-antigen binding prediction challenging. To demonstrate the immediate, high-throughput, and large-scale applicability of Absolut!, we have created an online database of 1 billion antibody-antigen structures, the extension of which is only constrained by moderate computational resources. We translated immunological antibody specificity prediction problems into ML tasks and used our database to investigate paratope-epitope binding prediction accuracy as a function of structural information encoding, dataset size, and ML method, which is unfeasible with existing experimental data. Furthermore, we found that in silico investigated conditions, predicted to increase antibody specificity prediction accuracy, align with and extend conclusions drawn from experimental antibody-antigen structural data. In summary, the Absolut! framework enables the development and benchmarking of ML strategies for biotherapeutics discovery and design.Graphical abstract The software framework Absolut! enables (A,B) the generation of virtually arbitrarily large numbers of in silico 3D-antibody-antigen structures, (C,D) the formalization of antibody specificity as machine learning (ML) tasks as well as the exploration of ML strategies for paratope-epitope prediction.Highlights- Software framework Absolut! to generate an arbitrarily large number of in silico 3D-antibody-antigen structures- Generation of one billion in silico antigen-antibody structures- Immunological antibody specificity prediction problems formalized as machine learning tasks- Exploration of machine learning architectures for paratope-epitope interaction prediction accuracy as a function of neural network depth, dataset size, and sequence-structure encodingCompeting Interest StatementE.M. declares holding shares in aiNET GmbH. V.G. declares advisory board positions in aiNET GmbH and Enpicom B.V. VG is a consultant for Roche/Genentech. ER -