Abstract
Identification of patients with life-threatening diseases including leukemias or infections such as tuberculosis and COVID-19 is an important goal of precision medicine. We recently illustrated that leukemia patients are identified by machine learning (ML) based on their blood transcriptomes. However, there is an increasing divide between what is technically possible and what is allowed because of privacy legislation. To facilitate integration of any omics data from any data owner world-wide without violating privacy laws, we here introduce Swarm Learning (SL), a decentralized machine learning approach uniting edge computing, blockchain-based peer-to-peer networking and coordination as well as privacy protection without the need for a central coordinator thereby going beyond federated learning. Using more than 14,000 blood transcriptomes derived from over 100 individual studies with non-uniform distribution of cases and controls and significant study biases, we illustrate the feasibility of SL to develop disease classifiers based on distributed data for COVID-19, tuberculosis or leukemias that outperform those developed at individual sites. Still, SL completely protects local privacy regulations by design. We propose this approach to noticeably accelerate the introduction of precision medicine.
Competing Interest Statement
H.S., K.L.S, S.Ma., S.Mu., V.G., R.S., C.S., M.D., B.M, C.M.S., S.C., M.S.W, E.L.G are employees of Hewlett-Packard Enterprise. Hewlett Packard Enterprise developed the Swarm Learning Library in its entirety as described in this work and has submitted multiple associated patent applications. E.J.G.-B. received honoraria from AbbVie USA, Abbott CH, InflaRx GmbH, MSD Greece, XBiotech Inc. and Angelini Italy; independent educational grants from AbbVie, Abbott, Astellas Pharma Europe, AxisShield, bioMerieux Inc, InflaRx GmbH, and XBiotech Inc.