Abstract
The classification of B cell lymphomas - mainly based on light microscopy evaluation by a pathologist - requires many years of training. Since the B cell receptor (BCR) of the lymphoma clonotype and the microenvironmental immune architecture are important features discriminating different lymphoma subsets, we asked whether BCR repertoire next-generation sequencing (NGS) of lymphoma-infiltrated tissues in conjunction with machine learning algorithms could have diagnostic utility in the subclassification of these cancers. We trained a random forest and a linear classifier via logistic regression based on patterns of clonal distribution, VDJ gene usage and physico-chemical properties of the top-n most frequently represented clonotypes in the BCR repertoires of 620 paradigmatic lymphomas - nodular lymphocyte predominant B cell lymphoma (NLPBL), diffuse large B cell lymphoma (DLBCL) and chronic lymphocytic leukemia (CLL) - as well as 291 control tissues. With regard to DLBCL and CLL, the models demonstrated optimal performance when utilizing only the most prevalent clonotype for classification, while in NLPBL - that has a dominant background of non-malignant bystander cells - a broader array of clonotypes enhanced model accuracy. Surprisingly, the straightforward logistic regression model performed best in this seemingly complex classification problem, suggesting linear separability in our chosen dimensions. It achieved a weighted F1-score of 0.84 on a test cohort including 125 cases from all three lymphoma entities and 58 healthy individuals. Together, we provide proof-of-concept that at least the 3 studied lymphoma entities can be differentiated from each other using BCR repertoire NGS on lymphoma-infiltrated tissues by a trained machine learning model.
Author Summary Lymphoma, a complex group of malignant blood cancers, poses a significant diagnostic challenge due to its diverse subtypes. Yet, precise classification is crucial for tailored treatment. In our research, we developed a machine learning algorithm and conducted comprehensive validation to discern distinct B cell lymphoma subtypes. We therefore leveraged B cell repertoires of lymphoma-infiltrated tissue, as ascertained through next-generation sequencing. Our data offers three key insights: We detail the creation and training of our machine learning algorithm, explaining how we selected features and designed the model. We demonstrate the algorithm’s diagnostic precision using sequencing data from a test-set of patients. Moreover, through a deep dive into the most distinguishing aspects of our algorithm, we unveil distinctive disease-related patterns present within the malignant B cell and its surrounding environment. This analysis showed that both the malignant lymphoma cell, but also healthy bystander immune cells contribute to the distinctive architecture that characterizes a specific lymphoma subtype. We hope our work will contribute towards creating tools to diagnose lymphoma more easily and accurately ultimately leading to better outcomes for patients with this type of cancer.
Competing Interest Statement
The authors have declared no competing interest.