Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

NetTCR 2.2 - Improved TCR specificity predictions by combining pan- and peptide-specific training strategies, loss-scaling and integration of sequence similarity

View ORCID ProfileMathias Fynbo Jensen, Morten Nielsen
doi: https://doi.org/10.1101/2023.10.12.562001
Mathias Fynbo Jensen
1Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mathias Fynbo Jensen
Morten Nielsen
1Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, DTU, 2800 Kgs. Lyngby, Denmark
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: [email protected]
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The ability to predict binding between peptides presented by the Major Histocompatibility Complex (MHC) class I molecules and T-cell receptors (TCR) is of great interest in areas of vaccine development, cancer treatment and treatment of autoimmune diseases. However, the scarcity of paired-chain data, combined with the bias towards a few well-studied epitopes, has challenged the development of pan-specific machine-learning (ML) models with accurate predictive power towards peptides characterized by little or no TCR data. To deal with this, we here benefit from a larger paired-chain peptide-TCR dataset and explore different ML model architectures and training strategies to better deal with imbalanced data. We show that while simple changes to the architecture and training strategies results in greatly improved performance, particularly for peptides with little available data, predictions on unseen peptides remain challenging, especially for peptides distant to the training peptides. We also demonstrate that ML models can be used to detect potential outliers, and that the removal of such outliers from training further improves the overall performance. Furthermore, we show that a model combining the properties of pan-specific and peptide-specific models achieves improved performance, and that performance can be further improved by integrating similarity-based predictions, especially when a low false positive rate is desirable. Moreover, in the context of the IMMREP 2022 benchmark, this updated modeling framework archived state-of-the-art performance. Finally, we show that combining all these approaches results in acceptable predictive accuracy for peptides characterized with as little as 15 positive TCRs. This observation thus places great promise on rapidly expanding the peptide covering of the current models for predicting TCR specificity. The final NetTCR 2.2 models are available at https://github.com/mnielLab/NetTCR-2.2, and as a web server at https://services.healthtech.dtu.dk/services/NetTCR-2.2/.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • We detected a few minor factual errors in the method section, which we have corrected in this revision. The first error was that we wrongfully stated that our final dataset had 6358 unique TCRs, whereas it was in fact 6353 unique TCRs. The second error was that we stated that the maximum length of CDR1ꞵ was 5, where it was in fact 6. The last error was that we stated that we used a Levenshtein distance of at least 3 to discard similar peptides when swapping the TCRs to generate negatives. This should have been a Levenshtein greater than 3. We also added a brief mention of potential applications for our models to the introduction and discussion, and extended our conclusion to better cover all the findings. Finally, we changed Figure 3 and Supplementary Figure 1 to boxplots to better illustrate the differences in performance.

  • https://github.com/mnielLab/NetTCR-2.2

  • https://services.healthtech.dtu.dk/services/NetTCR-2.2/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted December 19, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
NetTCR 2.2 - Improved TCR specificity predictions by combining pan- and peptide-specific training strategies, loss-scaling and integration of sequence similarity
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
NetTCR 2.2 - Improved TCR specificity predictions by combining pan- and peptide-specific training strategies, loss-scaling and integration of sequence similarity
Mathias Fynbo Jensen, Morten Nielsen
bioRxiv 2023.10.12.562001; doi: https://doi.org/10.1101/2023.10.12.562001
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
NetTCR 2.2 - Improved TCR specificity predictions by combining pan- and peptide-specific training strategies, loss-scaling and integration of sequence similarity
Mathias Fynbo Jensen, Morten Nielsen
bioRxiv 2023.10.12.562001; doi: https://doi.org/10.1101/2023.10.12.562001

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (6022)
  • Biochemistry (13704)
  • Bioengineering (10434)
  • Bioinformatics (33152)
  • Biophysics (17100)
  • Cancer Biology (14172)
  • Cell Biology (20106)
  • Clinical Trials (138)
  • Developmental Biology (10868)
  • Ecology (16014)
  • Epidemiology (2067)
  • Evolutionary Biology (20343)
  • Genetics (13393)
  • Genomics (18633)
  • Immunology (13748)
  • Microbiology (32164)
  • Molecular Biology (13387)
  • Neuroscience (70067)
  • Paleontology (526)
  • Pathology (2189)
  • Pharmacology and Toxicology (3741)
  • Physiology (5861)
  • Plant Biology (12020)
  • Scientific Communication and Education (1814)
  • Synthetic Biology (3367)
  • Systems Biology (8166)
  • Zoology (1841)