ATRAP - Accurate T cell Receptor Antigen Pairing through data-driven filtering of sequencing information from single-cells

Novel single-cell based technologies hold the promise of matching T cell receptor (TCR) sequences with their cognate peptide-MHC recognition motif in a high-throughput manner. Parallel capture of TCR transcripts and peptide-MHC is enabled through the use of reagents labeled with DNA barcodes. However, analysis and annotation of such single-cell sequencing (SCseq) data is challenged by dropout, random noise, and other technical artifacts that must be carefully handled in the downstream processing steps. We here propose a rational, data-driven method termed ATRAP (Accurate T cell Receptor Antigen Paring) to deal with these challenges, filtering away likely artifacts, and enable the generation of large sets of TCR-pMHC sequence data with a high degree of specificity and sensitivity, thus outputting the most likely pMHC target per T cell. We have validated this approach across 10 different virus-specific T cell responses in 16 healthy donors. Across these samples we have identified up to 1494 high-confident TCR-pMHC pairs derived from 4135 single-cells.

hashing to trace origin of a given cell to e.g., a given donor, sample, or time-point, 64 which is highly valuable in patient-studies. cells in a single GEM and it is proportional to the capture rate of cells introduced to 78 the system (Bloom, 2018;Zheng et al., 2017). The capture rate is determined by the 79 rate of pulsing cells relative to the rate of gel-beads. Thus, to include even low 80 frequency cell populations, the capture rate must be high at the expense of 81 introducing more multiplets. Contamination is particularly an issue when including 82 analytes such as pMHC multimers which may be dissolved in cell suspension We applied this algorithm to two datasets, each derived from screening PBMCs from 101 16 healthy donors for T cell recognition against common viruses. In total, we 102 evaluated TCR recognition against 10 different pMHC multimers, each labeled with 103 their unique barcode. We demonstrate that following the filtering steps described 104 here we can obtain a confident pairing of pMHC specificity and TCR sequence. This 105 strategy will open novel opportunities to evaluate the structural interplay and the 106 sequence-driven signatures of pMHC recognition at large scale. 107

108
Parallel capture of TCRɑβ sequences, peptide-MHC specificity 109 and sample origin from single-cells 110 To obtain single-cell-derived triad information on TCR sequence, pMHC specificity, 111 and sample origin; we stained peripheral blood mononuclear cells (PBMC) from a 112 total of 16 different healthy donors (Table 1). All samples were stained with the same 113 panel composed of 10 different viral-derived peptide-MHC (pMHC) multimers, each 114 labeled with a unique barcode for that specificity and a common fluorescent label 115 (allophycocyanin (APC)) ( Fig. 1) (Table 2). To serve as an experimental control for 116 the purity of the isolated T cells, we moreover stained the cells with three additional 117 viral-derived pMHC multimers bearing a different fluorochrome (phycoerythrin (PE)) 118 and labeled with their own unique DNA barcode (Supplementary Table 2). We sorted 119 only the APC-labeled pMHC multimer binding T cells (and hence deselected the PE-120 labeled T cells) and included these in the down-stream single-cell processing.  Based on raw, unfiltered data, we found 6,073 GEMs which contained all three 170 components, i.e. TCR, pMHC and sample hashing, corresponding to 40% of the 171 loaded cells (Fig. 2a). 716,069 GEMs only contained one or two of the components, 172 with the majority containing only the cell hashing barcode (n=677,502) and the 173 second largest share containing cell hashing as well as pMHC barcodes (n=37,277). 174 This number vastly exceeds the number of cells in the assay (15,700 cells loaded) 175 and indicates contamination from ambient barcodes in suspension. This is further 176 supported by the observation that the sample hashing UMI count was significantly 177 higher (p < 0.0005, Mann-Whitney U) in the 6,073 GEMs containing a TCR 178 compared to the GEMs void of TCR (Fig. 2b). 43,455 GEMs captured a DNA 179 barcode associated with the pMHC library and only 14% of these were completed 180 with TCR transcripts and sample hashing barcodes. In the GEMs containing a TCR, 181 84% were completed with all three components, i.e. included hashing and pMHC 182 barcodes, while less than 0.05% of these GEMs were void of both sample hashing 183 and pMHC barcodes. In the following, we will only consider the 6,073 GEMs 184 containing all three components, while taking into account that the high degree of 185 noise also affects these seemingly completely mapped GEMs. Comparison of distributions of UMI counts of sample hashing barcode between GEMs that 233 contain TCR transcripts (TCR-occupied GEMs) and GEMs that do not contain TCR 234 transcripts (TCR-void GEMs) (p < 0.0005, Mann-Whitney U). c) Matrix of the distribution of 235 pMHC singlets and multiplets across GEMs with TCRs either missing a chain, detected with 236 multiple chains, or with a single, unique αβ-pair. The counts are given for each field and 237 illustrated by a color. The lighter color represents higher counts. d) Scatterplot of all detected 238 pMHC barcodes (y-axis) within each of the 6073 GEMs (x-axis) that contain all three 239 components: TCR, pMHC and sample hashing. In each GEM the most abundant pMHC is 240 marked by a color, while the remaining pMHCs in the GEM are gray. The marker size 241 reports the UMI count of the given pMHC and the shape recounts whether the HLA allele of 242 the pMHC matches the HLA haplotype of the donor, which is deduced from sample hashing. 243 The fraction of HLA matches within the GEMs displaying a given specificity is annotated to 244 the right of the plot. The first colorbar indicates the type of TCR chain annotation; whether 245 the TCR has a unique αβ-pair, is missing a chain or consists of multiple chains. The data in Fig. 2d suggests that most of the captured T cells interact with several of 252 the screened pMHCs to a degree that exceeds the level expected from natural 253 cross-recognition. Therefore, it is reasonable to assume that a large proportion of 254 these multiplets are formed as a result of ambient pMHC leaking into GEMs. 255 A data-driven filtering approach 256 From these observations, it is clear that a substantial part of the data consists of 257 noise, i.e. GEMs with multiplets of pMHC and sample hashing, and that the data 258 must be filtered for proper interpretation. with the highest UMI count, similar to what is shown in Fig. 1d. If two pMHCs are equally 318 abundant in a GEM they are both colored. No marker means no detection of that pMHC in 319 that given GEM. b) The compiled distribution of UMI counts for each peptide (assigning 0 320 UMI when the pMHC is not detected in a GEM). The asterisk marks that a Wilcoxon test 321 showed that the UMI counts of TPR B0702 were on average higher than for VTE A0101 UMI 322 counts. c) The specificity concordance across the GEMs of clonotype 1. Concordance is 323 shown by a color gradient, i.e. the larger the fraction of GEMs supporting a given specificity 324 the darker the color. 325 Improving concordance between GEM and clonotype annotation based 326 on grid search on UMI features 327 To rationally filter data, an accuracy metric was defined, and optimized through the 328 filtering process. For all specificities belonging to clonotypes with an assigned 329 expected target, we calculated the overall accuracy as the proportion of GEMs 330 where highest abundance pMHC annotation corresponds to the expected target of 331 the clonotype. The raw unfiltered data yielded accuracy and average concordance 332 scores of 69.6% and 83.8%, respectively. Next, we set out to investigate how 333 different data driven UMI filters could improve these performance values, removing 334 noise and artifacts from the data. This removal would also reduce the number of 335 included observations, hence the performance of different thresholds for filtering the 336 data was evaluated based on a tradeoff between increased accuracy and discarded 337 number of GEMs.

339
We tested various thresholds on UMI count and UMI ratios, i.e. the ratio between the 340 most abundant and second most abundant UMI feature, for pMHC and TCRαβ 341 respectively. The optimal thresholds were chosen to maximize the weighted average 342 between accuracy and fraction of retained GEMs to favor increase in accuracy 343 above losing some GEMs. This filtering analysis resulted in optimal thresholds of 2 344 pMHC UMI counts and a ratio pMHC UMI counts between top one and two >1.The 345 latter results in removal of GEMs where two pMHC were equally abundant for low 346 UMI counts. The search did not result in thresholds imposing restrictions on neither 347 TCR UMI counts nor TCR UMI ratio, which underpins that the TCRs with a missing 348 chain as well as multiple chains also contribute to good performance. Imposing this 349 filter yielded 5,061 GEMs (83% of total), 2,233 clonotypes (91% of total), and 350 resulted in 96.4% accuracy, and a mean concordance of 93.6%. 351 Additional filters 352 Additional filters can be added to further clean the data. We compared how two 353 filters, integrated in the 10x Genomics software, Cellranger, performed in removing 354 potential noise from our data set (Supplementary Fig 2). The purpose of these filters 355 is to evaluate, with high confidence, whether a GEM has captured a cell: 356 "is cell" is defined based on the TCR transcript level in a given GEM and "is cell 357 (GEX)" is defined based on the full transcript level (10xGenomics, n.d. a). 358 Alternatively, viable cells are identified from the transcript data, independently of 359 Cellranger, based on mitochondrial load and a minimum and maximum gene count 360 per GEM. All three filterings are comparable (Supplementary Fig 2), and taken into 361 account in the further evaluations. It is worth noting that, while the filterings based on 362 the full transcript data might remove slightly more noise, the economic costs 363 associated could propose that this should only be applied when the transcript data is 364 required for additional purposes. Including only GEMs where the TCR-pMHC pair is observed more than once, i.e. 373 specificity multiplets, reduces the uncertainty described above. Below we investigate 374 the impact of imposing such filters. 375 Impacts of filtering 376 Evaluating filters by comparing TCR similarity across specificity 377 To objectively evaluate the performance impact of the presented filters, we define a 378 quantitative evaluation based on the hypothesis that T cells binding the same pMHC 379 (intra specificity) will share a higher sequence similarity compared to TCRs of 380 different specificities (inter specificity) (Fig. 4). Thus, filtering away artifacts should 381 increase intra-similarity while decreasing the inter-similarity. Here, the similarity 382 score between two TCRs was calculated from the summed score of the pairwise α-383 and β-chain similarities calculated using a kernel method described in (Shen,Wong,384 Xiao, Guo, & Smale, 2012) and applied in .

386
Based on this kernel similarity metric, the filters were tested individually and 387 cumulatively, i.e. each filter was added to the previous set of filters. The general 388 trend is that TCRs with the same specificity are more similar to each other than to 389 TCRs of different specificities, when computing the intra and inter similarities per 390 pMHC before and after filtering on the optimized UMI thresholds ( Fig. 4a-b). Before 391 filtering, nine out of 13 pMHCs displayed a higher mean intra-similarity than inter-392 similarity scores, whereas this number was 11 out 13 pMHCs when applying the UMI 393 thresholds. The outliers before filtering were GIL A0201, VLE A0201, CLG A0201, 394 and RPP B0702, while the outliers were reduced to VLE and RPP after filtering. 395 Generally, the similarity scores often have a wide, overlapping range between the 396 intra and inter categories. The three pMHCs that were deselected during sorting, GIL 397 A0201, GLC A0201, and NLV A0201, are only detected in a few TCR binding events. 398 To enhance the power of comparison, the intra and inter scores were pooled 399 respectively across the individual pMHCs ( Fig. 4c-d). The results demonstrate that 400 intra-similarity is significantly higher than inter-similarity at each filtering step, both 401 individually and combined. Moreover, we observe that the differences between intra-402 and inter-similarity appear to increase as filters are cumulatively added and fewer 403 observations are left (Fig. 4d). Particularly, the median inter-similarity score is 404 lowered, suggesting that the filtering steps predominantly removes false-positives. illustrate the distribution of paired similarity scores for each clonotype (containing both α-410 and β-chain). For each pMHC each clonotype is compared to the remaining clonotypes of 411 the same specificity (intra) and across specificities (inter). The count of compared clonotypes 412 is listed just to the right of the y-axis in both a) and b). c) Displays the pooled intra-and inter-413 scores across all peptides for each of the filtering methods: total (no filtering), optimal 414 threshold, matching HLA, hashing singlets, complete TCRs, specificity multiplets, "is cell" by 415 cell-flagging, "is cell" by cell-flagging when including GEX data, and viable cell from 416 analyzing GEX data. An asterisk marks filters where intra-similarity is significantly larger than 417 inter-similarity (Wilcoxon, α=0.05). d) Displays the pooled intra-and inter-scores across all 418 peptides for each of the filtering methods where each filtering is added cumulatively to the 419 previously listed above it. An asterisk marks filters where intra-similarity is significantly larger 420 than inter-similarity (Wilcoxon, α=0.05). The count of compared clonotypes is listed just to 421 the right of the y-axis in both c) and d). 422 423 Evaluating filters across selected performance metrics 424 To compare the effect of the filters, the similarity scores were converted to the 425 performance metric: AUC (area under the receiver operating characteristic (ROC) 426 curve). Here, intra specificity comparisons are regarded as true positive observations 427 and inter specificity comparisons as true negatives. Based on these performance 428 metric definitions, we quantify the effect of each filtering step (Fig. 5), and find that 429 the highest accuracy and highest average concordance is obtained by filtering on the 430 optimal threshold (95.3% and 90.6%), while the highest AUC is obtained from 431 filtering on specificity singlets (70.5%) (Fig. 5a). Expectantly, the accuracy and 432 average concordance increases when the filters are imposed cumulatively (Fig. 5b). 433 The accumulation of filters also results in drastic reduction of the GEMs, and it is 434 evident that one must carefully weigh out the need for specificity over sensitivity 435 when selecting the desired set of filters. 436 437 We conclude that the minimal filtering must include optimal threshold and matching total (raw, unfiltered data), optimal threshold obtained from grid search, matching HLA, 452 hashing singlets identified from Seurat HTO demultiplexing, complete TCRs with a unique 453 set of αand β-chain, specificity multiplets such that each TCR-pMHC pair must be observed 454 in two or more GEMs, is cell defined by 10x Genomics Cellranger, is cell (GEX) defined by 455 Cellranger where GEX data is included, and is viable cell defined by mitochondrial load and 456 gene counts. a) Presentation of the individual effect of each filter. b) Presentation of the 457 accumulated effects of the listed filters. 458 Inspecting the filtered data 459 To determine the impact of the filtering steps, we have compiled the binding 460 concordance for all clonotypes and applied three selected filtering steps: a) the raw, 461 unfiltered data, b) filtering on optimal UMI thresholds and matching HLA, and c) 462 additionally filtering on complete TCRs (Fig. 6). The raw, unfiltered data displays 463 many clonotypes where the most abundant pMHC in GEMs of a given clonotype are 464 dispersed across multiple of the screened pMHCs (Fig. 6a). When imposing the 465 recommended set of filters, optimal threshold and HLA match, the outliers are greatly 466 reduced, although not all low-concordance GEMs are removed (Fig. 6b). By 467 additionally filtering on complete TCRs even fewer outliers are left (Fig. 6c). Note 468 again that we have purposely deselected T cells specific for GIL A0201, GLC A0201, 469 and NLV A0201, explaining the few observations for these otherwise frequently 470 recognized epitopes.

472
Many of the remaining low concordance GEMs still suggest the improbable event of 473 cross-binding across HLA restriction. We suspect that these are artifacts that we 474 have not successfully removed. When the most strict filtering is imposed (Fig. 6c) 475 there are 56 GEMs (out of 2833) with a binding concordance of 0.5 or lower, which 476 will be referred to as outliers. 50 of those GEMs contain pMHC multiplets. 94% of the 477 multiplet outliers actually do contain the pMHC which defines the high-concordance 478 GEMs, however, at a lower UMI count. In the GEMs with multiple pMHC annotations, 479 the HLA is conserved across the pMHCs in 14% of the cases. In 68% of the cases 480 the HLAs are different, but still match the HLA haplotype of the donor given by the 481 cell hashing. Of the 56 outliers, the most dominant pMHCs are RVR A0301 (41%) 482 and TPR B0702 (27%). Prior to filtering the data, six clonotypes were identified 483 which were already registered in IEDB and VDJdb, five with matching pMHC and 484 one with a different annotation than in our observation (Fig. 2d). The five matching 485 clonotypes (9 GEMs) were successfully retained, while the mismatching clonotype (1 486 GEM) was filtered away.  (Fig. 7a), however the clonality of each 508 specificity is only available via single-cell data (Fig. 7b). Here ATRAP represents 509 data filtered by optimal UMI thresholds and matching HLA between pMHC and donor Our recommended approach of cleaning data with minimal elimination of GEMs is 591 obtained by two sets of filters: 1) the optimized data-driven UMI thresholds combined 592 with 2) information on matching HLA specificity (as obtained from donor-specific 593 hashing). Increasing filtering is naturally at the expense of the number of GEMs 594 which might reflect the trade-off between specificity and sensitivity of the assay. 595 However, any benchmarking or validation is made difficult without a golden standard. 596 Our best attempt at quantifying the impact of filtering is based on three metrics: are not expected to display cross-reactivity amongst the included pMHC multimers. 607 The optimal UMI thresholds are intended to remove observations deviating from the 608 expected target. The identified UMI thresholds are data specific and cannot be 609 universally applied, but must be fitted for individual experiments. The thresholds are 610 based on the assumption that contamination will predominantly exist at lower UMI 611 counts than actual binding events. This limits the sensitivity of the method in cases of 612 low-affinity low-frequency interactions which otherwise might be of great scientific 613 and clinical interest.

615
The binding concordance is a metric that highlights cross-reactive clonotypes. In  Fig 4 for gating strategy). Cells sorted from 762 individual samples were collected into the same tube (Fig 1b). The sorted cells were 763 centrifuged for 10 min at 390g and the buffer was removed. 764 DNA barcode-labeled MHC multimer stained cells on 10x platform 765 We utilize the 10x 5' v2 chemistry that allows the cell barcode to be appended at the  Effects of filtering were also evaluated through a comparison of TCR similarity. The 880 similarity score is based on the kernel similarity score underlying the TCRmatch 881 method between CDR3 sequences . This score can be 882 calculated for CDR3s of variable length, and takes a value between 0-1, with the 883 value of 1 representing identical pairs. As both the αand β-chain partake in the 884 pMHC interaction, TCRs will be compared based on the summed similarity between 885 the αand β-chains, and GEMs missing a chain will be excluded to avoid bias. Two 886 similarity scores are computed for each clonotype: an intra score and an inter score. 887 The intra score is based on the maximum similarity of the given clonotype to all other 888 clonotypes sharing its pMHC target (intra specificity). The inter score is based on the 889 maximum similarity of the given clonotype to an equal sized set of clonotypes 890 specific to other pMHC targets (inter specificity). The computation is done peptide-891 wise, such that clonotypes with maximum concordance for a given peptide will, for 892 that peptide, be included in an intra similarity score, but for another peptide be 893 included in an inter similarity score. Clonotypes consisting of GEMs causing 894 diverging specificities were limited to the expected target pMHC or, if non-existing, to 895 the specificity of highest concordance to avoid potential overlaps from "cross-896 reactive" clonotypes in the computation.

898
The similarity difference between intra and inter specificity clonotypes was tested for 899 the hypothesis that intra similarity is greater than inter similarity (Wilcoxon, α=0.05). 900 The similarity test was performed on all filtering methods described in the paper.