RT Journal Article SR Electronic T1 clusTCR: a Python interface for rapid clustering of large sets of CDR3 sequences JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.02.22.432291 DO 10.1101/2021.02.22.432291 A1 Sebastiaan Valkiers A1 Max Van Houcke A1 Kris Laukens A1 Pieter Meysman YR 2021 UL http://biorxiv.org/content/early/2021/02/23/2021.02.22.432291.abstract AB The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).Competing Interest StatementThe authors have declared no competing interest.