Sebastiaan Valkiers1,2, Max Van Houcke1, Kris Laukens1,2, Pieter Meysman1,2. 1. Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium. 2. Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), University of Antwerp, Antwerp, Belgium.
Abstract
MOTIVATION: The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed ClusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences, without knowledge about their antigen specificity. RESULTS: Benchmarking comparisons revealed similar accuracy of ClusTCR as compared to other TCR clustering methods, as measured by cluster retention, purity and consistency. ClusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through ultra-efficient similarity searching and sequence hashing. AVAILABILITY: ClusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed ClusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences, without knowledge about their antigen specificity. RESULTS: Benchmarking comparisons revealed similar accuracy of ClusTCR as compared to other TCR clustering methods, as measured by cluster retention, purity and consistency. ClusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through ultra-efficient similarity searching and sequence hashing. AVAILABILITY: ClusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Hannah Kockelbergh; Shelley Evans; Tong Deng; Ella Clyne; Anna Kyriakidou; Andreas Economou; Kim Ngan Luu Hoang; Stephen Woodmansey; Andrew Foers; Anna Fowler; Elizabeth J Soilleux Journal: Diagnostics (Basel) Date: 2022-05-13
Authors: George Elias; Pieter Meysman; Esther Bartholomeus; Kris Laukens; Viggo Van Tendeloo; Benson Ogunjimi; Nicolas De Neuter; Nina Keersmaekers; Arvid Suls; Hilde Jansens; Aisha Souquette; Hans De Reu; Marie-Paule Emonds; Evelien Smits; Eva Lion; Paul G Thomas; Geert Mortier; Pierre Van Damme; Philippe Beutels Journal: Elife Date: 2022-01-25 Impact factor: 8.140