SUMMARY: High-throughput sequencing provides an opportunity to analyse the repertoire of antigen-specific receptors with an unprecedented breadth and depth. However, the quantity of raw data produced by this technology requires efficient ways to categorize and store the output for subsequent analysis. To this end, we have defined a simple five-item identifier that uniquely and unambiguously defines each TcR sequence. We then describe a novel application of finite-state automaton to map Illumina short-read sequence data for individual TcRs to their respective identifier. An extension of the standard algorithm is also described, which allows for the presence of single-base pair mismatches arising from sequencing error. The software package, named Decombinator, is tested first on a set of artificial in silico sequences and then on a set of published human TcR-β sequences. Decombinator assigned sequences at a rate more than two orders of magnitude faster than that achieved by classical pairwise alignment algorithms, and with a high degree of accuracy (>88%), even after introducing up to 1% error rates in the in silico sequences. Analysis of the published sequence dataset highlighted the strong V and J usage bias observed in the human peripheral blood repertoire, which seems to be unconnected to antigen exposure. The analysis also highlighted the enormous size of the available repertoire and the challenge of obtaining a comprehensive description for it. The Decombinator package will be a valuable tool for further in-depth analysis of the T-cell repertoire. AVAILABILITY AND IMPLEMENTATION: The Decombinator package is implemented in Python (v2.6) and is freely available at https://github.com/uclinfectionimmunity/Decombinator along with full documentation and examples of typical usage.
SUMMARY: High-throughput sequencing provides an opportunity to analyse the repertoire of antigen-specific receptors with an unprecedented breadth and depth. However, the quantity of raw data produced by this technology requires efficient ways to categorize and store the output for subsequent analysis. To this end, we have defined a simple five-item identifier that uniquely and unambiguously defines each TcR sequence. We then describe a novel application of finite-state automaton to map Illumina short-read sequence data for individual TcRs to their respective identifier. An extension of the standard algorithm is also described, which allows for the presence of single-base pair mismatches arising from sequencing error. The software package, named Decombinator, is tested first on a set of artificial in silico sequences and then on a set of published humanTcR-β sequences. Decombinator assigned sequences at a rate more than two orders of magnitude faster than that achieved by classical pairwise alignment algorithms, and with a high degree of accuracy (>88%), even after introducing up to 1% error rates in the in silico sequences. Analysis of the published sequence dataset highlighted the strong V and J usage bias observed in the human peripheral blood repertoire, which seems to be unconnected to antigen exposure. The analysis also highlighted the enormous size of the available repertoire and the challenge of obtaining a comprehensive description for it. The Decombinator package will be a valuable tool for further in-depth analysis of the T-cell repertoire. AVAILABILITY AND IMPLEMENTATION: The Decombinator package is implemented in Python (v2.6) and is freely available at https://github.com/uclinfectionimmunity/Decombinator along with full documentation and examples of typical usage.
Authors: David T Mulder; Etienne R Mahé; Mark Dowar; Youstina Hanna; Tiantian Li; Linh T Nguyen; Marcus O Butler; Naoto Hirano; Jan Delabie; Pamela S Ohashi; Trevor J Pugh Journal: Blood Adv Date: 2018-12-11
Authors: Dmitriy A Bolotin; Stanislav Poslavsky; Igor Mitrophanov; Mikhail Shugay; Ilgar Z Mamedov; Ekaterina V Putintseva; Dmitriy M Chudakov Journal: Nat Methods Date: 2015-05 Impact factor: 28.547
Authors: Jason A Vander Heiden; Gur Yaari; Mohamed Uduman; Joel N H Stern; Kevin C O'Connor; David A Hafler; Francois Vigneault; Steven H Kleinstein Journal: Bioinformatics Date: 2014-03-10 Impact factor: 6.937
Authors: Pierre Barennes; Valentin Quiniou; Mikhail Shugay; Evgeniy S Egorov; Alexey N Davydov; Dmitriy M Chudakov; Imran Uddin; Mazlina Ismail; Theres Oakes; Benny Chain; Anne Eugster; Karl Kashofer; Peter P Rainer; Samuel Darko; Amy Ransier; Daniel C Douek; David Klatzmann; Encarnita Mariotti-Ferrandiz Journal: Nat Biotechnol Date: 2020-09-07 Impact factor: 54.908