| Literature DB >> 32853330 |
Thomas Peacock1,2, James M Heather3, Tahel Ronel1,4, Benny Chain1,2.
Abstract
MOTIVATION: Analysis of the T-cell receptor repertoire is rapidly entering the general toolbox used by researchers interested in cellular immunity. The annotation of T-cell receptors (TCRs) from raw sequence data poses specific challenges, which arise from the fact that TCRs are not germline encoded, and because of the stochastic nature of the generating process.Entities:
Mesh:
Substances:
Year: 2021 PMID: 32853330 PMCID: PMC8098023 DOI: 10.1093/bioinformatics/btaa758
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Pseudocode outlining the main functionality of the Collapsinator script. TCR data in the Decombinator format is read into the program and initially grouped by barcode. Each of these groups undergo pairwise comparison, whereby the barcode (bci) and the most frequent TCR sequence (TCRi) of group i is compared to the barcode (bcj) and the most frequent TCR sequence (TCRj) of group j. If barcodes bci and bcj are similar relative to the barcode threshold (th_bc), and sequences TCRi and TCRj are similar relative to the sequence threshold (th_tcr), then groups i and j are merged. The merged groups are here referred to as clusters. Similarity measures are taken as the Levenshtein distance for barcodes, and a percentage-based Levenshtein distance for TCR sequences (Levenshtein distance weighted by length of sequence). The two thresholds are user-configurable. Once every group has been clustered, the TCR identifying classifier (V gene, J gene, no. of V deletions, no. of J deletions, insert sequence) of each TCR in the biological sample is output to file, accompanied by the number of times that TCR was found in the sample (TCR count) and the mean cluster size (BC count) associated with that TCR