| Literature DB >> 32668251 |
Cinque Soto1, Robin G Bombardi2, Morgan Kozhevnikov2, Robert S Sinkovits3, Elaine C Chen4, Andre Branchizio2, Nurgun Kose2, Samuel B Day2, Mark Pilkinton5, Madhusudan Gujral3, Simon Mallal6, James E Crowe7.
Abstract
The collection of T cell receptors (TCRs) generated by somatic recombination is large but unknown. We generate large TCR repertoire datasets as a resource to facilitate detailed studies of the role of TCR clonotypes and repertoires in health and disease. We estimate the size of individual human recombined and expressed TCRs by sequence analysis and determine the extent of sharing between individual repertoires. Our experiments reveal that each blood sample contains between 5 million and 21 million TCR clonotypes. Three individuals share 8% of TCRβ- or 11% of TCRα-chain clonotypes. Sorting by T cell phenotypes in four individuals shows that 5% of naive CD4+ and 3.5% of naive CD8+ subsets share their TCRβ clonotypes, whereas memory CD4+ and CD8+ subsets share 2.3% and 0.4% of their clonotypes, respectively. We identify the sequences of these shared TCR clonotypes that are of interest for studies of human T cell biology.Entities:
Keywords: CDR3; HLA; TCR; adaptive immunity; clonotypes; immune repertoire sequencing
Mesh:
Substances:
Year: 2020 PMID: 32668251 PMCID: PMC7433715 DOI: 10.1016/j.celrep.2020.107882
Source DB: PubMed Journal: Cell Rep Impact factor: 9.423
Research Subject Demographics
| Subject[ | Subject No. from Clinical Site | Gender | Age (years) |
|---|---|---|---|
| HIP1 | VVC[ | F | 47 |
| HIP2 | VVC 657 | M | 22 |
| HIP3 | VVC 1056 | M | 29 |
| HIP4 | VVC 1124 | M | 32 |
| HIP5 | VVC 1386 | F | 30 |
See also Tables S2 and S3.
All samples were collected by leukapheresis from healthy Caucasian subjects living in Nashville, Tennessee.
VVC, Vanderbilt Vaccine Center.
Figure 1.TCRβ V3J Clonotype Diversity for Subjects HIP1, HIP2, and HIP3
(A) Rarefaction curves for species richness of V3J clonotypes generated using the program RTK (Saary et al., 2017). HIP1 had an endpoint species richness value of 5.9 million V3J clonotypes (left panel), HIP2 had a value of 17.6 million V3J clonotypes (middle panel), and HIP3 had a value of 21.1 million V3J clonotypes (right panel). The endpoint estimates for species richness appear on each plot with a filled-in circle.
(B) Recon (Kaplinsky and Arnaout, 2016) estimates suggested about 38.9 million V3J clonotypes were not observed at this depth of sequencing for HIP1 (left panel), 134 million for HIP2 (middle panel), or 120 million for HIP3 (right panel). The observed values for clonotypes binned by their repeat frequency (clonotype group size) is represented by an open circle, and theoretical fits obtained using Recon are represented by an X. For clarity, only the first 25 clonotype group sizes are shown on the plot.
See also Figure S1.
Figure 2.Shared V3J Clonotypes for TCRβ Chains Belonging to Subjects HIP1, HIP2, and HIP3
(A) Shared V3J clonotypes for experimentally determined TCRβ chains.
(B) Shared V3DJ clonotypes for experimentally determined TCRβ chains.
(C) Shared V3DJ clonotypes for synthetic repertoires generated using the program IGoR (Marcou et al., 2018). The pairwise percentage overlaps were based on the average computed for 50,000 comparisons. The average and standard error of the mean (SEM) for the pairwise percentage overlaps were 0.2% (2.0 × 10−5) between simHIP1 and simHIP2, 0.3% (2.0 × 10−5) between simHIP1 and simHIP3, and 0.2% (1.0 × 10−5) between simHIP2 and simHIP3. The average (SEM) for the percentage overlaps among all three synthetic repertoires was 0.01% (4.0 × 10−6). Reducing the CDR3 lengths of the V3DJ clonotypes in (B) so that the maximum length is 19 amino acids did not change the value of the percentage overlaps.
(D) Histogram of overlap counts among all three synthetic repertoires, simHIP1, simHIP2, and simHIP3, for 50,000 comparisons. Ranking the overlap count among the three experimentally determined TCRβ repertoires (n = 89,831 shared V3DJ clonotypes) against those obtained from the three synthetic repertoires gave an estimated p = 0.002.
See also Figure S2.
Figure 3.TCRβ Repertoire Statistics for Subjects HIP1, HIP2, and HIP3 Using the mRNA or gDNA Sequencing Methods
(A) Boxplot showing CDR3 length distribution of repertoires obtained from the mRNA or gDNA sequencing methods. Sequencing from the mRNA method resulted in a median CDR3 length of 13 amino acids for HIP1 (n = 3,161,410 unique CDR3s), HIP2 (n = 5,104,666 unique CDR3s), and HIP3 (n = 9,182,164 unique CDR3s). Sequencing from the gDNA method resulted in median CDR3 lengths of 13 amino acids for HIP1 (n = 2,443,117 unique CDR3s), HIP2 (n = 9,967,538 unique CDR3s), and HIP3 (n = 9,018,345 unique CDR3s). Each box represents the interquartile range (IQR) from 25% to 75%. The median is represented by a line within each box, and the whiskers are each within 1.5 of the IQR. There was no statistical difference in the median values according to a Kruskal-Wallis test (p < 2.2E–16).
(B) Morisita-Horn indices for subject HIP1, HIP2, or HIP3 between mRNA and gDNA sequencing methods.
(C) Distributions of clonotype abundances for the mRNA and gDNA methods represented as the cumulative fraction of the repertoire versus the cumulative fraction of unique TCRβ V3J clonotypes. The clonotypes were ordered from largest to smallest based on the total number of unique somatic variants associated with each clonotype.
(D) TCRβ V3J clonotype overlap percentages for subject HIP1, HIP2, or HIP3 between mRNA and gDNA sequencing methods. Subsampling yielded median percentage overlaps of 6.8% ± 1.1% × 10−2, 5.6% ± 7.5% × 10−3 or 6.6% ± 5.5% × 10−3 for subjects HIP1, HIP2 or HIP3, respectively.
See also Figure S3.
Figure 4.Pairwise Percentage Overlaps of T Cell Subsets Based on gDNA Sequencing for Subject HIP2, HIP3, HIP4, or HIP5
(A) Pairwise percentage clonotype overlaps for naive or memory CD4+ T cell subsets.
(B) Pairwise percentage clonotype overlaps for naive or memory CD8+ T cell subsets.
See also Tables S3 and S4 and Figure S4.
Figure 5.TCRβ Clonotype Sharing for T Cell Subsets
The total number of shared clonotypes in each T cell subset belonging to subjects HIP2, HIP3, HIP4, and HIP5 was determined. The percentage overlaps for all five subjects were also determined in each T cell subset and are represented as white text in each figure.
(A) CD4+ naive cell subset.
(B) CD4+ memory cell subset.
(C) CD8+ naive subset.
(D) CD8+ memory cell subset.
See also Figure S5.
Figure 6.TCRβ Clonotypes from HIP Subjects Appearing in GenBank
TCRβ V3J clonotypes from HIP subjects were used to search against the entire GenBank database for possible matches.
(A) Exact matches were grouped either as patented sequences or into one of five categories that focused on the target: viral target, bacterial target, autoimmune target, cancer target, and other.
(B) Representative amino acid sequence alignments between the V region from GenBank and the V region from the HIP subject. In cases in which the sequence was missing in the framework region or regions, the closest-matching germline sequence was used to fill in the missing region. The filled-in portion of the sequence is highlighted by the thick gray line of the alignment.
See also Table S5.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| CD4+ T Cell Isolation Kit, Human | Miltenyi Biotec | Cat# 130–096-533 |
| CD8+ T Cell Isolation Kit, Human | Miltenyi Biotec | Cat# 130–096-495 |
| CD4+ Central Memory T Cell Isolation Kit, Human | Miltenyi Biotec | Cat# 130–094-302 |
| CD45RO MicroBeads, Human | Miltenyi Biotec | Cat# 130–046-001 |
| LS Columns | Miltenyi Biotec | Cat# 130–042-401 |
| Biological Samples | ||
| Human adult leukapheresis sample, HIP1 | Vanderbilt Vaccine Center, | VVC 1051 |
| Human adult leukapheresis sample, HIP2 | Vanderbilt Vaccine Center, | VVC 657 |
| Human adult leukapheresis sample, HIP3 | Vanderbilt Vaccine Center, | VVC 1056 |
| Human adult leukapheresis sample, HIP4 | Vanderbilt Vaccine Center, | VVC 1124 |
| Human adult leukapheresis sample, HIP5 | Vanderbilt Vaccine Center, | VVC 1386 |
| Critical Commercial Assays | ||
| Immunoseq TCRB Kit | Adaptive Biotechnologies | Cat# hsTCRB |
| TCRB Library and Sequencing Service | AbHelix LLC | Cat# AH91004 |
| QIAGEN QIAamp DNA Blood Maxi Kit | QIAGEN | Cat# 51192 |
| QIAGEN RNeasy Maxi Kit | QIAGEN | Cat# 75162 |
| HiSeq Rapid SBS Kit v2 (500 cycles) | Illumina | Cat# FC-402–4023 |
| HiSeq PE Rapid Cluster Kit v2 | Illumina | Cat# PE-402–4002 |
| HiSeq Rapid Duo cBot Sample Loading Kit | Illumina | Cat# CT-403–2001 |
| NextSeq 500/550 Mid Output Kit v2.5 (150 cycles) | Illumina | Cat# 20024904 |
| Qubit dsDNA HS Assay Kit | Thermofisher Scientific | Cat# Q32851 |
| High Sensitivity DNA Kit | Agilent | Cat# 5067–4626 |
| KAPA Library Quantification Kit | Roche | Cat# KK4835 |
| SuperScript IV First-Strand Synthesis System | ThermoFisher Scientific | Cat# 18091050 |
| Phusion High-Fidelity DNA Polymerase | ThermoFisher Scientific | Cat# F530S |
| Deposited Data | ||
| gDNA sequencing (Adaptive Biotechnologies) | This paper | |
| mRNA sequencing (AbHelix LLC) | This paper | SRA: PRJNA511481 |
| Recombinant DNA | ||
| HLA typing HIP1 | Institute for Immunology Murdoch University Western Australia | VVC 1051 |
| HLA typing HIP2 | Institute for Immunology Murdoch University Western Australia | VVC 657 |
| HLA typing HIP3 | Institute for Immunology Murdoch University Western Australia | VVC 1056 |
| HLA typing HIP4 | Institute for Immunology Murdoch University Western Australia | VVC 1124 |
| HLA typing HIP5 | Institute for Immunology Murdoch University Western Australia | VVC 1386 |
| Software and Algorithms | ||
| PyIR v1.0 | This paper and | |
| USEARCH v9.1 | ||
| FASTQC v.0.11.6 | ||
| IgBLAST v.1.9 | ||
| Recon v.2.1 | ||
| IGOR v1.1.0 | ||
| OLGA v1.1.0 | ||
| MiXCR v.2.1.10 | ||
| RTK v.0.93.1 | ||
| VDJTools v1.2.1 | ||
| Python v.2.7.12 | Python software foundation | |
| Numpy v.1.13.3 (distributed with Python) | Python software foundation | |
| Seaborn Python plotting module v.0.8.1 | ||
| MongoDB v.3.4.0 | MongoDB Inc. New York, NY, USA | |
| R statistical package v3.2.3 | R Core Team | |
| ORIGIN(Pro) 2018b | OriginLab Corporation, Northampton, MA, USA. | |
| Other | ||
| Processed sequence data, analyses, Python scripts and other resources related to the sequencing of human subjects HIP1, HIP2, HIP3, HIP4 and HIP5. | This paper | |
| gDNA (Adaptive Biotechnologies) | ||
| Synthetic repertoire sets created with IGoR | This paper | |
| GenBank (release 231) | ||
| IMGT Reference directory in FASTA format | ||