| Literature DB >> 36159868 |
Mao-Jan Lin1,2, Yu-Chun Lin3, Nae-Chyun Chen2, Allen Chilun Luo3, Sheng-Kai Lai4, Chia-Lang Hsu3,5,6, Jacob Shujui Hsu3, Chien-Yu Chen7, Wei-Shiung Yang3,4,8,9, Pei-Lung Chen1,3,4,8,9.
Abstract
Adaptive immune receptor repertoire (AIRR) is encoded by T cell receptor (TR) and immunoglobulin (IG) genes. Profiling these germline genes encoding AIRR (abbreviated as gAIRR) is important in understanding adaptive immune responses but is challenging due to the high genetic complexity. Our gAIRR Suite comprises three modules. gAIRR-seq, a probe capture-based targeted sequencing pipeline, profiles gAIRR from individual DNA samples. gAIRR-call and gAIRR-annotate call alleles from gAIRR-seq reads and annotate whole-genome assemblies, respectively. We gAIRR-seqed TRV and TRJ of seven Genome in a Bottle (GIAB) DNA samples with 100% accuracy and discovered novel alleles. We also gAIRR-seqed and gAIRR-called the TR and IG genes of a subject from both the peripheral blood mononuclear cells (PBMC) and oral mucosal cells. The calling results from these two cell types have a high concordance (99% for all known gAIRR alleles). We gAIRR-annotated 36 genomes to unearth 325 novel TRV alleles and 29 novel TRJ alleles. We could further profile the flanking sequences, including the recombination signal sequence (RSS). We validated two structural variants for HG002 and uncovered substantial differences of gAIRR genes in references GRCh37 and GRCh38. gAIRR Suite serves as a resource to sequence, analyze, and validate germline TR and IG genes to study various immune-related phenotypes.Entities:
Keywords: adaptive immune receptor repertoire (AIRR); allele typing; germline genes encoding AIRR (gAIRR); immunogenomics; novel allele; targeted sequencing
Mesh:
Substances:
Year: 2022 PMID: 36159868 PMCID: PMC9496171 DOI: 10.3389/fimmu.2022.922513
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 8.786
Figure 1The gAIRR Suite pipelines. The gray arrows show the verification methods between the two pipelines when both gAIRR-seq reads and personal assembly are available. (Section 2.5). *: These database can also be generated by gAIRRseq + gAIRR-call.
Figure 2gAIRR-seq and gAIRR-call results. (A) Read depths sequenced with gAIRR-seq in TRV, TRJ and IGV regions using data from HG001. Columns without the “_200” suffix shows the average read-depth of a region. Columns with the “_200” suffix shows the read depth 200 bp away from the region boundaries. (B) gAIRR-call results using HG001 data. The results are sorted by minimum read-depth of the perfect matched reads. The dash line represents the adaptive threshold in gAIRR-call. All the alleles not annotated by gAIRR-annotate, colored in orange, are below the adaptive threshold and are regarded as nocalls by gAIRR-call. True known alleles are in blue and true novel alleles are in green. In the HG001 analysis all true alleles are successfully identified by gAIRR-call. Alleles with zero minimum read-depth are not included in the figure. (C) gAIRR-call concordance with manual inspected gAIRR-annotate results for HG001. For IGHV, IGKV, and IGLV regions, we only included the functional genes.
Figure 3Trio validation for gAIRR-call and gAIRR-annotate. We validated gAIRR-call using the HG002 and HG005 families and validated gAIRR-annotate using the HG00514, HG00733, and NA19240 families. (A) TR loci, including all genes in TRV and TRJ. (B) IG loci, including functional IGV genes. The genes with V(D)J recombination evidence and genes without parental information due to gene lost are classified as LimitedEvidence.
Figure 4gAIRR-annotate results. (A) Positions of HG001’s TRA and TRD alleles determined by gAIRR-annotate. The purple text indicates that the allele is novel with its edit-distance compared to the most similar base allele inside parentheses. The text color of the gene names are basically according to IMGT’s color menu for genes (5). (B) Positions of CHM13 (36) IGH alleles determined by gAIRR-annotated. The figure settings are the same as in (A). (C) TRV and (D) TRJ novel alleles found in HQ-12 samples (shown in yellow) and remaining 24 samples (shown in green) compared to the novel alleles updated in IMGT from v3.1.22 to v3.1.33.
Number of known and novel TR RSS in the gAIRR-called and gAIRR-annotated flanking sequences.
| TRV | TRJ | |||||||
|---|---|---|---|---|---|---|---|---|
| #known | #novel | #known | #novel | |||||
| sample | Func | P/O | Func | P/O | Func | P/O | Func | P/O |
| HG001 | 94 | 43 | 4 | 1 | 66 | 10 | 7 | 1 |
| HG002 | 95 | 43 | 6 | 1 | 65 | 10 | 9 | 1 |
| HG003 | 94 | 44 | 3 | 1 | 66 | 10 | 6 | 1 |
| HG004 | 91 | 43 | 7 | 1 | 65 | 10 | 8 | 1 |
| HG005 | 93 | 44 | 8 | 2 | 67 | 10 | 7 | 1 |
| HG006 | 96 | 46 | 6 | 1 | 67 | 10 | 8 | 1 |
| HG007 | 95 | 44 | 7 | 2 | 67 | 10 | 5 | 1 |
| Primary cell sample | 95 | 43 | 6 | 2 | 67 | 10 | 6 | 1 |
| HQ-12 set samples | 96 | 46 | 14 | 3 | 67 | 10 | 11 | 3 |
| HGSVC-additional-24 samples | 96 | 46 | 31 | 22* | 67 | 10 | 15 | 3 |
*: In addition to 22 pseudogene and ORF alleles with novel RSS, we didn’t find any appropriate nonamers at TRBV24/OR9-2*03 (ORF) for HG01505. The number of RSS known in IMGT (5) and novel RSS are shown in columns #known and #novel respectively. The functionality Func indicates that the RSSs come from functional genes while P/O indicates that the RSSs come from pseudogenes or open reading frames (ORFs). The RSS from HG001-7 and the primary cell sample are called from both gAIRR-seq and gAIRR-call while the RSS from HQ-12 and the additional 24 samples are called from gAIRR-annotate alone.
Figure 5Structural variants identified by gAIRR-annotate. (A–C) The 65 kbp structural variation in TRA and TRD J region of HG002. (A) The maternal haplotype, (B) the paternal haplotype, and (C) the sequence alignment of HG002’s maternal and paternal TRA/TRD J sequence. In (A, B) the arrows with the same color can be aligned between the two haplotypes. The green arrow is the segment deleted in paternal haplotype. (D–F) The inversion and deletion of TR beta chain germline genes of the reference genome. (D) GRCh37 chr7, (E) GRCh38 chr7, and (F) the sequence alignment of GRCh37 and GRCh38 at TR beta chain. In (D, E), the arrows with the same color can be aligned between the two haplotypes. The deep blue arrow indicates the inversion between the reference genomes.
Figure 6The Integrated Genomics Viewer visualization of HG002’s structural variant. The Integrated Genomics Viewer visualization of the capture-based reads from HG002 (son, top), HG003 (father, middle), and HG004 (mother, bottom) aligned to GRCh37 chromosome 14. There are two red arrows indicating abrupt read-depth changes of HG002’s reads at chr14:22,918,113 and chr14:22,982,924.