| Literature DB >> 29593723 |
Neha Chaudhary1, Duane R Wesemann1.
Abstract
Somatic assembly of T cell receptor and B cell receptor (BCR) genes produces a vast diversity of lymphocyte antigen recognition capacity. The advent of efficient high-throughput sequencing of lymphocyte antigen receptor genes has recently generated unprecedented opportunities for exploration of adaptive immune responses. With these opportunities have come significant challenges in understanding the analysis techniques that most accurately reflect underlying biological phenomena. In this regard, sample preparation and sequence analysis techniques, which have largely been borrowed and adapted from other fields, continue to evolve. Here, we review current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies. We discuss the general steps in the process of immune repertoire generation including sample preparation, platforms available for sequencing, processing of sequencing data, measurable features of the immune repertoire, and the statistical tools that can be used for analysis and interpretation of the data. Because BCR analysis harbors additional complexities, such as immunoglobulin (Ig) (i.e., antibody) gene somatic hypermutation and class switch recombination, the emphasis of this review is on Ig/BCR sequence analysis.Entities:
Keywords: B cell repertoire; immunoglobulin; next-generation sequencing; repertoire; statistical analysis
Mesh:
Substances:
Year: 2018 PMID: 29593723 PMCID: PMC5861150 DOI: 10.3389/fimmu.2018.00462
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Complete workflow for high-throughput sequencing and analysis of the immunoglobulin repertoire. Text within orange outlines the complications at each step.
Figure 2Use of unique molecular identifiers (UMIs). Each strand is an mRNA or a cDNA and smaller bars are UMIs. Same color of the strand and bar represents copies of same mRNA and UMI, respectively. (A) Molecular Identifier Group based Error Correction (MIGEC) (17). Among all sequences with same UMI, only few have error (late PCR error) (red), the error is identified and removed; if near 50% of the sequences have the same error, the sequence is dropped; an early error (present in most sequences) would be unidentifiable but it is dropped if it falls on a PCR hotspot. (B) Duplex Sequencing (18). UMIs are added to both ends of the sequence and both strands are sequenced. If a mutation (green, black, or cyan) is present in only one of the two stands, it is an error. (C) Paired-end sequencing is done after UMI tagging. Error corrections are done for individual reads and then they are merged to get the full good quality sequence (19). (D) Tn5-enabled molecular identifier-guided amplicon sequencing (TMIseq) (20). The PCR amplified libraries are tagmented using Tn5 transposase where either forward (green) or reverse (pink) primer is inserted. Thus, only part of the sequence containing both forward and reverse primers gets amplified for sequencing. Both, the smaller libraries and the complete sequence library are sequenced and used to generate a consensus error-free sequence. (E) Molecular amplification fingerprinting (MAF) (21). A reverse UMI (RUMI) is added at the reverse transcription (RT) step and a forward UMI (FUMI) is added at each subsequent PCR amplification step. FUMIs keep track of PCR bias for different sequences. Some sequences are over amplified while some may be lost in the process.
Figure 3Impact of erroneous barcodes (25). Each strand represents a mRNA. The bar at the end represents a unique molecular identifier (UMI). Same color of the strand and bar represents copies of same mRNA and UMI, respectively. The sequence of the UMI is mentioned within each strand.
Figure 4Single cells bulk sequencing: (A) Single cells are sorted in 96-well plates, and VH and VL are tagged with cell specific unique molecular identifier (UMI). Sequences from all cells are pooled together and sequenced (29). (B) Single cells are isolated in polydimethylsiloxane slides (1.7 × 105 wells/slide-56-μm diameter wells); poly(dT) microbeads are added; wells are sealed with dialysis membrane and equilibrated with lysis buffer; VH and VL mRNAs get attached to poly(dT) beads; beads are emulsified for cDNA synthesis; linkage PCR generates paired VH:VL products which are pooled together and sequenced (30). (C) Single cells and poly(dT) magnetic beads are trapped into emulsions along with lysis buffer. VH and VL mRNAs annealed to poly(dT) beads and sequenced as in (B) (31). (D) Single cells are sorted in 384-well PCR plates. Instead of unique UMI for each cell, each row and column has unique UMIs attached to respective forward and reverse primers, which help trace back to the wells (32). The DNA is pooled and sequenced. (E) Microfluidic device joins two aqueous flows into distinct droplets: one with cells and other with barcoded primer beads in lysis buffer. The cell is lysed and its mRNAs hybridizes to the primers on the microparticle surface. The microparticles are collected, washed, and the mRNAs are reverse transcribed, each with unique UMI from the beads. They are pooled and bulk sequenced together (33).
Common platforms used for immunoglobulin repertoire sequencing.
| Platforms | Roche’s 454 GS FLX | Illumina MiSeq | Illumina HiSeq | PacBio | Ion torrent |
|---|---|---|---|---|---|
| Mechanism | Pyrosequencing | Dye terminator sequencing | Syntdesis (fluoresces attached to nt is excited and detected after each run) | Syntdesis (florescence tag attached to phosphate chain) | Syntdesis (detect H+) |
| Read length | 700 bp | 300 × 2 | 250 × 2 | 860–1,100 | >100 |
| Run time | 18–20 h | 26 h | 8 days | 0.5–2 h | 2 h |
| Reads/run | 1M | 3.5M | 2B | 0.01M | 60–80M |
| Error rate (%) | 1 | ~0.1 | ~0.1 | ~13 | ~1 |
| Type of errors | Indel | Substitution | Substitution | Indel | Indel |
| Cost/mbp ($) | 12.40 | 0.74 | 0.10 | 11–180 | <7.5 |
| Region of antibody covered | FWR1-CR | FWR1-CR | FWR1-CR | Amplification of linked H and L chains | FWR3 to CR |
Softwares available for sequence error correction, annotation, and analysis of immunoglobulin (Ig) repertoire.
| Name | Platform/availability | Input format | Maximum sequence limit | Features | Reference |
|---|---|---|---|---|---|
| IMGT/V-QUEST | Online | Fasta | 50 | V(D)J Annotation, junction analysis; mutation; amino acid statistics; comparisons between two repertoires | ( |
| IMGT/HighV-Quest | Online | Fasta | 150,000 | ( | |
| JOINSOLVER | Online/standalone | Fasta | – | Annotation; complimentary determining region 3 (CDR3); mutation; insertion deletion in human only | ( |
| VDJSolver | Online | Fasta | 500 | Use hidden Markov model (HMM) or maximum likelihood to prediction V(D)J recombination | ( |
| iHMMune-align | Online/standalone | Fasta | HMM to model the processes involved in human IGH gene rearrangement and maturation | ( | |
| VDJFasta | Standalone | Fasta | – | HMM-based CDR identification; translation and alignment; probabilistic germline classification | ( |
| BASELINe | Online/standalone | Fasta | – | Quantifying selection based on somatic hypermutation (SHM) patterns | ( |
| IgAT | Standalone (windows) | IMGT output files | 150,000 | Gene segments usage; CDR3; antigen selection based on SHM; the hydrophobicity of antigen-binding sites; structural properties of the CDR-H3 loop using Shirai’s H3-rules | ( |
| IgBlast | Online/standalone | Fasta | Online-1,000/SA-none | V(D)J assignment; CDR3 identification; mutation; can use custom database in SA | ( |
| pRESTO | Standalone | Fastq/Fasta | None | Merge; filter; error correction (with/without UMIs); annotation | ( |
| Vidjil | Online/standalone | Fastq/Fasta | None | Extract V(D)J junctions; clonality | ( |
| The antibody mining toolbox | Standalone | Fastq | None | Analysis based on CDR3 as sequence identifiers | ( |
| MIGEC | Standalone (Unix) | Fastq | None | Error correction and sequence assembly | ( |
| IgRepertoireConstructor | Standalone | Fastq | None | Merge; filter; error correction (with/without UMIs); validation using mass spec; clonality; diversity | ( |
| MiXCR | Standalone | Fastq | None | Merge; filter; PCR error correction; annotation; Gene segment usage; clonality; mutation | ( |
| IMonitor | Standalone | Fastq/Fasta | None | Merge; filter; V(D)J assignment; gene usage frequency; CDR3; mutation; insertion and deletion | ( |
| IgSCUEAL | Standalone | Fasta | None | V J annotation based on phylogeny; gene usage frequency; CDR3 length | ( |
| Change-O | Standalone | IMGT/IgBlast Result | None | Gene usage; clonality; CDR3; diversity; phylogenetic; mutation; selection pressure; novel germline prediction | ( |
| TIgGER | Standalone | Fasta | – | Predicts germline alleles | ( |
| LymAnalyzer | Standalone | Fastq | None | V(D)J identification; CDR3; diversity; mutation; polymorphism analysis | ( |
| sciReptor | Standalone | SFF/Fastq/Fasta | 2,500 | Single-cell analysis, annotation; maintains regional database; gene segment usage; clustering; mutation | ( |
| repgenHMM | Standalone | Fasta | None | Predicts scenarios of V(D)J recombination | ( |
| bcRep | Standalone (R) | IMGT output files | – | Gene usage frequency; clonality; diversity; mutations; repertoire comparison; visualization | ( |
| IgDiscover | Standalone | Fastq | – | Identification of existing and novel germline V genes | ( |
| Recon | Standalone | Frequency table (txt) | – | Diversity | ( |
| IMPre | Standalone | Fasta | – | Predicts germline genes and alleles | ( |
| ARResT/Interrogate | Standalone | IMGT output files | – | Calculation of statistics; visualization | ( |
| Antigen Receptor Galaxy | Online | Fastq/Fasta | None | Demultiplex; annotation using IMGT/High V-Quest; V(D)J usage; SHM and CSR; Ag selection; clonality | ( |
| IGoR | Standalone | Fasta | None | Calculates V(D)J recombination and mutation probabilities | ( |
| ClonoCalc and ClonoPlot | Standalone | Fastq | – | GUI; Demultiplex; merge and annotate using MiXCR; analysis and plots using tcR package in R | ( |
Figure 5Network analysis of immunoglobulin (Ig) repertoire—an explanatory model. (A) An example network arising from single germline sequence (Red). (B) Multiple clusters arising from different ancestral sequences. Each color represents cluster arising from different germline. (C) Representative network of a healthy individual: each cluster arising from an ancestral sequence is of uniform size and complexity. (D) Representative network of an individual exposed to an antigen: larger clusters represent the antibody, which recognizes the antigen and hence expands and mutates. (E) Representative Ig network of chronic lymphocytic leukemia patient with one dominant highly expanded cluster.