| Literature DB >> 25875140 |
Bernardo Cortina-Ceballos1, Elizabeth Ernestina Godoy-Lozano, Hugo Sámano-Sánchez, Andrés Aguilar-Salgado, Martín Del Castillo Velasco-Herrera, Carlos Vargas-Chávez, Daniel Velázquez-Ramírez, Guillermo Romero, José Moreno, Juan Téllez-Sosa, Jesús Martínez-Barnetche.
Abstract
The B cell antigen receptor repertoire is highly diverse and constantly modified by clonal selection. High-throughput DNA sequencing (HTS) of the lymphocyte repertoire (Rep-Seq) represents a promising technology to explore such diversity ex-vivo and assist in the identification of antigen-specific antibodies based on molecular signatures of clonal selection. Therefore, integrative tools for repertoire reconstruction and analysis from antibody sequences are needed. We developed ImmunediveRity, a stand-alone pipeline primarily based in R programming for the integral analysis of B cell repertoire data generated by HTS. The pipeline integrates GNU software and in house scripts to perform quality filtering, sequencing noise correction and repertoire reconstruction based on V, D and J segment assignment, clonal origin and unique heavy chain identification. Post-analysis scripts generate a wealth of repertoire metrics that in conjunction with a rich graphical output facilitates sample comparison and repertoire mining. Its performance was tested with raw and curated human and mouse 454-Roche sequencing benchmarks providing good approximations of repertoire structure. Furthermore, ImmunediveRsity was used to mine the B cell repertoire of immunized mice with a model antigen, allowing the identification of previously validated antigen-specific antibodies, and revealing different and unexpected clonal diversity patterns in the post-immunization IgM and IgG compartments. Although ImmunediveRsity is similar to other recently developed tools, it offers significant advantages that facilitate repertoire analysis and repertoire mining. ImmunediveRsity is open source and free for academic purposes and it runs on 64 bit GNU/Linux and MacOS. Available at: https://bitbucket.org/ImmunediveRsity/immunediversity/.Entities:
Keywords: CDR3; CDRH3, heavy chain complementarity determining region 3; HEL, hen egg lysozyme; Ig repertoire; Rep-Seq, repertoire sequencing; SHM, somatic hypermutation.; data mining; high-throughput sequencing
Mesh:
Substances:
Year: 2015 PMID: 25875140 PMCID: PMC4622655 DOI: 10.1080/19420862.2015.1026502
Source DB: PubMed Journal: MAbs ISSN: 1942-0862 Impact factor: 5.857
Figure 1.The overall algorithm used by ImmunediveRsity. Input file is a *.fastq. Pre-processing consists on VDJ assignment and non-VDJ sequence trimming (5′ UTR, signal sequence and IGHC), homo-polymer correction (particularly required for 454-Roche or Ion Torrent reads), quality, size and IGH germline transcript (GLT) filters. Processing: ImmunediveRsity first identifies the CDRH3 with HMMER3 and the V and J segment rearrangement with IgBLAST. The CDRH3 reads belonging to the same V and J assignment are clustered iteratively according to sequence identity and length to define clonotypes. A second clustering step according to sequence identity is performed within the full V region of reads belonging to each clonotype to identify the lineages with different somatic hypermutation patterns. The output consists (1) Fasta files: containing the CDRH3 sequences for each read and clonotype, as well as the sequence for each consensus lineage with a unique identifier. (2) Text files, describing V, D and J assignments for each read and the relation of each read to a given clonotype and lineage. (3) Metrics files: for each clonotype is given frequency, the number of synonymous (Ks) and non-synonymous mutations (Ka) mutations, diversity indices, CDRH3 physico-chemical characteristics (P.Q.) and (4) Repertoire visualization: A series of predefined vectorized graphics: (1) Rarefaction curves, (2) Aminoacid composition per specific length of CDRH3, (3) Heat-map of VJ rearrengment frequencies, (4) CDRH3 spectratyping, (5) 3D cloud VDJ rearrangement frequency and (6) Network representation of the overall structure of the antibody repertoire. clonotypes (CG), lineages (Id).
Figure 2.iGraph network representation of the sampled antibody repertoire in mouse spleen 15 days after immunization with HEL. (A) Left: IgM compartment. Right: IgG compartment. Each clonotype is represented by the agglomeration of lineages (nodes; represented by circles), the diameter of the circle represents the relative frequency and the color code according to the number of non-synonymous mutations. The CDRH3 sequence (*3G1 ARGEGNYGY) of recombinant HEL-specific antibody as described is shown. Fading of certain clonotypes allows visualization of other clonotypes in the background. (B) Quantitative analysis of SHM in the IgM vs. IgG compartment in the same dataset as in A. (C) Statistical analysis of SHM in the IgM vs. IgG compartment. Median for each compartment is shown (dotted line). U-Mann-Whitney test. Frame shifted sequences were removed before the analysis. NSM, non-synonymous mutations.
Overview of the reference sequencing sets
| Set | Sequenced reads | After filters1 | Observed clonotypes | Expected clonotypes | Well supported clonotypes2 | Observed lineages (without singletons) | Expected lineages | Well supported lineages3 |
|---|---|---|---|---|---|---|---|---|
| MD4 | 5,359 | 99.6% | 10 | 1 | 1 | 21(7) | 1 | 1 |
| IGHV1-3 | 1,044 | 95.2% | 1 | 1 | 1 | 469 (52) | 10 | 7 |
| Stanford224 | 13,141 | 100% | 11,779 | 13,141 | NA | 12,421 | 13,141 | NA |
1Percent of reads that pass the pre-processing filters.
2Number of clonotypes whose corresponding lineages are composed ≥ 6 reads.
3Lineages composed of ≥ 6 reads.
4The publicly available Stanford22 set was published as a set of non-clonally related immunoglobulin sequences; we removed one read with a duplicated identifier and 11 with duplicated sequences.
NA, not applicable.
Figure 3.Comparison of clonotype intra-clonal inequality (Gini coefficient) between control (PBS) and 2 HEL-immunized mice 15 days post-immunization. clonotypes derived from control mice are shown as green dots. Clonotypes from HEL-immunized mice are shown in red (m8) and purple (m9) dots. x axis; Gini coefficient per clonotype, y axis; clonotype relative frequency. The CDRH3 sequences of anti-HEL recombinant antibodies as described are shown. Fading dots allow the visualization of overlapping dots.
Figure 4.Clonal diversity and somatic diversification after immunization. A change in clonal (closed symbols) and lineage (open symbols) diversity measured by the average Shannon-Weaver index of HEL-immunized (n = 2) minus PBS-injected mouse (n = 1) at day 3, 7 and 15 post-immunization for the IgM (upper panel) and IgG (lower panel) compartments. (B) The corresponding change in clonal (closed symbols) and lineage (open symbols) inequality measured by the average Gini coefficient. For A and B, 5,700 reads per library were randomly sampled using the post-processing multi-library analysis toolbox. Sequencing metrics of the libraries used to estimate diversity measurements are described in .