| Literature DB >> 29669518 |
Christos Maramis1,2, Athanasios Gkoufas3,4, Anna Vardi4, Evangelia Stalika4, Kostas Stamatopoulos4, Anastasia Hatzidimitriou4, Nicos Maglaveras3,4, Ioanna Chouvarda3,4.
Abstract
BACKGROUND: The study of the huge diversity of immune receptors, often referred to as immune repertoire profiling, is a prerequisite for diagnosis, prognostication and monitoring of hematological disorders. In the era of high-throughput sequencing (HTS), the abundance of immunogenetic data has revealed unprecedented opportunities for the thorough profiling of T-cell receptors (TR) and B-cell receptors (BcR). However, the volume of the data to be analyzed mandates for efficient and ease-to-use immune repertoire profiling software applications.Entities:
Keywords: B-cell receptors; High-throughput sequencing; Immune receptor profiling; Software pipeline; T-cell receptors
Mesh:
Substances:
Year: 2018 PMID: 29669518 PMCID: PMC5907363 DOI: 10.1186/s12859-018-2144-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fields of the IMGT Summary Report that are employed by the introduced pipeline
| Index | Field Name |
|---|---|
| 1 | AA JUNCTION |
| 2 | V-GENE and allele |
| 3 | V-REGION identity % |
| 4 | J-GENE and allele |
| 5 | D-GENE and allele |
| 6 | Functionality |
Fig. 1Conceptual design of IRProfiler
List of clonotype definitions supported by IRProfiler
| Index | Clonotype Name | Components | Comment |
|---|---|---|---|
| 1 | V + D + J + CDR3 | (V-gene, D-gene, J-gene, CDR3-AA) | IMGT Clonotype (AA) with the allele information omitted |
| 2 | V + J + CDR3 | (V-gene, J-gene, CDR3-AA) | No D-gene information; caters for D-gene assignment ambiguity |
| 3 | V + CDR3 | (V-gene, CDR3-AA) | Specialized definition, focusing on V-gene |
| 4 | J + CDR3 | (J-gene, CDR3-AA) | Specialized definition, focusing on J-gene |
| 5 | CDR3 | (CDR3-AA) | The least detailed definition including only CDR3-AA |
CDR3-AA denotes the animoacid translation of the CDR3 including the anchor animoacids (104 and 118)
Specifications of the hardware and software setup for the scalability evaluation experiments
| Processor | Intel(R) Core(TM) i7–4790 CPU @ 3.60GHz, × 64 |
| RAM Memory | 16 GB RAM DIMM DDR3 Synchronous 1600 MHz |
| Storage | INTEL SSD SC2BW18, SATA 3.0 6Gbs |
| OS | Ubuntu 16.04.1 LTS |
| Python & Libraries | CPython 3.4.5 with Pandas 0.19.1 |
Results of theoretical and experimental execution time estimation (extracted independently) for the developed tools
| Index | Tool |
| |
|---|---|---|---|
| 1 | Data filtering | 0.999808 | |
| 2 | Clonotype diversity and expression | [0.999508, 0.999717] | |
| 3 | Gene usage | [0.976990, 0.981139] | |
| 4 | Public clonotypes | [0.976313, 0.996352] | |
| 5 | Exclusive clonotypes | [0.937594, 0.958152] | |
| 6 | Gene usage comparison | [0.834540, 0.864164] |
For the theoretical estimation (3rd column), n is the number of input receptor reads or clonotypes and m is the number of input repertoire datasets. For the experimental estimation (4th column), exactly 2 input datasets have been assumed for the 4th–6th tools. The 4th column includes the coefficient of determination values (R2) assuming a first (1st-3rd tool) and second (4th–6th tool) order polynomial model of the execution time; whenever multiple alternative clonotype definitions or gene subgroups are supported by a tool, ranges of values are reported
Fig. 2Bar charts of peak RAM memory usage for various groups of tools – tools are grouped on the basis of number of inputs and their size (in number of rows; M = 106, K = 103)
Comparison of IRProfiler with existing software with respect to functionality and other S/W properties
| IMGT | IGGalaxy | tcR | IMonitor | IMSEQ | IMEX | Vidjil | IRProfiler | ||
|---|---|---|---|---|---|---|---|---|---|
|
|
| Graphical | Graphical | Command-line | Command-line | Command-line | Graphical | Graphical | Graphical |
|
| Web-based (Asynchronous) | Web based (Galaxy) | Native (R package) | Native (Shell script) | Native (Shell script) | Native (C# executable) | Web-based | Web based (Galaxy) | |
|
| +/+ | −/+ | +/− | +/+ | +/+ | +/+ | +/+ | +/+ | |
|
| Hundreds of files (HTML, PDF, PNG) | Tab delimited text and HTML files | R dataframes (can be saved to text) | Several PDF files | 1 Tabular text file (optionally PDF graph) | Tabular text and image files | Various files (HTML, JSON, CSV, PDF, FASTA) | Tabular text files (2 or 3 per tool) | |
|
| IMGT Clonotype (AA) | V + CDR3 (AA); V + CDR3 (NT); V + J + CDR3 (NT); V + D + J + CDR3 (NT) | IMGT Clonotype (AA) – works also with “relaxed” definitions | CDR3 (AA); CDR3 (NT) | V + J + CDR3 (AA) | CDR3 (AA); CDR3 (NT); V + D + J (incl. allele); whole read (NT | AA) | V(D)J junction (NT) – also supports 3rd S/W definitions | 5 definitions (see Table | |
|
|
| Conserved anchors; V/J functional; ORF | Productive reads | User-specified filtering supported | Pseudogenes; out-of-frame; stop codons; etc. | Conserved anchors; out-of-frame; stop-codons | IMGT “no result” | Conserved anchors | 11 read quality criteria |
|
| Clonotype diversity and expression | Clonotype diversity | Clonotype diversity and expression | Clonotype diversity and expression | Clonotype diversity and expression | Clonotype diversity and expression | Clonotype diversity and expression | Clonotype diversity and expression | |
|
| V, D and J gene subgroups | V, D and J gene subgroups | V and J gene subgroups | V and J gene subgroups | N/A | V, D and J gene subgroups | N/A | V and J gene subgroups | |
|
| Public clonotypes and number of exclusive ones | N/A | Public clonotypes | N/A | Top 10 public clonotypes | Top N (user defined) public clonotypes | Public clonotypes | Public and exclusive clonotypes | |
|
| V, D and J gene subgroups | V, D and J gene subgroups | Entire V and J gene usage repertoire comparison | N/A | N/A | N/A | N/A | V and J gene subgroups | |
|
| Various diversity and expression histograms (e.g., per CDR3 length), etc. | V-D, V-J and D-J gene combination heatmaps | Advanced statistics for diversity and gene usage, visualizations, etc. | Receptor annotation, error correction, visualizations | Clonotype clustering (for ambiguity resolution) | V-J gene combination heatmaps, primer efficiency analysis | Receptor annotation, Interactive visualization | N/A |
Fig. 3Bar charts of the J gene usages that are calculated by IRProfiler and IGGalaxy for a public BcR dataset