| Literature DB >> 27782801 |
Sheng-Jou Hung1, Yi-Lin Chen2,3, Chia-Hung Chu1, Chuan-Chun Lee2,3, Wan-Li Chen2,3, Ya-Lan Lin2,3, Ming-Ching Lin2,3, Chung-Liang Ho2,3, Tsunglin Liu4.
Abstract
BACKGROUND: T cells and B cells are essential in the adaptive immunity via expressing T cell receptors and immunoglogulins respectively for recognizing antigens. To recognize a wide variety of antigens, a highly diverse repertoire of receptors is generated via complex recombination of the receptor genes. Reasonably, frequencies of the recombination events have been shown to predict immune diseases and provide insights into the development of immunity. The field is further boosted by high-throughput sequencing and several computational tools have been released to analyze the recombined sequences. However, all current tools assume regular recombination of the receptor genes, which is not always valid in data prepared using a RACE approach. Compared to the traditional multiplex PCR approach, RACE is free of primer bias, therefore can provide accurate estimation of recombination frequencies. To handle the non-regular recombination events, a new computational program is needed.Entities:
Keywords: Immunoglobulin; Next-generation sequencing; RACE; Sequence alignment; T-cell receptor; VDJ recombination
Mesh:
Substances:
Year: 2016 PMID: 27782801 PMCID: PMC5080739 DOI: 10.1186/s12859-016-1304-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
RACE data used in this study
| Species; gene; sequencer | Read number | Mean length (bp) | SRA accession |
|---|---|---|---|
| Human; TRβ; 454 | 16,545 | 157 | Our data |
| Human; TRβ; Illumina | 1,522,640 | 209 | SRR1544031 |
| Mouse; IgH; 454 | 106,189 | 322 | SRR934668-79; SRR934686-91 |
Fig. 1TRIg pipeline. TRIg starts by aligning TR or Ig reads to the corresponding reference using nucmer. Because many initial alignments overlap, thus not authentic, optimal sets of alignments are obtained using a heuristic iterative approach. Some of the optimal alignments are further filtered based on the VDJ annotations. If more than one V annotation survives the filtering, TRIg intends to extend the alignments by relaxing the breaklen parameter of nucmer, followed by re-identification of authentic alignments. Because the resulting alignments may still overlap by few bases, the overlapping bases are trimmed (in red). Finally, VDJ annotations are collected and the CDR3 segments are extracted. Please see main text for detailed descriptions
Number of VJ annotations by four programs
| Data | Decombinator | IgBLAST | IMGT | TRIg (including non-VJ annotations) |
|---|---|---|---|---|
| Our data | 4807 | 16,487 | 5711 | 12,260 (16,538) |
| SRR1544031 | 1,190,792 | 1,521,612 | 1,232,628 | 1,456,541 (1,517,758) |
| SRR9346 (68-79;86-91) | N.A. | 105,850 | 64,819 | 87,286 (106,111) |
Consistency of VJ annotations to our data
| TRIg v.s. | Identical | Extra | Missing | Distinct | Non-VJ |
|---|---|---|---|---|---|
| Decombinator | 4394 | 1 | 257 | 155 | 0 |
| IgBLAST | 5733 | 2028 | 30 | 4411 | 4278 |
| IMGT | 5466 | 131 | 68 | 45 | 1 |
Fig. 2Comparison of immune sequence alignments by different tools. Differences in length (x-axis) and identity (y-axis) of non-identical alignments by IgBLAST and TRIg (left column) and IMGT and TRIg (right column) for (a) our 454 data of human TRβ gene, (b) public Illumina data of human TRβ gene, and (c) public 454 data of mouse IgH gene. The differences were obtained by subtracting IgBLAST’s or IMGT’s values from TRIg’s values. Thus, dots in the first quadrant clearly indicate better alignments by TRIg. The validities of alignments in the second and fourth quadrants are less clear. However, for most dots in the two quadrants, TRIg’s annotations are more convincing because TRIg’s alignments are much longer or the identities much higher. Note that the dots may fall on top of each other, this explains the seemingly fewer dots than indicated in the first quadrant of (b)
Consistency of VJ annotations to the SRR1544031 data
| TRIg v.s. | Identical | Extra | Missing | Distinct | Non-VJ |
|---|---|---|---|---|---|
| Decombinator | 1,111,024 | 66 | 42,448 | 37,254 | 0 |
| IgBLAST | 1,226,408 | 42,952 | 343 | 186,698 | 60,850 |
| IMGT | 1,191,269 | 38,376 | 796 | 1204 | 645 |
Consistency of VJ annotations to the SRR9346(68-79;86-91) data
| TRIg v.s. | Identical | Extra | Missing | Distinct | Non-VJ |
|---|---|---|---|---|---|
| IgBLAST | 46,496 | 6601 | 5799 | 28,144 | 18,740 |
| IMGT | 44,684 | 2736 | 6328 | 11,057 | 14 |
Note that IgBLAST or IMGT gave an annotation not present in the reference set of TRIg to 8.9 % of the reads, resulting in non-identical annotations
Run time of four programs on the three data
| Run time | Decombinator | IgBLAST | IMGT | TRIg |
|---|---|---|---|---|
| Our data | 0 m 8 s | 4 m 43 s | 84 m | 0 m 15 s |
| SRR1544031 | 11 m 41 s | 653 m 32 s | N.A.a | 42 m 10 s |
| SRR9346(68-79;86-91) | N.A. | 135 m 01 s | 155 m | 25 m 17 s |
IMGT jobs were done on the webserver. The rest tools were run using only one processor (800 MHz). aNot available because the data was split into 11 files and the total run time did not reflect the true run time