| Literature DB >> 23671333 |
Jian Ye1, Ning Ma, Thomas L Madden, James M Ostell.
Abstract
The variable domain of an immunoglobulin (IG) sequence is encoded by multiple genes, including the variable (V) gene, the diversity (D) gene and the joining (J) gene. Analysis of IG sequences typically requires identification of each gene, as well as a comparison of sequence variations in the context of defined regions. General purpose tools, such as the BLAST program, have only limited use for such tasks, as the rearranged nature of an IG sequence and the variable length of each gene requires multiple rounds of BLAST searches for a single IG sequence. Additionally, manual assembly of different genes is difficult and error-prone. To address these issues and to facilitate other common tasks in analysing IG sequences, we have developed the sequence analysis tool IgBLAST (http://www.ncbi.nlm.nih.gov/igblast/). With this tool, users can view the matches to the germline V, D and J genes, details at rearrangement junctions, the delineation of IG V domain framework regions and complementarity determining regions. IgBLAST has the capability to analyse nucleotide and protein sequences and can process sequences in batches. Furthermore, IgBLAST allows searches against the germline gene databases and other sequence databases simultaneously to minimize the chance of missing possibly the best matching germline V gene.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23671333 PMCID: PMC3692102 DOI: 10.1093/nar/gkt382
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.IgBLAST result page. This example used a human IG sequence (GenBank accession AY671579) to search against the default germline gene databases [IMGT human V genes (F + ORF + in-frame P), IMGT human D genes (F + ORF) and IMGT human J genes (F + ORF)]. The search used default values for all parameters. A red box was added to indicate the overlapping nucleotides TAC at the D–J junction. The search was performed on 25 February 2013.
Figure 2.Example IgBLAST result of searching against the germline gene databases and the NCBI nr database simultaneously. A mouse IG sequence (GenBank accession AF104468) was searched against the default mouse germline gene databases [IMGT mouse V genes (F + ORF + in-frame P), IMGT mouse D genes (F + ORF + in-frame P) and IMGT mouse J genes (F + ORF + in-frame P)]. The ‘organism’ field was set to mouse, and the nr database was selected for the ‘additional database’ field. Default values are used for all other parameters, except the ‘number of alignments for additional database’ was 25. The light blue pop-up message box is a feature that displays the sequence title when the mouse pointer is moved over the sequence identifier (i.e. the accession AF021857 in the example). Only part of the result page is shown because of space limitation. A red box was added to indicate the hits from the nr databases that have 100% matches to the query over the 290 bases. The search was performed on 25 February 2013.
Characteristics of the D and J genes identified in 100 random IG heavy chain sequences
| IgBLAST | iHMMune-align | |
|---|---|---|
| Average D gene mutations per sequence (average D gene length) | 0.04 (16.29) | 0.056 (17.43) |
| Average J gene mutations per sequence (average J gene length) | 0.08 (44.29) | 0.23 (44.52) |
aOne hundred IG heavy chain sequences are randomly selected from NCBI nr database (available in Supplementary File S1). The selection is based on 100% identity match to any heavy chain germline gene from IMGT database as determined by BLAST program with default parameters; therefore, there is no previous knowledge about their D and J gene compositions. Tests were performed using web IgBLAST and stand-alone iHMMune-align (version iHMMune-align_26-11-2007.zip) with default search parameters. iHMMune-align did not return a D gene match for 11 sequences that were excluded from D gene analysis.
Number of sequences with correctly identified V, D and J genes or rearrangements in clonally related sequence data sets
| IgBLAST | iHMMune-align | |
|---|---|---|
| Data set 1 (57 sequences) | ||
| IGHV4-34*01 | 52 | 51 |
| IGHD7-27*01 | 54 | 55 |
| IGHJ3*02 | 52 | 52 |
| IGHV4-34*01-IGHD7-27*01- IGHJ3*02 rearrangement | 47 | 47 |
| Data set 2 (101 sequences) | ||
| IGHV4-34*01 | 96 (96) | 95 |
| IGHD6-6*01 | 87 (48) | 86 |
| IGHJ6*02 | 101 (101) | 97 |
| IGHV4-34*01- IGHD6-6*01- IGHJ6*02 rearrangement | 82 | 80 |
aThe clonally related sequences were obtained from Wilson and co-workers (15). Tests were performed using web IgBLAST and stand-alone iHMMune-align (version iHMMune-align_26-11-2007.zip) with default search parameters, except that the mismatch penalty for D gene is set to −1 (instead of default −4) for IgBLAST test with data set 2. The identified germline genes are the top hits (or one of the top equivalent hits that have identical match scores, as well as identical per cent identity) from IgBLAST or iHMMune-align searches. iHMMune-align did not return a D gene match for 1 and 6 sequences for data set 1 and data set 2, respectively. IHMMune-align also did not return any germline gene matches for one sequence in both data sets because of presence of deletions in V gene.
bResults from IgBLAST using default mismatch penalty for D genes.