| Literature DB >> 20663186 |
Yoji Nakamura1, Tomoyoshi Komiyama, Motoki Furue, Takashi Gojobori, Yasuto Akiyama.
Abstract
BACKGROUND: Immunoglobulin (IG or antibody) and the T-cell receptor (TR) are pivotal proteins in the immune system of higher organisms. In cancer immunotherapy, the immune responses mediated by tumor-epitope-binding IG or TR play important roles in anticancer effects. Although there are public databases specific for immunological genes, their contents have not been associated with clinical studies. Therefore, we developed an integrated database of IG/TR data reported in cancer studies (the Cancer-related Immunological Gene Database [CIG-DB]). DESCRIPTION: This database is designed as a platform to explore public human and murine IG/TR genes sequenced in cancer studies. A total of 38,308 annotation entries for IG/TR proteins were collected from GenBank/DDBJ/EMBL and the Protein Data Bank, and 2,740 non-redundant corresponding MEDLINE references were appended. Next, we filtered the MEDLINE texts by MeSH terms, titles, and abstracts containing keywords related to cancer. After we performed a manual check, we classified the protein entries into two groups: 611 on cancer therapy (Group I) and 1,470 on hematological tumors (Group II). Thus, a total of 2,081 cancer-related IG and TR entries were tabularized. To effectively classify future entries, we developed a computational method based on text mining and canonical discriminant analysis by parsing MeSH/title/abstract words. We performed a leave-one-out cross validation for the method, which showed high accuracy rates: 94.6% for IG references and 94.7% for TR references. We also collected 920 epitope sequences bound with IG/TR. The CIG-DB is equipped with search engines for amino acid sequences and MEDLINE references, sequence analysis tools, and a 3D viewer. This database is accessible without charge or registration at http://www.scchr-cigdb.jp/, and the search results are freely downloadable.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20663186 PMCID: PMC2919518 DOI: 10.1186/1471-2105-11-398
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
CIG-DB statistics as of 1 October 2009
| Data content | IG | TR | Total |
|---|---|---|---|
| Screened from NCBI and PDB | 32240 | 6068 | 38308 |
| Cancer-related | 1605 | 476 | 2081 |
| Human | 879 | 318 | 1197 |
| Mouse | 726 | 158 | 884 |
| Group Ia | 397 | 214 | 611 |
| Group IIb | 1208 | 262 | 1470 |
| Chains | |||
| Light | 791 | ||
| Heavy | 814 | ||
| Alpha | 185 | ||
| Beta | 288 | ||
| Gammac | 3 | ||
| Epitope sqeuences | 772 | 148 | 920 |
aThis group is involved in cancer therapy such as antibody medicines.
bThis group is involved in expression in hematological tumors.
cA minor TCR allele.
Classification of cancer-related references in CIG-DB
| Data content | IG | TR | Total |
|---|---|---|---|
| Collected from PubMed | 2054 | 686 | 2740 |
| Screened by keywords | 446 | 132 | 578 |
| Manually classifieda | |||
| Group I | 120 | 34 | |
| Group II | 139 | 37 | |
| (Group III) | 187 | 61 |
aGroups I and II are the same as in Table 1. Group III is a group of wrongly screened references.
Figure 1The result of principal component analysis on cancer-related immunoglobulin (IG) and T-cell receptor (TR) references. PC1 and PC2 indicate scores of the first and second principal components, respectively. (a) IG references and (b) TR references.
Validation of reference classification by canonical discriminant analysis
| Predicted groups | ||||||
|---|---|---|---|---|---|---|
| Protein and groupa | Group I | Group II | Group III | Total | Accuracy (%) | |
| IG | Group I | 0 | 9 | 120 | 92.5 | |
| Group II | 0 | 11 | 139 | 92.1 | ||
| Group III | 0 | 4 | 187 | 97.9 | ||
| Total | ||||||
| TR | Group I | 0 | 3 | 34 | 91.2 | |
| Group II | 0 | 4 | 37 | 89.2 | ||
| Group III | 0 | 0 | 61 | 100.0 | ||
| Total | ||||||
aGroups I, II and III are the same as in Tables 1 and 2.
Figure 2A search interface of the Cancer-related Immunological Gene Database (CIG-DB). (a) An example of search results where the word "Herceptin" is queried against references in the CIG-DB. (b) The subsequent sequence search result from (a). A check box for reflecting the reference search result to sequence search is indicated by a red arrow.
Figure 3Sequence analysis tools of CIG-DB. (a) An example of BLAST query and result pages. (b) An example of CLUSTALW query and result pages. Both analyses were performed from the search result page in Figure 2b.
Figure 4A 3D viewer of CIG-DB. The entries containing two words, "cancer" and "testis," were searched against the column of "Epitope name" in TR-epitope table (EPI TCR table). A 3D structure of the top hit (PDB code = 2F53, the complex between the T-cell receptor and cancer/testis antigen 1B peptide) is shown. The epitope molecule is highlighted in green by checking a box in the viewer.