| Literature DB >> 17134477 |
Hong-Woo Chun1, Yoshimasa Tsuruoka, Jin-Dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki, Jun'ichi Tsujii.
Abstract
BACKGROUND: Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of the results of this approach, we identified prostate cancer and gene terms with the ID tags of public biomedical databases. Moreover, considering that genetics experts will use our results, we classified them based on six topics that can be used to analyze the type of prostate cancers, genes, and their relations.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17134477 PMCID: PMC1764448 DOI: 10.1186/1471-2105-7-S3-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Selected TUIs.
| T019 | Congenital abnormality | T048 | Mental or behavioral dysfunction |
| T020 | Acquired abnormality | T049 | Cell or molecular dysfunction |
| T033 | Finding | T050 | Experimental model of disease |
| T037 | Injury or poisoning | T184 | Sign or symptom |
| T046 | Pathologic function | T190 | Anatomical abnormality |
| T047 | Disease or syndrome | T191 | Neoplastic process |
Performance of NER.
| Target Entities | Features | Precision (%) | Relative recall (%) | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
| GENE | 84.4 | 100.0 | |||||||
| 93.5 | 95.4 | ||||||||
| 95.0 | 97.6 | ||||||||
| 93.1 | 93.3 | ||||||||
| 84.4 | 100.0 | ||||||||
| 84.4 | 100.0 | ||||||||
| 84.4 | 100.0 | ||||||||
| 84.4 | 100.0 | ||||||||
| 97.0 | |||||||||
| PROSTATE CANCER | 99.2 | 100.0 | |||||||
| 99.3 | 99.8 | ||||||||
| 99.3 | 100.0 | ||||||||
| 99.3 | 100.0 | ||||||||
| 99.2 | 100.0 | ||||||||
| 99.2 | 100.0 | ||||||||
| 99.2 | 100.0 | ||||||||
| 99.2 | 100.0 | ||||||||
| 99.3 | 100.0 | ||||||||
Note: 1) Bag of words (all words in co-occurrence), 2) candidate gene and prostate cancer names, 3) unigram words, 4) presence of capital letters in candidate term, 5) presence of numerical digits in candidate term, 6) presence of Greek letters in candidate term, 7) presence of affixes of candidate term.
Performance of relation recognition.
| Features | Precision (%) | Relative recall (%) | |||||
| Order of entities | Bag of words | Candidate entities | Unigram | Bigram | |||
| Any relation | 0.907 | 0.958 | |||||
| 0.908 | 0.957 | ||||||
| 0.896 | 0.975 | ||||||
| 0.902 | 0.957 | ||||||
| 0.903 | 0.966 | ||||||
| 0.909 | 0.961 | ||||||
| Study description | 0.669 | 0.567 | |||||
| 0.650 | 0.568 | ||||||
| 0.597 | 0.295 | ||||||
| 0.662 | 0.551 | ||||||
| 0.659 | 0.552 | ||||||
| 0.651 | 0.550 | ||||||
| Genetic variation | 0.793 | 0.691 | |||||
| 0.781 | 0.665 | ||||||
| 0.737 | 0.464 | ||||||
| 0.791 | 0.669 | ||||||
| 0.796 | 0.687 | ||||||
| 0.794 | 0.680 | ||||||
| Gene expression | 0.734 | 0.614 | |||||
| 0.720 | 0.620 | ||||||
| 0.762 | 0.466 | ||||||
| 0.733 | 0.612 | ||||||
| 0.743 | 0.613 | ||||||
| 0.735 | 0.612 | ||||||
| Epigenetics | 0.854 | 0.660 | |||||
| 0.833 | 0.660 | ||||||
| 0.905 | 0.358 | ||||||
| 0.857 | 0.679 | ||||||
| 0.854 | 0.660 | ||||||
| 0.854 | 0.660 | ||||||
| Pharmacology | 0.657 | 0.431 | |||||
| 0.626 | 0.442 | ||||||
| 0.642 | 0.264 | ||||||
| 0.643 | 0.419 | ||||||
| 0.637 | 0.419 | ||||||
| 0.647 | 0.433 | ||||||
| Clinical marker | 0.774 | 0.723 | |||||
| 0.767 | 0.728 | ||||||
| 0.772 | 0.730 | ||||||
| 0.768 | 0.678 | ||||||
| 0.771 | 0.727 | ||||||
| 0.772 | 0.719 | ||||||
Experimental results.
| Baseline with NE filter | RR with NER (feature) | RR with NER (filter) | |||||||
| Topic-classified Relations | (%) | Baseline w/o NER | Automatic | Manual | RR w/o NER | Automatic | Manual | Automatic | Manual |
| Any relation | P | 81.1 | 91.8 | 96.7 | 90.9 | 91.5 | 97.0 | 97.1 | |
| (3196) | R | 100.0 | 97.0 | 100.0 | 96.1 | 96.1 | 99.6 | 96.5 | 99.6 |
| Study description | P | 26.7 | 30.2 | 31.8 | 66.9 | 67.5 | 70.8 | 70.6 | |
| (1050) | R | 100.0 | 97.2 | 100.0 | 56.7 | 57.6 | 63.0 | 62.9 | 62.9 |
| Genetic variation | P | 7.1 | 8.1 | 8.4 | 79.3 | 78.6 | 81.9 | 83.1 | |
| (278) | R | 100.0 | 98.9 | 100.0 | 69.1 | 67.3 | 70.1 | 73.6 | 73.6 |
| Gene expression | P | 27.1 | 30.8 | 32.3 | 73.4 | 73.0 | 76.2 | 76.8 | |
| (1067) | R | 100.0 | 97.4 | 100.0 | 61.4 | 61.4 | 64.5 | 63.5 | 64.9 |
| Epigenetics | P | 1.3 | 1.6 | 1.6 | 85.7 | 86.0 | 85.7 | 88.1 | |
| (53) | R | 100.0 | 100.0 | 100.0 | 67.9 | 69.8 | 67.9 | 69.8 | 69.8 |
| Pharmacology | P | 9.1 | 10.3 | 10.9 | 65.7 | 66.7 | 63.7 | 67.2 | |
| (360) | R | 100.0 | 96.1 | 100.0 | 43.1 | 44.7 | 45.0 | 44.4 | 45.3 |
| Clinical marker | P | 31.5 | 35.9 | 37.5 | 77.4 | 78.2 | 76.6 | 78.3 | |
| (1240) | R | 100.0 | 97.8 | 100.0 | 72.3 | 73.2 | 74.0 | 73.6 | 75.4 |
Notes) Numbers in the first column: frequency of correct predictions, NER: ML-based NER, RR: ML-based topic-classified relation recognition, Automatic: experiments using ML-based NER results, Manual: experiments using human-generated NER annotation results, P: precision, R: relative recall.