| Literature DB >> 27307646 |
Shengwen Peng1, Ronghui You1, Hongning Wang2, Chengxiang Zhai3, Hiroshi Mamitsuka4, Shanfeng Zhu5.
Abstract
MOTIVATION: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well.Entities:
Mesh:
Year: 2016 PMID: 27307646 PMCID: PMC4908368 DOI: 10.1093/bioinformatics/btw294
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Typical example to show how well D2V-TFIDF works
| PMID | Title | MH | BOW | DSR-BOW | DSR | MH |
|---|---|---|---|---|---|---|
| 25236620 | Cytopathology fellowship milestones. | Accreditation; Clinical Competence; Cytodiagnosis; Education, Medical, Graduate; Fellowships and Scholarships; Humans; Pathology; United States. | – | – | – | – |
| 23416813 | Comparison of computed tomographic and cytopathological findings in the evaluation of adult orbital mass. | Adult; Aged; Aged, 80 and over; Biopsy, Fine-Needle; Eye Neoplasms; | 0.3620 | 0.2208 | 0.0476 | |
| 23597252 | Fellowship training in pediatric pathology: a guide for program directors. | 0.2315 | 0.3930 | 0.4444 | ||
| 24576024 | The pathology milestones and the next accreditation system. | 0.43813 | 0.5489 |
BOW is ‘bag of words’, DSR is ‘deep semantic representation’ (equivalent to ‘document to vector’ (D2V)) and MH is ‘MeSH main heading’. The last four columns show the similarity scores against PMID:25236620.
Fig. 1.The work flow of (a) generating D2V-TFIDF and (b) DeepMeSH
Performance comparison of KNNs with different feature representation
| Method | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| KNNTFIDF | 0.4369 | 0.4455 | 0.4412 | 0.4362 | 0.4547 | 0.4317 | |||
| KNNW2V | 0.4133 | 0.4215 | 0.4174 | 0.4083 | 0.4216 | 0.4027 | 0.1438 | 0.1230 | 0.1326 |
| KNNWW2V | 0.2332 | 0.2126 | 0.2225 | ||||||
| KNNW2P | 0.4027 | 0.4106 | 0.4066 | 0.3968 | 0.4098 | 0.3914 | 0.1201 | 0.1018 | 0.1102 |
| KNNWW2P | 0.4392 | 0.4478 | 0.4435 | 0.4351 | 0.4515 | 0.4300 | 0.1970 | 0.1786 | 0.1873 |
| KNND2V | 0.4271 | 0.4355 | 0.4313 | 0.4207 | 0.4361 | 0.4156 | 0.1726 | 0.1450 | 0.1576 |
| KNNW2V-TFIDF | 0.4526 | 0.4615 | 0.4570 | 0.4516 | 0.4710 | 0.4472 | 0.3359 | 0.3027 | 0.3185 |
| KNNWW2V-TFIDF | 0.4602 | 0.4693 | 0.4647 | 0.4593 | 0.4793 | 0.4549 | 0.3412 | 0.3091 | 0.3244 |
| KNNW2P-TFIDF | 0.4750 | 0.4844 | 0.4797 | 0.4752 | 0.4951 | 0.4702 | 0.3371 | 0.3054 | 0.3205 |
| KNNWW2P-TFIDF | 0.4768 | 0.4862 | 0.4814 | 0.4764 | 0.4963 | 0.4714 | 0.3398 | 0.3095 | 0.3239 |
| KNND2V-TFIDF | |||||||||
Comparison of binary relevance approaches with different features
| Method | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| BC D2V | 0.4395 | 0.4482 | 0.4438 | 0.4339 | 0.4519 | 0.4294 | 0.1627 | 0.1706 | 0.1666 |
| BC | 0.5584 | 0.5694 | 0.5638 | 0.5575 | 0.5892 | 0.5556 | 0.4662 | ||
| BCnorm | 0.5716 | 0.5829 | 0.5772 | 0.5704 | 0.5991 | 0.5667 | 0.4463 | 0.4402 | 0.4432 |
| BCD2V-TFIDF | 0.4633 | 0.4686 |
Performance improvement of MeSHRanker by incorporating deep semantic representation
| Method | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| MTIDEF | 0.5753 | 0.5526 | 0.5637 | 0.5838 | 0.5737 | 0.5566 | 0.4939 | 0.5140 | 0.5037 |
| BCD2V-TFIDF | 0.5974 | 0.6092 | 0.6033 | 0.5983 | 0.6280 | 0.5943 | 0.4741 | 0.4633 | 0.4686 |
| MeSHRanker | 0.6067 | 0.6187 | 0.6126 | 0.6091 | 0.6400 | 0.6053 | 0.5249 | 0.5400 | 0.5323 |
| + Step 2 of DeepMeSH | 0.6156 | 0.6278 | 0.6216 | 0.6180 | 0.6492 | 0.6141 | 0.5361 | 0.5476 | 0.5418 |
| + Steps 1 and 2 of DeepMeSH |
Performance comparison of DeepMeSH with MTIDEF and MeSHLabeler (P-values are shown in the parentheses)
| Method | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| MTIDEF | 0.5753 | 0.5526 | 0.5637 | 0.5838 | 0.5737 | 0.5566 | 0.4939 | 0.5140 | 0.5037 |
| (2.67E-85) | (1.28E-75) | (1.12E-86) | (1.16E-81) | (1.76E-73) | (8.17E-85) | (3.89E-69) | (2.19E-41) | (3.39E-63) | |
| MeSHLabeler | 0.6457 | 0.5995 | 0.6218 | 0.6480 | 0.6200 | 0.6145 | 0.5304 | 0.5216 | 0.5259 |
| (9.87E-54) | (1.43E-52) | (4.67E-60) | (3.24E-54) | (1.97E-50) | (1.49E-59) | (3.92E-48) | (5.25E-31) | (4.18E-45) | |
| DeepMeSH |
Fig. 2.Comparison with different representation
Fig. 3.Performance improvement examples
Improvement of DeepMeSH on EBF over MeSHLabeler for languages
| Language | Occurrences | MeSHLabeler | DeepMeSH | |
|---|---|---|---|---|
| 5021 | 0.6143 | 0.6251 | 2.58E–31 | |
| 6000 | 0.6139 | 0.6248 | 2.35E–37 |
The P-values in the last column are by paired t-test with Bonferroni multiple test correction.