| Literature DB >> 26072501 |
Ke Liu1, Shengwen Peng1, Junqiu Wu2, Chengxiang Zhai2, Hiroshi Mamitsuka2, Shanfeng Zhu3.
Abstract
MOTIVATION: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation.Entities:
Mesh:
Year: 2015 PMID: 26072501 PMCID: PMC4765864 DOI: 10.1093/bioinformatics/btv237
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The first, 100th, 1000th, 10,000th, 20,000th and 25,000th MeSH in terms of the number of appearances in 12,504,999 abstracts, which we used in our experiments
| Rank | Counts | MeSH (ID) |
|---|---|---|
| 1 | 8,152,852 | Humans (6801) |
| 100 | 129,816 | Risk Assessment (18,570) |
| 1000 | 23,178 | Soil (12,987) |
| 10,000 | 1532 | Transplantation Tolerance (23,001) |
| 20,000 | 199 | Hypnosis, Anesthetic (6991) |
| 25,000 | 31 | Pandanaceae (31,673) |
Fig. 1.The work flow of (a) MeSHLabeler and (b) MeSHRanker
Fig. 2.Precision/recall curves of LogReg for four MeSH: Humans, Cell Survival, Prosthesis Failure and Follicular Fluid
Fig. 3.Precision of LogReg by changing threshold, for four MeSH, which were used in Figure 2
Performance comparison of MLogRegN with typical existing methods
| Methods | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| MLogReg:MetaLabeler with LogReg | 0.5576 | 0.5614 | 0.5595 | 0.5555 | 0.5772 | 0.5502 | 0.4600 | 0.4623 | 0.4612 |
| MLogRegN:MLogReg with score normalization | 0.5734 | 0.5702 | 0.5884 | 0.5628 | 0.4508 | 0.4175 | 0.4335 | ||
| KNN | 0.5196 | 0.5231 | 0.5213 | 0.5176 | 0.5314 | 0.5095 | 0.4142 | 0.3733 | 0.3927 |
| Pattern matching using titles only | 0.5151 | 0.1273 | 0.2041 | 0.5112 | 0.1426 | 0.2101 | 0.3444 | 0.1997 | 0.2528 |
| Pattern matching using abstracts only | 0.2315 | 0.2990 | 0.2609 | 0.2445 | 0.3117 | 0.2582 | 0.3607 | 0.3956 | 0.3773 |
| Pattern matching using both titles and abstracts | 0.2363 | 0.3139 | 0.2696 | 0.2498 | 0.3291 | 0.2681 | 0.3739 | 0.4153 | 0.3935 |
| MTIFL | 0.5217 | 0.5642 | 0.5386 | 0.5549 | 0.4923 | 0.5038 | |||
| MTIDEF | 0.5740 | 0.5707 | 0.5724 | 0.5785 | 0.5909 | 0.5128 |
Performance comparison of MLogRegN and MeSHRanker with different types of evidence which were incrementally added
| Step | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| MLogRegN | 0.5734 | 0.5774 | 0.5754 | 0.5702 | 0.5884 | 0.5628 | 0.4508 | 0.4175 | 0.4335 |
| MeSHRanker (MLogReg+KNN) | 0.5724 | 0.5763 | 0.5743 | 0.5708 | 0.5900 | 0.5637 | 0.4597 | 0.4396 | 0.4495 |
| +MLogRegN | 0.5878 | 0.5919 | 0.5899 | 0.5878 | 0.6072 | 0.5802 | 0.4741 | 0.4472 | 0.4602 |
| +MeSH dependency | 0.5937 | 0.5978 | 0.5957 | 0.5935 | 0.6134 | 0.5861 | 0.4889 | 0.4988 | 0.4938 |
| +Pattern Matching | 0.6036 | 0.6077 | 0.6056 | 0.6043 | 0.6242 | 0.5966 | 0.5162 | 0.5248 | 0.5205 |
| +MeSH frequency | 0.6038 | 0.6079 | 0.6059 | 0.6043 | 0.6243 | 0.5967 | 0.5166 | 0.5205 | 0.5187 |
| +MTI |
Performance of MeSHRanker and MeSHLabeler, comparing with MTIDEF, a current cutting-edge indexing tool provided by NLM
| Step | MiP | MiR | MiF | EBP | EBR | EBF | MaP | MaR | MaF |
|---|---|---|---|---|---|---|---|---|---|
| MTIDEF | 0.5740 | 0.5707 | 0.5724 | 0.5785 | 0.5909 | 0.5645 | 0.5128 | 0.5372 | 0.5247 |
| MeSHRanker | 0.6145 | 0.6166 | 0.6159 | 0.6082 | 0.5364 | ||||
| MeSHLabeler | 0.5959 | 0.6108 | 0.5172 | 0.5054 |