| Literature DB >> 23802936 |
Antonio J Jimeno-Yepes1, Laura Plaza, James G Mork, Alan R Aronson, Alberto Díaz.
Abstract
BACKGROUND: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results.Entities:
Mesh:
Year: 2013 PMID: 23802936 PMCID: PMC3706357 DOI: 10.1186/1471-2105-14-208
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1MTI diagram.
Micro/macro average measures for MTI indexing on different types of documents
| Fulltext | 18185 | 12089 | 20125 | 0.3753 | 0.4797 | 0.4651 | 0.6163 | 0.5301 | |
| Medline | 18185 | 11117 | 7531 | 0.6113 | 0.5834 | ||||
| Gr-sum (15%) | 18185 | 11323 | 9982 | 0.5315 | 0.6227 | 0.5735 | 0.5051 | 0.5713 | 0.5362 |
| Gr-sum (30%) | 18185 | 11747 | 12585 | 0.4828 | 0.6460 | 0.5526 | 0.4932 | 0.5938 | 0.5388 |
| Gr-sum (50%) | 18185 | 11971 | 15304 | 0.4389 | 0.6583 | 0.5267 | 0.4843 | 0.6094 | 0.5397 |
| CF-sum (15%) | 18185 | 11955 | 15311 | 0.4385 | 0.6574 | 0.5261 | 0.4823 | 0.6083 | 0.5380 |
| CF-sum (30%) | 18185 | 11971 | 15355 | 0.4381 | 0.6583 | 0.5261 | 0.4823 | 0.6082 | 0.5380 |
| CF-sum (50%) | 18185 | 11999 | 16050 | 0.4278 | 0.6598 | 0.5191 | 0.4781 | 0.6108 | 0.5364 |
Result for the five terms with highest number of positive index entries
| | ||||||||||
| Humans | 864 | 0.6938 | 0.9861 | 0.8145 | 0.8407 | 0.9225 | 0.8797 | 0.8056 | 0.9259 | 0.8616 |
| Animals | 455 | 0.5743 | 0.9429 | 0.7138 | 0.9037 | 0.7429 | 0.8154 | 0.8326 | 0.7978 | 0.8148 |
| Female | 437 | 0.4468 | 0.9314 | 0.6039 | 0.7167 | 0.7643 | 0.7398 | 0.6329 | 0.8284 | 0.7175 |
| Male | 406 | 0.4374 | 0.9039 | 0.5896 | 0.7400 | 0.7709 | 0.7551 | 0.6069 | 0.7833 | 0.6839 |
| Adult | 253 | 0.3036 | 0.7391 | 0.4304 | 0.6048 | 0.6957 | 0.6471 | 0.4972 | 0.7075 | 0.5840 |
Micro/macro average results for different indexing algorithms and different types of documents
| FullText | MTI | 0.3753 | 0.6648 | 0.4797 | 0.4651 | 0.6163 | 0.5301 |
| | MMI | 0.3001 | 0.3536 | 0.3247 | 0.3692 | 0.4610 | 0.4100 |
| | PRC | 0.6548 | 0.0968 | 0.1686 | 0.1237 | 0.0695 | 0.0890 |
| MEDLINE | MTI | 0.5961 | 0.6113 | 0.6036 | 0.5409 | 0.5834 | 0.5614 |
| | MMI | 0.3731 | 0.3189 | 0.3439 | 0.4139 | 0.4547 | 0.4334 |
| | PRC | 0.6517 | 0.0710 | 0.1280 | 0.1059 | 0.0483 | 0.0663 |
| Gr-summ (15%) | MTI | 0.5315 | 0.6227 | 0.5735 | 0.5051 | 0.5713 | 0.5362 |
| | MMI | 0.3369 | 0.2994 | 0.3171 | 0.3550 | 0.4081 | 0.3797 |
| PRC | 0.6625 | 0.0692 | 0.1253 | 0.1074 | 0.0546 | 0.0724 |
MeSH term ranking per document
| MTI | 0.2714 | 0.3932 | 0.3589 |
| MMI | 0.1277 | 0.1457 | 0.1253 |
| PRC | 0.0337 | 0.0284 | 0.0284 |
| P@0R | FullText | MEDLINE | Gr-summ (15%) |
| MTI | 0.5750 | 0.7946 | 0.7403 |
| MMI | 0.4703 | 0.5527 | 0.5036 |
| PRC | 0.0905 | 0.0700 | 0.0700 |
| P@5 | FullText | MEDLINE | Gr-summ (15%) |
| MTI | 0.2938 | 0.5308 | 0.4610 |
| MMI | 0.2333 | 0.2917 | 0.2573 |
| PRC | 0.0313 | 0.0251 | 0.0251 |
Results on the 30 most frequent MeSH headings
| MTI | 0.4765 | 0.5697 | 0.6446 | 0.5874 | 0.6748 | 0.6281 | |||
| MMI | 0.4302 | 0.1512 | 0.2238 | 0.4265 | 0.1448 | 0.2162 | |||
| PRC | 0.5498 | 0.0887 | 0.1536 | 0.5628 | 0.0650 | 0.1166 | |||
| ML-SVM | 0.3555 | 0.6391 | 0.4584 | 0.5993 | 0.3376 | 0.4319 | |||
| ML-Ada | 0.4959 | 0.3883 | 0.4355 | 0.5603 | 0.3316 | 0.4166 | 0.5362 | 0.3129 | 0.3952 |
MeSH 2012 top level branch codes
| A | Anatomy |
| B | Organisms |
| C | Diseases |
| D | Chemicals and Drugs |
| E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
| F | Psychiatry and Psychology |
| G | Phenomena and Processes |
| H | Disciplines and Occupations |
| I | Anthropology, Education, Sociology and Social Phenomena |
| J | Technology, Industry, Agriculture |
| K | Humanities |
| L | Information Science |
| M | Named Groups |
| N | Health Care |
| Z | Geographicals |