| Literature DB >> 19025687 |
Aurélie Névéol1, Sonya E Shooshan, Vincent Claveau.
Abstract
BACKGROUND: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE.Entities:
Mesh:
Year: 2008 PMID: 19025687 PMCID: PMC2586750 DOI: 10.1186/1471-2105-9-S11-S11
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1ALEPH Algorithm.
ILP representation of a MEDLINE citation
| Excerpt of a sample citation | Excerpt of ILP representation in |
| PMID – 16179550 | in_article(pmid16179550, mh16179550_1). |
| MH – Acrylamide/*pharmacology | hierarchy(mh16179550_1,"Acrylamide"). |
| MH – Animals | in_article(pmid16179550, mh16179550_2). |
| MH – Astrocytes/drug effects | hierarchy(mh16179550_2,"Animals"). |
| MH – Histidine/physiology | in_article(pmid16179550, mh16179550_3). |
| MH – Amino Acid Transport | hierarchy(mh16179550_3,"Astrocytes"). |
| Systems/*biosynthesis | in_article(pmid16179550, mh16179550_4). |
| ... | ... |
Figure 2Excerpt of a search space structured by θ-subsumption.
Figure 3Excerpt of a search space structured by generalized subsumption.
MeSH subheading distribution in MEDLINE
| 5 allowable qualifiers | 26 allowable qualifiers | ||
| no subheading | 0.758 | no subheading | 0.047 |
| classification | 0.008 | classification | 0 |
| drug effect | 0.129 | drug effect | 0.082 |
| ethics | 0 | metabolism | 0.117 |
| physiology | 0.101 | pathology | 0.171 |
| radiation effect | 0.004 | radiation effect | 0.011 |
Performance of ILP rule inference on MEDLINE citations
| | | | | |||||
| 5,300 | 40,000 | 75 mn | 41.4 | 53.0 | 46.5 | |
| 5,700 | 30,500 | 51 mn | 50.4 | 59.4 | 54.5 | |
| 4,500 | 21,000 | 37 mn | 42.4 | 60.2 | 49.7 | |
| 5,000 | 22,000 | 45 mn | 48.8 | 53.9 | 51.2 | |
| 5,200 | 34,000 | 46 mn | 41.4 | 41.5 | 41.5 |
Performance on the test corpus using MTI main heading recommendations
| ILP | 166 | 38 | |||
| Manual | 1 | 1 | 1 | ||
| ILP-filtered | 95 | 45 | 25 | 32 | |
| ILP-reviewed | 124 | 37 | |||
| Baseline | - | 26 | 9 | 13 | |
| ILP | 200 | 55 | |||
| Manual | 226 | 28 | 39 | ||
| ILP-filtered | 181 | 55 | |||
| ILP-reviewed | 172 | 55 | |||
| Baseline | - | 33 | 10 | 15 | |
| ILP | 134 | 49 | |||
| Manual | 61 | 20 | 30 | ||
| ILP-filtered | 123 | 49 | |||
| ILP-reviewed | 73 | 49 | |||
| Baseline | - | 37 | 12 | 18 | |
| ILP | 217 | 47 | |||
| Manual | 7 | 3 | 5 | ||
| ILP-filtered | 183 | 48 | |||
| ILP-reviewed | 74 | 47 | |||
| Baseline | - | 28 | 12 | 17 | |
| ILP | 70 | ||||
| Manual | 0 | - | - | - | |
| ILP-filtered | 64 | ||||
| ILP-reviewed | 70 | ||||
| Baseline | - | 28 | 10 | 15 | |
Performance of MTI's subheading attachment module using Manual vs. ILP post-processing rules
| ILP | ||||
| Manual | 17 | 25 | ||
| ILP | 55 | |||
| Manual | 38 | 45 | ||
| ILP | 51 | |||
| Manual | 33 | 42 | ||
| ILP | 50 | |||
| Manual | 24 | 33 | ||
| ILP | ||||
| Manual | 44 | 23 | 30 | |
| ILP | 49 | |||
| Manual | 23 | 32 | ||