| Literature DB >> 28231809 |
James Mork1, Alan Aronson2, Dina Demner-Fushman2.
Abstract
BACKGROUND: Facing a growing workload and dwindling resources, the US National Library of Medicine (NLM) created the Indexing Initiative project in 1996. This cross-library team's mission is to explore indexing methodologies for ensuring quality and currency of NLM document collections. The NLM Medical Text Indexer (MTI) is the main product of this project and has been providing automated indexing recommendations since 2002. After all of this time, the questions arise whether MTI is still useful and relevant.Entities:
Keywords: BioASQ; Indexing methods; MEDLINE; Machine learning; MeSH; Text categorization
Mesh:
Year: 2017 PMID: 28231809 PMCID: PMC5324252 DOI: 10.1186/s13326-017-0113-5
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1MTI processing flow diagram
MeSH terms MTI considers check tag
|
| History, 18th Century |
|
| History, 19th Century |
|
| History, 20th Century |
|
| History, 21st Century |
| Animals | History, Ancient |
| Bees | History, Medieval |
| Cats | Horses |
| Cattle |
|
| Cercopithecus aethiops |
|
| Chick Embryo | Infant, Newborn |
| Child |
|
|
| Mice |
| Cricetinae |
|
| Dogs | Pregnancy |
|
| Rabbits |
| Guinea Pigs | Rats |
| History of Medicine | Sheep |
| History, 15th Century |
|
| History, 16th Century | United States |
| History, 17th Century |
|
All bolded check tags represent machine learning suggested check tags
Fig. 2Hooper’s measure of indexing consistency
Fig. 3Percentage of indexing production referenced via MTI
Fig. 4Average daily usage of MTI by indexers
MTI web usage statistics 2012 – 2014
| 2012 | 2013 | 2014 | |
|---|---|---|---|
| MTI Requests | 44,970 | 42,919 | 87,549 |
| # Items processed | 3,148,431 | 7,963,477 | 11,294,998 |
| MeSH on demand requests | – | – | 225,750 |
| # Different domains | 118 | 124 | 147 |
Fig. 5MTI and MTIFL performance 2007 – 2014
Inter-indexer consistency statistics - past and present studies
| Marcetich & Schuyler | |||||||
|---|---|---|---|---|---|---|---|
| Lancaster | Leonard | Manual | Computer | Funk & Reid | MTI | MTIFL | |
| Year of study | 1968 | 1975 | 1981 | 1981 | 1983 | 2014 | 2014 |
| Number of articles | 16 | 100 | 50 | 50 | 760 | 673,125 | 27,068 |
| Checktags (CT) | – | – | – | – | 74.70% | 62.01% | 70.91% |
| Geographics (GEOG) | – | – | – | – | 56.60% | 41.52% | 57.24% |
| Descriptors (DESC) | 46.10% | 48.20% | – | – | 55.40% | 40.85% | 53.97% |
| Main headings (MH) | – | – | – | – | 48.20% | 35.17% | 48.89% |
| All main headings (no Checktags) | – | – | 39% | 43% | – | 35.29% | 49.12% |
2013 BioASQ results as of October 21, 2013 for winning system and MTI/MTIFL
| Batch | # Articles | System name | Precision | Recall | F 1 |
|---|---|---|---|---|---|
| 1 | 10,681 | System3 | 0.5602 | 0.5735 | 0.5668 |
| MTIFL | 0.5940 | 0.5196 | 0.5543 | ||
| 2 | 11,808 | System1 | 0.5921 | 0.5670 | 0.5793 |
| MTIFL | 0.6127 | 0.5050 | 0.5537 | ||
| 3 | 9828 | MTI | 0.5610 | 0.6193 | 0.5887 |
| MTIFL | 0.6027 | 0.5653 | 0.5834 | ||
| System1 | 0.5873 | 0.5760 | 0.5816 |
2014 BioASQ results as of August 5, 2014 for winning system and MTI/MTIFL
| Batch | # Articles | System name | Precision | Recall | F 1 |
|---|---|---|---|---|---|
| 1 | 17,061 | Asclepius | 0.5958 | 0.5923 | 0.5941 |
| MTI | 0.5908 | 0.5614 | 0.5757 | ||
| MTIFL | 0.6284 | 0.5199 | 0.5690 | ||
| 2 | 17,073 | Antinomyra SYS1 | 0.6189 | 0.5863 | 0.6022 |
| MTI | 0.6012 | 0.5621 | 0.5810 | ||
| MTIFL | 0.6176 | 0.5367 | 0.5743 | ||
| 3 | 18,256 | Antinomyra SYS1 | 0.6527 | 0.6120 | 0.6317 |
| MTI | 0.6099 | 0.5646 | 0.5864 | ||
| MTIFL | 0.6400 | 0.5257 | 0.5773 |
Fig. 6Title and abstract versus full text example (PMID: 24000132)