| Literature DB >> 29196280 |
Isabel Segura Bedmar1, Paloma Martínez1, Adrián Carruana Martín1.
Abstract
BACKGROUND: Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature.Entities:
Keywords: Medical Subject Headings; information storage and retrieval; semantic indexing
Year: 2017 PMID: 29196280 PMCID: PMC5732329 DOI: 10.2196/medinform.7059
Source DB: PubMed Journal: JMIR Med Inform
Figure 1JSON-based format for the training data in the biomedical semantic indexing and question answering challenge BioASQ task 4a.
Main works for biomedical semantic indexing.
| System | Type | Guidelines | Search engine | Approach | F1 |
| MTIa, Mork et al [ | Hierarchical | Yes | PubMed | MetaMap, k-NNb | 0.548 |
| AUTH-Atypon, Papanikolaou et al [ | Flat | No | No | SVMc with NLPd features | 0.578 |
| NCBIe, Mao et al [ | Flat | No | No | SVM + k-NN | 0.605 |
| Antinomyra, Liu et al [ | Flat | No | No | k-NN + logistic regression | 0.619 |
| Ribadas et al [ | Hierarchical | No | No | Bayesian network | 0.615 |
| Kosmopoulos et al [ | Flat | No | No | k-NN + word embeddings | 0.57 |
| Peng et al [ | Flat | No | No | k-NN + word embeddings | 0.632 |
aMTI: Medical Text Indexer.
bk-NN: k-nearest neighbors.
cSVM: Support Vector Machine.
dNLP: Natural Language Processing.
eNCBI: National Center for Biotechnology Information.
Figure 2Medical Subject Headings (MeSH) descriptor data for the term "Lymphoma".
Figure 3MeSH terms for the abstract with Pubmed unique identifier (PMID)=26852276.
Size of datasets (number of documents).
| Dataset | Documents, n |
| Training | 10,099,281 |
| Development | 1099 |
| Test | 13,936 |
Figure 4Architecture of our system.
Figure 5The element wi,d is the frequency of the term i in the document d.
Figure 6Cosine similarity between a document d and a query w, where V(q).V(d) is the dot product of their vectors, and |V(q)| and |V(d)| are their Euclidean norms.
Figure 7Scoring function to rank Medical Subject Headings (MeSH) term.
Figure 8BlazeGraph query to obtain the ancestors of the term "Lymphoma".
Figure 9List of ancestors for the term "Lymphoma" provided by BlazeGraph.
MeSH (Medical Subject Headings) terms proposed by our system for the article with PMID (PubMed unique identifier)=25676421.
| MeSHa exploiting the hierarchy of MeSH | |
| Ataxia Telangiectasia Mutated Proteins | Ataxia |
| Telangiectasia Mutated | |
| Proteins | |
| B-Lymphocytes | B-Lymphocytes |
| Cell Cycle Proteins | Cell Cycle Proteins |
| DNA-Binding Proteins | DNA-Binding Proteins |
| Humans | Humans |
| Protein-Serine-Threonine Kinases | Protein-Serine-Threonine |
| Kinases | |
| Animals | Animals |
| Genomic Instability | Genomic Instability |
| Mice, Knockout | Mice, Knockout |
| Cyclin D1 Mice In Situ Hybridization, Fluorescence | Cyclin D1 In Situ |
| Hybridization, Fluorescence | |
| Immune System Diseases | Lymphoma, B-Cell |
| Ataxia Telangiectasia | |
| Lymphoma |
aMeSH: Medical Subject Headings.
Comparison of search times on the Resource Description Framework (RDF) format and the graph database of the MeSH (Medical Subject Headings) thesaurus.
| MeSHa terms | RDFb in msc | |
| Lymphoma, B-Cell | 193.39 | 112 |
| Cyclin D1 | 210.44 | 100 |
| Mice, Knockout | 239.86 | 130 |
aMeSH: Medical Subject Headings.
bRDF: Resource Description Framework.
cms: milliseconds.
Experimental results on our development dataset exploiting the hierarchical structure of Medical Subject Headings (MeSH).
| LCA-Pa | LCA-Rb | LCA-Fc | ||||
| Elastic-10-0 | 0.3021 | 0.8784 | 0.4386 | 0.2061 | 0.6046 | 0.3006 |
| Elastic-10-1.5 | 0.6290 | 0.6213 | 0.6039 | 0.4146 | 0.3979 | 0.3880 |
| Elastic-10-2.5 | 0.6599 | 0.6214 | 0.6179 | 0.4376 | 0.3981 | 0.3982 |
| Elastic-10-4 | 0.7371 | 0.6130 | 0.6466 | 0.4936 | 0.3927 | 0.4179 |
| Elastic-10-5 | 0.7898 | 0.5987 | 0.6576 | 0.5316 | 0.3843 | 0.4256 |
| Elastic-10-6 | 0.7434 | 0.6107 | 0.6475 | 0.4986 | 0.3914 | 0.4185 |
| Elastic-10-7 | 0.7904 | 0.5980 | 0.6573 | 0.5321 | 0.3840 | 0.4255 |
| Elastic-10-8 | 0.7968 | 0.5937 | 0.6566 | 0.5372 | 0.3415 | 0.4254 |
| Elastic-10-9 | 0.7910 | 0.5976 | 0.6571 | 0.5325 | 0.3838 | 0.4255 |
| Elastic-20-0 | 0.2174 | 0.9248 | 0.3441 | 0.1564 | 0.6530 | 0.2475 |
| Elastic-20-1.5 | 0.5268 | 0.6303 | 0.5546 | 0.3396 | 0.4045 | 0.3542 |
| Elastic-20-2.5 | 0.5723 | 0.6331 | 0.5803 | 0.3709 | 0.4044 | 0.3703 |
| Elastic-20-4 | 0.6266 | 0.6332 | 0.6080 | 0.4108 | 0.4047 | 0.3901 |
| Elastic-20-5 | 0.6879 | 0.6294 | 0.6350 | 0.4580 | 0.4037 | 0.4100 |
| Elastic-20-6 | 0.6413 | 0.6333 | 0.6150 | 0.4221 | 0.4054 | 0.3953 |
| Elastic-20-7 | 0.6914 | 0.6296 | 0.6367 | 0.4605 | 0.4039 | 0.4112 |
| Elastic-20-8 | 0.7104 | 0.6279 | 0.6441 | 0.4755 | 0.4041 | 0.4173 |
| Elastic-20-9 | 0.6945 | 0.6299 | 0.6383 | 0.4630 | 0.4043 | 0.4124 |
| Elastic-30-0 | 0.1790 | 0.9434 | 0.2945 | 0.1331 | 0.6770 | 0.2185 |
| Elastic-30-1.5 | 0.4776 | 0.6388 | 0.5290 | 0.3040 | 0.4108 | 0.3359 |
| Elastic-30-2.5 | 0.5231 | 0.6341 | 0.5537 | 0.3374 | 0.4076 | 0.3537 |
| Elastic-30-4 | 0.5652 | 0.6323 | 0.5766 | 0.3668 | 0.4046 | 0.3685 |
| Elastic-30-5 | 0.6200 | 0.6354 | 0.6067 | 0.4063 | 0.4060 | 0.3888 |
| Elastic-30-6 | 0.5831 | 0.6332 | 0.5864 | 0.3786 | 0.4050 | 0.3748 |
| Elastic-30-7 | 0.6256 | 0.6359 | 0.6096 | 0.4104 | 0.4069 | 0.3911 |
| Elastic-30-8 | 0.6506 | 0.6369 | 0.6217 | 0.4293 | 0.4070 | 0.3993 |
| Elastic-30-9 | 0.6302 | 0.6364 | 0.6120 | 0.4139 | 0.4069 | 0.3928 |
| Elastic-40-0 | 0.1555 | 0.9532 | 0.2621 | 0.1184 | 0.6924 | 0.1988 |
| Elastic-40-1.5 | 0.4412 | 0.6473 | 0.5081 | 0.2801 | 0.4166 | 0.3227 |
| Elastic-40-2.5 | 0.4915 | 0.6383 | 0.5366 | 0.3145 | 0.4106 | 0.3417 |
| Elastic-40-4 | 0.5302 | 0.6343 | 0.5582 | 0.3404 | 0.4077 | 0.3556 |
| Elastic-40-5 | 0.5755 | 0.6359 | 0.5840 | 0.3726 | 0.4069 | 0.3726 |
| Elastic-40-6 | 0.5472 | 0.6356 | 0.5682 | 0.3521 | 0.4073 | 0.3618 |
| Elastic-40-7 | 0.5819 | 0.6370 | 0.5878 | 0.3777 | 0.4083 | 0.3758 |
| Elastic-40-8 | 0.6073 | 0.6374 | 0.6011 | 0.3959 | 0.4077 | 0.3847 |
| Elastic-40-9 | 0.5870 | 0.6374 | 0.5908 | 0.3813 | 0.4082 | 0.3777 |
| Elastic-50-0 | 0.1395 | 0.9603 | 0,239 | 0.1082 | 0.7045 | 0.1846 |
| Elastic-50-1.5 | 0.4161 | 0.6542 | 0.4930 | 0.2628 | 0.4226 | 0.3127 |
| Elastic-50-2.5 | 0.4669 | 0.6445 | 0.5239 | 0.2965 | 0.4151 | 0.3328 |
| Elastic-50-4 | 0.5008 | 0.6392 | 0.5431 | 0.3192 | 0.4112 | 0.3452 |
| Elastic-50-5 | 0.5447 | 0.6357 | 0.5670 | 0.3500 | 0.4081 | 0.3610 |
| Elastic-50-6 | 0.5192 | 0.6390 | 0.5538 | 0.3324 | 0.4096 | 0.3518 |
| Elastic-50-7 | 0.5507 | 0.6361 | 0.5702 | 0.3548 | 0.4086 | 0.3637 |
| Elastic-50-8 | 0.5767 | 0.6366 | 0.5849 | 0.3734 | 0.4074 | 0.3734 |
| Elastic-50-9 | 0.5560 | 0.6360 | 0.5733 | 0.3585 | 0.4081 | 0.3656 |
aLCA-P: lowest common ancestor Precision.
bLCA-R: lowest common ancestor Recall.
cLCA-F: lowest common ancestor F-measure.
Experimental results on our development dataset without using the hierarchical structure of Medical Subject Headings.
| LCA-Pa | LCA-Rb | LCA-Fc | ||||
| Elastic-10-0 | 0.4201 | 0.6273 | 0.4858 | 0.2678 | 0.4074 | 0.3104 |
| Elastic-10-1.5 | 0.5737 | 0.7755 | 0.6439 | 0.3749 | 0.5260 | 0.4258 |
| Elastic-10-2.5 | 0.6128 | 0.7598 | 0.6602 | 0.4017 | 0.5151 | 0.4374 |
| Elastic-10-4 | 0.7102 | 0.7125 | 0.6927 | 0.4701 | 0.4812 | 0.4599 |
| Elastic-10-5 | 0.7724 | 0.6755 | 0.7010 | 0.5141 | 0.4515 | 0.4636 |
| Elastic-10-6 | 0.7178 | 0.7074 | 0.6935 | 0.4761 | 0.4773 | 0.4605 |
| Elastic-10-7 | 0.7731 | 0.6746 | 0.7007 | 0.5149 | 0.4508 | 0.4634 |
| Elastic-10-8 | 0.7803 | 0.6684 | 0.6997 | 0.5204 | 0.4456 | 0.4624 |
| Elastic-10-9 | 0.7738 | 0.6740 | 0.7005 | 0.5154 | 0.4505 | 0.4634 |
| Elastic-20-0 | 0.3498 | 0.6548 | 0.4413 | 0.2229 | 0.4274 | 0.2829 |
| Elastic-20-1.5 | 0.4263 | 0.8527 | 0.5559 | 0.2821 | 0.5856 | 0.3723 |
| Elastic-20-2.5 | 0.4859 | 0.8324 | 0.5982 | 0.3191 | 0.5702 | 0.3982 |
| Elastic-20-4 | 0.5631 | 0.8008 | 0.6458 | 0.3678 | 0.5459 | 0.4280 |
| Elastic-20-5 | 0.6433 | 0.7643 | 0.6820 | 0.4230 | 0.5213 | 0.4534 |
| Elastic-20-6 | 0.5822 | 0.7926 | 0.6547 | 0.3811 | 0.5406 | 0.4345 |
| Elastic-20-7 | 0.6479 | 0.7627 | 0.6838 | 0.4265 | 0.5203 | 0.4549 |
| Elastic-20-8 | 0.6713 | 0.7515 | 0.6917 | 0.4434 | 0.5131 | 0.4609 |
| Elastic-20-9 | 0.6518 | 0.7608 | 0.6852 | 0.4296 | 0.5195 | 0.4563 |
| Elastic-30-0 | 0.3141 | 0.6747 | 0.4152 | 0.2023 | 0.4444 | 0.2690 |
| Elastic-30-1.5 | 0.3538 | 0.8876 | 0.4950 | 0.2380 | 0.6146 | 0.3362 |
| Elastic-30-2.5 | 0.4165 | 0.8668 | 0.5492 | 0.2760 | 0.5972 | 0.3686 |
| Elastic-30-4 | 0.4769 | 0.8429 | 0.5956 | 0.3124 | 0.5773 | 0.3959 |
| Elastic-30-5 | 0.5528 | 0.8115 | 0.6428 | 0.3602 | 0.5541 | 0.4254 |
| Elastic-30-6 | 0.5016 | 0.8336 | 0.6113 | 0.3281 | 0.5705 | 0.4059 |
| Elastic-30-7 | 0.5602 | 0.8087 | 0.6466 | 0.3652 | 0.5524 | 0.4281 |
| Elastic-30-8 | 0.5913 | 0.7952 | 0.6623 | 0.3860 | 0.5430 | 0.4388 |
| Elastic-30-9 | 0.5657 | 0.8067 | 0.6494 | 0.3690 | 0.5508 | 0.4300 |
| Elastic-40-0 | 0.2905 | 0.6895 | 0.3962 | 0.1881 | 0.4562 | 0.2581 |
| Elastic-40-1.5 | 0.3086 | 0.9071 | 0.4508 | 0.2112 | 0.6319 | 0.3106 |
| Elastic-40-2.5 | 0.3710 | 0.8862 | 0.5110 | 0.2484 | 0.6135 | 0.3460 |
| Elastic-40-4 | 0.4200 | 0.8675 | 0.5534 | 0.2777 | 0.5979 | 0.3710 |
| Elastic-40-5 | 0.4895 | 0.8416 | 0.6054 | 0.3200 | 0.5770 | 0.4020 |
| Elastic-40-6 | 0.4469 | 0.8591 | 0.5740 | 0.2942 | 0.5909 | 0.3834 |
| Elastic-40-7 | 0.4980 | 0.8383 | 0.6106 | 0.3254 | 0.5752 | 0.4054 |
| Elastic-40-8 | 0.5327 | 0.8242 | 0.6321 | 0.3471 | 0.5639 | 0.4187 |
| Elastic-40-9 | 0.5052 | 0.8359 | 0.6152 | 0.3300 | 0.5733 | 0.4083 |
| Elastic-50-0 | 0.2719 | 0.7006 | 0.3803 | 0.1776 | 0.4651 | 0.2496 |
| Elastic-50-1.5 | 0.2769 | 0.9204 | 0.4168 | 0.1925 | 0.6458 | 0.2911 |
| Elastic-50-2.5 | 0.3379 | 0.9003 | 0.4805 | 0.2287 | 0.6261 | 0.3282 |
| Elastic-50-4 | 0.3791 | 0.8845 | 0.5192 | 0.2527 | 0.6118 | 0.3504 |
| Elastic-50-5 | 0.4420 | 0.8615 | 0.5718 | 0.2904 | 0.5926 | 0.3813 |
| Elastic-50-6 | 0.4065 | 0.8757 | 0.5425 | 0.2694 | 0.6042 | 0.3643 |
| Elastic-50-7 | 0.4513 | 0.8578 | 0.5783 | 0.2961 | 0.5901 | 0.3854 |
| Elastic-50-8 | 0.4876 | 0.8445 | 0.6043 | 0.3184 | 0.5793 | 0.4010 |
| Elastic-50-9 | 0.4594 | 0.8555 | 0.5841 | 0.3012 | 0.5883 | 0.3890 |
aLCA-P: lowest common ancestor Precision.
bLCA-R: lowest common ancestor Recall.
cLCA-F: lowest common ancestor F-measure.
Results on the biomedical semantic indexing and question answering 2016 test datasets (exploiting the Medical Subject Headings hierarchy.
| LCA-Pa | LCA-Rb | LCA-F1c | |||||
| Week1 | 0.705 | 0.619 | 0.635 | 0.470 | 0.389 | 0.406 | |
| Week2 | 0.717 | 0.627 | 0.646 | 0.476 | 0.397 | 0.413 | |
| Week3 | 0.701 | 0.625 | 0.635 | 0.467 | 0.395 | 0.407 | |
| Week4 | 0.725 | 0.613 | 0.643 | 0.486 | 0.385 | 0.410 | |
| Week5 | 0.707 | 0.624 | 0.638 | 0.474 | 0.398 | 0.410 | |
| Week1 | 0.695 | 0.633 | 0.637 | 0.457 | 0.398 | 0.405 | |
| Week2 | 0.713 | 0.637 | 0.649 | 0.467 | 0.410 | 0.412 | |
| Week3 | 0.691 | 0.637 | 0.673 | 0.464 | 0.402 | 0.410 | |
| Week4 | 0.676 | 0.659 | 0.641 | 0.446 | 0.420 | 0.4120 | |
| Week5 | 0.686 | 0.660 | 0.648 | 0.448 | 0.414 | 0.409 | |
| Week1 | 0.701 | 0.625 | 0.639 | 0.461 | 0.403 | 0.410 | |
| Week2 | 0.698 | 0.652 | 0.648 | 0.457 | 0.408 | 0.407 | |
| Week3 | 0.694 | 0.641 | 0.641 | 0.447 | 0.406 | 0.405 | |
| Week4 | 0.429 | 0.513 | 0.399 | 0.284 | 0.264 | 0.258 | |
| Week5 | 0.674 | 0.660 | 0.640 | 0.447 | 0.419 | 0.409 | |
aLCA-P: lowest common ancestor Precision.
bLCA-R: lowest common ancestor Recall.
cLCA-F: lowest common ancestor F-measure.
Results on the biomedical semantic indexing and question answering 2016 test datasets (without exploiting the Medical Subject Headings hierarchy).
| LCA-Pa | LCA-Rb | LCA-F1c | |||||
| Week1 | 0.665 | 0.753 | 0.687 | 0.438 | 0.503 | 0.452 | |
| Week2 | 0.674 | 0.767 | 0.700 | 0.441 | 0.513 | 0.460 | |
| Week3 | 0.661 | 0.755 | 0.684 | 0.437 | 0.509 | 0.453 | |
| Week4 | 0.683 | 0.749 | 0.697 | 0.451 | 0.502 | 0.460 | |
| Week5 | 0.667 | 0.757 | 0.690 | 0.438 | 0.509 | 0.455 | |
| Week1 | 0.655 | 0.755 | 0.681 | 0.427 | 0.501 | 0.445 | |
| Week2 | 0.669 | 0.758 | 0.692 | 0.427 | 0.508 | 0.454 | |
| Week3 | 0.653 | 0.757 | 0.681 | 0.433 | 0.509 | 0.452 | |
| Week4 | 0.639 | 0.764 | 0.674 | 0.420 | 0.516 | 0.445 | |
| Week5 | 0.643 | 0.797 | 0.692 | 0.417 | 0.531 | 0.451 | |
| Week1 | 0.666 | 0.746 | 0.684 | 0.437 | 0.512 | 0456 | |
| Week2 | 0.654 | 0.774 | 0.690 | 0.421 | 0.517 | 0.448 | |
| Week3 | 0.655 | 0.754 | 0.680 | 0.426 | 0.507 | 0.446 | |
| Week4 | 0.390 | 0.475 | 0.410 | 0.254 | 0.311 | 0.268 | |
| Week5 | 0.663 | 0.770 | 0.672 | 0.416 | 0.516 | 0.442 | |
aLCA-P: lowest common ancestor Precision.
bLCA-R: lowest common ancestor Recall.
cLCA-F: lowest common ancestor F-measure.
Baseline results provided by the Medical Text Indexer (MTI) tool. These results were taken from the biomedical semantic indexing and question answering website.
| LCA-Pa | LCA-Rb | LCA-F1c | |||||
| Week1 | 0.558 | 0.516 | 0.493 | 0.498 | 0.462 | 0.463 | |
| Week2 | 0.550 | 0.514 | 0.487 | 0.516 | 0.478 | 0.480 | |
| Week3 | 0.553 | 0.537 | 0.507 | 0.499 | 0.467 | 0.465 | |
| Week4 | 0.568 | 0.505 | 0.482 | 0.507 | 0.455 | 0.464 | |
| Week5 | 0.558 | 0.508 | 0.484 | 0.504 | 0.474 | 0.473 | |
| Week1 | 0.546 | 0.520 | 0.493 | 0.495 | 0.473 | 0.467 | |
| Week2 | 0.544 | 0.520 | 0.492 | 0.497 | 0.471 | 0.469 | |
| Week3 | 0.558 | 0.526 | 0.500 | 0.503 | 0.470 | 0.470 | |
| Week4 | 0.549 | 0.516 | 0.491 | 0.487 | 0.452 | 0.449 | |
| Week5 | 0.532 | 0.551 | 0.519 | 0.480 | 0.487 | 0.467 | |
| Week1 | 0.515 | 0.459 | 0.444 | 0.492 | 0.441 | 0.449 | |
| Week2 | 0.543 | 0.484 | 0.466 | 0.493 | 0.455 | 0.455 | |
| Week3 | 0.580 | 0.502 | 0.486 | 0.512 | 0.457 | 0.466 | |
| Week4 | 0.545 | 0.522 | 0.494 | 0.496 | 0.481 | 0.469 | |
| Week5 | 0.536 | 0.517 | 0.496 | 0.499 | 0.473 | 0.466 | |
aLCA-P: lowest common ancestor Precision.
bLCA-R: lowest common ancestor Recall.
cLCA-F: lowest common ancestor F-measure.
Results of the top systems in biomedical semantic indexing and question answering (BioASQ) task 4a. These scores were taken on December 5 from the BioASQ website.
| Batch | System | Week | Number of annotated articles | Total of articles | Precision | Recall | F1 |
| 1 | MeSHLabeler | 1 | 1853 | 3740 | 0.626 | 0.521 | 0.513 |
| MeSHLabeler | 2 | 1578 | 2872 | 0.625 | 0.515 | 0.506 | |
| MeSHLabeler | 3 | 1115 | 2599 | 0.602 | 0.519 | 0.515 | |
| MeSHLabeler-1 | 4 | 1436 | 3294 | 0.649 | 0.496 | 0.495 | |
| MTI | 5 | 1181 | 3210 | 0.558 | 0.508 | 0.484 | |
| 2 | MTI | 1 | 1080 | 3212 | 0.546 | 0.520 | 0.493 |
| MeSHLabeler-2 | 2 | 901 | 3213 | 0.630 | 0.505 | 0.499 | |
| MeSHLabeler-2 | 3 | 850 | 2831 | 0.642 | 0.521 | 0.516 | |
| MTI | 4 | 800 | 3111 | 0.549 | 0.516 | 0.491 | |
| MeSHLabeler | 5 | 688 | 2470 | 0.615 | 0.538 | 0.526 | |
| 3 | MeSHLabeler | 1 | 305 | 2994 | 0.637 | 0.462 | 0.462 |
| MeSHLabeler | 2 | 507 | 3044 | 0.6449 | 0.4851 | 0.4825 | |
| MeSHLabeler | 3 | 501 | 3351 | 0.6544 | 0.4991 | 0.4956 | |
| MeSHLabeler | 4 | 514 | 2630 | 0.6312 | 0.5098 | 0.5012 | |
| MeSHLabeler | 5 | 627 | 3130 | 0.5017 | 0.5119 | 0.6135 |