| Literature DB >> 21658291 |
Antonio Jimeno-Yepes1, Bridget T McInnes, Alan R Aronson.
Abstract
BACKGROUND: The effectiveness of knowledge-based word sense disambiguation (WSD) approaches depends in part on the information available in the reference knowledge resource. Off the shelf, these resources are not optimized for WSD and might lack terms to model the context properly. In addition, they might include noisy terms which contribute to false positives in the disambiguation results.Entities:
Mesh:
Year: 2011 PMID: 21658291 PMCID: PMC3111590 DOI: 10.1186/1471-2105-12-S3-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Process diagram
Left side collocation examples
| Adjustment | Determination | Repair |
|---|---|---|
| psychosocial | quantitative | dna |
| psychological | spectrophotometric | excision |
| social | photometric | mismatch |
| marital | potentiometric | surgical |
| occlusal | accurate | hernia |
Collocation examples based on co-occurrences
| Adjustment | Determination | Repair |
|---|---|---|
| age | chromatography | damage |
| study | liquid | injury |
| results | standard | defect |
| women | chromatographic | strand |
| data | quantitative | excision |
Collocation examples filtered using the Stanford parser
| Adjustment | Determination | Repair |
|---|---|---|
| measures | assay | damage |
| illness | procedure | injury |
| parents | paper | dna damage |
| social support | techniques | |
| recurrence |
Figure 2Collocation assignment diagram
Example top terms for profile vectors for semantic types
| Type: T046 | Type: T047 | Type: T116 | Type: T126 |
|---|---|---|---|
| patients | patients | activity | activity |
| management | case | delta | ec |
| case | hypoxic | rat | delta |
| cases | raeb | human | liver |
| diagnosis | management | liver | human |
| acute | diagnosis | ec | rat |
| treatment | treatment | deficiency | mitochondrial |
| spontaneous | allergic | mitochondrial | activities |
| massive | patient | alpha | enzyme |
| chronic | cases | enzyme | inhibition |
Example top terms for profile vectors for semantic groups
| Grp: DISO | Grp: CHEM | Grp: CONC | Grp: ANAT |
|---|---|---|---|
| patients | human | health | human |
| case | activity | patients | rat |
| treatment | acid | based | cells |
| cases | effects | study | function |
| diagnosis | effect | children | anatomy |
| management | rat | inter | normal |
| children | alpha | care | patients |
| congenital | synthesis | medical | case |
| patient | mg | data | left |
| syndrome | treatment | evaluation | neurons |
NLM WSD results comparing the baselines and the proposed methods
| AEC | MRD | |
|---|---|---|
| Initial | 0.7007 | 0.6362 |
| LSC | 0.7226† | |
| Coll | 0.7163 | 0.6365 |
| CollParser | 0.6406 | |
| 2-MRD | - | 0.7158‡ |
| 2-MRDFilter | 0.6295 | 0.6825‡ |
| MFS | 0.8550 | 0.8550 |
| NB | 0.8830 | 0.8830 |
Accuracy results of the different methods using the NLM WSD set. The Initial system consists of the knowledge-based method being evaluated and off the shelf UMLS.
MSH WSD results comparing the baselines and the proposed methods
| AEC | MRD | |
|---|---|---|
| Initial | 0.8383 | 0.8070 |
| LSC | 0.8082 | |
| Coll | 0.8407 | |
| CollParser | 0.8409 | 0.8098· |
| 2-MRD | - | 0.8069 |
| 2-MRDFilter | 0.8313 | 0.8072 |
| MFS | 0.5448 | 0.5448 |
| NB | 0.9386 | 0.9386 |
Accuracy results of the different methods using the MSH WSD set. The Initial system consists of the knowledge-based method being evaluated and off the shelf UMLS.
NLM WSD results at different k-NN threshold levels
| AEC | MRD | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 66 | 75 | 85 | 95 | 66 | 75 | 85 | 95 | ||
| LSC | 0.7226 | 0.7220 | 0.7201 | 0.7082 | 0.6360 | 0.6360 | |||
| SG | Coll | 0.7163 | 0.7038 | 0.7102 | 0.7055 | 0.6365 | 0.6365 | 0.6363 | 0.6363 |
| CollParser | 0.7120 | 0.7198 | 0.7055 | 0.6362 | 0.6364 | 0.6362 | 0.6356 | ||
| LSC | 0.7052 | 0.7050 | 0.7110 | 0.7053 | 0.6348 | 0.6348 | 0.6344 | 0.6352 | |
| kAEC | Coll | 0.7027 | 0.6992 | 0.7004 | 0.6358 | 0.6359 | 0.6347 | 0.6347 | |
| CollParser | 0.7118 | 0.7023 | 0.7079 | 0.6969 | 0.6372 | 0.6356 | 0.6357 | ||
Disambiguation results in terms of accuracy using the NLM WSD set. Several k-NN values are used in combination with the semantic group (SG) and the automatic extracted corpus (kAEC) methods. The disambiguation methods AEC and MRD are compared.
MSH WSD results at different k-NN threshold levels
| AEC | MRD | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 66 | 75 | 85 | 95 | 66 | 75 | 85 | 95 | ||
| LSC | 0.8370 | 0.8371 | 0.8371 | 0.8071 | 0.8070 | 0.8071 | 0.8071 | ||
| SG | Coll | 0.8173 | 0.8214 | 0.8268 | 0.8327 | 0.8077 | 0.8073 | 0.8071 | |
| CollParser | 0.8284 | 0.8271 | 0.8337 | 0.8355 | 0.8076 | 0.8071 | 0.8071 | 0.8071 | |
| LSC | 0.8391 | 0.8413 | 0.8400 | 0.8072 | 0.8072 | 0.8072 | 0.8071 | ||
| kAEC | Coll | 0.8252 | 0.8331 | 0.8385 | 0.8407 | 0.8100 | 0.8092 | ||
| CollParser | 0.8298 | 0.8337 | 0.8396 | 0.8409 | 0.8098 | 0.8093 | 0.8090 | 0.8090 | |
Disambiguation results in terms of accuracy using the MSH WSD set. Several k-NN values are used in combination with the semantic group (SG) and the automatic extracted corpus (kAEC) methods. The disambiguation methods AEC and MRD are compared.