| Literature DB >> 23645553 |
Diana Lynn MacLean1, Jeffrey Heer.
Abstract
BACKGROUND ANDEntities:
Keywords: crowdsourcing; medical term extraction; online health forums; text mining
Mesh:
Year: 2013 PMID: 23645553 PMCID: PMC3822103 DOI: 10.1136/amiajnl-2012-001110
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1Patient-authored text (PAT) medical word identification task instructions and interface. Access the article online to view this figure in colour.
Figure 2An illustration of our corroborative, word-level voting policy. Stopwords (like ‘of’) are excluded from the vote.
Turker performance against the Nurse gold standard along Turker voting thresholds
| Turker vote threshold | F1 | Precision | Recall | Accuracy | MCC |
|---|---|---|---|---|---|
| 1 | 78.45 | 67.15 | 93.96 | 0.77 | |
| 2 | 82.53 | 86.41 | 96.29 | ||
| 3 | 83.80 | 91.67 | 77.18 | ||
| 4 | 76.61 | 95.70 | 63.87 | 95.46 | 0.76 |
| 5 | 59.81 | 43.04 | 93.26 | 0.62 |
A corroborative vote of 2 or more yields high scores across the board, and maximizes F1 score.
Annotator performance against the crowd-labeled dataset and the gold standards
| Validation dataset | Annotator | F1 | Precision | Recall | Accuracy | MCC | Parameters |
|---|---|---|---|---|---|---|---|
| MedHelp, Crowd-labeled | MetaMap | 32.64 | 21.88 | 64.20 | 70.44 | 0.24 | Default |
| 34.97 | 25.45 | 55.85 | 76.83 | 0.26 | SNOMED CT | ||
| 34.88 | 24.48 | 60.63 | 74.75 | 0.26 | CHV | ||
| OBA | 43.77 | 30.20 | 79.53 | 77.21 | 0.39 | Default | |
| 43.23 | 36.15 | 53.76 | 84.25 | 0.35 | SNOMED CT | ||
| Dictionary | 46.18 | 32.34 | 79.02 | 0.42 | |||
| ADEPT | 74.59 | ||||||
| MedHelp, Gold Standard | MetaMap | 37.73 | 28.03 | 57.67 | 77.82 | 0.29 | SNOMED CT |
| OBA | 45.78 | 32.10 | 78.04 | 0.41 | SNOMED CT | ||
| TerMine | 42.35 | 52.67 | 35.41 | 88.77 | 0.37 | ||
| Dictionary | 37.30 | 26.34 | 63.89 | 74.98 | 0.29 | ||
| ADEPT | 74.53 | ||||||
| CureTogether, Gold Standard | MetaMap | 39.12 | 29.33 | 58.57 | 74.13 | 0.27 | SNOMED CT |
| OBA | 47.28 | 33.56 | 74.74 | 0.40 | SNOMED CT | ||
| TerMine | 43.09 | 53.11 | 36.25 | 86.43 | 0.37 | ||
| Dictionary | 38.74 | 27.53 | 65.35 | 70.65 | 0.27 | ||
| ADEPT | 76.69 | ||||||
Figure 3A comparison of terms identified as medically-relevant (shown in black) by different models in five sample sentences. OBA and MetaMap are run using the SNOMED CT ontology.
Figure 4Term classification accuracy plotted against logged term frequency in test corpora. Purple (darker) circles represent terms that are always classified correctly; blue (lighter) circles represent terms that are misclassified at least once. A LOWESS fit line to the entire dataset (black) shows that most terms are always classified correctly. A LOWESS fit line to the misclassified points (blue, or lighter) shows that classification accuracy increases with term frequency. Access the article online to view this figure in colour.
Figure 5Top 50 terms, ranked by frequency, derived for MedHelp's Arthritis forum as determined by ADEPT (left) and OBA (right). Terms unique to their respective portion of the list are shown in black. Terms occurring in both lists are linked with a line. The gradient of these lines show that all co-occurring terms, bar three, are ranked more highly by ADEPT.
Examples of terms that occur more than once, and are misclassified more than 50% of the time
| Frequently misclassified ( | baby, bc, condition, doctor, doctors, drs, health, ice, natural, relief, short, strain, weight |
| Mostly false positive ( | accident, decreased, drinks, drunk, exertion, external, healthy, heavy, higher, lie, lying, milk, million, pants, periods, prevention, solution, suicidal… |
| Mostly false negative | appointment, clear, copd, hiccups, lack, ldn, massage, maxalt, missed, nurse, physician, pubic, rebound, silver, sleeping, smell, tea, treat, tree, tx … |
| Infrequently misclassified ( | cravings, generic, growing, hereditary, increasing, lab, limit, lunch, panel, pituitary, position, possibilities, precursor, taste, version, waves, weakness … |