| Literature DB >> 26099853 |
Noha Alnazzawi, Paul Thompson, Riza Batista-Navarro, Sophia Ananiadou.
Abstract
BACKGROUND: Phenotypic information locked away in unstructured narrative text presents significant barriers to information accessibility, both for clinical practitioners and for computerised applications used for clinical research purposes. Text mining (TM) techniques have previously been applied successfully to extract different types of information from text in the biomedical domain. They have the potential to be extended to allow the extraction of information relating to phenotypes from free text.Entities:
Mesh:
Year: 2015 PMID: 26099853 PMCID: PMC4474585 DOI: 10.1186/1472-6947-15-S2-S3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Annotated phenotype concept types.
| Entity Type | Description |
|---|---|
| Cause | Any medical problem that contributes to the occurrence of CHF |
| Risk factors | A condition that increases the chance of a patient having the CHF disease |
| Sign & symptom | Any observable manifestation of a disease which is experienced by a patient and reported to the physician |
| Non-traditional risk factor | Conditions associated with abnormalities in kidney functions that put the patient at higher risk of developing "signs & symptoms" and causes of CHF |
Figure 1Distribution of phenotypic information types in the corpus. Phenotypic concepts were manually annotated by our domain experts.
Inter-annotator agreement on the PhenoCHF annotations.
| Discharge summaries | Articles | |||
|---|---|---|---|---|
| Cause | 0.84 | 0.95 | 0.59 | 0.78 |
| Risk factor | 0.84 | 0.94 | 0.86 | 0.79 |
| Sign & symptom | 0.69 | 0.97 | 0.53 | 0.82 |
| Non-traditional risk factor | 0.77 | 0.83 | 0.81 | 0.72 |
| Macro-average | 0.82 | 0.92 | 0.69 | 0.77 |
F-scores were calculated using both exact and relaxed matching.
Comparison of different methods developed and evaluated on the PhenoCHF training and test sets, respectively.
| Discharge summaries | Articles | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dictionary (MetaMap) | 0.22 | 0.29 | 0.25 | 0.39 | 0.51 | 0.44 | 0.42 | 0.25 | 0.30 | 0.67 | 0.33 | 0.44 |
| Rules | 0.88 | 0.86 | 0.92 | 0.93 | 0.83 | 0.88 | 0.88 | 0.90 | ||||
| MEMMs | 0.67 | 0.33 | 0.52 | 0.87 | 0.60 | 0.54 | 0.18 | 0.55 | 0.24 | 0.20 | 0.56 | 0.28 |
| HMMs | 0.90 | 0.63 | 0.74 | 0.90 | 0.65 | 0.76 | 0.30 | 0.55 | 0.39 | 0.32 | 0.58 | 0.41 |
| CRFs | 0.88 | 0.77 | 0.82 | 0.90 | 0.86 | 0.88 | 0.48 | 0.62 | 0.54 | 0.53 | 0.69 | 0.60 |
For MEMMs, HMMs and CRFs, only the results from the model with the best performing combination of features are presented (P=precision, R=recall, F=F-score).
Results of CRF model training and evaluation.
| Evaluation data | Training Data | P | R | F |
|---|---|---|---|---|
| PhenoCHF Articles | Discharge summaries | 0.79 | 0.47 | 0.58 |
| PhenoCHF Discharge summaries | Articles | 0.56 | 0.29 | 0.38 |
| PhenoCHF (full) 5-fold cross validation | 0.89 | 0.83 | 0.86 | |
Experiments were performed using different document types (P=precision, R=recall, F=F-score)
Results from the application of PhenoCHF models on ShARe/CLEF.
| Training set | Test set | |||||
|---|---|---|---|---|---|---|
| Record model | 0.25 | 0.49 | 0.33 | 0.06 | 0.18 | 0.09 |
| Article model | 0.29 | 0.22 | 0.25 | 0.06 | 0.07 | 0.06 |
| PhenoCHF model | 0.25 | 0.53 | 0.34 | 0.07 | 0.18 | 0.10 |
Experiments were performed using our various CRF models (P=precision, R=recall, F=F-score)
Figure 2Distribution of phenotypic information in the ShARE/CLEF corpus. Phenotypic concepts were automatically recognised by our model and then validated by our domain experts.