| Literature DB >> 24499679 |
Aron Henriksson1, Hans Moen, Maria Skeppstedt, Vidas Daudaravičius, Martin Duneld.
Abstract
BACKGROUND: Terminologies that account for variation in language use by linking synonyms and abbreviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs.Entities:
Year: 2014 PMID: 24499679 PMCID: PMC3937097 DOI: 10.1186/2041-1480-5-6
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Ensembles of semantic spaces for synonym extraction and abbreviation expansion. Semantic spaces built with different model parameters are induced from different corpora. The output of the semantic spaces are combined in order to obtain better results compared to using a single semantic space in isolation.
Corpora statistics
| Clinical | ∼42.5M tokens | ∼22.5M tokens | 268,727 documents |
| | (∼0.4M types) | (∼0.4M types) | |
| Medical | ∼20.3M tokens | ∼12.1M tokens | 1,153,824 sentences |
| (∼0.3M types) | (∼0.3M types) |
The number of tokens and unique terms (word types) in the medical and clinical corpus, with and without stop words.
Overview of experiments conducted with a single semantic space
| For each of the | |||||||
| RI_20 | RI_2 | | RI_4 | | RI_8 | | |
| | RP_2 | RP_2_sw | RP_4 | RP_4_sw | RP_8 | RP_8_sw | |
| The induced semantic spaces were combined in | |||||||
| | | | | | | ||
| Identical window size | RI_2, RP_2 | RI_4, RP_4 | RI_8, RP_8 | ||||
| Identical window size, stop words | RI_2, RP_2_sw | RI_4, RP_4_sw | RI_8, RP_8_sw | ||||
| Large window size | RI_20, RP_2 | RI_20, RP_4 | | ||||
| Large window size, stop words | RI_20, RP_2_sw | RI_20, RP_4_sw | | ||||
| For each combination, | |||||||
For each of the two corpora and the conjoint corpus, 30 different combinations were evaluated. The configurations are described according to the following pattern: model_windowSize. For RP, sw means that stop words are retained in the semantic space. For instance, model_20 means a window size of 10+10 was used.
Figure 2Distribution of candidate terms for the clinical corpus. The distribution (cosine similarity and rank) of candidates for synonyms for the best combination of semantic spaces induced from the clinical corpus. The results show the distribution for query terms in the development reference standard.
Figure 3Distribution of candidate terms for the medical corpus. The distribution (cosine similarity and rank) of candidates for synonyms for the best combination of semantic spaces induced from the medical corpus. The results show the distribution for query terms in the development reference standard.
Figure 4Distribution of candidate terms for clinical + medical corpora. The distribution (combined cosine similarity and rank) of candidates for synonyms for the ensemble of semantic spaces induced from medical and clinical corpora. The results show the distribution for query terms in the development reference standard.
Reference standards statistics
| Abbr →Exp (Devel) | 117 | 9.4% | 0.0% | 55 | 13% | 1.8% | 42 | 14% | 0% |
| Abbr →Exp (Eval) | 98 | 3.1% | 0.0% | 55 | 11% | 0% | 35 | 2.9% | 0% |
| Exp →Abbr (Devel) | 110 | 8.2% | 1.8% | 63 | 4.7% | 0% | 45 | 6.7% | 0% |
| Exp →Abbr (Eval) | 98 | 7.1% | 0.0% | 61 | 0% | 0% | 36 | 0% | 0% |
| Syn (Devel) | 334 | 9.0% | 1.2% | 266 | 11% | 3.0% | 122 | 4.9% | 0% |
| Syn (Eval) | 340 | 14% | 2.4% | 263 | 13% | 3.8% | 135 | 11% | 0% |
Size shows the number of queries, 2 cor shows the proportion of queries with two correct answers and 3 cor the proportion of queries with three (or more) correct answers. The remaining queries have one correct answer.
Results on clinical development set
| RI_8 | RP_8_sw | 0.38 | RI_8 | RP_8 | 0.30 | RI_8 | RP_8 | 0.39 | |
| | | | | RI_4 | RP_4_sw | 0.30 | RI_8 | RP_8 | 0.38 |
| RI_20 | RP_4_sw | 0.35 | RI_20 | RP_4_sw | | RI_8 | RP_8_sw | | |
| | | | | | | | RI_20 | RP_2_sw | |
| RI_4 | RP_4_sw | RI_4 | RP_4_sw | RI_20 | RP_4_sw | ||||
| RI_8 | RP_8_sw | ||||||||
Results (recall, top ten) of the best configurations for each model and model combination on the three tasks. The configurations are described according to the following pattern: model_windowSize. For RP, sw means that stop words are retained in the model.
Results on medical development set
| RI_4 | RP_4_sw | 0.08 | RI_2 | RP_2 | 0.03 | RI_20 | RP_4_sw | 0.26 | |
| RI_20 | RP_2 | RI_4 | RP_4 | ||||||
| RI_20 | RP_4_sw | RI_4 | RP_4_sw | ||||||
| | | RI_8 | RP_8 | ||||||
| | | RI_20 | RP_2 | ||||||
| | | RI_20 | RP_2_sw | ||||||
| | | RI_20 | RP_4 | ||||||
| | | RI_20 | RP_4_sw | ||||||
| RI_2 | RP_2_sw | 0.08 | RI_2 | RP_2 | 0.03 | RI_8 | RP_8_sw | 0.24 | |
| RI_4 | RP_4 | RI_2 | RP_2_sw | ||||||
| RI_4 | RP_4_sw | RI_4 | RP_4 | ||||||
| RI_8 | RP_8 | RI_4 | RP_4_sw | ||||||
| RI_8 | RP_8_sw | RI_8 | RP_8 | ||||||
| RI_20 | RP_2_sw | RI_8 | RP_8_sw | ||||||
| RI_20 | RP_4 | RI_20 | RP_2 | ||||||
| RI_20 | RP_4_sw | RI_20 | RP_2_sw | ||||||
| | | RI_20 | RP_4 | ||||||
| | | RI_20 | RP_4_sw | ||||||
| RI_4 | RP_4_sw | RI_8 | RP_8_sw | RI_20 | RP_2_sw | ||||
| RI_20 | RP_4_sw | ||||||||
Results (recall, top ten) of the best configurations for each model and model combination on the three tasks. The configurations are described according to the following pattern: model_windowSize. For RP, sw means that stop words are retained in the model.
Conjoined corpus space results on clinical + medical development set
| RI_4 | RP_4_sw | RI_4 | RP_4_sw | RI_8 | RP_8_sw | 0.41 | |||
| RI_20 | RP_4_sw | ||||||||
| RI_4 | RP_4 | 0.23 | RI_4 | RP_4_sw | 0.13 | RI_8 | RP_8 | 0.36 | |
| RI_4 | RP_4_sw | RI_8 | RP_8_sw | RI_8 | RP_8_sw | ||||
| RI_8 | RP_8 | RI_20 | RP_2_sw | RI_20 | RP_2_sw | ||||
| RI_20 | RP_2 | RI_20 | RP_4_sw | RI_20 | RP_4_sw | ||||
| RI_20 | RP_4 | | | | | ||||
| RI_2 | RP_2_sw | 0.25 | RI_4 | RP_4_sw | RI_8 | RP_8_sw | |||
| RI_8 | RP_8_sw | ||||||||
| RI_20 | RP_4_sw | ||||||||
Results (recall, top ten) of the best configurations for each model and model combination on the three tasks. The configurations are described according to the following pattern: model_windowSize. For RP, sw means that stop words are retained in the model.
Disjoint corpus ensemble results on clinical + medical development set
| AVG | 0.13 | 0.09 | 0.39 | ||||
| AVG | 0.24 | 0.11 | 0.39 | ||||
| SUM | 0.13 | 0.09 | 0.34 | ||||
| SUM | |||||||
| AVG →AVG | | 0.15 | 0.09 | 0.41 | |||
| SUM →SUM | | 0.13 | 0.07 | 0.40 | |||
| AVG →SUM | | 0.15 | 0.09 | 0.41 | |||
| SUM →AVG | 0.13 | 0.07 | 0.40 | ||||
Results (P = weighted precision, R = recall, top ten) of the best models with and without post-processing on the three tasks. Dynamic # of suggestions allows the model to suggest less than ten terms in order to improve precision. The results are based on the application of the model combinations to the development data.
Results on clinical evaluation set
| RI Baseline | 0.04 | 0.22 | 0.03 | 0.19 | 0.07 | 0.39 |
| RP Baseline | 0.04 | 0.23 | 0.04 | 0.24 | 0.06 | 0.36 |
| Clinical Ensemble | 0.05 | 0.31 | 0.03 | 0.20 | 0.07 | |
| +Post-Processing (Top 10) | 0.08 | 0.05 | 0.43 | |||
| +Dynamic Cut-Off (Top ≤ 10) | 0.41 | 0.33 | 0.08 | 0.42 | ||
Results (P = weighted precision, R = recall, top ten) of the best models with and without post-processing on the three tasks. Dynamic # of suggestions allows the model to suggest less than ten terms in order to improve precision. The results are based on the application of the model combinations to the evaluation data. The improvements in recall between the best baseline and the ensemble method for the synonym task and for the abbr →exp task are both statistically significant for a p-value < 0.05. (abbr →exp task: p-value = 0.022 and synonym task: p-value = 0.002.) The improvement in recall that was achieved by post-processing is statistically significant for both abbreviation tasks (p-value = 0.001 for abbr →exp and p-value = 0.000 for exp →abbr).
Results on medical evaluation set
| RI baseline | 0.02 | 0.09 | 0.01 | 0.08 | 0.03 | 0.18 |
| RP baseline | 0.01 | 0.06 | 0.01 | 0.05 | 0.05 | 0.26 |
| Medical ensemble | 0.03 | 0.01 | ||||
| +Post-processing (top 10) | 0.03 | 0.17 | 0.02 | 0.11 | 0.06 | 0.34 |
| +Dynamic cut-off (top ≤ 10) | 0.17 | 0.11 | 0.06 | 0.34 | ||
Results (P = weighted precision, R = recall, top ten) of the best semantic spaces with and without post-processing on the three tasks. Dynamic # of suggestions allows the model to suggest less than ten terms in order to improve precision. The results are based on the application of the model combinations to the evaluation data. The difference in recall when using the ensemble method compared to the best baseline is only statistically significant (p-value < 0.05) for the synonym task (p-value = 0.000).
Results on clinical + medical evaluation set
| Clinical space | 0.03 | 0.17 | 0.03 | 0.19 | 0.05 | 0.29 |
| Medical space | 0.01 | 0.06 | 0.01 | 0.08 | 0.03 | 0.18 |
| Conjoint corpus space | 0.03 | 0.19 | 0.01 | 0.08 | 0.05 | 0.30 |
| Clinical ensemble | 0.04 | 0.24 | 0.03 | 0.19 | 0.06 | 0.34 |
| Medical ensemble | 0.02 | 0.11 | 0.01 | 0.11 | 0.05 | 0.33 |
| Conjoint corpus ensemble | 0.03 | 0.19 | 0.02 | 0.14 | 0.07 | 0.40 |
| Disjoint corpora ensemble | 0.05 | 0.30 | 0.03 | 0.19 | ||
| +Post-processing (top 10) | 0.07 | 0.06 | 0.08 | 0.47 | ||
| +Dynamic cut-off (top ≤ 10) | 0.39 | 0.33 | 0.08 | 0.45 | ||
Results (P = weighted precision, R = recall, top ten) of the best semantic spaces and ensembles on the three tasks. The results are based on the clinical + medical evaluation set and are grouped according to the number of semantic spaces employed: one, two or four. The disjoint corpus ensemble is performed with and without post-processing. A dynamic cut-off allows less than ten terms to be suggested in an attempt to improve precision. Results for tests of statistical significance are shown in Table 11.
P-values for recall results presented in Table 10
| Clinical space | 1.000 | 0.057 | 0.885 | |||
| Medical space | - | |||||
| Conjoint corpus | - | - | 0.210 | 1.000 | ||
| Clinical ensemble | - | - | - | 0.480 | 0.189 | |
| Medical ensemble | - | - | - | - | ||
| Conjoint corp. ens. | - | - | - | - | - |
P-values for the differences between the recall results on the synonym task for the semantic spaces/ensembles presented in Table 10. P-values showing a statistically significant difference (p-value < 0.05) are presented in bold-face.
P-values for the post-processing and for the abbr →exp and exp →abbr are not shown in the table. However, for the significance level p-value < 0.05, there were no statistically significant recall difference between the standard Disjoint Corpora Ensemble and the post-processing version for any of the three tasks (p-value = 0.25 for abbr →exp and p-value = 0.062 for exp →abbr). When testing the recall difference between the pairs of semantic spaces/ensembles shown in Table 10 for the abbr →exp task, there was only a significant difference for the pairs Medical Space vs. Clinical Ensemble (p-value = 0.039), Medical Space vs. Disjoint Corpora Ensemble (p-value = 0.004) and Medical Ensemble vs. Disjoint Corpora Ensemble (p-value = 0.039). For the exp →abbr task, there were no statistically significant differences.
Figure 5Frequency thresholds. The relation between recall and the required minimum frequency of occurrence for the reference standard terms in both corpora. The number of query terms for each threshold value is also shown.
Examples of extracted candidate synonyms
| Heartcenter ( | Vårdcentral ( | Vårdcentral ( |
| Bröstklinik ( | Akutmottagning ( | Mottagning ( |
| Hälsomottagningen ( | Akuten ( | |
| Hjärtcenter ( | Mottagning ( | Gotland ( |
| Län ( | Intensivvårdsavdelning ( | Sjukhus ( |
| Eyecenter ( | Arbetsplats ( | Gård ( |
| Bröstklin ( | Vårdavdelning ( | Vårdavdelning ( |
| Sjukhems ( | Gotland ( | Arbetsplats ( |
| Hartcenter ( | Kväll ( | Akutmottagning ( |
| Biobankscentrum ( | Ks ( | Akuten ( |
| | | |
| Sömnstörning ( | Depressioner ( | Sömnstörning ( |
| Sömnsvårigheter ( | Osteoporos ( | Osteoporos ( |
| Panikångest ( | Astma ( | Tvångssyndrom ( |
| Tvångssyndrom ( | Fetma ( | Epilepsi ( |
| Fibromyalgi ( | Smärta ( | Hjärtsvikt ( |
| Ryggvärk ( | Depressionssjukdom ( | |
| Självskadebeteende ( | Bensodiazepiner ( | Fibromyalgi ( |
| Osteoporos ( | Hjärtsvikt ( | Astma ( |
| Depressivitet ( | Hypertoni ( | Alkoholberoende ( |
| Pneumoni ( | Utbrändhet ( | Migrän ( |
| | | |
| Pollenallergi ( | Allergier ( | Allergier ( |
| Födoämnesallergi ( | Sensibilisering ( | Hösnuva ( |
| Hösnuva ( | Hösnuva ( | Födoämnesallergi ( |
| Rehabilitering ( | Pollenallergi ( | |
| Kattallergi ( | Fetma ( | |
| Jordnötsallergi ( | Kol ( | Astma ( |
| Pälsdjursallergi ( | Osteoporos ( | Kol ( |
| Negeras ( | Födoämnesallergi ( | Osteoporos ( |
| Pollen ( | Astma ( | Jordnötsallergi ( |
| Pollenallergiker ( | Utbrändhet ( | Pälsdjursallergi ( |
The top ten candidate synonyms for three different query terms with the clinical ensemble, the medical ensemble and the disjoint corpus ensemble. The synonym in the reference standard is in boldface.