| Literature DB >> 26039310 |
Hamed Hassanzadeh1, Tudor Groza2, Anthony Nguyen3, Jane Hunter1.
Abstract
Following the Evidence Based Medicine (EBM) practice, practitioners make use of the existing evidence to make therapeutic decisions. This evidence, in the form of scientific statements, is usually found in scholarly publications such as randomised control trials and systematic reviews. However, finding such information in the overwhelming amount of published material is particularly challenging. Approaches have been proposed to automatically extract scientific artefacts in EBM using standardised schemas. Our work takes this stream a step forward and looks into consolidating extracted artefacts-i.e., quantifying their degree of similarity based on the assumption that they carry the same rhetorical role. By semantically connecting key statements in the literature of EBM, practitioners are not only able to find available evidence more easily, but also can track the effects of different treatments/outcomes in a number of related studies. We devise a regression model based on a varied set of features and evaluate it both on a general English corpus (the SICK corpus), as well as on an EBM corpus (the NICTA-PIBOSO corpus). Experimental results show that our approach performs on par with the state of the art on the general English and achieves encouraging results on the biomedical text when compared against human judgement.Entities:
Mesh:
Year: 2015 PMID: 26039310 PMCID: PMC4454558 DOI: 10.1371/journal.pone.0129392
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Features used to encode pairwise sentence similarity as a basis for the learning model.
|
|
|
|
|---|---|---|
| Syntactic Similarity | Naive | Bags of words overlap (1 feature)—[ |
| Bags of lemmatised/stemmed words overlap(2 features) | ||
| Set similarity of lemmatised effective words (1 feature) | ||
| Jaccard similarity of set of words/lemmas (2 features)—[ | ||
| Cosine similarity of vectors of lemmatised effective words (1 feature) | ||
| Window-based | Windows of words overlap (1 feature) | |
| Size of the longest shared window of words (1 feature) | ||
| Windows of effective words overlap (1 feature) | ||
| Size of the longest shared window of effective words (1 feature) | ||
| Windows of POS tags overlap and longest overlapped windows (2 features) | ||
| Other | Ratio of shared skipped bigrams (1 feature)—[ | |
| Pairwise sentence polarity (1 feature)—[ | ||
| Ratio of sentence lengths (1 feature)—[ | ||
| Structural Similarity | Sentence Structure | Ratio of number of clauses (1 feature) |
| Reduced parse tree overlap (1 feature) | ||
| Semantic Similarity | Basic | Role-based word-by-word similarity (3 features)—[ |
| Semantic similarity of effective words (1 feature)—[ | ||
| Cosine similarity Information Content (IC) vectors (1 feature) | ||
| Role-based Part of Speech (POS) tags alignment (2 features) | ||
| Synonymy | WordNet-based synonym similarity (1 feature)—[ | |
| FrameNet-based synonym similarity (1 feature) | ||
| Sense Disambiguation | Normalised set similarity of best senses (2 feature)—[ | |
| Category level similarity of best senses (2 features) | ||
| Normalised set similarity of the best sensesof skipped bigrams (1 feature) | ||
| Vector Space Model | Similarity of Sets of Associated Terms (1 feature) | |
| Cosine Similarity of Matrices ofAssociated Terms Vectors (1 feature)—[ |
Citations denote existing systems that have employed the corresponding features.
Fig 1Example of parse tree and its reduced version for a sample sentence.
The parse tree represents the syntactic structure of a sentence in the form of a rooted tree. The reduced form retains only the major groups of part of speech tags—i.e., NPs and VPs.
Fig 2Reduced parse trees of the two sample sentences (i.e. Outcome A and B) listed in the Introduction.
Fig 3Example of role-based semantic similarity measure for two sample sentences.
Both measures are computed using Eq 7, with the actual similarity being specific to pre-verb component (as defined in Eq 8) and predicates (as defined in Eq 9).
Statistics on the SICK corpus [9].
|
|
|
|---|---|
| [1–2) range | 925 (9%) |
| [2–3) range | 1,380 (14%) |
| [3–4) range | 3,904 (39%) |
| [4–5] range | 3,718 (38%) |
| Total | 9927 |
The statistics of the NICTA-PIBOSO corpus.
|
|
|
|
|---|---|---|
| Background | 2,557 | 3,267,846 |
| Intervention | 690 | 237,705 |
| Outcome | 4,523 | 10,226,503 |
| Population | 812 | 329,266 |
| Study Design | 228 | 25,878 |
Evaluation of regression algorithms on 10-fold cross-validation on the SICK training corpus.
|
|
|
|---|---|
| Baseline Approach | |
| Baseline | 0.63 |
| Regression Algorithms | |
| M5 Rules (M = 10) | 0.7705 |
| RepTree (N = 2) | 0.7483 |
| K* (B = 30) | 0.7391 |
| Linear Regression | 0.7055 |
| Regression By Classification | |
| Regression by Random Forest (I = 150, #Bins = 10) | 0.8139 |
| Regression by KNN (K = 10, #Bins = 10) | 0.7539 |
| Regression by Naïve Bayes (#Bins = 10) | 0.6529 |
| Regression Ensemble | |
| Ensemble of Bagging (RepTree), Random SubSpace(K*), and Regression by Discretisation (Random Forest) | 0.8268 |
Analysis of effects of different similarity measures—Pearson Correlation results for 10-fold cross-validation using Leave-one-Out feature strategy (i.e. the model is trained on all features except the one mentioned in each row) and results for each measure individually (i.e. the model is trained only for the mentioned feature).
|
|
|
|
|---|---|---|
| Naive features | 0.8266 | 0.7645 |
| Window-based features | 0.8266 | 0.7218 |
| Other syntactic features | 0.8183 | 0.7061 |
| Sentence structure features | 0.8272 | 0.5129 |
| Basic semantic features | 0.818 | 0.7543 |
| Words Synonymy features | 0.826 | 0.6538 |
| Word-sense features | 0.8237 | 0.7277 |
| Semantic space models features | 0.8256 | 0.7159 |
Inter-annotator agreement.
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| - | 0.62 | 0.70 | 0.67 | 0.68 | 0.67 |
|
| 0.62 | - | 0.85 | 0.77 | 0.81 | 0.76 |
|
| 0.70 | 0.85 | - | 0.86 | 0.82 | 0.81 |
|
| 0.67 | 0.77 | 0.86 | - | 0.84 | 0.78 |
|
| 0.68 | 0.81 | 0.82 | 0.84 | - | 0.79 |
|
| 0.76 | |||||
Evaluation of semantic similarity approach over EBM scientific artefacts.
|
| |||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| 0.25 | 0.18 | 0.3 | 0.32 | 0.44 |
|
| 0.87 | 0.78 | 0.84 | 0.79 | 0.53 |
|
| 0.90 | 0.57 | 0.84 | 0.12 | 0.56 |
Analysis of effects of different similarity measures when the model is trained only on the mentioned features.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Naive features | 0.8741 | 0.6231 | 0.7609 | 0.208 | 0.0435 |
| Window-based features | 0.5992 | 0.55.88 | 0.7954 | 0.1198 | 0.4042 |
| Other syntactic features | 0.8815 | 0.6992 | 0.721 | -0.3219 | 0.398 |
| Sentence structure features | -0.2688 | -0.2521 | 0.0498 | 0.141 | 0.0652 |
| Basic semantic features | 0.8053 | 0.4235 | 0.6894 | -0.0553 | 0.0555 |
| Words Synonymy features | 0.8038 | 0.5996 | 0.5908 | 0.0703 | 0.099 |
| Word-sense features | 0.6009 | 0.1709 | 0.6483 | -0.1419 | 0.1976 |
| Semantic space models features | 0.7936 | 0.3122 | 0.7016 | -0.0099 | -0.1508 |
Fig 4The error distribution of the ensemble predictions on SICK data.
Prediction errors from the ensemble model.
|
|
|
|
|---|---|---|
| [-2.5, -2] | 6 | 0.1% |
| (-2, -1.5] | 30 | 0.6% |
| (-1.5, -1] | 147 | 3% |
| (-1, -0.5] | 723 | 14.7% |
| (-0.5, 0] | 1682 | 34.1% |
| (0, 0.5] | 1438 | 29.2% |
| (0.5, 1] | 665 | 13.5% |
| (1, 1.5] | 202 | 4.1% |
| (1.5, 2] | 31 | 0.6% |
| (2, 2.5] | 3 | 0.1% |
| Total | 4927 | 100% |
Evaluation of the ensemble model on the test set, split onto the four score ranges.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| [1,2) | 452 | 0.6646 | 16 (4%) | 436 (96%) | -0.593 | 2.383 |
| [2,3) | 676 | 0.304 | 105 (16%) | 571 (84%) | -1.69 | 2.086 |
| [3,4) | 1966 | 0.1828 | 843 (43%) | 1123 (57%) | -1.85 | 1.849 |
| [4,5] | 1833 | 0.5649 | 1623 (89%) | 210 (11%) | -2.455 | 0.564 |
Comparative overview of the features used by existing systems.
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| Naive | Bags of words overlap | ✓ | ✓ | ✓ | ✓ | ||
| Bags of lemmatised/stemmed words overlap | ✓ | ✓ | |||||
| Set similarity of lemmatised effective words | ✓ | ✓ | |||||
| Jaccard similarity of set of words/lemmas | ✓ | ✓ | |||||
| Cosine similarity of vectors of lemmatisedeffective words | ✓ | ||||||
| Character subsequence/n-gram overlap | ✓ | ✓ | ✓ | ||||
| Weighted word overlap | ✓ | ✓ | |||||
| Numbers overlap | ✓ | ||||||
| Discourse Representation Structure overlaps | ✓ | ||||||
| POS tag based words comparison | ✓ | ||||||
| Window-based | Windows of words overlap | ✓ | |||||
| Size of the longest shared window of words | ✓ | ||||||
| Windows of effective words overlap | ✓ | ||||||
| Size of the longest shared window of effective words | ✓ | ||||||
| Windows of POS tags overlapand longest overlapped windows | ✓ | ||||||
| Other | Ratio of shared skipped bigrams | ✓ | ✓ | ||||
| Pairwise sentence polarity | ✓ | ✓ | ✓ | ✓ | |||
| Ratio of sentence lengths | ✓ | ✓ | ✓ | ✓ | |||
| Logical Model | ✓ | ||||||
| Alignment of lemma of words | ✓ | ✓ | |||||
| Dependencies features | ✓ | ✓ | ✓ | ||||
| Named Entity features | ✓ | ||||||
| Sentence Structure | Ratio of number of clauses | ✓ | |||||
| Reduced parse tree overlap | ✓ | ||||||
| Basic Similarity | Role-based word-by-word similarity | ✓ | ✓ | ✓ | |||
| Semantic similarity of effective words | ✓ | ✓ | |||||
| Cosine similarity IC vectors | ✓ | ||||||
| Role-based POS tags alignment | ✓ | ||||||
| Wordnet concepts difference | ✓ | ||||||
| Synonymy | WordNet-based synonym similarity | ✓ | ✓ | ||||
| FrameNet-based synonym similarity | ✓ | ||||||
| Antonymy | ✓ | ✓ | |||||
| Hypernymy | ✓ | ✓ | |||||
| Sense Disambiguation | Normalised set similarity of best senses | ✓ | ✓ | ||||
| Category level similarity of best senses | ✓ | ||||||
| Norm. set sim. of the best senses skipped bigrams | ✓ | ||||||
| Explicit Semantic Analysis | ✓ | ||||||
| Vector Space Model | Similarity of sets of associated terms | ✓ | |||||
| Cosine Similarity of Matrices of Associated Terms Vectors | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Weighted textual matrix factorization | ✓ | ||||||
| Distributional/Denotational Constituent Similarity | ✓ | ||||||
Experimental results achieved by our approach in comparison to the state of the art.
|
|
|
| |
|---|---|---|---|
|
|
|
| |
|
| 0.8207 |
| 0.3338 |
|
|
| 0.807±0.058 | 0.3250 |
|
| 0.8268 | - | 0.3223 |
|
| 0.8043 | - | 0.3593 |
|
| 0.799 | - | 0.3691 |