| Literature DB >> 31805934 |
Austin J Brockmeier1,2, Meizhi Ju1, Piotr Przybyła1,3, Sophia Ananiadou4,5.
Abstract
BACKGROUND: Machine learning can assist with multiple tasks during systematic reviews to facilitate the rapid retrieval of relevant references during screening and to identify and extract information relevant to the study characteristics, which include the PICO elements of patient/population, intervention, comparator, and outcomes. The latter requires techniques for identifying and categorising fragments of text, known as named entity recognition.Entities:
Keywords: Active learning; Evidence-based medicine; Logistic regression; Machine learning; Systematic review; Text mining
Mesh:
Year: 2019 PMID: 31805934 PMCID: PMC6896258 DOI: 10.1186/s12911-019-0992-8
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1PICO recognition example. Visualisation of the trained model’s predictions of PICO elements within a reference (title and abstract) from the Proton Pump Inhibitors review. The interventions tags correspond to drug names, participant spans cover characteristics of the population, but erroneously include details of the intervention. The latter demonstrates the model’s ability to nest shorter spans within longer pans. The outcomes cover spans for qualitative and quantitative measures. Screenshot from the brat system [23]
Fig. 2PICO recognition and abstract screening process. In the first phase, the PICO recognition model is trained to predict the PICO mention spans on a human annotated corpus of abstracts. In the second phase, a collection of abstracts is processed by the PICO recognition model and the results along with the original abstract are used to create a vector representation of each abstract. In the final phase, a user labels abstracts as being included (relevant) or excluded, these decisions are used to train a machine learning (ML) model that uses the vector representation. The ML model is applied to the remaining unlabelled abstracts, which are then sorted by their predicted relevancy, the user sees the top ranked abstracts, labels them, and this process repeats
The top-level and fine-grained PICO elements in the training set for the PICO recognition model
| Top-level | Patient-population-problem | Intervention/Comparator | Outcome |
|---|---|---|---|
| Fine-grained | Age | Control | Adverse effect |
| Condition | Educational | Mental | |
| Sample size | Pharmacological | Mortality | |
| Sex | Physical | Pain | |
| Psychological | Physical | ||
| Surgical | Other | ||
| Other |
Details of the 3-layer network architecture for the PICO recognition model
| Layer | Size | Source | |
|---|---|---|---|
| 1a | Word embedding | 200 | [ |
| 1b | Character embedding | 28 | trained from random initialisation |
| 1c | Character-based word representation | 2 ×28 | biLSTM applied to 1b |
| 1d | Combined embedding | 256 | concatenation of 1a and 1c |
| 2 | Recurrent layer | 2 ×128 | biLSTM over 1d |
| 3 | Linear layer | 41 | affine projection of 2 |
| CRF output | 1 | most likely sequence of tags | |
DERP systematic review descriptive statistics
| Review | Inc. | Exc. | Tot. | Prev. |
|---|---|---|---|---|
| ACE Inhibitors | 2544 | 41 | 2503 | 1.61% |
| ADHD | 851 | 20 | 831 | 2.35% |
| Antihistamines | 310 | 16 | 294 | 5.16% |
| Atypical Antipsychotics | 1120 | 146 | 974 | 13.04% |
| Beta Blockers | 2072 | 42 | 2030 | 2.03% |
| Calcium Channel Blockers | 1218 | 100 | 1118 | 8.21% |
| Estrogens | 368 | 80 | 288 | 21.74% |
| NSAIDS | 393 | 41 | 352 | 10.43% |
| Opioids | 1915 | 15 | 1900 | 0.78% |
| Oral Hypoglycemics | 503 | 136 | 367 | 27.04% |
| Proton Pump Inhibitors | 1333 | 51 | 1282 | 3.83% |
| Skeletal Muscle Relaxants | 1643 | 9 | 1634 | 0.55% |
| Statins | 3465 | 85 | 3380 | 2.45% |
| Triptans | 671 | 24 | 647 | 3.58% |
| Urinary Incontinence | 327 | 40 | 287 | 12.23% |
Abbreviated columns correspond to the number of inclusions (relevant references), exclusions, total number of references, and the prevalence (percentage of inclusions compared to total)
OHAT and COMARADES systematic review descriptive statistics
| Review | Inc. | Exc. | Tot. | Prev. |
|---|---|---|---|---|
| PFOA/PFOS and Immunotoxicity | 6331 | 95 | 6236 | 1.50% |
| Bisphenol A (BPA) and Obesity | 7700 | 111 | 7589 | 1.44% |
| Transgenerational Inheritance of Health Effects | 48638 | 765 | 47873 | 1.57% |
| Fluoride and Neurotoxicity in Animal Models | 4479 | 51 | 4428 | 1.14% |
| Neuropathic Pain | 29207 | 5011 | 24196 | 17.16% |
Abbreviated columns correspond to the number of inclusions (relevant references), exclusions, total number of references, and the prevalence (percentage of inclusions compared to total)
PICO recognition performance in terms of a token-wise evaluation and a document-level filtered bag-of-words (BOW)
| Token-wise | Document-level BOW | |||||
|---|---|---|---|---|---|---|
| Precision | Recall | F-1 | Precision | Recall | F-1 | |
| Participants | 0.81 | 0.62 | 0.70 | 0.86 | 0.71 | 0.78 |
| Interventions | 0.69 | 0.47 | 0.56 | 0.83 | 0.52 | 0.64 |
| Outcomes | 0.66 | 0.75 | 0.70 | 0.73 | 0.81 | 0.77 |
Relevancy feedback performance in terms of WSS@95% on DERP systematic review collections
| [ | [ | LR | PICO | ||
|---|---|---|---|---|---|
| ACE Inhibitors | 74.3 | *82.7 | 74.7 | 74.4 | -0.3 |
| ADHD | 67.9 | *82.1 | 67.5 | 68.9 | 1.4 |
| Antihistamines | *24.5 | 17.7 | -1.7 | -1.9 | -0.1 |
| Atypical Antipsychotics | 18.0 | *33.6 | 18.0 | 20.5 | 2.5 |
| Beta Blockers | 65.0 | *68.5 | 54.7 | 55.7 | 1.1 |
| Calcium Channel Blockers | 17.3 | 12.8 | *47.6 | 47.1 | -0.5 |
| Estrogens | 22.6 | 28.5 | 36.6 | *39.1 | 2.4 |
| NSAIDS | *77.4 | 64.1 | 60.9 | 63.1 | 2.2 |
| Opioids | 9.0 | 17.4 | 19.5 | *34.1 | 14.6 |
| Oral Hypoglycemic | 13.5 | *15.9 | 6.9 | 9.2 | 2.3 |
| Proton Pump Inhibitors | 19.7 | 21.0 | *21.2 | 18.3 | -2.9 |
| Skeletal Muscle Relaxants | *58.6 | 29.9 | 25.9 | 32.4 | 6.5 |
| Statins | 27.8 | *43.7 | 42.9 | 43.3 | 0.3 |
| Triptans | 39.6 | *54.1 | 34.3 | 52.4 | 18.1 |
| Urinary Incontinence | 20.8 | 41.6 | 44.8 | *46.4 | 1.6 |
| Average | 37.1 | 40.9 | 36.9 | 40.2 | 3.3 |
indicates the change between adding the PICO features to the baseline logistic regression classifier (LR)
*indicate best performance per review
Two-fold relevancy prediction in terms of WSS@95% on DERP systematic review collections
| [ | [ | [ | LR | PICO | ||
|---|---|---|---|---|---|---|
| ACE Inhibitors | 52.3 | 73.3 | *80.1 | 78.5 | 77.6 | -0.9 |
| ADHD | 62.2 | 52.6 | *79.3 | 75.5 | 74.5 | -0.9 |
| Antihistamines | 14.9 | *23.6 | 13.7 | 4.9 | 5.0 | 0.1 |
| Atypical Antipsychotics | 20.6 | 17.0 | *25.1 | 19.9 | 20.9 | 1.0 |
| Beta Blockers | 36.7 | 46.5 | 42.8 | *55.5 | 54.1 | -1.4 |
| Calcium Channel Blockers | 23.4 | 43.0 | *44.8 | 38.8 | 39.3 | 0.6 |
| Estrogens | 37.5 | 41.4 | *47.1 | 41.0 | 43.7 | 2.7 |
| NSAIDS | 52.8 | 67.2 | *73.0 | 65.3 | 66.5 | 1.2 |
| Opioids | 55.4 | 36.4 | *82.6 | 53.3 | 57.0 | 3.7 |
| Oral Hypoglycemic | 8.5 | *13.6 | 11.7 | 7.1 | 8.9 | 1.8 |
| Proton Pump Inhibitors | 22.9 | 32.8 | *37.8 | 32.6 | 31.0 | -1.6 |
| Skeletal Muscle Relaxants | 26.5 | 37.4 | *55.6 | 40.1 | 45.3 | 5.3 |
| Statins | 31.5 | *49.1 | 43.6 | 42.2 | 44.3 | 2.1 |
| Triptans | 27.4 | 34.6 | 41.2 | 40.6 | *51.2 | 10.5 |
| Urinary Incontinence | 29.6 | 43.2 | *53.0 | 52.4 | 52.4 | 0.0 |
| Average | 33.5 | 40.8 | 48.8 | 43.2 | 44.8 | 1.6 |
indicates the change between adding the PICO features to the baseline logistic regression classifier (LR)
*indicate best performance per review
Two-fold relevancy prediction in terms of WSS@95% on OHAT and CAMARADES systematic review collections
| [ | LR | PICO | ||
|---|---|---|---|---|
| PFOA/PFOS and Immunotoxicity | 80.5 | 84.0 | *84.6 | 0.7 |
| Bisphenol A (BPA) and Obesity | 75.2 | 77.9 | *78.6 | 0.8 |
| Transgenerational Inheritance of Health Effects | 71.4 | *74.3 | *74.3 | 0.0 |
| Fluoride and Neurotoxicity in Animal Models | 87.0 | 89.3 | *89.4 | 0.1 |
| Neuropathic Pain | *69.1 | 64.3 | 64.1 | -0.1 |
| Average | 76.6 | 77.9 | 78.2 | 0.3 |
indicates the change between adding the PICO features to the baseline logistic regression classifier (LR)
*indicate best performance per review
Fig. 3Comparison of BOW and BERT word vectors as the machine learning representation. The two-fold relevancy prediction performance is reported in terms of WSS@95% across the DERP collections, sorted by BOW performance. In each group, the different colored bars correspond to BOW, BOW including PICO features, BERT, and BERT including PICO features. Bar heights are the average across 100 Monte Carlo trials. In the WSS@95% plot, the upper error bars indicate the standard deviation across the 100 Monte Carlo trials
PICO features with strong relevancy within the Proton Pump Inhibitors systematic review
| PICO | PPV | TP/FP | |||
|---|---|---|---|---|---|
| Tag | Lemma | PICO | BOW | PICO | BOW |
| O | relief | 0.21 | 0.17 | 21/78 | 22/111 |
| O | healing | 0.13 | 0.11 | 33/215 | 33/264 |
| O | heartburn | 0.15 | 0.11 | 16/94 | 16/125 |
| O | pain | 0.15 | 0.12 | 14/79 | 14/98 |
| P | oesophagitis | 0.14 | 0.11 | 13/77 | 13/107 |
| O | rate | 0.07 | 0.07 | 35/439 | 35/501 |
| P | grade | 0.15 | 0.08 | 8/44 | 8/90 |
| O | safety | 0.10 | 0.08 | 11/94 | 11/122 |
| P | reflux | 0.07 | 0.05 | 23/311 | 23/441 |
Positive predictive value (PPV) is the proportion of true positives (TP) to the total number of TP and false positives (FP). Each TP corresponds to an inclusion containing the feature; each FP corresponds to an exclusion containing the feature
PICO features with strong relevancy within the Triptans systematic review
| PICO | PPV | TP/FP | |||
|---|---|---|---|---|---|
| Tag | Lemma | PICO | BOW | PICO | BOW |
| O | relief | 0.68 | 0.61 | 96/46 | 106/67 |
| O | headache | 0.53 | 0.43 | 130/113 | 161/212 |
| P | migraine | 0.50 | 0.41 | 138/138 | 198/281 |
| P | treat | 0.78 | 0.59 | 49/14 | 124/85 |
| O | pain | 0.59 | 0.52 | 90/63 | 96/89 |
| O | severe | 0.80 | 0.60 | 40/10 | 89/60 |
| O | moderate | 0.79 | 0.63 | 34/9 | 94/55 |
| O | response | 0.59 | 0.49 | 51/35 | 71/75 |
| I | sumatriptan | 0.43 | 0.41 | 141/187 | 145/211 |
| O | mild | 0.73 | 0.53 | 29/11 | 71/62 |
| O | migraine | 0.51 | 0.41 | 74/70 | 198/281 |
| O | functional | 0.81 | 0.56 | 21/5 | 25/20 |
| O | effective | 0.82 | 0.47 | 18/4 | 106/120 |
| O | patient | 0.67 | 0.43 | 26/13 | 194/253 |
| O | complete | 0.71 | 0.47 | 15/6 | 36/40 |
| O | reduction | 0.64 | 0.42 | 16/9 | 30/42 |
| O | reduce | 0.87 | 0.38 | 7/1 | 39/63 |
| O | migraine-specific | 0.80 | 0.50 | 8/2 | 10/10 |
Positive predictive value (PPV) is the proportion of true positives (TP) to the total number of TP and false positives (FP). Each TP corresponds to an inclusion containing the feature; each FP corresponds to an exclusion containing the feature