| Literature DB >> 36180888 |
Qianying Wang1, Jing Liao1, Mirella Lapata2, Malcolm Macleod3.
Abstract
BACKGROUND: Natural language processing could assist multiple tasks in systematic reviews to reduce workflow, including the extraction of PICO elements such as study populations, interventions, comparators and outcomes. The PICO framework provides a basis for the retrieval and selection for inclusion of evidence relevant to a specific systematic review question, and automatic approaches to PICO extraction have been developed particularly for reviews of clinical trial findings. Considering the difference between preclinical animal studies and clinical trials, developing separate approaches is necessary. Facilitating preclinical systematic reviews will inform the translation from preclinical to clinical research.Entities:
Keywords: Information extraction; Named entity recognition; PICO; Preclinical animal study; Self-training
Mesh:
Year: 2022 PMID: 36180888 PMCID: PMC9524079 DOI: 10.1186/s13643-022-02074-4
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Fig. 1Preclinical PICO annotation example. Screenshot from tagtog
Statistics of 400 annotated PICO dataset
| PICO sentences | 5 |
| Sentences | 11 |
| Entities | 17.5 |
| Intervention | 24.1% |
| Comparator | 1.8% |
| Outcome | 40.6% |
| Induction | 10.6% |
| Species | 19.6% |
| Strain | 3.3% |
| Total | 100% |
Fig. 2The workflow of the PICO extraction
Fig. 3The workflow of the self-training in our experiments
Performance of PICO sentence classification by BERT with different pre-trained weights on the test set
| F1 | Recall | Precision | |
|---|---|---|---|
| BERT-base | 80.6 | 81.4 | 82.1 |
| BioBERT | 84.3 | 81.0 | 90.0 |
| PubMedBERT-abs | 85.4 | 88.4 | 85.0 |
| PubMedBERT-full | 84.2 | 87.1 | 83.8 |
Overall performance of the PICO entity recognition models on the test set
| Model | Weight | F1 | Recall | Precision |
|---|---|---|---|---|
| BiLSTM | – | 43.5 | 38.1 | 50.6 |
| BiLSTM-CRF | – | 57.9 | 54.7 | 61.6 |
| BERT | Base | 61.3 | 66.3 | 57.1 |
| BioBERT | 65.4 | 69.8 | 61.5 | |
| PubMed-abs | 70.1 | 73.2 | 67.3 | |
| PubMed-full | 69.9 | 73.4 | 66.7 | |
| BERT-CRF | Base | 62.1 | 67.2 | 57.8 |
| BioBERT | 66.5 | 70.1 | 63.3 | |
| PubMed-abs | 68.0 | 71.5 | 64.9 | |
| PubMed-full | 67.5 | 70.9 | 64.5 | |
BERT - BiLSTM - CRF | Base | 64.6 | 69.5 | 60.3 |
| BioBERT | 68.3 | 71.2 | 65.6 | |
| PubMed-abs | 67.2 | 70.8 | 64.0 | |
| PubMed-full | 68.5 | 72.6 | 64.8 |
Entity-level performance of PubMedBERT on the gold test set. Original scores refer to the performance of the model before self-training; self-training scores refer to the performance of the model at the best iteration (6th iteration) of self-training. ‘R’ and ‘P’ refer to recall and precision, respectively
| Original scores | Self-training scores | |||||
|---|---|---|---|---|---|---|
| F1 | R | P | F1 | R | P | |
| 16.0 | 10.0 | 40.0 | 48.5 | 40.0 | 61.5 | |
| 49.1 | 50.6 | 47.7 | 48.0 | 49.4 | 46.6 | |
| 70.2 | 76.1 | 65.2 | 69.8 | 74.6 | 65.6 | |
| 65.4 | 70.6 | 60.9 | 66.9 | 70.6 | 63.6 | |
| 98.1 | 100.0 | 96.4 | 98.1 | 100.0 | 96.4 | |
| 63.4 | 72.2 | 56.5 | 70.0 | 77.8 | 63.6 | |
| 69.9 | 73.4 | 66.7 | 71.0 | 74.0 | 68.2 | |
Fig. 4Performance of PubMedBERT for PICO entity recognition using self-training
Fig. 5The visualisation of the Streamlit app
Performance of PICO sentence classification by BERT with different pre-trained weights on the validation set
| F1 | Recall | Precision | |
|---|---|---|---|
| BERT-base | 86.6 | 87.7 | 87.2 |
| BioBERT | 87.7 | 89.6 | 88.1 |
| PubMedBERT-abs | 89.3 | 91.3 | 89.1 |
| PubMedBERT-full | 85.8 | 89.3 | 84.6 |
Overall performance of PICO entity recognition models on the validation set
| Model | Weight | F1 | Recall | Precision |
|---|---|---|---|---|
| BiLSTM | – | 41.7 | 44.2 | 39.5 |
| BiLSTM-CRF | – | 58.8 | 56.9 | 61.0 |
| BERT | Base | 56.0 | 62.7 | 50.6 |
| BioBERT | 64.2 | 69.8 | 59.4 | |
| PubMed-abs | 65.0 | 70.5 | 60.2 | |
| PubMed-full | 68.1 | 73.0 | 63.8 | |
| BERT-CRF | Base | 57.7 | 62.9 | 53.3 |
| BioBERT | 65.1 | 70.0 | 60.9 | |
| PubMed-abs | 65.5 | 70.9 | 60.9 | |
| PubMed-full | 68.0 | 72.8 | 63.7 | |
BERT - BiLSTM - CRF | Base | 60.8 | 66.4 | 56.1 |
| BioBERT | 66.0 | 70.0 | 62.5 | |
| PubMed-abs | 68.1 | 73.3 | 63.5 | |
| PubMed-full | 68.0 | 72.8 | 63.8 |
Entity-level performance of PubMedBERT on the gold validation set. Original scores refer to the performance of the model before self-training; self-training scores refer to the performance of the model at the best iteration (6th iteration) of self-training. ‘R’ and ‘P’ refer to recall and precision, respectively
| Original scores | Self-training scores | |||||
|---|---|---|---|---|---|---|
| F1 | R | P | F1 | R | P | |
| 33.3 | 66.7 | 22.2 | 80.0 | 66.7 | 100.0 | |
| 46.2 | 50.9 | 42.3 | 45.7 | 40.7 | 52.2 | |
| 67.3 | 69.6 | 65.2 | 69.6 | 75.0 | 64.9 | |
| 61.5 | 68.0 | 56.2 | 69.9 | 73.5 | 66.7 | |
| 96.4 | 99.1 | 93.9 | 96.4 | 99.1 | 93.9 | |
| 80.0 | 80.0 | 80.0 | 90.9 | 100.0 | 83.3 | |
| 68.1 | 73.0 | 63.8 | 73.6 | 76.4 | 71.0 | |