| Literature DB >> 34709718 |
Qianying Wang1, Jing Liao1, Mirella Lapata2, Malcolm Macleod1.
Abstract
We sought to apply natural language processing to the task of automatic risk of bias assessment in preclinical literature, which could speed the process of systematic review, provide information to guide research improvement activity, and support translation from preclinical to clinical research. We use 7840 full-text publications describing animal experiments with yes/no annotations for five risk of bias items. We implement a series of models including baselines (support vector machine, logistic regression, random forest), neural models (convolutional neural network, recurrent neural network with attention, hierarchical neural network) and models using BERT with two strategies (document chunk pooling and sentence extraction). We tune hyperparameters to obtain the highest F1 scores for each risk of bias item on the validation set and compare evaluation results on the test set to our previous regular expression approach. The F1 scores of best models on test set are 82.0% for random allocation, 81.6% for blinded assessment of outcome, 82.6% for conflict of interests, 91.4% for compliance with animal welfare regulations and 46.6% for reporting animals excluded from analysis. Our models significantly outperform regular expressions for four risk of bias items. For random allocation, blinded assessment of outcome, conflict of interests and animal exclusions, neural models achieve good performance; for animal welfare regulations, BERT model with a sentence extraction strategy works better. Convolutional neural networks are the overall best models. The tool is publicly available which may contribute to the future monitoring of risk of bias reporting for research improvement activities.Entities:
Keywords: automatic assessment; natural language processing; preclinical research synthesis; risk of bias
Mesh:
Year: 2021 PMID: 34709718 PMCID: PMC9298308 DOI: 10.1002/jrsm.1533
Source DB: PubMed Journal: Res Synth Methods ISSN: 1759-2879 Impact factor: 9.308
FIGURE 1Overall methods of text representations and classification models being tested [Colour figure can be viewed at wileyonlinelibrary.com]
Percentage of papers reporting each risk of bias item, and example sentences from full texts indicating the reporting
| Risk of bias item | Reporting percentage | Positive example |
|---|---|---|
| Random allocation | 27.5% | … a randomisation code is used to allocate animals to treatment group … |
| Blinded assessment of outcome | 30.6% | … the midbrain sections from each animal were screened for … by a person unaware of the treatment condition of the animals … |
| Conflict of interests | 78.0% | The authors declare that they have no competing interests |
| Animal welfare regulations | 31.5% | … experiments were performed in accordance with protocols by the Institutional Animal Care and Use Committee at … |
| Animal exclusions | 12.2% | … cases in which the lesion was assessed to involve less than <50% of the dopamine neurons, the animal was excluded from … |
Data statistics
| Samples for random allocation, blinded assessment of outcome and animal exclusions | Samples for conflict of interest and compliance with animal welfare regulations | |||||
|---|---|---|---|---|---|---|
| Train | Valid | Test | Train | Valid | Test | |
| No. documents | 6272 | 784 | 784 | 5671 | 708 | 710 |
| Avg No. tokens per document | 4977 | 5112 | 5077 | 4947 | 5057 | 4964 |
| Avg No. sentences per document | 180 | 186 | 184 | 178 | 182 | 178 |
| Avg No. tokens per sentence | 28 | 28 | 28 | 28 | 28 | 28 |
Note: Samples for random allocation, blinded assessment of outcome and animal exclusions consist of 7840 records; samples for compliance of animal welfare regulations and conflict of interests consist of 7089 records.
Performance of best model in three categories (baseline, neural model and BERT models with two strategies) for risk of bias items on the validation set
| Risks of bias item | Model | F1 | Recall | Precision | Specificity |
|---|---|---|---|---|---|
| Random allocation | SVM | 51.9 | 72.2 | 40.5 | 65.1 |
| LogReg | 56.3 | 75.3 | 44.9 | 69.7 | |
| RF | 67.2 | 79.9 | 58.1 | 81.0 | |
| CNN | 86.4 | 93.2 | 81.8 | 92.8 | |
|
|
| 92.4 | 83.7 | 93.7 | |
| HAN | 86.2 | 91.3 | 83.1 | 93.7 | |
| BERT‐DCP | 85.4 | 92.7 | 80.1 | 92.2 | |
| BERT‐SE | 80.6 | 82.0 | 82.0 | 92.7 | |
| Blinded assessment of outcome | SVM | 59.3 | 67.8 | 52.7 | 74.7 |
| LogReg | 60.0 | 69.1 | 53.0 | 74.6 | |
| RF | 57.8 | 68.3 | 50.2 | 71.8 | |
| CNN | 82.4 | 88.5 | 77.8 | 89.4 | |
| RNN + Attn | 83.0 | 91.1 | 77.2 | 88.5 | |
| HAN | 81.3 | 86.4 | 77.5 | 89.1 | |
|
|
| 91.8 | 77.0 | 87.7 | |
| BERT‐SE | 79.9 | 84.7 | 79.8 | 89.9 | |
| Conflict of interests | SVM | 67.1 | 79.7 | 57.9 | 77.7 |
| LogReg | 68.8 | 76.1 | 62.8 | 82.6 | |
| RF | 65.1 | 68.5 | 61.9 | 83.8 | |
|
|
| 86.8 | 84.1 | 93.8 | |
| RNN + Attn | 82.9 | 85.4 | 82.0 | 92.9 | |
| HAN | 83.2 | 84.7 | 82.8 | 93.2 | |
| BERT‐DCP | 79.5 | 84.6 | 76.8 | 90.1 | |
| BERT‐SE | 64.0 | 64.3 | 70.9 | 88.3 | |
| Compliance of animal welfare regulations | SVM | 90.1 | 96.3 | 84.6 | 42.8 |
| LogReg | 87.6 | 85.4 | 89.9 | 68.7 | |
| RF | 88.8 | 89.7 | 88.0 | 60.2 | |
| CNN | 86.9 | 83.3 | 92.4 | 97.4 | |
| RNN + Attn | 76.3 | 77.6 | 78.3 | 93.5 | |
| HAN | 79.3 | 77.9 | 84.5 | 94.9 | |
| BERT‐DCP | 93.8 | 92.1 | 95.8 | 87.7 | |
|
|
| 94.6 | 93.8 | 75.1 | |
| Animal exclusions | SVM | 39.0 | 64.3 | 28.0 | 72.5 |
| LogReg | 41.4 | 62.5 | 31.0 | 76.8 | |
| RF | 48.8 | 44.6 | 53.8 | 93.6 | |
|
|
| 73.6 | 54.2 | 89.7 | |
| RNN + Attn | 58.0 | 68.3 | 54.3 | 90.0 | |
| HAN | 53.4 | 58.4 | 54.0 | 88.9 | |
| BERT‐DCP | 56.2 | 77.0 | 46.8 | 84.7 | |
| BERT‐SE | 34.4 | 46,5 | 30.5 | 79.5 |
Note: ‘SVM’ represents support vector machine; ‘LogReg’ represents logistic regression; ‘RF’ represents random forest; ‘CNN’ represents convolutional neural network; ‘RNN + Attn’ represents recurrent neural network with attention; ‘HAN’ represents hierarchical attention network; ‘BERT‐DCP’ represents BERT model with document chunk pooling; ‘BERT‐SE’ represents BERT model with sentence extraction. For each risk of bias item the best performing approach (by F1 score) is given in bold.
Performance of the best natural language processing model and regular expression approach for each risk of bias item on the test set
| Risks of bias item | Model/approach | F1 | Recall | Precision | Specificity |
|---|---|---|---|---|---|
| Random allocation | RNN + Attn |
| 86.8 | 79.5 | 89.7 |
| Regular expression | 68.8 | 96.4 | 53.6 | 62.7 | |
| Blinded assessment of outcome | RNN + Attn |
| 87.8 | 78.2 | 88.4 |
| Regular expression | 68.3 | 59.8 | 79.6 | 92.1 | |
| Conflict of interests | CNN |
| 80.6 | 86.2 | 93.9 |
| Regular expression | 48.7 | 33.8 | 87.1 | 97.8 | |
| Compliance with animal welfare regulation | BERT‐SE |
| 91.4 | 92.0 | 70.9 |
| Regular expression | 55.2 | 40.9 | 85.2 | 78.2 | |
| Animal exclusions | CNN | 46.6 | 56.5 | 45.0 | 89.7 |
| Regular expression | – | – | – | – |
Note: A regular expression approach has not been developed for animal exclusions. ‘CNN’ represents convolutional neural network; ‘RNN + Attn’ represents recurrent neural network with attention; ‘BERT‐SE’ represents BERT model with sentence extraction.
p < 0.05 v Regular Expression approach, McNemar's test.
An example of model predication and relevant sentence extraction for risk of bias items on a full‐text publication
| Risk of bias item | True | Prediction | High‐scored sentences |
|---|---|---|---|
| Random allocation | Yes | 99.97% | In the last 5 min of this habituation period, three 5 s, 56 dB, substartle‐threshold white noise tones were presented randomly by computer |
| Blinded assessment of outcome | Yes | 99.99% | Video records of 11 randomly selected animals were recorded by an observer blind to the experimental conditions |
| Conflict of interests | No | 0.32% | Schematic depictions of the regions dissected for neurochemical analysis are presented in Figure |
| Animal welfare regulations | No | 3.68% | Role of the amygdala in the coordination of behavioural, neuroendocrine and prefrontal cortical monoamine responses to psychological stress in the rat |
| Animal exclusions | Yes | 99.99% | In 8 of the original 26 lesioned animals in the pre‐training experiment, the lesions were judged incomplete by the criteria above and were excluded from the data analyses |
Note: Prediction probabilities are generated from the optimal model of each item, and most relevant sentences are extracted by the hierarchical attention network.
FIGURE 2Most important words in the decision of classification for each risk of bias item, based on the average attention scores from RNN output over all positive samples [Colour figure can be viewed at wileyonlinelibrary.com]
FIGURE 3Percentages of false positive, false negative, true positive and true negative of each optimal model for the corresponding risk of bias item on the test set [Colour figure can be viewed at wileyonlinelibrary.com]