| Literature DB >> 34717768 |
Sungmin Aum1,2,3,4, Seon Choe5.
Abstract
BACKGROUND: Systematic reviews (SRs) are recognized as reliable evidence, which enables evidence-based medicine to be applied to clinical practice. However, owing to the significant efforts required for an SR, its creation is time-consuming, which often leads to out-of-date results. To support SR tasks, tools for automating these SR tasks have been considered; however, applying a general natural language processing model to domain-specific articles and insufficient text data for training poses challenges.Entities:
Keywords: Deep learning; Process automation; Systematic review; Text mining
Mesh:
Year: 2021 PMID: 34717768 PMCID: PMC8556883 DOI: 10.1186/s13643-021-01763-w
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Fig. 1Procedure of building the srBERT model using datasets obtained via previous SRs. The abstracts of documents downloaded in Endnote are used to create the model vocabulary and pre-train the model. Data categorized as “Title,” which were obtained through manual screening, were used for the fine-tuning of srBERT. SR, systematic review; BERT, Bidirectional Encoder Representations from Transformers
Fig. 2Compositions of the three BERT models. srBERT was pre-trained with domain-specific literature data, whereas the original BERT model was pre-trained using Wikipedia and books. srBERTmy used the vocabulary created by domain-specific literature data, whereas srBERTmix used that of the original BERT model. All three models were fine-tuned using titles from literature data.. SR, systematic review; BERT, Bidirectional Encoder Representations from Transformers
Performance of the models for the first task of article screening using the original datasetA
| srBERTmy250K | srBERTmix | Original BERT | K-neighbors | SVC | DecisionTree | RandomForest | Adaboost | MultinomialNB | |
|---|---|---|---|---|---|---|---|---|---|
| AUC | 76.785 | 50.000 | 50.685 | 57.985 | 50.000 | 57.449 | 53.650 | 55.097 | 50.000 |
| Accuracy | 94.353 | 89.945 | 90.083 | 90.083 | 89.945 | 89.118 | 89.945 | 90.358 | 89.945 |
| Precision | 83.333 | 0.000 | 100.000 | 52.000 | 0.000 | 40.620 | 50.000 | 61.538 | 0.000 |
| Recall | 54.795 | 0.000 | 13.60 | 17.808 | 0.000 | 17.808 | 8.219 | 10.959 | 0.000 |
| F1 | 66.116 | 0.000 | 26.84 | 26.531 | 0.000 | 24.762 | 14.118 | 18.605 | 0.000 |
SR systematic review, BERT bidirectional encoder representations from transformers, srBERT srBERTmy model trained for 250 K steps, AUC area under the curve, SVC support vector classification, MultinomialNB multinomial naive Bayes model
Performance of the models for the first task of article screening using the adjusted datasetA
| srBERTmy355K | srBERTmix | Original BERT | K-neighbors | SVC | DecisionTree | RandomForest | Adaboost | MultinomialNB | |
|---|---|---|---|---|---|---|---|---|---|
| AUC | 90.016 | 50.000 | 50.000 | 58.976 | 50.000 | 66.258 | 66.431 | 57.319 | 53.158 |
| Accuracy | 89.380 | 77.120 | 71.009 | 75.590 | 77.123 | 77.594 | 78.420 | 78.066 | 77.241 |
| Precision | 68.900 | 0.000 | 0.000 | 44.715 | 0.000 | 51.163 | 53.416 | 56.061 | 51.515 |
| Recall | 91.100 | 0.000 | 0.000 | 28.351 | 0.000 | 45.361 | 44.330 | 19.072 | 8.763 |
| F1 | 78.460 | 0.000 | 0.000 | 34.700 | 0.000 | 48.087 | 48.451 | 28.462 | 14.978 |
SR systematic review, BERT bidirectional encoder representations from transformers, srBERT srBERTmy model trained for 355 K steps, AUC area under the curve, SVC support vector classification, MultinomialNB multinomial naive Bayes model
Performance of srBERTmy with respect to the learning steps for the second task (relation extraction) using datasetB
| srBERTmy50K | srBERTmy100K | srBERTmy150K | srBERTmy200K | srBERTmy250K | srBERTmy300K | srBERTmy350K | srBERTmix | Original BERT | |
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | 0.922 | 0.935 | 0.896 | 0.909 | 0.922 | 0.909 | 0.909 | 0.922 | 0.922 |
| Loss | 0.337 | 0.270 | 0.542 | 0.540 | 0.328 | 0.658 | 0.658 | 0.309 | 0.232 |
SR systematic review, BERT bidirectional encoder representations from transformers, srBERT srBERTmy model trained for # K steps