| Literature DB >> 32609092 |
Kun Zeng1, Zhiwei Pan1, Yibin Xu2, Yingying Qu3.
Abstract
BACKGROUND: Eligibility criteria are the main strategy for screening appropriate participants for clinical trials. Automatic analysis of clinical trial eligibility criteria by digital screening, leveraging natural language processing techniques, can improve recruitment efficiency and reduce the costs involved in promoting clinical research.Entities:
Keywords: Clinical trial; Deep learning; Eligibility criteria; Ensemble learning; Text classification
Year: 2020 PMID: 32609092 PMCID: PMC7367522 DOI: 10.2196/17832
Source DB: PubMed Journal: JMIR Med Inform
Examples of eligibility criteria texts and corresponding annotated categories.
| Eligibility criteria text | Annotated category |
| 年龄>80岁 (Age>80) | Age |
| 近期颅内或椎管内手术史 (Recent intracranial or spinal canal surgery) | Therapy or surgery |
| 血糖<2.7 mmol/L (Blood glucose<2.7 mmol/L) | Laboratory examinations |
| 2)性别不限,年龄18~70岁 (Unlimited gender, aged 18-70 years) | Multiple |
| 合并造血系统或恶性肿瘤等严重原发性疾病 (A serious primary disease, such as one involving the hematopoietic system or a malignant tumor) | Disease |
| 其他研究者认为不适合参加本研究的患者 (Patients that are unsuitable for this study that were considered by other investigators) | Researcher decision |
| 预期生存超过12周 (Expected survival over 12 weeks) | Life expectancy |
| 男、女不限 (Male or female) | Gender |
Figure 1The framework of the proposed model that contains two layers: a word embedding layer consisting of 4 pretrained models (BERT, XLNet, ERNIE, and RoBERTa); and a model ensemble layer containing LightGBM, used to learn information by combining the outputs of the 4 pretrained models. BERT: Bidirectional Encoder Representations from Transformers; ERNIE: Enhanced Representation through Knowledge Integration; LightGBM: Light Gradient Boosting Machine; RoBERTa: A Robustly Optimized BERT Pretraining Approach.
Figure 2Histogram distributions of the training set, validation set, and test set. The y-axis represents different labels, and the x-axis represents quantity.
The performance of our model and baseline models using the full training data set.
| Model | Accuracy | Precision | Recall | Macro F1 |
| BERTa | 0.836 | 0.779 | 0.802 | 0.788 |
| XLNet | 0.844 | 0.790 | 0.811 | 0.795 |
| ERNIEb | 0.836 | 0.786 | 0.795 | 0.783 |
| RoBERTac | 0.840 | 0.791 | 0.800 | 0.792 |
| Ensemble (Voting) | 0.846 | 0.800 | 0.812 | 0.802 |
| Our model | 0.846 | 0.803 | 0.817 | 0.808 |
aBERT: Bidirectional Encoder Representations from Transformers.
bERNIE: Enhanced Representation through Knowledge Integration.
cRoBERTa: A Robustly Optimized BERT Pretraining Approach.
Figure 3Histogram distributions of the training set, validation set, and test set. The y-axis represents different labels, and the x-axis represents quantity.
The performance of the 6 models using the reduced training data set.
| Model | Accuracy | Precision | Recall | Macro F1 |
| BERTa | 0.831 | 0.781 | 0.776 | 0.771 |
| XLNet | 0.839 | 0.797 | 0.759 | 0.773 |
| ERNIEb | 0.822 | 0.754 | 0.765 | 0.751 |
| RoBERTac | 0.832 | 0.7952 | 0.770 | 0.776 |
| Ensemble (Voting) | 0.832 | 0.795 | 0.770 | 0.776 |
| Our model | 0.834 | 0.790 | 0.785 | 0.780 |
aBERT: Bidirectional Encoder Representations from Transformers.
bERNIE: Enhanced Representation through Knowledge Integration.
cRoBERTa: A Robustly Optimized BERT Pretraining Approach.