| Literature DB >> 36068551 |
Jianfu Li1, Qiang Wei2, Omid Ghiasvand3, Miao Chen4, Victor Lobanov4, Chunhua Weng5, Hua Xu2.
Abstract
BACKGROUND: Clinical trial protocols are the foundation for advancing medical sciences, however, the extraction of accurate and meaningful information from the original clinical trials is very challenging due to the complex and unstructured texts of such documents. Named entity recognition (NER) is a fundamental and necessary step to process and standardize the unstructured text in clinical trials using Natural Language Processing (NLP) techniques.Entities:
Keywords: Clinical trial; Eligibility criteria; Named entity recognition; Pre-trained language model
Mesh:
Year: 2022 PMID: 36068551 PMCID: PMC9450226 DOI: 10.1186/s12911-022-01967-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Basic information and statistics of entities in the three EC corpora for NER
| Corpus | EliIE | Covance | Chia |
|---|---|---|---|
| Number of documents | 230 | 470 | 1000 |
| Source | Clinicaltrials.org | In-house by Covance | clinicaltrials.org |
| Disease Areas | Alzheimer’s disease only | All diseases | All diseases |
Main entities (entity types)—Count (number of occurrence) in the three EC corpora; numbers in the parentheses are nested occurrence for Chia corpus
| EliIE | Covance | Chia | |||
|---|---|---|---|---|---|
| Main entities | Count | Main entities | Count | Main entities | Count |
| Condition | 4138 | Condition | 21,022 | Condition | 12,039 (127) |
| Drug | 1465 | Drug | 13,671 | Drug | 3801 (24) |
| Qualifier | 1715 | Qualifier_Modifier | 12,953 | Qualifier | 4157 (127) |
| Measurement | 1029 | Measurement | 7732 | Measurement | 3305 (9) |
| Procedure_Device | 652 | Procedure | 5635 | Procedure | 3595 (54) |
| Observation | 1765 | Observation | 12,391 | Observation | 1216 (19) |
| Temporal_measurement | 812 | Temporal_constraint | 11,326 | Temporal | 3580 (1066) |
| Anatomic_location | 83 | Anatomic_location | 648 | Negation | 843 (0) |
| Negation_Cue | 1551 | Device | 386 (2) | ||
| Event | 4053 | Multiplier | 671 (8) | ||
| Permission_Cue | 2108 | Person | 1666 (2) | ||
| Demographics | 869 | Value | 4002 (60) | ||
| Device | 360 | Visit | 165 (1) | ||
| Refractory_condition | 662 | Mood | 616 (13) | ||
| Investigational_product | 559 | Reference_point | 934 (116) |
Fig. 1Examples of conversions of non-flat entities in the Chia corpus. Left: Nested entities; Right: Disjoint entities
Fig. 2Architecture of the NER task using pre-trained transformer models
Hyperparameters used for all the transformer models
| Hyperparameters | Value |
|---|---|
| training epochs | 10 |
| Learning rate | 5.00E−05 |
| Adam epsilon | 1.00E−08 |
| Training batch size | 8 |
| Maximum sequence length | 256 |
The strict and relaxed overall performance on the test sets of COVANCE, ELIIE, and CHIA corpora
| Models | Covance | EliIE | Chia | ||||||
|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | |
| BERT | 0.691 (0.810) | 0.719 (0.849) | 0.705 (0.829) | 0.810 (0.877) | 0.842 (0.917) | 0.826 (0.896) | 0.577 (0.701) | 0.620 (0.761) | 0.598 (0.730) |
| SpanBERT | 0.692 (0.810) | 0.718 (0.847) | 0.705 (0.828) | 0.813 (0.879) | 0.843 (0.917) | 0.828 (0.897) | 0.593 (0.711) | 0.628 (0.758) | 0.610 (0.734) |
| BioBERT | 0.694 (0.812) | 0.722 (0.851) | 0.708 (0.831) | 0.810 (0.879) | 0.837 (0.915) | 0.823 (0.896) | 0.589 (0.707) | 0.632 (0.765) | 0.609 (0.735) |
| BlueBERT | 0.689 (0.807) | 0.718 (0.848) | 0.703 (0.827) | 0.811 (0.880) | 0.838 (0.917) | 0.824 (0.898) | 0.590 (0.702) | 0.616 (0.737) | 0.603 (0.719) |
| PubMedBERT | |||||||||
| SciBERT | 0.696 (0.813) | 0.723 (0.850) | 0.709 (0.831) | 0.813 (0.883) | 0.839 (0.915) | 0.825 (0.899) | 0.589 (0.709) | 0.634 (0.768) | 0.611 (0.737) |
Bold values were calculated using the Wilcoxon rank sum test. The Wilcoxon rank sum test is a non-parametric test method that determines whether the means of strict F1 scores (Bold values) from the 10-fold experiments of the PubMedBERT model and each other model (BERT, SpanBERT, BioBERT, SciBERT) are statistically different from each other based on ranks rather than the original F1 scores of the experiments. The detailed definition of the Wilcoxon rank sum test can be found in the reference [33] as shown in the manuscript
Numbers in the parentheses are results based on relaxed criteria
*Indicates p < 0.05 when comparing to other pre-trained models
The strict performance of the PubMedBERT model for each main entity across the three corpora
| Main entities | Covance | Main entities | EliIE | Main entities | Chia | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | |||
| Condition | 0.783 | 0.806 | 0.795 | Condition | 0.871 | 0.892 | 0.881 | Condition | 0.742 | 0.773 | 0.757 |
| Drug | 0.734 | 0.762 | 0.748 | Drug | 0.850 | 0.881 | 0.865 | Drug | 0.747 | 0.798 | 0.771 |
| Qualifier_Modifier | 0.597 | 0.599 | 0.598 | Qualifier | 0.780 | 0.814 | 0.796 | Qualifier | 0.444 | 0.486 | 0.462 |
| Measurement | 0.786 | 0.818 | 0.801 | Measurement | 0.863 | 0.871 | 0.866 | Measurement | 0.669 | 0.689 | 0.678 |
| Procedure | 0.651 | 0.674 | 0.662 | Procedure_device | 0.725 | 0.765 | 0.742 | Procedure | 0.574 | 0.630 | 0.600 |
| Observation | 0.651 | 0.679 | 0.664 | Observation | 0.754 | 0.792 | 0.771 | Observation | 0.278 | 0.260 | 0.267 |
| Temporal_constraint | 0.717 | 0.751 | 0.733 | Temporal_measurement | 0.807 | 0.829 | 0.815 | Temporal | 0.552 | 0.638 | 0.592 |
| Anatomic_location | 0.458 | 0.407 | 0.429 | Anatomic_location | 0.519 | 0.499 | 0.507 | Negation | 0.569 | 0.626 | 0.595 |
| Negation_Cue | 0.500 | 0.502 | 0.501 | Device | 0.528 | 0.515 | 0.520 | ||||
| Event | 0.814 | 0.848 | 0.830 | Multiplier | 0.374 | 0.406 | 0.388 | ||||
| Permission_Cue | 0.578 | 0.635 | 0.604 | Person | 0.795 | 0.824 | 0.808 | ||||
| Demographics | 0.714 | 0.743 | 0.727 | Value | 0.727 | 0.745 | 0.735 | ||||
| Device | 0.565 | 0.567 | 0.559 | Visit | 0.504 | 0.579 | 0.530 | ||||
| Refractory_condition | 0.519 | 0.586 | 0.547 | Mood | 0.302 | 0.360 | 0.325 | ||||
| Investigational_product | 0.657 | 0.630 | 0.641 | Reference_point | 0.398 | 0.524 | 0.453 | ||||
The strict performance for the common main entities of COVANCE with augment corpora using the PubMedBERT model
| Main entities | Covance | Covance + EliIE | Covance + Chia | Covance + EliIE + Chia | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
| Condition | 0.783 | 0.806 | 0.795 | 0.784 | 0.808 | 0.796 | 0.765 | 0.801 | 0.783 | 0.767 | 0.799 | 0.782 |
| Drug | 0.734 | 0.762 | 0.748 | 0.734 | 0.761 | 0.747 | 0.731 | 0.754 | 0.742 | 0.727 | 0.756 | 0.741 |
| Measurement | 0.786 | 0.818 | 0.801 | 0.783 | 0.814 | 0.798 | 0.751 | 0.790 | 0.770 | 0.748 | 0.786 | 0.766 |
| Observation | 0.651 | 0.679 | 0.664 | 0.651 | 0.678 | 0.664 | 0.643 | 0.657 | 0.650 | 0.650 | 0.661 | 0.655 |
| Procedure | 0.651 | 0.674 | 0.662 | 0.652 | 0.660 | 0.656 | 0.636 | 0.665 | 0.650 | 0.632 | 0.663 | 0.647 |
| Qualifier_Modifier | 0.597 | 0.599 | 0.598 | 0.602 | 0.595 | 0.598 | 0.580 | 0.572 | 0.576 | 0.584 | 0.579 | 0.581 |
| Temporal_constraint | 0.717 | 0.751 | 0.733 | 0.720 | 0.751 | 0.735 | 0.707 | 0.750 | 0.728 | 0.707 | 0.748 | 0.727 |
| Overall | 0.704 | 0.727 | 0.715 | 0.712 | 0.731 | 0.697 | 0.720 | 0.708 | 0.697 | 0.721 | 0.709 | |
Bold values were calculated using the Wilcoxon rank sum test. The Wilcoxon rank sum test is a non-parametric test method that determines whether the means of strict F1 scores (Bold values) from the 10-fold experiments of the PubMedBERT model and each other model (BERT, SpanBERT, BioBERT, SciBERT) are statistically different from each other based on ranks rather than the original F1 scores of the experiments. The detailed definition of the Wilcoxon rank sum test can be found in the reference [33] as shown in the manuscript
*Indicates p < 0.05 when comparing to the original Covance corpus
Computational time for training all the models on three corpora
| Models | Training time (seconds per epoch) | ||
|---|---|---|---|
| Covance | EliIE | Chia | |
| BERT | 518.4 | 69.9 | 212.3 |
| SpanBERT | 520.3 | 70.5 | 212.3 |
| BioBERT | 343.4 | 30.9 | 92.6 |
| BlueBERT | 529.8 | 69.6 | 212.6 |
| PubMedBERT | 395.7 | 30.7 | 92.5 |
| SciBERT | 341.7 | 30.5 | 92.3 |
Percentages of vocabulary coverage of BERT, PubMedBERT, and SCIBERT in ELIIE, COVANCE, and CHIA
| EliIE (%) | Covance (%) | Chia (%) | |
|---|---|---|---|
| BERT | 47.5 | 28.1 | 34.3 |
| PubMedBERT | 63.2 | 44.4 | 53.4 |
| SciBERT | 54.8 | 34.1 | 41.9 |