| Literature DB >> 35451969 |
Yuanyuan Sun1,2, Dongping Gao1, Xifeng Shen1, Meiting Li1, Jiale Nan1, Weining Zhang1.
Abstract
BACKGROUND: With the prevalence of online consultation, many patient-doctor dialogues have accumulated, which, in an authentic language environment, are of significant value to the research and development of intelligent question answering and automated triage in recent natural language processing studies.Entities:
Keywords: BERT; Bidirectional Encoder Representations from Transformers; China; Chinese; ERNIE; Enhanced Representation through Knowledge Integration; automatic classification; classification; machine learning; model; named entity; natural language processing; neural network; online consultation; patient doctor dialogue; patient-physician dialogue; semantics
Year: 2022 PMID: 35451969 PMCID: PMC9073616 DOI: 10.2196/35606
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Construction of our model. BERT: Bidirectional Encoder Representations from Transformers; CNN: convolutional neural network.
Figure 2Bidirectional Encoder Representations from Transformers input characterization.
Figure 3Transformer encoder structure.
Figure 4An example of whole word masking in our model.
Attribute statistics in the training data and the model training results of the test data.
| Data set | Training data (N=118,976) | Test data (N=39,204) | |||
|
|
| ERNIEa | BERTb | BERT-WWMc | RoBERTa-WWM-ext + CNNd |
| POSe, n (%) | 74,774 (62.85%) | 25,163 (64.18%) | 27,866 (71.08%) | 26,116 (66.62%) | 26,116 (66.62%) |
| NEGf, n (%) | 14,086 (11.84%) | 4271 (10.89%) | 3125 (7.97%) | 3871 (9.87%) | 3871 (9.87%) |
| OTHERg, n (%) | 6167 (5.18%) | 1006 (2.57%) | 684 (1.74%) | 2587 (6.60%) | 2587 (6.60%) |
| EMPTYh, n (%) | 23,949 (20.13%) | 8764 (22.35%) | 7529 (19.20%) | 6630 (16.91%) | 6630 (16.91%) |
aERNIE: Enhanced Representation through Knowledge Integration.
bBERT: Bidirectional Encoder Representations from Transformers.
cBERT-WWM: Bidirectional Encoder Representations from Transformers with whole word masking.
dRoBERTa-WWM-ext + CNN: Robustly Optimized BERT Pretraining Approach with whole word masking extended plus a convolutional neural network.
eThe tag “positive (POS)” is used when it can be determined that a patient has dependent symptoms, diseases, and corresponding entities that are likely to cause a certain disease.
fNEG: The tag “negative (NEG)” is used when the disease and symptoms are not related.
gOTHER: The tag “other (OTHER)” is used when the user does not know or the answer is unclear/ambiguous, which is difficult to infer.
hEMPTY: The tag “empty (EMPTY)” is used when there is no practical meaning to determine the patient’s condition, such as interpretation of some medical knowledge by the doctor, independent of the patient’s current condition, inspection items, drug names, etc.
The scores of the 4 models.
| Data set | ERNIEa | BERTb | ||
|
|
| BERT | BERT-WWMc | RoBERTa-WWM-ext + CNNd |
| POSe-Rrf | 87.32461545 | 87.10998052 | 89.81676537 | 89.23248142 |
| POS-Pr | 87.35933834 | 78.69582391 | 86.57854406 | 88.20871479 |
| POS-F1 | 87.34197344 | 82.68940537 | 88.16793149 | 88.71764473 |
| NEGg-Rrh | 67.70158588 | 41.50100514 | 66.96448515 | 70.13625195 |
| NEG-Pr | 71.03351301 | 59.45600000 | 77.50775595 | 77.30182176 |
| NEG-F1 | 69.32753888 | 48.88187319 | 71.85140803 | 73.54491158 |
| OTHERi-Rr | 27.30551262 | 12.98299845 | 58.06285420 | 57.13549717 |
| OTHER-Pr | 52.68389662 | 36.84210526 | 43.58081980 | 45.06298253 |
| OTHER-F1 | 35.96878181 | 19.20000000 | 49.79014800 | 50.38618810 |
| EMPTYj-Rr | 75.84846093 | 61.62851881 | 62.98342541 | 67.77163904 |
| EMPTY-Pr | 65.84446728 | 62.29224837 | 72.27169811 | 71.50589868 |
| EMPTY-F1 | 70.49330644 | 61.95860610 | 67.30863850 | 69.58870804 |
| Macro-Rr | 64.54504372 | 50.80562573 | 69.45688253 | 71.06896740 |
| Macro-Pr | 69.23030381 | 59.32154439 | 69.98470448 | 70.51985444 |
| Total score (Macro-F1) | 65.78290014 | 53.18247117 | 69.27953150 | 70.55936311 |
aERNIE: Enhanced Representation through Knowledge Integration.
bBERT: Bidirectional Encoder Representations from Transformers.
cBERT-WWM: Bidirectional Encoder Representations from Transformers with whole word masking.
dRoBERTa-WWM-ext + CNN: Robustly Optimized BERT Pretraining Approach with whole word masking extended plus a convolutional neural network.
eThe tag “positive (POS)” is used when it can be determined that a patient has dependent symptoms, diseases, and corresponding entities that are likely to cause a certain disease.
fPr: precision rate.
gNEG: The tag “negative (NEG)” is used when the disease and symptoms are not related.
hRr: recall rate.
iOTHER: The tag “other (OTHER)” is used when the user does not know or the answer is unclear/ambiguous, which is difficult to infer.
jEMPTY: The tag “empty (EMPTY)” is used when there is no practical meaning to determine the patient’s condition, such as interpretation of some medical knowledge by the doctor, independent of the patient’s current condition, inspection items, drug names, etc.