| Literature DB >> 31760931 |
Qianlong Liu1,2, Kangenbei Liao1,2, Kelvin Kam-Fai Tsoi2, Zhongyu Wei3.
Abstract
BACKGROUND: With the development of e-Health, it plays a more and more important role in predicting whether a doctor's answer can be accepted by a patient through online healthcare community. Unlike the previous work which focus mainly on the numerical feature, in our framework, we combine both numerical and textual information to predict the acceptance of answers. The textual information is composed of questions posted by the patients and answers posted by the doctors. To extract the textual features from them, we first trained a sentence encoder to encode a pair of question and answer into a co-dependent representation on a held-out dataset. After that,we can use it to predict the acceptance of answers by doctors.Entities:
Keywords: Co-attention mechanism; Deep learning; Natural language processing; Online healthcare community
Mesh:
Year: 2019 PMID: 31760931 PMCID: PMC6876081 DOI: 10.1186/s12859-019-3129-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1This is one example in youwenbida (Q&A) section, which includes patient’s information, doctor’s information, question and answering information, etc. and the details can be searched in http://club.xywy.com/ static/20160401/104740588.htm
Fig. 2The pipeline for data preprocessing and feature extraction. textual and numerical information are extracted differently, after that the numerical features and textual features are joined together to predict the acceptance of answers. The details of sentence encoder are shown in Fig. 3
Fig. 3Architecture of our sentence encoder. The question and answer are first encoded by LSTM and no-linear layer as Q and A respectively, then Q and A are encoded by co-attention encoder [30] as C. Next, context function maps C into a vector h, i.e, the representation of the given pair of question and answer, which are fed into softmax layer for binary classification, i.e., accepted or not
Numerical features used in the model
| Feature | Remarks | |
|---|---|---|
| Patient’s information | Patient’s age | integer, 0-98 |
| Patient’s gender | categorical: male/female | |
| Doctor’s information | Doctor’s title | categorical: chief physician/ |
| attending physician/assistant physician | ||
| Doctor’s reputation | categorical: level1/level2/level3 | |
| Number of patients the doctor helped | integer, 0-406342 | |
| Number of gratitude the doctor received | integer, 0-13883 | |
| Answering information | Number of Chinese characters in the answer | integer, 12-734 |
| Time difference between question and answer | continuous (in seconds), 0-8005.37 | |
| Answer’s order under the corresponding question | integer, 1-25 |
Description of textual information
| Marker | Number |
|---|---|
| average # of Chinese characters in questions | 73.6 |
| average # of Chinese characters in answers | 94.0 |
| average # of answers per question | 2.36 |
| # of patients | 6352 |
| # of doctors | 1394 |
Definition of metrics
| metric | definition |
|---|---|
| Recall | |
| Precision | |
| Accuracy | ( |
| F1-score | 2×( |
| AUC | The area under ROC curve |
Model results
| Model | F1-score | AUC | Accuracy | Recall | Precision | |
|---|---|---|---|---|---|---|
| Textual features | LSTM-GBC | 0.396 ±0.022 | 0.626 ±0.020 | 0.628 ±0.011 | 0.313 ±0.022 | 0.540 ±0.025 |
| MLP-GBC | 0.349 ±0.008 | 0.619 ±0.016 | 0.627 ±0.009 | 0.257 ±0.006 | 0.545 ±0.027 | |
| Numerical features | GBC | 0.729 ±0.015 | 0.859 ±0.015 | 0.803 ±0.009 | 0.677 ±0.027 | |
| All features | LSTM-GBC | 0.734 ±0.023 | 0.862 ±0.012 | 0.804 ±0.016 | 0.694 ±0.025 | 0.779 ±0.023 |
| MLP-GBC | 0.772 ±0.019 |
the significance means for each metric, which model performs better than the other models
Fig. 4An example of attention weights of a question and the accepted answer. a The attention weights across the question for each word in the answer. b The attention weights across the answer for each word in the question. Stop words in the question and answer text are removed and out of vocabulary words are replaced with UNK. The web page of this question and answer is available at http://club.xywy.com/static/20160111/99078139.htm