| Literature DB >> 30785407 |
Shoko Wakamiya1,2,3, Mizuki Morita4, Yoshinobu Kano5, Tomoko Ohkuma6, Eiji Aramaki1,2,3.
Abstract
BACKGROUND: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media-based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient's symptom and those that do not.Entities:
Keywords: artificial intelligence; infodemiology; infoveillance; machine learning; natural language processing; social media; surveillance; text mining
Mesh:
Year: 2019 PMID: 30785407 PMCID: PMC6401666 DOI: 10.2196/12783
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Ratio of positive labels.
| Symptom | Ratio of number of positive tweets to the number of each symptom’s tweets (N=320 tweets) | Ratio of number of positive tweets to the total number of all symptoms’ tweets (N=2560 tweets) |
| Cold, n (%) | 220 (0.6875) | 355 (0.1387) |
| Cough, n (%) | 295 (0.9219) | 306 (0.1195) |
| Diarrhea, n (%) | 230 (0.7188) | 246 (0.0961) |
| Fever, n (%) | 220 (0.6875) | 438 (0.1711) |
| Hay fever, n (%) | 208 (0.6500) | 209 (0.0816) |
| Headache, n (%) | 260 (0.8125) | 328 (0.1281) |
| Flu, n (%) | 128 (0.4000) | 130 (0.0508) |
| Runny nose, n (%) | 257 (0.8031) | 499 (0.1949) |
Samples of the training data corpus for the English subtask.
| Tweet ID | Message | s1a | s2 | s3 | s4 | s5 | s6 | s7 | s8 |
| 1enb | The cold makes my whole body weak. | pc | nd | n | n | n | n | n | n |
| 2en | It’s been a while since I’ve had allergy symptoms. | n | n | n | n | p | n | n | p |
| 3en | I’m so feverish and out of it because of my allergies. I’m so sleepy. | n | n | n | p | p | n | n | p |
| 4en | I took some medicine for my runny nose, but it won’t stop. | n | n | n | n | n | n | n | p |
| 5en | I had a bad case of diarrhea when I traveled to Nepal. | n | n | n | n | n | n | n | n |
| 6en | It takes a millennial wimp to call in sick just because they’re coughing. It’s always important to go to work, no matter what. | n | p | n | n | n | n | n | n |
| 7en | I’m not going today, because my stuffy nose is killing me. | n | n | n | n | n | n | n | p |
| 8en | I never thought I would have allergies. | n | n | n | n | p | n | n | p |
| 9en | I have a fever but I don’t think it’s the kind of cold that will make it to my stomach. | p | n | n | p | n | n | n | n |
| 10en | My phlegm has blood in it and it’s really gross. | n | p | n | n | n | n | n | n |
as1, s2, s3, s4, s5, s6, s7, and s8 are IDs of the 8 symptoms (cold, cough, diarrhea, fever, hay fever, headache, flu, and runny nose).
bID corresponds to the corpora of other languages (eg, the tweet of 1en corresponds to the tweets of 1ja and 1zh).
cp indicates the positive label.
dn indicates the negative label.
Exceptions for symptom labels.
| Symptom | Expressions with suspicion | Just a symptom word | Exceptions | |
| Regarded as symptom | Not regarded as symptom | |||
| Cold | Accept | Accept | —a | — |
| Cough | Accept | Accept | Alcohol drinking and pungently flavored food | — |
| Diarrhea | Accept | Accept | Overeating, indigestion, alcohol drinking, medication, and pungently flavored food | — |
| Fever | Accept | Only | Hay fever and side effect due to any injection | — |
| Hay fever | Accept | Accept | — | — |
| Headache | Accept | Accept | — | Due to a sense of sight or smell |
| Flu | Not accept | Not accept | — | — |
| Runny nose | Accept | Not accept | Hay fever | Change in temperature |
aIndicates there are no exceptions.
Participating systems in subtasks. A total of 19 participating systems and 2 baseline systems are constructed for the Japanese subtask, 12 participating systems and 2 baseline systems are constructed for the English subtask, and 6 participating systems and 2 baseline systems are constructed for the Chinese subtask.
| System ID | Models or methods | Language resources |
| AITOK-ja [ | Keyword-based, logistic regression, and SVMa,b | —c |
| AKBL-ja and AKBL-en [ | SVM and Fisher exact test | Patient symptom feature word dictionary and Disease-X feature words dict1 and dict2 |
| DrG-ja [ | Random forest | — |
| KIS-ja [ | Rule-based and SVM | — |
| NAIST-ja, NAIST-en, and NAIST-zh [ | Ensembles of hierarchical attention network and deep character-level convolutional neural network with loss functions (negative loss function, hinge, and hinge squared) | — |
| NIL-ja [ | Rule-based | — |
| NTTMU-ja [ | Principle-based approach | Manually constructed knowledge for capturing tweets that conveyed flu-related information, using common sense and ICD-10d |
| NTTMU-en [ | SVM and recurrent neural network | Manually constructed knowledge for capturing tweets that conveyed flu-related information, using common sense and ICD-10 |
| TUA1-zh [ | Logistic regression, SVM, and logistic regression with semantic information | Updated training samples using active learning unlabeled posts downloaded with the symptom names in Chinese |
| UE-ja [ | Rule-based and random forest | Custom dictionary consisting of nouns selected from the dry-run dataset and heuristics |
| UE-en [ | Rule-based, random forests, and skip-gram neural network for word2vec | Custom dictionary consisting of nouns selected from the dry-run dataset and heuristics |
| Baseline | SVM (unigram and bigram) | — |
aSVM: support vector machine.
bIt indicates that the method was tested after the submission of the formal run, and thus, it was not included in the results.
cIt indicates that any language resources were not used.
dICD: International Codes for Diseases.
Interannotator agreement ratio.
| Symptom | Agreement ratio (number) |
| Cold | 0.9945 (2546/2560) |
| Cough | 0.9934 (2543/2560) |
| Diarrhea | 0.9785 (2505/2560) |
| Fever | 0.9922 (2540/2560) |
| Hay fever | 0.9918 (2539/2560) |
| Headache | 0.9773 (2502/2560) |
| Flu | 0.9734 (2492/2560) |
| Runny nose | 0.9793 (2507/2560) |
| Total | 0.9851 (20,174/20,480) |
Performance in the Japanese subtask (19 participating systems and 2 baseline systems).
| System IDa | Exact matchb | F1 | Precision | Recall | Hamming loss | |||
| Micro | Macro | Micro | Macro | Micro | Macro | |||
| NAIST-ja-2 | 0.880 | 0.920 | 0.906 | 0.899 | 0.887 | 0.941 | 0.925 | 0.019 |
| NAIST-ja-3 | 0.878 | 0.919 | 0.904 | 0.899 | 0.885 | 0.940 | 0.924 | 0.019 |
| NAIST-ja-1 | 0.877 | 0.918 | 0.904 | 0.899 | 0.887 | 0.938 | 0.921 | 0.020 |
| AKBL-ja-3 | 0.805 | 0.872 | 0.859 | 0.896 | 0.883 | 0.849 | 0.839 | 0.029 |
| UE-ja-1 | 0.805 | 0.865 | 0.855 | 0.831 | 0.819 | 0.903 | 0.902 | 0.033 |
| KIS-ja-2 | 0.802 | 0.871 | 0.856 | 0.831 | 0.815 | 0.915 | 0.904 | 0.032 |
| AKBL-ja-1 | 0.800 | 0.869 | 0.847 | 0.889 | 0.873 | 0.849 | 0.825 | 0.030 |
| UE-ja-3 | 0.800 | 0.866 | 0.855 | 0.823 | 0.812 | 0.913 | 0.911 | 0.033 |
| AKBL-ja-2 | 0.795 | 0.868 | 0.849 | 0.891 | 0.875 | 0.846 | 0.827 | 0.030 |
| KIS-ja-3 | 0.784 | 0.855 | 0.831 | 0.840 | 0.816 | 0.871 | 0.850 | 0.034 |
| SVM-unigram | 0.761 | 0.849 | 0.835 | 0.843 | 0.828 | 0.854 | 0.842 | 0.036 |
| KIS-ja-1 | 0.758 | 0.849 | 0.833 | 0.798 | 0.782 | 0.906 | 0.899 | 0.038 |
| SVM-bigram | 0.752 | 0.843 | 0.830 | 0.838 | 0.820 | 0.848 | 0.845 | 0.037 |
| NTTMU-ja-1 | 0.738 | 0.835 | 0.829 | 0.770 | 0.761 | 0.913 | 0.921 | 0.042 |
| UE-ja-2 | 0.706 | 0.815 | 0.803 | 0.696 | 0.702 | 0.983 | 0.984 | 0.052 |
| NIL-ja-1 | 0.680 | 0.749 | 0.742 | 0.862 | 0.845 | 0.662 | 0.671 | 0.052 |
| DrG-ja-1 | 0.653 | 0.777 | 0.774 | 0.825 | 0.808 | 0.734 | 0.779 | 0.049 |
| NTTMU-ja-3 | 0.614 | 0.775 | 0.773 | 0.740 | 0.720 | 0.814 | 0.840 | 0.055 |
| NTTMU-ja-2 | 0.597 | 0.770 | 0.753 | 0.741 | 0.706 | 0.801 | 0.813 | 0.056 |
| AITOK-ja-2 | 0.503 | 0.706 | 0.696 | 0.726 | 0.738 | 0.687 | 0.767 | 0.067 |
aThe system ID comprises the group ID (see Multimedia Appendix 1), the abbreviation of subtask (ja indicates Japanese subtask), and the system number from 1 to 3 since each group can submit three systems per subtask.
bThe results are ordered by exact match accuracy.
Performance in the Chinese subtask (6 participating systems and 2 baseline systems).
| System IDa | Exact matchb | F1 | Precision | Recall | Hamming loss | |||
| Micro | Macro | Micro | Macro | Micro | Macro | |||
| NAIST-zh-2 | 0.880 | 0.920 | 0.906 | 0.899 | 0.887 | 0.941 | 0.925 | 0.019 |
| NAIST-zh-3 | 0.878 | 0.919 | 0.904 | 0.899 | 0.885 | 0.940 | 0.924 | 0.019 |
| NAIST-zh-1 | 0.877 | 0.918 | 0.904 | 0.899 | 0.887 | 0.938 | 0.921 | 0.020 |
| TUA1-zh-3 | 0.786 | 0.860 | 0.844 | 0.772 | 0.760 | 0.970 | 0.971 | 0.037 |
| SVM-unigram | 0.780 | 0.858 | 0.843 | 0.831 | 0.815 | 0.888 | 0.883 | 0.034 |
| TUA1-zh-1 | 0.773 | 0.853 | 0.838 | 0.766 | 0.753 | 0.963 | 0.965 | 0.039 |
| SVM-bigram | 0.767 | 0.850 | 0.835 | 0.824 | 0.806 | 0.878 | 0.876 | 0.036 |
| TUA1-zh-2 | 0.719 | 0.824 | 0.809 | 0.712 | 0.710 | 0.978 | 0.982 | 0.049 |
aThe system ID comprises the group ID (see Multimedia Appendix 1), the abbreviation of subtask (zh indicates Chinese subtask), and the system number from 1 to 3 since each group can submit 3 systems per subtask.
b The results are ordered by exact match accuracy.
Performance in the English subtask (12 participating systems and 2 baseline systems).
| System IDa | Exact matchb | F1 | Precision | Recall | Hamming loss | |||
| Micro | Macro | Micro | Macro | Micro | Macro | |||
| NAIST-en-2 | 0.880 | 0.920 | 0.906 | 0.899 | 0.887 | 0.941 | 0.925 | 0.019 |
| NAIST-en-3 | 0.878 | 0.919 | 0.904 | 0.899 | 0.885 | 0.940 | 0.924 | 0.019 |
| NAIST-en-1 | 0.877 | 0.918 | 0.904 | 0.899 | 0.887 | 0.938 | 0.921 | 0.020 |
| SVM-bigram | 0.800 | 0.866 | 0.856 | 0.865 | 0.849 | 0.868 | 0.865 | 0.031 |
| UE-en-1 | 0.789 | 0.858 | 0.848 | 0.846 | 0.831 | 0.871 | 0.876 | 0.034 |
| SVM-unigram | 0.783 | 0.858 | 0.845 | 0.851 | 0.830 | 0.864 | 0.864 | 0.033 |
| NTTMU-en-2 | 0.773 | 0.856 | 0.849 | 0.807 | 0.796 | 0.911 | 0.918 | 0.036 |
| NTTMU-en-3 | 0.758 | 0.845 | 0.828 | 0.836 | 0.818 | 0.854 | 0.844 | 0.037 |
| UE-en-2 | 0.745 | 0.821 | 0.809 | 0.861 | 0.838 | 0.786 | 0.800 | 0.040 |
| UE-en-3 | 0.739 | 0.820 | 0.815 | 0.870 | 0.851 | 0.776 | 0.795 | 0.040 |
| AKBL-en-2 | 0.734 | 0.819 | 0.799 | 0.832 | 0.808 | 0.806 | 0.793 | 0.042 |
| AKBL-en-3 | 0.716 | 0.804 | 0.787 | 0.853 | 0.834 | 0.760 | 0.747 | 0.043 |
| NTTMU-en-1 | 0.619 | 0.770 | 0.777 | 0.734 | 0.733 | 0.809 | 0.835 | 0.056 |
| AKBL-en-1 | 0.613 | 0.772 | 0.755 | 0.656 | 0.649 | 0.936 | 0.945 | 0.065 |
aThe system ID comprises the group ID (see Multimedia Appendix 1), the abbreviation of subtask (en indicates English subtask), and the system number from 1 to 3 since each group can submit three systems per subtask.
bThe results are ordered by exact match accuracy.
Figure 1Statistical summary of the performance of 3 evaluation metrics (A: Exact math accuracy, B: F1-micro, and C: Hamming loss) in each of the subtasks (ja: Japanese, en: English, and zh: Chinese). Note that higher scores are better in exact match accuracy and F1-micro, whereas lower scores are better in hamming loss. The bottom and top of a box are the first and third quartiles, the band inside the box is the median, and the dotted band inside the box is the mean. Dots on the right side of the box represent the distribution of values of participating systems.