| Literature DB >> 35013649 |
Laith Abu Lekham1,2, Yong Wang1, Ellen Hey2, Mohammad T Khasawneh1.
Abstract
This study is conducted to build a multi-criteria text mining model for COVID-19 testing reasons and symptoms. The model is integrated with a temporal predictive classification model for COVID-19 test results in rural underserved areas. A dataset of 6895 testing appointments and 14 features is used in this study. The text mining model classifies the notes related to the testing reasons and reported symptoms into one or more categories using look-up wordlists and a multi-criteria mapping process. The model converts an unstructured feature to a categorical feature that is used in building the temporal predictive classification model for COVID-19 test results and conducting some population analytics. The classification model is a temporal model (ordered and indexed by testing date) that uses machine learning classifiers to predict test results that are either positive or negative. Two types of classifiers and performance measures that include balanced and regular methods are used: (1) balanced random forest and (2) balanced bagged decision tree. The balanced or weighted methods are used to address and account for the biased and imbalanced dataset and to ensure correct detection of patients with COVID-19 (minority class). The model is tested in two stages using validation and testing sets to ensure robustness and reliability. The balanced classifiers outperformed regular classifiers using the balanced performance measures (balanced accuracy and G-score), which means the balanced classifiers are better at detecting patients with positive COVID-19 results. The balanced random forest achieved the best average balanced accuracy (86.1%) and G-score (86.1%) using the validation set. The balanced bagged decision tree achieved the best average balanced accuracy (83.0%) and G-score (82.8%) using the testing set. Also, it was found that the patient history, age, testing reasons, and time are the key features to classify the testing results.Entities:
Keywords: Classification; Community health; Machine learning; Population health analytics; Primary care; Text mining
Year: 2022 PMID: 35013649 PMCID: PMC8729325 DOI: 10.1007/s00521-021-06884-w
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.102
Fig. 1Features and label avalaibale in the dataset
Fig. 2Methodology flow chart for building the multi-criteria text mining model (a) and the temporal classification predictive model (b)
A pseudo code for the multi-criteria text mining algorithm
Pseudo code for the temporal predictive model algorithm
Fig. 3Most frequent and common words (top 25) in the text corpus. a before cleaning the corpus; b after cleaning the corpus
Main look-up words in the wordlists of the nine testing reasons and symptoms categories
| # | Testing Reason Category | Main Look-up Words |
|---|---|---|
| 1 | Asymptomatic | Asymptomatic |
| 2 | Constitutional | Body, aches, fever, fatigue, flu, weakness |
| 3 | Contract | Contract |
| 4 | Gastro | Nausea, vomiting, diarrhea, stomach, abdominal |
| 5 | Neurology | Headache, smell, taste, dizziness |
| 6 | Other/Unreported/Referral | Unreported, referral, mental, allergy, rash |
| 7 | Public Health Contact (PHC) | Exposure, contact |
| 8 | Required by state, county, or work | Mandatory, state, county, work, requirement |
| 9 | Respiratory/ENT | Cough, congestion, sinus, breathing, lungs, throat |
Fig. 4Total frequency of each testing reasons and symptoms category
Fig. 5Total frequency of the top 10 testing reasons combinations
Fig. 6Positivity and negativity rates for the top 10 combinations testing reasons and symptoms categories
Fig. 7Average age (years) of the top 10 combinations of testing reasons and symptoms categories
Fig. 8Frequency of the top 10 combinations of testing reasons and symptoms categories by a ethnicity and b gender
Performance measures result for each classifier. STD stands for standard deviation
| Balanced & Weighted Classifiers | Standard Classifiers | ||||||
|---|---|---|---|---|---|---|---|
| Measure Value (STD) | RF | BDT | GB | K-NN | LDA | GP | |
| Validation | Accuracy | 86.8% (1.0%) | 89.2% (1.6%) | 94.7% (0.8%) | 96.7% (0.6%) | 96.6% (0.5%) | 95.0% (0.7%) |
| Balanced Accuracy | 86.1% (2.3%) | 85.8% (2.4%) | 77.2% (3.1%) | 70.8% (3.9%) | 71.9% (3.7%) | 63.5% (4.7%) | |
| F1-Score | 90.2% (0.6%) | 91.7% (0.8%) | 95.1% (0.6%) | 95.5% (0.6%) | 96.4% (0.6%) | 94.6% (0.8%) | |
| Recall | 86.8% (0.3%) | 89.1% (1.6%) | 94.7% (0.8%) | 95.7% (0.6%) | 96.6% (0.5%) | 95.0% (0.7%) | |
| Precision | 96.0% (0.3%) | 96.0% (0.2%) | 95.6% (0.5%) | 95.4% (0.6%) | 96.3% (0.5%) | 94.3% (0.9%) | |
| G-Score | 86.1% (2.4%) | 85.8% (2.4%) | 75.1% (3.9%) | 66.1% (5.4%) | 67.3% (5.1%) | 54.7% (8.3%) | |
| Testing | Accuracy | 86.8% (0.4%) | 89.3% (0.6%) | 94.7% (0.8%) | 96.4% (0.3%) | 96.3% (0.0%) | 96.1% (0.3%) |
| Balanced Accuracy | 82.1% (0.2%) | 83.0% (0.7%) | 75.0% (2.4%) | 73.9% (1.7%) | 68.7% (0.0%) | 65.8% (2.7%) | |
| F1-Score | 90.8% (0.3%) | 91.7% (0.4%) | 95.0% (0.6%) | 96.2% (0.3%) | 95.9% (0.0%) | 95.5% (0.4%) | |
| Recall | 86.8% (0.4%) | 89.3% (0.6%) | 94.7% (0.8%) | 96.4% (0.3%) | 96.3% (0.0%) | 96.1% (0.3%) | |
| Precision | 95.4% (0.0%) | 95.6% (0.1%) | 95.4% (0.4%) | 96.1% (0.3%) | 95.7% (0.0%) | 95.3% (0.5%) | |
| G-Score | 81.9% (0.2%) | 82.8% (0.8%) | 72.3% (3.1%) | 70.3% (2.3%) | 62.9% (0.0%) | 58.4% (4.2%) | |
Fig. 9The ROC curve for the BRF classifier
Fig. 10Predictions probability distribution for each class in the testing set using the BRF classifier
Fig. 11Feature Importance of the 14 features using the BRF classifier