| Literature DB >> 31146749 |
Ching-Chin Chern1, Yu-Jen Chen1, Bo Hsiao2.
Abstract
BACKGROUND: Although previous research showed that telehealth services can reduce the misuse of resources and urban-rural disparities, most healthcare insurers do not include telehealth services in their health insurance schemes. Therefore, no target variable exists for the classification approaches to learn from or train with. The problem of identifying the potential recipients of telehealth services when introducing telehealth services into health welfare or health insurance schemes becomes an unsupervised classification problem without a target variable.Entities:
Keywords: Data mining; Data preprocessing; Decision tree; Telehealth service
Mesh:
Year: 2019 PMID: 31146749 PMCID: PMC6543775 DOI: 10.1186/s12911-019-0825-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Flow Chart of HDTTCA
Attributes Used to Consult with the Experts
| Attributes | Data Type | Attributes | Data Type |
|---|---|---|---|
| Insurance Amount ( | numeric | Gender | category |
| Age | numeric | Reim_Spe ( | category |
| outpatient frequency ( | category | Economic Priority ( | category |
| inpatient frequency ( | category | Age Group ( | category |
| No. of outpatient times (op_time) | numeric | Insurance Level ( | category |
| No. of inpatient times (ip_time) | numeric | Drug Duration ( | category |
| No. of days in an emergency bed (EB day) | numeric | Chronic Bed Rate ( | numeric |
| No. of days in a chronic bed (CB day) | numeric | Target Disease ( | category |
| No. of drug prescription days (drug day) | numeric | Copayment Exemption Mark ( | category |
| Remoteness ( | category | Distance ( | numeric |
| the amount of medical fees (Total amount) | numeric | Target Treatment ( | category |
Conversion Formula for Numeric Attributes in the Decision Tree Algorithm
| Attribute | Conversion Formula |
|---|---|
|
| IF( |
| op_time_C | IF(op_time < mean, 0, IF(op_time < mean + standard deviation, 1, 2)) |
| ip_time_C | IF(ip_time < mean, 0, 1) |
| EB day_C | IF(EB day < mean, 0, 1) |
| CB day_C | IF(CB day < mean, 0, 1) |
| drug day_C | IF(drug day < mean, 0, IF(drug day < mean + standard deviation, 1, 2)) |
| Total amount_C | IF(Total amount < mean, 0, IF(Total amount < mean + standard deviation, 1, 2)) |
|
| IF( |
|
| IF( |
Different Classification Results
| Predicted / Actual | Actual Positive ( | Actual Negative ( |
| Predicted as Positive ( | True Positive ( | False Positive ( |
| Predicted as Negative ( | False Negative ( | True Negative ( |
Distribution of the Attributes Used in Oversampling
| Attributes | Distribution of each class | ||
|---|---|---|---|
| Remoteness ( | 1 | 2 | 3 |
| 90.59% | 7.32% | 2.1% | |
| Economic Priority ( | N | Y | |
| 97.6% | 2.4% | ||
| Age Group ( | Elder | Middle-aged | Young |
| 9.72% | 79.86% | 10.42% | |
| Insurance Level ( | High | Middle | Low |
| 24.74% | 55.86% | 19.4% | |
| Drug Duration ( | N | Y | |
| 87.76% | 12.24% | ||
| Chronic Bed Rate ( | = 0 | > 0 | |
| 99.84% | 0.16% | ||
| Target Disease ( | N | Y | |
| 91.38% | 8.62% | ||
| Target Treatment ( | N | Y | |
| 99.998% | 0.002% | ||
| Copayment Exemption Mark ( | N | Y | |
| 95.28% | 4.72% | ||
Distribution of Gender and Age Group of All Patients in the entire Data Set
| Attributes | Age Group | |||
|---|---|---|---|---|
| Gender | Elder | Middle-aged | Young | Percentage |
| Female | 4.45% | 38.87% | 5.82% | 49.14% |
| Male | 5.28% | 40.99% | 4.59% | 50.86% |
| Percentage | 9.72% | 79.86% | 10.42% | |
Distribution of Gender and Age Group of All Patients in the Sample Data Set
| Attributes | Age Group | |||
|---|---|---|---|---|
| Gender | Elder | Middle-aged | Young | Percentage |
| Female | 10% | 28% | 5.5% | 43.5% |
| Male | 8% | 40.5% | 8% | 56.5% |
| Percentage | 18% | 68.5% | 13.5% | |
Results of Physician’s Decision Tree on Training Data
| Classified as | |||
|---|---|---|---|
| Actual | Y | N | |
| Y | 14 | 9 | |
| N | 3 | 174 | |
Sensitivity: 60.9% | Specificity: 98.31% | Precision: 82.35% | Accuracy: 94% |
Fig. 2Physician’s Decision Tree
Results of Social Worker’s Decision Tree on Training Data
| Classified as | |||
|---|---|---|---|
| Actual | Y | N | |
| Y | 16 | 4 | |
| N | 4 | 176 | |
Sensitivity: 80% | Specificity: 97.78% | Precision: 80% | Accuracy: 96% |
Fig. 3Social Worker’s Decision Tree
Results of Manager’s Decision Tree on Training Data
| Classified as | |||
|---|---|---|---|
| Actual | Y | N | |
| Y | 18 | 4 | |
| N | 1 | 177 | |
Sensitivity: 81.81% | Specificity: 99.44% | Precision: 94.74% | Accuracy: 97.5% |
Fig. 4Manager’s Decision Tree
Distributions of Adoptability in Different Versions of the Sample Data Set
| Adoptability | Expert 1 | Expert 2 | Expert 3 | Final Version |
|---|---|---|---|---|
| Applicable or “Y” | 23 | 20 | 22 | 14 |
| Not Applicable or “N” | 177 | 180 | 178 | 186 |
Performance Metrics and ANOVA Tests for the Four Trees
| Tree | Sensitivity | ANOVA | Accuracy | ANOVA | Specificity | ANOVA | Precision | ANOVA |
|---|---|---|---|---|---|---|---|---|
| (0.5, 1 or 2) | .7562 | F = 8773.99 | .9536 | F = 17,067.17 | .9656 | F = 17,232.80 | .5863 | F = 13,619.86 |
| (0.25, 1 or 2) | .9796 | .9874 | .9879 | .8380 | ||||
| (0.75, 2) | .9877 | p_value = 7.8999E–108 | .9626 | p_value = 2.282E–120 | .9610 | p_value = 1.5E–120 | .6176 | p_value = 4.0981E–116 |
| (0.75, 1) | .8836 | .9442 | .9481 | .5201 |
Results of the Final Version (0.75, 2) Decision Tree on Training Data
| Classified as | |||
|---|---|---|---|
| Actual | Y | N | |
| Y | 11 | 2 | |
| N | 1 | 186 | |
Sensitivity: 91.17% | Specificity: 98.94% | Precision: 84.62% | Accuracy: 98.5% |
Fig. 5The Final Version
Logistic Regression Model
| Coefficients | Estimate | Std. Error | z value | Pr(>|z|) | Signif. codes |
|---|---|---|---|---|---|
| (Intercept) | −17.949 | 4.6122 | −3.892 | 9.95E–05 | *** |
| Distance_C ( | 3.033 | .9792 | 3.098 | .001952 | ** |
| Remoteness ( | 2.5578 | 1.1479 | 2.228 | .025866 | * |
| Age Group ( | 3.5081 | 1.2511 | 2.804 | .005048 | ** |
| Copayment Exemption Mark ( | 3.8109 | 1.5376 | 2.479 | .013192 | * |
| Target Disease ( | 8.0175 | 2.2871 | 3.506 | .000456 | *** |
Ps. Signif. codes: ‘***’ for 0; ‘**’ for 0.001; ‘*’ for 0.01
Logistic Regression Model on the Training Data
| Classified as | |||
|---|---|---|---|
| Actual | Y | N | |
| Y | 10 | 2 | |
| N | 4 | 184 | |
Sensitivity: 83.33% | Specificity: 97.87% | Precision: 71.43% | Accuracy: 97.00% |
Pairwise t-tests for Performance Metrics
| Metric | Sensitivity | Accuracy | Specificity | Precision | ||||
|---|---|---|---|---|---|---|---|---|
| Model | HDTTCA | LR | HDTTCA | LR | HDTTCA | LR | HDTTCA | LR |
| Mean | .9877 | .9875 | .9626 | .9451 | .9610 | .9424 | .6176 | .5219 |
| Variance | 9.2187E–06 | 6.4638E–06 | 1.6201E–06 | 3.2124E–06 | 2.1E–06 | 3.856E–06 | .00012 | .0001293 |
| DF | 29 | 29 | 29 | 29 | ||||
|
| −.7327 | −81.5205 | −79.4559 | −88.2711 | ||||
| P(T < =r) | .2348 | 4.1444E–36 | 8.69E–36 | 4.16E–37 | ||||