| Literature DB >> 34924698 |
Sergey Samoilenko1, Kweku-Muata Osei-Bryson2.
Abstract
The awareness of the occurrence of a new disease involves much uncertainty and the search for answers and also appropriate questions. In this paper we focus on the perspective of public health decision-makers. Typically, they would have a standard set of questions and supporting metrics that have been found in previous disease outbreaks to be useful in assessing the effectiveness of various 'solution' methods on the trajectory of the disease. There may be other relevant questions with which such public health domain experts may not be familiar and/or for which they are familiar but are not aware of methods for addressing such questions when there is limited data. Decision Support Systems (DSS) can be used to facilitate the exploration of established questions and some other relevant questions. Given an initial set of questions, the DSS designer should consider which sets of data analytic methods have the capabilities to adequately address. Some of these data analytic methods may also have the capability of addressing questions that could be of interest to the public health decision makers including researchers. In this paper we present a conceptual design for a relevant easy-to-construct DSS and an example of a multi-method DSS that is based on this conceptual design. Using publicly available data on the CoViD-19 pandemic, we illustrate benefits of the multi-method DSS in action.Entities:
Keywords: Data Analytics; Decision Support System; Modular Design; Public Health
Year: 2021 PMID: 34924698 PMCID: PMC8668606 DOI: 10.1016/j.eswa.2021.116385
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 6.954
Conceptual Design of a DSS: Underlying Assumptions and Justifications.
| Assumption | Justification | |
|---|---|---|
| A1 | For each disease, there is an identifiable set of demographic risk factors that provides a useful characterization of the disease. | For each known disease, public health practitioners attempt to identify variables that have high correlations with its occurrence and outcomes. For each disease, the identified variables are its ‘ |
| A2 | The population of a geographic area (e.g., nation, state, city, county, etc.) could be described by a set of demographic factors that includes the risk factors for known diseases. | In practice this typically holds at the national and state levels, and in some cases at the county and city level. |
| A3 | The population of a geographic area could be characterized by a disease-related subset of its demographic factors i.e. the risk factors for a given disease. | In practice for known diseases this typically holds at the national and state levels, and in some cases at the county and city level. |
| A4 | Populated geographic areas could be grouped in terms of the risk factors for a given disease. | If A3 holds then A4 should also be possible. |
| A5 | A geographic area could be assessed in terms of the impact of the risk factors for a given disease on the associated infection rate. | The reasonableness of this assumption follows from the meaning of the concept of a ‘risk factor’ for a disease. |
| A6 | The spread of an infectious disease follows a path towards geographic areas with higher demographic risk factors. | The reasonableness of this assumption follows from the meaning of the concept of a ‘risk factor’ for a disease. |
| A7 | A geographic area could be assessed in terms of its relative effectiveness and efficiency of containing the spread of a disease. | The presence of any organized system of healthcare is associated with collecting of the relevant patient data. |
| A8 | A spread of an infectious disease follows a path towards geographic areas with lower levels of effectiveness and efficiency of containing the spread of a disease. | The reasonableness of this assumption would follow from the meaning of the concepts of ‘effectiveness’ and ‘efficiency’ of disease containment. |
| A9 | A geographic area could be assessed in terms of the changes in its level of efficiency of containing a disease over time. | If A7 holds then A9 would also hold. |
| A10 | The efficiency and effectiveness of a geographic area in containing a disease could be improved via area-specific decisions (e.g. allocation of resources). | If A10 does not hold then it would not make sense to have a DSS that supports the making of appropriate area-specific decisions. |
Structural components of the DSS: offered insights and limitations.
| Method | Offered Insight | Limitation |
|---|---|---|
| CA: Cluster Analysis | Allows for testing an assumption of homogeneity of the sample and identifying presence of sub-groups in the sample. | In the presence of multiple sub-groups does not offer any insights into the sources of heterogeneity. |
| DTI: Decision Tree Induction | Given the target variable, allows for identifying attributes responsible for differentiating sub-groups of the sample. | Target variable must be provided “from outside.” Does not consider impact of differentiating variables on an “ |
| ARM: Association Rules Mining | Allows for identifying a set of “ | Does not provide any insights regarding an “ |
| DEA: Data Envelopment Analysis | Allows for calculating the relative efficiency scores of decision-making units (DMUs), as well as changes in the scores over time via using the Malmquist Index (MI) scores. | A “black box” model of the “ |
| MR: Multiple Regression | Allows for determining the significance of the impact of independent variables on a dependent variable and identifying the presence of complementarities. | Does not provide any insights regarding an “ |
Modular Design of the DSS: Methodological Steps.
| Module | Step | Method | Expected Outcome/Result |
|---|---|---|---|
| 1 | CA | A group of n-clusters of geographic areas that differ in terms of the demographic risk factors. | |
| DTI | A set of demographic factors that are responsible for the differences between the geographic areas. | ||
| ARM | A set of “if->then” naturally occurring associations that characterize the sample. | ||
| MR | Determination of the significance/presence of the impact between “if” and “then” parts of associations discovered in Step 4. | ||
| 2 | CA | A group of n-clusters of geographic areas that differ in terms of the contagion-specific factors. | |
| DTI | A set of contagion-specific factors that are responsible for the differences between the geographic areas identified in Step 8. | ||
| DEA | A set of scores of relative efficiency for each area, as well as for each cluster that was identified in Step 6. | ||
| DEA MI | Determination of the improvement, or deterioration of performance of each area in fighting the outbreak via Malmquist Index ( | ||
| DEA MI | Determination of the reasons for the improvement/deterioration in performance of each geographic area via the relationship between the |
Fig. 1Design of the Proposed DSS.
Census Data - Variables Used in Module 1.
| Variable/Code | Description |
|---|---|
| PopDensity | Population Density |
| S2601C_C01_010E | Total Population,55 to 64 years |
| S2601C_C01_011E | Total Population,65 to 74 years |
| S2601C_C01_012E | Total Population,75 to 84 years |
| S2601C_C01_013E | Total Population, 85 years + |
| S2601C_C01_017E | Total Population, 65 years + |
| S2601C_C01_018E | Total Population, 65 years +, Male |
| S2601C_C01_019E | Total Population, 65 years +, Female |
| S2601C_C01_020E | Total Population, Median age (years) |
| S2601C_C01_023E | Total Population, Black or African American |
| S2601C_C01_034E | Total population, 15 years +, Widowed |
| S2601C_C01_035E | Total population, 15 years +, Divorced |
| S2601C_C01_043E | Total population, 25 years +, Bachelor's degree or higher |
| S2601C_C01_047E | Total population With a disability |
| S2601C_C01_051E | Total Population 18 to 64 years With a disability |
| S2601C_C01_054E | Total population 65 years + With a disability |
| S2601C_C01_087E | Total population, 16 years +, Unemployed |
| S2601C_C01_088E | Total population 16 years +, Unemployed, Percent of civilian labor force |
| S2601C_C01_093E | Total population16 years +, Service occupations |
| S2601C_C01_106E | Total population, poverty rate, All people |
| S2601C_C01_107E | Total population, poverty rate, 18 years + |
| S2601C_C01_108E | Total population, poverty rate, 18 to 64 years |
| S2601C_C01_109E | Total population, poverty rate, 65 years + |
| S2601C_C02_009E | Total group quarters population, 45 to 54 years |
| S2601C_C02_010E | Total group quarters population, 55 to 64 years |
| S2601C_C02_011E | Total group quarters population, 65 to 74 years |
| S2601C_C02_012E | Total group quarters population, 75 to 84 years |
| S2601C_C02_013E | Total group quarters population, 85 years + |
| S2601C_C02_017E | Total group quarters population, 65 years + |
| S2601C_C02_018E | Total group quarters population, 65 years +, Male |
| S2601C_C02_019E | Total group quarters population, 65 years +, Female |
| S2601C_C02_023E | Total group quarters population, Black or African American |
| S2601C_C02_034E | Total group quarters population, 15 years +, Widowed |
| S2601C_C02_035E | Total group quarters population, 15 years +, Divorced |
| S2601C_C02_043E | Total group quarters population, 25 years +, Bachelor's degree or higher |
| S2601C_C02_047E | Total group quarters population, With a disability |
| S2601C_C02_051E | Total group quarters population, 18 to 64 years, With a disability |
| S2601C_C02_052E | Total group quarters population, 18 to 64 years, No disability |
| S2601C_C02_053E | Total group quarters population, Disability Status, 65 years + |
| S2601C_C02_054E | Total group quarters population, 65 years +, With a disability |
| S2601C_C02_087E | Total group quarters population,16 years +, Unemployed |
| S2601C_C02_088E | Total group quarters population,16 years +, Unemployed, % of the labor force |
| S2601C_C02_090E | Total group quarters population,16 years +, Not in labor force |
| S2601C_C02_093E | Total group quarters population, 16 years +, Service occupations |
| S2601C_C02_094E | Total group quarters population,16 years +, Sales and office occupations |
| S2601C_C02_105E | Total group quarters population, Individuals With Food Stamp/SNAP benefits |
| S2601C_C02_106E | Total group quarters population, Poverty Status is Determined, All people |
| S2601C_C02_107E | Total group quarters population, Poverty Status is Determined, 18 years + |
| S2601C_C02_108E | Total group quarters population, Poverty Status is Determined, 18 to 64 years |
| S2601C_C02_109E | Total group quarters population, Poverty Status is Determined, 65 years + |
NB: Name of the State is also included as the ID variable.
List of States Used in the Study.
| Alabama, Arizona, Arkansas, California, Colorado, Connecticut, Florida, Georgia, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, New Jersey, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, South Carolina, Tennessee, Texas, Virginia, Washington, Wisconsin |
Pandemic Data – Variables Used in Module 2.
| Variable | Description |
|---|---|
| Confirmed | Aggregated confirmed case count for the state. |
| Deaths | Aggregated Death case count for the state. |
| Active | Aggregated confirmed cases that have not been resolved. |
| Incidence_Rate | Confirmed cases per 100,000 persons. |
| People_Tested | Total number of people who have been tested. |
| Mortality_Rate | Number recorded deaths / Number confirmed cases. |
| Testing_Rate | Total number of people tested per 100,000 persons. |
NB: Name of the State is also included as the ID variable.
Module 1 – Summary Descriptions of the Clusters.
| Cluster | Cluster 1 | Cluster 2 | Cluster 3 |
|---|---|---|---|
| Average rank | 12.54 | 16.09 | 28.22 |
| Assigned label | |||
| Cluster size | 13 | 11 | 9 |
Module 1 – Cluster Membership.
| Cluster | States |
|---|---|
| 1 | Arizona, California, Florida, Georgia, Illinois, Indiana, Michigan, New York, North Carolina, Ohio, Oregon, Pennsylvania, Texas |
| 2 | Colorado, Connecticut, Iowa, Kansas, Maryland, Massachusetts, Minnesota, New Jersey, Virginia, Washington, Wisconsin |
| 3 | Alabama, Arkansas, Kentucky, Louisiana, Mississippi, Missouri, Oklahoma, South Carolina, Tennessee |
Module 1 – Top Variables that differentiate the Clusters.
| DTI Iteration | Selected Splitting Variables |
|---|---|
| 1 | Total population for whom poverty status is determined ( Total population with a disability ( |
| 2 | Population Density ( Population 25 years and over with bachelor’s degree or higher ( |
Fig. 2Visual Representation of the Results of DTI.
Module 1 – Association Rules.
| Left Side (If) | Association ( | Right Side (Then) |
|---|---|---|
| High | MidHigh | |
| High | High | |
| Low | LowMid | |
| Low | Low | |
| MidHigh | High | |
| Low | LowMid | |
| MidLow | High | |
| Low | High | |
| High | Low |
Module 1 – Result of MR Analysis.
| Model Statistics | R2 | Adjusted R2 | F | Significance F |
|---|---|---|---|---|
| 0.390 | 0.327 | 6.190 | 0.002 | |
| Variable | Coefficients | Standard Error | t Stat | P-value |
| 107.935 | 40.872 | 2.641 | 0.013 | |
| −20478.073 | 7514.452 | −2.725 | 0.011 | |
| 13467.399 | 5916.581 | 2.276 | 0.030 | |
Module 1 – Priority-based Groupings of the States.
| Tier 1′ States- | Tier 2′ States- | Tier 3′ States- |
|---|---|---|
| New Jersey, Massachusetts, Connecticut, Maryland, New York, Florida, Ohio, Pennsylvania, California, Illinois, Virginia | North Carolina, Indiana, Georgia, Michigan, South Carolina, Tennessee, Kentucky Washington, Texas, Wisconsin, Louisiana | Alabama, Missouri, Minnesota, Arizona, Mississippi, Arkansas, Oklahoma, Iowa, Colorado, Oregon, Kansas |
Module 2 – Results of Cluster Analysis.
| 1 (n = 13) | Colorado, Indiana, Kansas, Kentucky, Michigan, Minnesota, Missouri, Ohio, Oklahoma, Oregon, Pennsylvania, Washington, Wisconsin |
| 2 (n = 12) | Alabama, Arizona, Arkansas, Georgia, Iowa, Maryland, Mississippi, North Carolina, South Carolina, Tennessee, Texas, Virginia |
| 3 (n = 8) | California, Connecticut, Florida, Illinois, Louisiana, Massachusetts, New Jersey, New York |
Module 2 – Summary Descriptions of the Clusters.
| Cluster | Cluster 1 | Cluster 2 | Cluster 3 |
|---|---|---|---|
| Average Rank | 23.00 | 20.50 | 6.13 |
| Assigned Label | |||
| Cluster Size | 13 | 12 | 8 |
Module 2 – Comparison of Cluster Memberships.
| Confirmed Cases | High Level | Mid Level | Low Level |
|---|---|---|---|
| Based on demographics | Module 1: Cluster 1 | Module 1: Cluster 2 | Module 1: Cluster 3 |
| Based on actual spread | Module 2: Cluster 3 | Module 2: Cluster 2 | Module 2: Cluster 1 |
| Change in Avg. Ranking | 12.54 → 6.12 | 16.09 → 20.5 | 28.22 → 23 |
| States “ | California, Florida, Illinois, New York | Colorado, Kansas, Minnesota, Washington, Wisconsin | Alabama, Arkansas, Mississippi, South Carolina, Tennessee |
| States that “ | From | From | From |
Louisiana | Connecticut, Massachusetts, New Jersey | Oklahoma, Missouri, Kentucky | |
| States that “ | From | From | From |
Ohio, Oregon, Pennsylvania, Indiana, Michigan | Virginia, Maryland, Iowa | Arizona, Texas, Georgia, North Carolina |
Module 2 – Top Variables that differentiate the Module 2 Clusters.
| Cluster | Condition |
|---|---|
| Cluster 1 ( | |
| Cluster 2 ( | |
| Cluster 3 ( |
Module 2 – Average Relative Efficiency Scores.
| DEA Model | Group | ||
|---|---|---|---|
| High Risk | Mid Risk | Low Risk | |
| Clustering is based on Demographic Risk Factors | Cluster 1 | Cluster 2 | Cluster 3 |
| 0.86 | 0.97 | 0.78 | |
| 0.81 | 0.83 | 0.89 | |
| 0.78 | 0.79 | 0.89 | |
| High Preval | Mid Preval | LowPreval | |
| Clustering is based on Actual Spread of the Disease | Cluster 3 | Cluster 2 | Cluster 1 |
| 0.90 | 0.86 | 0.88 | |
| 0.86 | 0.89 | 0.78 | |
| 0.73 | 0.89 | 0.81 | |
Module 2 – Average Malmquist Index (MI) Scores.
| Change over time (MI) | Group | ||
|---|---|---|---|
| 0.33 | 0.42 | 0.47 | |
| 0.82 | 0.98 | 0.85 | |
| 0.96 | 0.99 | 0.94 | |
| 0.31 | 0.41 | 0.45 | |
| 0.98 | 0.81 | 0.89 | |
| 1.06 | 0.99 | 0.89 | |
Statistical Analysis of the (Population Density, Incidence Rate) → Mortality Rate link.
| Month | R2 | Significance F | Variable | Coefficient | P-value |
|---|---|---|---|---|---|
| April | 0.14 | 0.04130031 | −0.0012 | 0.1484 | |
| 0.0025 | |||||
| May | 0.30 | 0.00183188 | 0.0004 | 0.7809 | |
| 0.0023 | |||||
| June | 0.37 | 0.000384 | 0.0013 | 0.4520 | |
| 0.0023 | |||||
| July | 0.43 | 0.0001 | 0.0061 | ||
| 0.0000 | 0.9964 | ||||
| August | 0.55 | 0.0000 | 0.0065 | ||
| −0.0004 | 0.3406 |
Statistical Analysis of Testing Rate → Incidence Rate link.
| Module 2 Cluster | R2 | Coefficient | P-value | |
|---|---|---|---|---|
| 0.9610 | 0.4825 | |||
| 0.6036 | 0.3996 | |||
| 0.9120 | 0.2248 | |||
| 0.7764 | 0.3908 | |||
| 0.1749 | 0.1031 | 0.1549 | ||
| 0.1282 | 0.0667 | 0.2530 | ||
| 0.7936 | 0.3640 | |||
| 0.7328 | 0.2923 | |||
| 0.0700 | 0.033 | 0.3821 | ||
| 0.1473 | 0.0599 | 0.2180 | ||
| 0.8542 | 0.2087 | |||
| 0.7318 | 0.1537 | |||
| 0.1036 | 0.0206 | 0.2834 | ||
| 0.0605 | −0.0334 | 0.4408 | ||
| 0.5774 | 0.0759 | |||
| 0.5318 | 0.0809 | |||
| 0.1357 | 0.0180 | 0.2153 | ||
| 0.0328 | −0.0236 | 0.5726 | ||
| 0.1477 | 0.0353 | 0.3470 | ||
| 0.2938 | 0.0535 |
Statistical Analysis of Incidence Rate → Mortality Rate link.
| Month | Module 2 Cluster | R2 | Coefficient | P-value |
|---|---|---|---|---|
| 0.2866 | 0.0022 | 0.0594 | ||
| 0.0021 | 0.0002 | 0.8990 | ||
| 0.0484 | 0.0020 | 0.6007 | ||
| 0.1321 | 0.0017 | |||
| 0.4692 | 0.0079 | |||
| 0.0224 | 0.0012 | 0.6420 | ||
| 0.4773 | 0.0020 | 0.0577 | ||
| 0.3413 | 0.0025 | |||
| 0.4383 | 0.0073 | |||
| 0.3711 | 0.0032 | |||
| 0.6557 | 0.0027 | |||
| 0.3966 | 0.0029 | |||
| 0.3518 | 0.0078 | |||
| 0.0466 | 0.0008 | 0.5001 | ||
| 0.3641 | 0.0045 | 0.1131 | ||
| 0.1471 | 0.0021 | |||
| 0.0064 | 0.0007 | 0.7942 | ||
| 0.0072 | 0.0001 | 0.7921 | ||
| 0.0744 | −0.0015 | 0.5133 | ||
| Complete Sample | 0.0027 | 0.0001 | 0.7733 |
Clustering based on Demographic Risk Factors: Comparison of EC vs TC.
| High | Mid | Low | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Period | MI | EC | TC | MI | EC | TC | MI | EC | TC |
| 0.33 | 0.60 | 0.59 | 0.42 | 0.60 | 0.47 | 0.66 | |||
| 1.13 | 1.16 | 1.12 | |||||||
| 0.84 | 0.74 | 0.94 | |||||||
| 0.82 | 0.67 | 0.98 | 0.73 | 0.85 | 0.82 | ||||
| 0.96 | 0.97 | 0.99 | 0.98 | 0.94 | 0.97 | 0.98 | |||
Clustering based on Actual Spread of the Disease: Comparison of EC vs TC.
| High | Mid | Low | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Period | MI | EC | TC | MI | EC | TC | MI | EC | TC |
| 0.31 | 0.51 | 0.41 | 0.58 | 0.45 | 0.66 | 0.67 | |||
| 1.19 | 1.16 | 1.02 | |||||||
| 0.86 | 1.19 | 1.07 | |||||||
| 0.98 | 0.64 | 0.81 | 0.80 | 0.89 | 0.72 | ||||
| 1.06 | 1.03 | 0.99 | 0.98 | 0.89 | 1.01 | 1.00 | |||