Literature DB >> 35916867

Computational Decision Support for the COVID-19 Healthcare Coalition.

Andreas Tolk¹, Christopher Glazner², Joseph Ungerleider².

Abstract

The COVID-19 Healthcare Coalition was established as a private sector-led response to the COVID-19 pandemic. Its purpose was to bring together healthcare organizations, technology firms, nonprofits, academia, and startups to preserve the healthcare delivery system and help protect U.S. populations by providing data-driven, real-time insights that improve outcomes. This required the coalition to obtain, align, and orchestrate many heterogeneous data sources and present this data on dashboards in a format that was understandable and useful to decision makers. To do this, the coalition employed an ensemble approach to analysis, combining machine learning algorithms together with theory-based simulations, allowing prognosis to provide computational decision support rooted in science and engineering.

Entities: Chemical

Year: 2020 PMID： 35916867 PMCID： PMC9295914 DOI： 10.1109/MCSE.2020.3036586

Source DB: PubMed Journal: Comput Sci Eng ISSN： 1521-9615 Impact factor: 2.152

I n the early months of 2020, the SARS-CoV-2 Coronavirus took the world by surprise, resulting in the COVID-19 pandemic that has caused significant loss of lives and challenged the sustainability of our health care systems. In mid-March, it became obvious that government and communities had to react immediately. Under the lead of the Mayo Clinic and The MITRE Corporation, the COVID-19 Healthcare Coalition (C19HCC) was established as a coordinated public-interest, private-sector response. The coalition brought healthcare organizations, technology firms, nonprofits, academia, and startups to support supply chains, inform coordinated social policies, and provide data-driven insights to protect people and preserve the healthcare delivery system. The coalition quickly reached more than 1000 member organizations, many of them working in computational fields. Although the efforts focused on the United States, we had several international partners who not only observed, but also contributed to the efforts. This article summarizes selected research results and lessons learned when highly diverse and heterogenous organizations bring their data and computational infrastructure together to provide computational decision support in a new problem domain with daily changing scientific insights, as is the case with COVID-19. Of particular interest for this journal is the work of the analytics working groups who had to obtain and align distributed and diverse data, orchestrate heterogenous modeling approaches, use machine learning (ML) and artificial intelligence (AI) methods to identify trends, apply simulations implementing latest research insights, and visualize the results using dashboards that allow decision makers in federal and state governments to understand the results, leading to actionable recommendations.

DATA CHALLENGE

One of the first activities of the coalition was to obtain data that could provide insight into the situation as it unfolded in the United States and around the world. In the early weeks and months of the outbreak, there was a strong need for open, available authoritative data. In the absence of official government sources, the Johns Hopkins University's Center for Systems Science and Engineering established one of the first global sources of updated, curated epidemiological data, compiling information from around the globe. Their work was soon augmented with data provided by third parties such as the COVID Tracking Project, C19HCC partners, such as hospitals, pharmacies, and medical suppliers, and eventually, from the Centers for Disease Control (CDC). As to be expected, these data differed in many ways, including the definition of categories (e.g., all cases, lab confirmed cases, suspected cases), what level of geospatial granularity (if any) it provided, the temporal frequency at which it was obtained, and what standards were used to structure its format. This inconsistency in the data can be attributed to the lack of standards for this data, which is usually managed at the state and local levels in the United States. The heterogeneity of the data required detailed analysis and cleansing and led to a series of normalization and conditioning steps that were necessary for our exploration. As the coalition's response and partnerships grew however, other datasets from open, public, and scientific organizations would further enrich and complicate the analytic approach. Our second and more difficult challenge was normalizing data according to a common controlled vocabulary or standard data format. Our process for synthesizing and reporting the data in support of decision-making began with the identification of scientific questions which were derived from several key documents originating at the CDC, White House Task Force, and National Governor's Association. Using these documents, we derived analytic goals of the coalition that could be used to align and assign datasets into appropriate categorization for exploratory data analysis. While data was made immediately available through the volunteer partnerships within the coalition, much of these data were not geospatially bound to the same common standard. In order for the coalition to provide both situational awareness and localized recommendations in response to the crisis, geospatial accuracy was a critical component that was often missing or inconsistent. As such, we had to implement multiple geocoding techniques to assign location accuracy to the data that was unbound, provide look-up tables to translate to locations conformant with the Federal Information Processing Standard Publication (FIPS) (as most analytics were structured around States or Counties), and to merge datasets of overlapping geospatial locations into appropriate location codes. Our second and more difficult challenge was normalizing data according to a common controlled vocabulary or standard data format. Although the coalition attempted to define a schema by which all data would comply, it was difficult to implement such a standard across the broad spectrum of data providers. Most data providers followed a schema consistent with their current customer base (i.e., hospital group, care provider, pharmaceutical provider), which were disparate from similar providers across the health industry. Our inability to standardize recording practices to include both what data was collected, and how that data was stored, resulted in a series of translation tables, and heuristic optimization techniques to combine similar data into common tables. In lieu of generally accepted and applied standards, the coalition had to agree on case-specific mappings to support the short-term goals. To ensure much quicker reaction times for comparable challenges to COVID-19 in the future, better standards allowing the alignment of such diverse data sources will be needed, including pedigree data. A data-centric enterprise requires a guiding data strategy that allows data to remain application specific, while at the same time able to be aligned with alternative viewpoints and structures. Ultimately, we were able to take many disparate data sources and solve dozens of analytic questions using this approach within the coalition. Where federal standards existed, we resolved the differences in data format through our translators and data normalization engines, and established our own data pull frequency to ensure similar cadence and conditioned all data into the FIPS format for geospatial accuracy. Where such agreements were missing, coalition specific solutions were required.

USING AI/ML FOR FORECASTING

The pioneer in systems thinking, Ackoff is credited with the development of the data, information, knowledge, wisdom pyramid. By putting data into context, we are gaining information, and by adding causal relationship in procedural manner, we reach knowledge. AI and ML became ubiquitous over the recent years with their ability to quickly apply methods rooted in statistics to evaluate huge amounts of data to discover hidden relations by looking at correlations and related means of multivariate statistics. By not only using these insights for the visualization of live data but additionally for the visualization of forecasted behavior as well, it becomes a reliable tool for decision support for various operational decisions. Applying it to better understand the COVID-19 pandemic was therefore a logical early step for the coalition. However, there were two challenges of general interest to the community using computing in science and engineering quickly identified in the process. First, because this was a novel coronavirus, there was not a significant amount of data for AI/ML to train on, and what data was available often was inconsistent. Without consistent, curated big data to learn from, ML approaches were of limited use. With the addressed standardization efforts and agreements on data structures and tags to capture the meaning within the community, and with unfortunately more days of transmission, these challenges could be overcome. The second challenge, however, was harder. If AI/ML identified a trend, using it to forecast future developments that forecast is dependent on the constraints and assumptions underlying this trend when they were observed. To create a valid forecast these constraints must remain unchanged. But decision makers and the population want to do exactly the opposite of that: to counteract an undesirable trend, they want to change the constraints and assumptions. In the face of seeing others sick and dying around them, people will modify their behavior, but not all populations will respond in similar ways. Without a massive training data set containing trends of previous responses to all combinations of actions, representative of multiple geographies, AI/ML approaches will struggle. Our approach was to ensemble theory-driven simulations with the AI/ML driven models to develop forecasts that could help decision makers understand “what if.”

SIMULATION AS EXECUTABLE THEORY

Given the newness of COVID-19 and the limited amount of available data, the use of simulation modeling, built from a foundational understanding of the disease and its characteristics, was the only alternative to provide actionable forecasts, especially in the opening months of the pandemic. The professional simulation community reacted quickly with recommended solutions, such as in the publication by Currie et al., as well as with calls to action, as by Squazzoni et al. The C19HCC working group on modeling and simulation actively contributed to answer this call and utilize proposed solutions. This occurred in several phases, along the lines of new scientific insights about the pandemic. The earliest phase of modeling was led by SEIR(S) compartment models, a long-time tool of public health in which differential equations describe the flow between compartments of the population of those being “Susceptible,” “Exposed,” “Infectious,” “Recovered,” or “Dead,” and possibly again “Susceptible.” The discipline of system dynamics provides a well-supported set of methods and tools to allow the rapid development of such models, which then could be configured by empirically observed data on contact rates, infection rates and incubation time, and recovery/death rates and recovery time. These SEIR(S) models were the first simulations that used theoretic insights to drive the simulation, differing from data-driven forecasting to knowledge-driven prognosis. Using an established and accepted theory of how a virus generally spreads, simulation systems could help to think about what interventions could work best to decrease the amount of infected people at the same time, as the first objective was to flatten the curve to preserve the healthcare delivery system. The earliest models released, such as that of the Imperial College London fall into this category of models, built from our understanding of disease transmission, as observed in China and later Italy. A weakness of SEIR(S) models is that they are highly sensitive to their initial conditions, parameter values, and a structural knowledge of the disease. However, SEIR(S) is a relatively simple theory leading to simple models that can show trends and effects, and differential equations approaches to these models require assumptions of even mixing of heterogeneous populations, or cumbersome work arounds. As the pandemic unfolded, it quickly became evident these were not valid assumptions: the virus does not impact all populations evenly, and the interaction among different groups is far from even. “Super spreader” events, where infections begin in places of very high contact rates, as well as infections among susceptible populations, such as retirement communities and incarcerated populations, play an outsized role in the evolution of the COVID-19 outbreaks. A weakness of SEIR(S) models is that they are highly sensitive to their initial conditions, parameter values, and a structural knowledge of the disease. Early in the outbreak, it was not yet known that asymptomatic individuals could spread the virus. Misunderstandings like this can drastically change model results. Given the number of unknowns in the early months, these models gave wildly divergent forecasts, leading some decision makers to completely discount models, even when they provided a logical basis for comparing courses of action. SEIR(S) models served their initial purpose for quick insight given limited information, but better solutions were needed to inform decision makers that could be tailored to local populations. The C19HCC developed a set of requirements for such improved simulation solutions, allowing both quantitative and qualitative data. They should be driven by detailed data, describing the spatial and demographic details per region, and calibrated to reflect the region's characteristics. The calibration was supported by genetic algorithms. Small experiments, using agent-based models, demonstrated that social groups and interactions, network structures, and local distribution are often more important for the spread of the disease than infection parameters in SEIR(S). The coalition observed in their experiments that the social network structure of interactions usually led to infection numbers significantly lower than those predicted from SEIR(S) models that assumed uniform interactions. While the average infectivity is important, they are not always the dominant characteristic of an outbreak. We also observed that mobility and movement significantly impact the spread. Finally, human factors, like being compliant with regulations or altering behavior as perceived conditions change, were identified and demonstrated to be important factors. The situation of requiring more detailed information about distribution and social nets for better understanding is reminiscent of Anscombes quartet, shown in Figure 1. All four distributions shown in the figure have identical standard statistics: mean, variance, and correlation, but obviously are quite different, so that higher resolution modeling is required.

Figure 1.

Anscombes quartet as an example for the need of higher resolution to capture the essence of local distributions.

Anscombes quartet as an example for the need of higher resolution to capture the essence of local distributions. Another insight was coming from members of the Operational Research community, namely that the spread of COVID-19 is characterized by deep uncertainty. Lempert et al. define deep uncertainty as conditions “where analysts do not know, or the parties to a decision cannot agree on, (1) the appropriate conceptual models that describe the relationships among the key driving forces that will shape the long-term future, (2) the probability distributions used to represent uncertainty about key variables and parameters in the mathematical representations of these conceptual models, and/or (3) how to value the desirability of alternative outcomes.” In other words, deep uncertainty is systemic in the research. Parameters of systems may be unknown, behaviors and roles are unclear, objectives to be reached are still in question, etc. Computational science addresses deep uncertainty using exploratory analysis, such as that captured for research on technology forecast and social change by Kwakkel and Pruyt.5 This approach requires conducting experiments in large numbers to be statistically significant, running multiple scenarios to understand the stability under various circumstances, and using intuitive interfaces to present the results to the decision makers. An additional challenge is the time delay between cause and resulting effect observations. Decision makers are used to immediate feedback. In case of disease spread, observable parameters—like new infections and deaths—are happening days after new policies are taking effect. The use of simulation can bridge this gap by immediately allowing the simulation of such decisions and present to the expected outcome over time. In the next phase, the need to understand countering COVID-19 as a multivalue, multicriteria decision problem became apparent. Focusing predominantly on one challenge in the complex problem space, like focusing on minimizing the chance of contacts possibly leading to infections by interventions like lockdowns, were usually effective about the desired value but also had side effects that were not immediately obvious to the decision maker. A possible solution was the use of artificial societies, as being developed by three of the coalition members. The Argonne National Laboratory, the Bio-Complexity Group of the University of Virginia, and the Center for Health and Humanitarian Systems of the Georgia Institute of Technology all developed high-fidelity, individual-centered systems that not only address spatial-temporal aspects, but also social nets, workplaces, malls, schools, and many other aspects of daily life.- The socially capable agents within these models adjust their behavior to various constraints, which include the various interventions decision makers may choose, including wearing masks, washing hands, etc. These simulations quickly become so intensive computationally that only high-performance computers can provide the necessary power. The coalition therefore also looked at new forms of large-scale distributed computing, such as those initially discussed by Chen et al. With the Center for Mind and Culture of the Boston University as a member, the coalition was able to put these lessons learned into its own solution as well. Supported by the Virginia Modeling Analysis and Simulation Center and the industry partner Simudyne, a smaller scale artificial society was created to represent students, faculty, and administrators within a university. The Artificial University (TAU) is implemented as an agent-based epidemiological model that takes account of often neglected human factors such as compliance and social networks, represents most of the interventions under discussion within universities, and employs multiple metrics to express diverse priorities. To support the coalition effort to empower minority and underserved population-centered universities, the core model was published as open source. Using this simulation, the group demonstrated that TAU can be used to generate policy insights directly relevant to the challenges facing university administrators concerning reopening and operating universities in the COVID-19 era. These insights include the identification of which interventions are most impactful regarding the spectrum of supported values as well as tipping points indicating how far to push any of these interventions, including questions like how to deploy limited testing kits and vaccination resources within the university. At the time this article is written, several universities started to use this resource in support of their decision making and evaluation process. Overall, the use of artificial societies and their ability to support multiple viewpoints and facets in a coherent computational representation showed to be a technically feasible and user acceptable approach. This approach enables support of multi-criteria and multi-value exploratory analysis of these complex situations.

DASHBOARDS

The main objective for the development of dashboards is the visualization of data in a way that clearly communicates research insights to the decision maker. However, as stated in the previous sections, countering COVID-19 is a multivalue, multicriteria decision problem, where multiple criteria are influencing several, often conflicting values. As Rouse points out, decision makers must be able to immerse into the complex problem space and have controls at hand easy enough to understand quickly but also powerful enough to evaluate their various options. The COVID-19 Healthcare Coalition Decision Support Dashboard (C19HCC DSD), shown in Figure 2, was developed to support these ideas.

Figure 2.

Screenshot of the Covid-19 decision support dashboard (https://dsd.c19hcc.org/) initial screen.

Screenshot of the Covid-19 decision support dashboard (https://dsd.c19hcc.org/) initial screen. The DSD was first developed to analytically address the reopening criteria established in the National Governor's Association (NGA), Roadmap to Recovery Report issues in April 2020. The report identifies key metrics and analytic questions that must be resolved before a state or location can take actions to return to normalcy. Understanding the data challenges, the analytics from this dashboard were processed to produce Red, Yellow, and Green (RYG) risk indicators using both qualitative and statistical analysis methods to identify thresholds of performance against the NGA metrics. Using logic tables, the method for determining RYG risk is aggregated across the various decision aids contained within the dashboard to assess an overall risk status for each location. The power of the DSD is that it can display data with different pedigrees side-by-side for comparison. This allows also to show the results of different model forecasts and prognoses side by side. DSD can integrate data from different sources into a common representation and help to navigate through different levels of abstraction and resolution. This requires, however, that the data challenges addressed earlier in this article have been solved, as otherwise it bears the danger to provide the decision makers with a false sense of security when they make their decision, as they assume that the data are well aligned as suggested by the common representation. The C19HCC DSD allows users to “drill down” into data with higher resolution, starting on the State level and going down to Counties, and place names or cities, if these data are available. It should be emphasized that higher resolution does not always equal higher fidelity, as data can be updated in various degrees of accuracy as well as frequency. As such, precise and current data of the County may have a higher fidelity than some of the less precise and older data of some of the contained districts. How to communicate the levels of trust into a current display is topic of current research and was not addressed within the Coalition. However, the challenge of multivalues and comparability was addressed by allowing to display a variety of values, like new infections, economic impact, utilized hospital beds, and intensive care units, and more, and doing so side by side of different areas of interest, such as different districts of a city, Counties of a State, or comparing State values with each other. The compared section did not have to be on the same level, so that it is possible, e.g., to compare how well certain regions are doing in comparison with the State. Additionally, we utilized location spatial-join methods to take lower-level granular data and join it to form larger County level indicators. This made it possible in certain cases to use higher granular data to produce broader geospatial decisions. The coalition also applied hybrid models, allowing the use of data-driven AI/ML approaches as well as knowledge-based simulation models. A special challenge was the design decision options for the decision makers. As observed by Rouse, offering too many decision parameters to choose from can easily become a distraction. Instead, having a small set of well-designed options prepared by the decision maker's staff is likely to lead to better acceptance and easier use. Therefore, an additional layer allowing for the configuration of decision options had to be introduced, allowing the supporting staff to prepare the decision options to be used by the decision maker. Since the decision aids are logically grouped into broad analytic categories, we can use this visualization of information enabling meaningful analytic insights. As observed by Rouse, offering too many decision parameters to choose from can easily become a distraction. All data is furthermore specifically aligned with temporary analytic goals. However, as more data becomes available these analytic goals are modified to reflect the value and accuracy of these datasets. As such the analytic questions that are presented on the dashboard naturally evolved as the pandemic evolved, and as our understanding of the key indicators of public health, economic recovery, and returning to work and school became more impactful. It was important to tag and attribute our data so that it could be used and realigned with these evolving criteria so that we could maintain rapid decision-making response to the crisis. Within the coalition, DSD became the common interface to experience the complex situation. As such, the circle closes, as the coalition recommended that dashboards shall display the aligned and orchestrated data describing observed or former situations, visualize forecasting of trends and allow to compare them among each other as well as with prognosed development by simulation, and finally also show the results of various options selected by the decision maker. The dashboard as such could be interpreted as the “command center” for the decision maker, providing for situational awareness as well as allow for the evaluation of different courses of action. This ambitious goal was not reached at the time this article is written, but the full integration of solutions as described in by Rouse should be considered for future designs to be better prepared for challenges like the COVID-19 pandemic in the future. Finally, we observed the danger of epistemological and hermeneutical challenges as well. Epistemological challenges occur when not all relevant parameters are captured in the underlying model, often because they are not known. As a result, the model outcome cannot reflect the influence of the excluded data, potentially leading to a wrong decision as not everything relevant is captured in the underlying model. Hermeneutical challenges occur when the user of the model interprets the results using parameters that were not reflected in the model, therefore reading something into the model which is not in there. Both are more human than technical challenges, but they play a pivotal role in the use of data- and knowledge-driven decision support and should be considered. How to generally avoid them is topic of ongoing research. With all these challenges that had to be overcome, was it worth all the work? This question cannot be answered for sure, but since its initial launch in May 2020 to September 2020, the DSD was visited over 14K times by over 6000 unique users from 38 states. These users represented citizens, parents, teachers, health experts, and decision makers in office of public health, offices of education, offices of employment, chambers of commerce, and volunteer organizations. Users recognized the uniqueness of how data was organized around analytic questions, and they provided positive feedback on the simplicity of red/yellow/green risk analysis methods.

CONCLUSION

The COVID-19 Healthcare Coalition Working Group brought experts from various disciplines together to battle the pandemic. The degree of collaboration between experts of different disciplines is often captured by the terms multi-, inter-, and transdisciplinary team. The general understanding is that in multidisciplinary efforts, experts from various disciplines are working together on one common question or topic of interest. Each expert contributes knowledge, methods, and expertise as needed, but when the problem is solved, all return to their disciplines. When common tools are developed and the participating disciplines start to link to each other instead of juxtaposing, the effort becomes interdisciplinary. Permanent bridges between the disciplines are established. Finally, when the participating disciplines are systematically integrated to create new knowledge components in transcending and transgressing form, a new transdisciplinary effort emerges. Computers in science and engineering are enablers of such increasingly mutually supporting developments. One of the first challenges is creating interconnectivity between the participating organizations and their infrastructure. This allows the exchange of data, but the coalition showed that a lot of work can easily go into the effort to align the data, making sure that a common understanding of structure and meaning of the data can be established. The absence of standards was a primary obstacle in the early phases of the coalition. While common tools were developed and distributed in later phases, the desired stage of interdisciplinarity was not reached. The reason was more on the organizational side than a challenge in the computational domain, as this requires the mutual understanding and merging of research methods and the systematic integration of knowledge components. To react to another outbreak comparable to the COVID-19 pandemic quickly and efficiently in the future, transdisciplinary teams will be needed. Interconnected and standardized computational infrastructures are necessary but not sufficient. Capturing metadata providing information about data and processes to allow for rapid interconnections should be the minimal requirement for future preparedness. Within the coalition, the use of dashboards contributed to establish a common, mutual understanding and technical method of structuring analytics for decisions support. This effect was increased when the dynamic behavior of the represented system is visualized as well and exposes weakness in the underlying analysis methods and technologies. The visualization of decision support information also exposes the variability and gaps in our datasets that must be rectified before providing meaningful intelligence. The easier the methods utilized for decision support could be integrated into such dashboards, the easier the computational support with common tools became. The complexity of the solution space of this multivalue, multicriteria challenges with many uncertainties results in the conclusion that model-based approaches are useful to better understand what may happen, but we are not able to predict exactly what will happen. Nonetheless, the provided decision support can be used to exclude policy ideas that have no positive effects under any constraints and identify policies with positive effects under a great number of foreseeable constraints. The main lesson learned, however, did not concern the computational component, but the social and educational component of the coalition. The mutual respect and recognition that all the various disciplines must collaborate to have a chance in fighting the common enemy SARS-CoV-2 Coronavirus was the main driver for success. The more future leaders, scientists, and researchers are prepared for disciplinary overarching teams, the easier our society will be able to react, but our computational infrastructure must be prepared to support such collaborative efforts.

DISCLAIMER

The views, opinions, and/or findings contained in this article are those of The MITRE Corporation and should not be construed as an official government position, policy, or decision, unless designated by other documentation. It is approved for public release, distribution unlimited, Case Number 20-01789-3.

2 in total

1. The impact of social distancing on COVID19 spread: State of Georgia case study.

Authors: Pinar Keskinocak; Buse Eylul Oruc; Arden Baxter; John Asplund; Nicoleta Serban
Journal: PLoS One Date: 2020-10-12 Impact factor: 3.240

2. Medical costs of keeping the US economy open during COVID-19.

Authors: Jiangzhuo Chen; Anil Vullikanti; Stefan Hoops; Henning Mortveit; Bryan Lewis; Srinivasan Venkatramanan; Wen You; Stephen Eubank; Madhav Marathe; Chris Barrett; Achla Marathe
Journal: Sci Rep Date: 2020-10-28 Impact factor: 4.379

2 in total