Literature DB >> 28815148

Spatio-temporal Analysis for New York State SPARCS Data.

Xin Chen1, Yu Wang1, Elinor Schoenfeld1, Mary Saltz1, Joel Saltz1, Fusheng Wang1.   

Abstract

Increased accessibility of health data provides unique opportunities to discover spatio-temporal patterns of diseases. For example, New York State SPARCS (Statewide Planning and Research Cooperative System) data collects patient level detail on patient demographics, diagnoses, services, and charges for each hospital inpatient stay and outpatient visit. Such data also provides home addresses for each patient. This paper presents our preliminary work on spatial, temporal, and spatial-temporal analysis of disease patterns for New York State using SPARCS data. We analyzed spatial distribution patterns of typical diseases at ZIP code level. We performed temporal analysis of common diseases based on 12 years' historical data. We then compared the spatial variations for diseases with different levels of clustering tendency, and studied the evolution history of such spatial patterns. Case studies based on asthma demonstrated that the discovered spatial clusters are consistent with prior studies. We visualized our spatial-temporal patterns as animations through videos.

Entities:  

Year:  2017        PMID: 28815148      PMCID: PMC5543354     

Source DB:  PubMed          Journal:  AMIA Jt Summits Transl Sci Proc


Introduction

Open data initiatives supported by the governments are providing unprecedented information about our health. New York State SPARCS (Statewide Planning and Research Cooperative System[1]) data, for example, collects patient level detail on patient characteristics, diagnoses and treatments, services, and charges for each hospital inpatient stay and outpatient (emergency department, ambulatory surgery, and outpatient services) visit. More examples include a data set from CMS (Centers for Medicare & Medicaid Services) that contains information about providers who participate in Medicare[20] and New York State Cancer Mapping dataset[2,3] that consists of the number of people diagnosed with cancer (cancer counts, 2005-2009) in small geographic areas. All these datasets provide street level location information for each patient, healthcare provider or facility site. These datasets cover a history of patient records, which also makes it possible for longitudinal analysis of historical disease patterns. The improved availability of health data combined with improved geospatial analysis and spatial statistics techniques has significant potential to uncover the spatial, and spatial-temporal patterns of diseases in a population at community level and provide new insights as to their causes and controls.[17-18] Spatio-temporal data analysis for public health has a strong focus on locating patients and the agents of disease, studying the region level patterns and variations, and assessing the spatio-temporal trends on diseases and human health[8]. In the past, due to limited accessibility of heath data, public health studies were often limited at global level, and may not allow public health researchers and officials to adequately identify most at-risk populations, analyze, and monitor health events with fine-grained spatial resolutions, such as at the community or neighborhood level[12-16]. Our goal is to integrate open health data with a comprehensive set of spatio-temporal exposure data, which ranges from levels of various environmental pollutants to the socioeconomic status of persons at risk[3]. We focus on the spatio- temporal public health research with a fine-grained spatial resolution and consolidate a variety of spatial datasets into a data warehouse for scalable integrative spatial and spatial-temporal analytics[6]. In this paper, we introduced our preliminary study of spatial, temporal, and spatio-temporal analysis of disease patterns for New York State SPARCS data at the ZIP code level. The approach is generic and can be applied to finer spatial resolution at address level through geocoding patients’ addresses and approximating them as census block group identifiers, which is an ongoing project. The paper is organized as follows. We first present an overview of New York State SPARCS data with basic statistics for inpatient stay, emergency department visit, ambulatory surgery and outpatient visit, for the year 2014 (Table 1, Figure 1). We then studied spatial distributions for the top ranking diseases by discharge count, for the year 2014 (Table 2 and Figure 2-4). We then performed spatial clustering of asthma for the year 2014 (Figure 5) and analyzed the temporal trends for a selected group of diseases for the year 2003-2014 (Figure 6-8). At the end, we demonstrated our results as animation videos to visualize how the spatial clusters of asthma varies over time for the year 2005-2014, followed by discussions and conclusion.
Table 1:

Demographics of New York State SPARCS data, 2014

Population/Discharge occurrenceCensus*Inpatient StayEmergency DepartmentAmbulatory SurgeryOutpatient Visit
19,795,7912,298,7567,356,6082,443,41611,033,814
Age and SexPersons under 5 years6.00%12.50%8.60%1.80%5.10%
Persons under 18 years21.30%15.40%20.20%5.30%13.30%
Persons 65 years and over15.00%34.20%13.90%32.50%24.50%
Female persons51.40%56.20%55.10%55.90%59.20%
RaceWhite alone70.10%57.40%47.10%65.80%37.00%
African American alone17.60%18.50%25.50%10.60%23.80%
American Indian and Alaska Native alone1.00%0.30%0.20%0.40%0.40%
Asian alone8.80%3.80%2.50%3.00%3.40%
Native Hawaiian and Other Pacific Islander0.10%0.00%0.00%0.00%0.00%
Two or More Races2.40%0.40%0.30%0.50%0.10%
Other Race or Unknown-19.60%24.40%19.70%35.30%

Estimates, July 1, 2015

Figure 1.

Occurrence Counts for Major Disease Groupings by Discharge Claim Type, New York State, 2014

Table 2:

Spatial Autocorrelation of Top Ten Diseases by Discharge Count for Inpatient Stay and Emergency Department Visit, New York State, 2014

Disease NameTotal Discharge CountHospital Discharge Rate Mean (St. Dev.)Moran's I Index
Inpatient Stay (IP)Liveborn222,803100 (165)0
Osteoarthritis54,36744 (93)0.46*
Congestive heart failure (non-hypertensive)45,72225 (34)0.64*
Mood disorders43,209122 (230)0.54*
Other complications of birth; puerperium affecting management of mother36,48015 (30)0.70*
Cardiac dysrhythmias35,29722 (40)0.61*
Complication of device; implant or graft33,30520 (34)0.62*
Diabetes mellitus with complications33,04015 (31)0.70*
Asthma32,50510 (23)0.78*
Acute myocardial infarction31,24922 (36)0.53*
Emergency Department (ED)Abdominal pain342,294189 (210)0.05*
Nonspecific chest pain300,623197 (449)2.00%
Asthma158,17552 (191)0.04*
Other non-traumatic joint disorders137,93758 (73)0.25*
Other complications of pregnancy134,19549 (247)0.02*
Other injuries and conditions due to external causes111,13763 (114)0.04*
Other viral infections***109,74635 (48)0.38*
Sprains and strains109,42981 (211)0.02
Superficial injury; contusion102,58279 (222)0.03**
Other gastrointestinal disorders93,09449 (140)0.01

Significant at 1% confidence interval

Significant at 5% confidence interval

other viral infections include herpes zoster infection, herpes simplex infection, and other and unspecified viral infection.

Figure 2.

Liveborn Inpatient Discharge Rate per 10,000 Residents by ZIP Code, New York State, 2014

Figure 4.

Asthma Inpatient Discharge Rate per 10,000 Residents by ZIP Code, New York State, 2014

Figure 5.

Spatial Clusters and Outliers of Asthma Inpatient Discharge Rate by ZIP Code, New York State, 2014

Figure 6.

Temporal Trends of Top Ten Diseases by Discharge Count for Inpatient Stay, New York State, 2005-2014

Figure 8.

Temporal Trends of Inpatient Discharge Rate for Asthma’s Six Subcategories, New York State, 2005- 2014

Methods

Overview of New York State SPARCS Data

Population. In this work, we used four types of New York State SPARCS data according to the discharge claim type, namely inpatient stay (IP), emergency department visits (ED), ambulatory surgery (AS), and outpatient visits (OP) with basic demographics for the year 2014 given in Table 1. Any New York State healthcare facility certified to provide inpatient services, ambulatory surgery services, emergency department services or outpatient services is required to submit data to SPARCS. The purpose of SPARCS was to create a statewide data set to contribute to the goal of providing high quality medical care by serving as an information source[4]. Disease Categories. While the SPARCS data contains a comprehensive list of diagnoses and treatment procedure code for each discharge record. This paper focused on analyzing the spatio-temporal trends on disease categories based on the principal diagnostic code. The ‘Principal/Primary Diagnosis’ is the condition established after study to have been chiefly responsible for occasioning the admission of the patient to the hospital for care.[4] We used the Clinical Classifications Software (CCS) for grouping patient diagnoses into a manageable number of clinically meaningful categories. Figure 1 included the hospital discharge counts for the year 2014 for major disease groupings while the rest of this paper used more detailed disease categories from the single-level CCS.[5] Patients’ Home Address and Hospital Admission Year. We approximate patients’ home location by combining the 5- digit ZIP code number from the patients’ home address and geographic data from TIGER/LINE data.[7] We then aggregated disease occurrences at ZIP code level and generated disease regional counts for the following spatio- temporal analyses. To evaluate the temporal trends, this work used the hospital admission year for the time range 2005-2014 for IP discharge and 2003-2014 for ED, AS, and OP discharge.

Spatio-temporal Analysis

We first used global spatial clustering analysis to determine whether a disease is dependent to patients’ home locations. As a starting point to access all disease mappings of New York State, we examined the top ten diseases of inpatient stays and emergency department visits according to the discharge counts. We then checked the spatial variation and local spatial clustering for exemplary diseases according to their degree of spatial dependence. At last, we analyzed detailed temporal and spatio-temporal trends for one exemplary disease (asthma inpatient stay). Hospital Discharge Rate. We used the hospital discharge rates to measure the probability of occurrence of a given disease in a population within a specified period of time. We calculated hospital discharge rates for inpatient stays, emergency department visits, ambulatory surgery, and outpatient visits respectively. The hospital discharge rate was calculated through dividing discharge counts by population counts from Census data[7]. In this paper, we evaluated both statewide rate and rates at ZIP code level. The discharge rates provide useful information about how common a disease is when compared to other diseases, or how common a disease in a specific location is as compared to the global baseline. Global Spatial Clustering. In spatial statistics, spatial autocorrelation measures how much close objects are in comparison with other close objects. To test whether there is spatial autocorrelation, this work used Moran’s I (Table 2). Moran’s I is a widely used global cluster test, which determines the degree of clustering or dispersion within a data set. The resulting values may range from 1 (perfect correlation), 0 (complete spatial randomness) to -1 (perfect dispersed).[8] For the hospital discharge rates, a positive spatial autocorrelation means that the areas with high discharge rates are close to other areas with high discharge rates. Local Spatial Clustering. In addition to the global cluster test (with Moran’s I index in Table 2) and visual analysis for mapping hospital discharge rates (in Figure 2-4), we then took cluster and outlier analysis with local Moran’s I statistics[8] to quantitatively detect local clusters for the asthma IP rates (Figure 5). The local Moran’s I statistics is a local cluster test that, given a set of weighted features, identifies statistically significant hot spots, cold spots, and spatial outliers. Temporal Analysis and Spatio-temporal Animation. As a starting point for spatio-temporal analysis for SPARCS data, we used the admission dates to examine the temporal trends of the top ten IP diseases (excluding liveborn) for the year 2003-2014 in Figure 6-8. To observe patterns that emerge in the mapping of discharge rates as time passes, we made spatio-temporal animations to visualized how the spatial clusters varies over time for the year 2005-2014.

Results

Spatial Trends

According to the Moran’s I indexes and the significant levels (Table 2), four types of disease discharge rates were spatially random, including IP liveborn, ED nonspecific chest pain, ED sprains and strains, and ED other gastrointestinal disorders (constipation, dysphagia, and other and unspecified gastrointestinal disorders). For the top ten IP diseases, asthma had the highest Moran’s I index that indicates the highest spatial clustering tendency. For the top ten ED diseases, other viral infections (herpes zoster infection, herpes simplex infection, and other and unspecified viral infection) showed the highest spatial clustering tendency. As shown in the choropleth map Figure 2-4, all ZIP code areas were classified into five classes according to their discharge rates. Class breaks were created with equal value ranges at intervals of one standard deviations using mean values and the standard deviations from the mean. Figure 2 visualized the spatial distribution of liveborn inpatient discharge rate by ZIP code, New York State (NYS), 2014. The liveborn rates for the majority part of NYS were within one standard deviation from the mean. Only a very small number of ZIP code areas were hotspots (areas with liveborn rate higher than 1.5 std. dev. in orange and red color in the map). This implied that the occurrence of liveborn was independent from the patients’ home locations. While Figure 2 showed an example for spatially independent disease, we used viral infection ED visit rate (Figure 3) and asthma inpatient discharge rate (Figure 4) as two examples for spatially dependent diseases. Both the viral infection (Figure 3) and asthma (Figure 4) exhibited a tendency of spatial clustering. Compared to the spatial distribution of liveborn rate, the maps of viral infection and asthma contained more hotspots (areas with rates higher than 1.5 std. dev. in orange and red color in the map), which indicates that the cases of viral infection or asthma more likely occurred near each other.
Figure 3.

Viral Infections Emergency Department Visit Rate per 10,000 Residents by ZIP Code, New York State, 2014

We can see that the mapping of viral infections ED rates has no extreme hotspots (areas with rate higher than 2.5 std. dev. in red color in the map), which indicates a more equal distribution of the rates. The asthma IP rates, on the other hand, have more extreme hotspots, which indicates a tendency of more uneven spatially distribution. From Table 2, we can also tell that the Moran’s I index values of ED visit rates are generally lower than those of IP discharge rates. It means that the ED visits rates generally have a lower degree of clustering tendency than IP discharge rates. For the rural-urban difference, we cannot find a rural-urban disparity for viral infections or asthma. Most parts of rural areas and urban areas (areas in the map with line fill symbol) have no hotspots. The few cases of hotspots and higher rates (with values higher than 1.5 std. dev.) areas seem more likely appear at New York City. With manual checking, we found the two hotspots areas for asthma are an area near JFK airport and upper Manhattan. The cluster/outlier type field in Figure 5 distinguishes between a statistically significant cluster of high values (High- High cluster), cluster of low values (Low-Low cluster), outlier in which a high value is surrounded by low values (High-Low outlier), and outlier in which a low value is surrounded by high values (Low-High outlier). Statistical significance is set at the 95 percent confidence level. We applied the False Discovery Rate (FDR) correction to reduce this p-value threshold from 0.05 to a value that better reflects the 95 percent confidence level given multiple testing. The FDR procedure will potentially reduce the critical p-value in order to account for multiple testing and spatial dependency.[8] As shown in the mapping of spatial clustering for asthma IP rates, two areas located at near JFK airport and upper Manhattan were identified as the High-High cluster (in pink color in Figure 5), which represent statistically significant clusters of higher asthma IP rates. Such results confirmed the visual analysis for spatial distribution map in Figure 4 and prior research about the potential health impact of residential proximity to large NYS airports[9-10]. Many High- Low outliers and Low-High outliers were also identified near the Albany city in northeastern New York State. Such finding, however, may require further research for the potential driving factors.

Temporal Trends

In Figure 6, most diseases have a down trend over the past decade (2005-2014). The osteoarthritis and congestive heart failure (non-hypertensive), however, have a rising trend which may require further research for the potential driving factors. We then took a close look at the temporal trends for asthma with different claim types (Figure 7). While inpatient stay, ambulatory surgery, and outpatient visits shared a similar down trends, the emergency department visit rates were rising over the past years. For inpatient discharge rates, we then broke down the asthma into its six subcategories (Figure 8) and found that the asthma subcategory (asthma other than chronical obstructive asthma with acute exacerbation) also had a rising trend. All these findings require further research for the potential driving factors.
Figure 7.

Temporal Trends of Asthma Discharge Rate for Four Discharge Claim Types, New York State, 2003- 2014

Spatio-temporal Animation

We made two exemplary animations of spatial clusters for the year 2005-2014 for New York State https://vimeo.com/183126416 and New York City and Long Island area https://vimeo.com/183126718 separately. At the statewide scale, the distribution of spatial clusters over New York State changed more dramatically than that over New York City and long island areas. For the areas near JFK airport, the spatial clusters with high rates emerged in 2006, disappeared during 2008-2012, and reappear after 2012. For the Low-High outliers and High-Low outliers appearing near the Albany city in northeastern New York State, we also observed a similar fluctuation trend over the years. To evaluate the potential factors that result in such fluctuation patterns, we will integrate SPARCS data with a comprehensive set of spatial impact factors in our future work, which ranges from levels of various environmental pollutants to demographical and socioeconomic status of persons at risk[3, 20].

Discussion and Future Work

This study provided our preliminary results of the spatio-temporal analyses on the New York State SPARCS data, which provides the analysis framework and acts as a baseline for more refined studies of spatio-temporal trends on diseases and human health in our future work. We examined the spatial variations for the top ranking diseases by different claim types and performed case studies on the spatial clustering patterns of asthma that draw consistent results with previous work. While our long-term goal is to provide integrative spatial analytics at fine grained spatial resolution[3], this paper focused on the spatio-temporal analysis on the disease discharge rates at ZIP code level. Our future work will break down patients based on demographics, such as gender, age or age ranges, race and ethnicity groups. For discharge rates based patterns of the diseases, we could elaborate on age or race based analysis. More fined-grained temporal information such as patient admission or discharge dates could also be used for discovery of refined temporal patterns in our future work. There has long been a demand for spatio-temporal data analysis at a fine geographic resolution for use in etiologic hypothesis generation, methodological evaluation and teaching. In our ongoing work, we are examining the comprehensive list of diagnoses and treatments, services, and charges for each hospital inpatient stay and outpatient visit. We will also take advantage of the street level location information and specific date information (admission date, procedure date, etc.) for each patient, healthcare provider and facility site. After geocoding and approximating addresses into census block group identifiers, our framework for integrative spatial data analytics will provide spatial queries based on coordinates or boundaries, and enable integrating and correlating the health records with spatial exposure data at multiple resolutions. We will provide multi-dimensional analysis by grouping patients according to their demographic or socio-economic attributes. We will also study potential spatial clusters of disease distributions and correlations between disease risk and spatial impact factors. For example, we are interested in exploring potential hotspots of Hepatitis C or potential environment and weather factors that may have correlations with asthma.

Conclusion

Vast amounts of spatio-temporal big data are being increasingly generated and made available in the public health domain. Spatio-temporal analyses could provide new insights and create new forms of value to support community or neighborhood level public health studies. In this paper, we present our preliminary results of spatio-temporal analysis for New York State SPARCS data. We focus on representative case studies for the top ranking diseases by discharge count. Our results provide much refined results on spatio-temporal trends of diseases, and demonstrate consistent spatial clustering patterns. The analysis framework we developed is generic and provides a foundation for advanced SPARCS data analysis at the community and neighborhood level in the future.
  8 in total

1.  Geographic clustering of adult asthma hospitalization and residential exposure to pollution at a United States-Canada border crossing.

Authors:  Tonny J Oyana; Peter Rogerson; Jamson S Lwebuga-Mukasa
Journal:  Am J Public Health       Date:  2004-07       Impact factor: 9.308

2.  A Bayesian spatio-temporal method for disease outbreak detection.

Authors:  Xia Jiang; Gregory F Cooper
Journal:  J Am Med Inform Assoc       Date:  2010 Jul-Aug       Impact factor: 4.497

3.  Mining local climate data to assess spatiotemporal dengue fever epidemic patterns in French Guiana.

Authors:  Claude Flamand; Mickael Fabregue; Sandra Bringay; Vanessa Ardillon; Philippe Quénel; Jean-Claude Desenclos; Maguelonne Teisseire
Journal:  J Am Med Inform Assoc       Date:  2014-02-18       Impact factor: 4.497

4.  Integrative Spatial Data Analytics for Public Health Studies of New York State.

Authors:  Xin Chen; Fusheng Wang
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

5.  Medicine. Spatial turn in health research.

Authors:  Douglas B Richardson; Nora D Volkow; Mei-Po Kwan; Robert M Kaplan; Michael F Goodchild; Robert T Croyle
Journal:  Science       Date:  2013-03-22       Impact factor: 47.728

6.  Using spatial analysis to predict health care use at the local level: a case study of type 2 diabetes medication use and its association with demographic change and socioeconomic status.

Authors:  Aletta Dijkstra; Fanny Janssen; Marinus De Bakker; Jens Bos; René Lub; Leo J G Van Wissen; Eelko Hak
Journal:  PLoS One       Date:  2013-08-30       Impact factor: 3.240

7.  The Spatial Distribution of Hepatitis C Virus Infections and Associated Determinants--An Application of a Geographically Weighted Poisson Regression for Evidence-Based Screening Interventions in Hotspots.

Authors:  Boris Kauhl; Jeanne Heil; Christian J P A Hoebe; Jürgen Schweikart; Thomas Krafft; Nicole H T M Dukers-Muijrers
Journal:  PLoS One       Date:  2015-09-09       Impact factor: 3.240

8.  Spatial trends of breast and prostate cancers in the United States between 2000 and 2005.

Authors:  Rakesh Mandal; Sophie St-Hilaire; John G Kie; DeWayne Derryberry
Journal:  Int J Health Geogr       Date:  2009-09-29       Impact factor: 3.918

  8 in total
  4 in total

1.  Investigating rates of reoperation or postsurgical gastroparesis following fundoplication or paraesophageal hernia repair in New York State.

Authors:  Danni Lu; Maria S Altieri; Jie Yang; Donglei Yin; Nabeel Obeid; Konstantinos Spaniolas; Mark Talamini; Aurora D Pryor
Journal:  Surg Endosc       Date:  2018-11-26       Impact factor: 4.584

2.  Pregnant patients requiring appendectomy: comparison between open and laparoscopic approaches in NY State.

Authors:  Jared Su; Christine A Ward; Abhinay Tumati; Jie Yang; Xiaoyue Zhang; Julie Hong; David Garry; Konstantinos Spaniolas; Mark A Talamini; Aurora D Pryor
Journal:  Surg Endosc       Date:  2020-09-14       Impact factor: 4.584

3.  The incidence of reintervention and reoperation following Heller myotomy across multiple indications.

Authors:  Kelly Ieong; Andrew Brown; Jie Yang; Xiaoyue Zhang; Maria S Altieri; Konstantinos Spaniolas; Aurora D Pryor
Journal:  Surg Endosc       Date:  2021-03-17       Impact factor: 4.584

4.  Mining co-occurrence and sequence patterns from cancer diagnoses in New York State.

Authors:  Yu Wang; Wei Hou; Fusheng Wang
Journal:  PLoS One       Date:  2018-04-26       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.