| Literature DB >> 33301451 |
Marcela Bergamini1, Pedro Henrique Iora2, Thiago Augusto Hernandes Rocha3, Yolande Pokam Tchuisseu3, Amanda de Carvalho Dutra1, João Felipe Herman Costa Scheidt2, Oscar Kenji Nihei4, Maria Dalva de Barros Carvalho1, Catherine Ann Staton3,5, João Ricardo Nickenig Vissoci3,5, Luciano de Andrade1,2.
Abstract
Cardiovascular diseases are the leading cause of deaths globally. Machine learning studies predicting mortality rates for ischemic heart disease (IHD) at the municipal level are very limited. The goal of this paper was to create and validate a Heart Health Care Index (HHCI) to predict risk of IHD based on location and risk factors. Secondary data, geographical information system (GIS) and machine learning were used to validate the HHCI and stratify the IHD municipality risk in the state of Paraná. A positive spatial autocorrelation was found (Moran's I = 0.6472, p-value = 0.001), showing clusters of high IHD mortality. The Support Vector Machine, which had an RMSE of 0.789 and error proportion close to one (0.867), was the best for prediction among eight machine learning algorithms after validation. In the north and northwest regions of the state, HHCI was low and mortality clusters patterns were high. By creating an HHCI through ML, we can predict IHD mortality rate at municipal level, identifying predictive characteristics that impact health conditions of these localities' guided health management decisions for improvements for IHD within the emergency care network in the state of Paraná.Entities:
Year: 2020 PMID: 33301451 PMCID: PMC7728276 DOI: 10.1371/journal.pone.0243558
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Representative flowchart of the main machine learning model development stages.
1) Variables pre-processing (correlation and variation tests); 2) Predictive variables cross-validation; 3) Machine learning models discrimination and validation (internal and external validation); 4) Heart Health Care Index (HHCI) calculation and mapping.
Fig 2Spatial exploratory analysis of municipalities’ IHD mortality rates.
A—Distribution of spatially IHD mortality rates observed in the state of Paraná from 2009–2015 in quantiles by municipality. B—Local Indicators of Spatial Association (LISA) of IHD mortality rates by municipality.
Calibration of tested models ordered according to the performance indicator (RMSEs and proportionality between the RMSEs).
| MODELS | RMSEs | PROPORTIONALITY | |||
|---|---|---|---|---|---|
| 2009–2014 | 2014 | 2015 | 2014 | 2015 | |
| XGB | 0.299 | 0.835 | 0.877 | 0.358 | 0.341 |
| RF | 0.308 | 0.832 | 0.894 | 0.37 | 0.345 |
| BRNN | 0.491 | 0.945 | 1.059 | 0.52 | 0.464 |
| Tree Bag | 0.551 | 0.871 | 0.871 | 0.633 | 0.633 |
| KNN | 0.842 | 0.915 | 0.916 | 0.92 | 0.919 |
BRNN–Bayesian Naive Neural Network; KNN–K-Nearest Neighbors Algorithm; PCR–Principal Component Regression Algorithm; PLS–Partial Least Squares with Wide Kernel Algorithm; RF–Random Forest; SVM–Support Vector Machine; TreeBag; XGB–xbgDART/RMSE–Root Mean Square Error of Prediction.
Fig 3Calibration graphs of the tested predictive models (adjustments using RMSE).
A- Example of underfitting calibration model with the worst adjustment (K-Nearest Neighbors); B- Example of overfitting calibration model (Random Forest); C: Example of best fit model (Support Vector Machine). Blue represents the observed mortality rate and red represents the predicted one.
Fig 4RMSE distribution representation.
A—PCR model (2015) B—SVM model (2015).
Fig 5Distribution of the generated municipalities index.
1 lower risk and 0 higher risk.