Massimo Stafoggia1, Giorgio Cattani2, Carla Ancona1, Antonio Gasparrini3, Andrea Ranzi4. 1. Department of Epidemiology, Lazio Regional Health Service/ASL Roma 1, Rome, Italy. 2. Institute for Environmental Protection and Research, Rome, Italy. 3. Department of Public Health, Environments and Society, London School of Hygiene & Tropical Medicine, London, UK. 4. Environmental Health Reference Centre, Regional Agency for Environmental Prevention of Emilia-Romagna, Modena, Italy.
Yu et al.[1] applied an innovative methodology, deep ensemble machine learning, to estimate daily concentrations of ambient particulate matter with an aerodynamic diameter of () in 2015–2019 at spatial resolution in Italy. To do so, they trained multiple prediction models on concentrations measured at 133 monitoring stations.We recently published similar studies in Italy using alternative methods, including mixed-effects models,[2] random forests,[3,4] ensemble techniques,[5] and Bayesian approaches.[6] We believe that Yu et al.[1] did not adequately consider critical questions, negatively affecting the validity and interpretation of their results.First, the set of monitoring stations measuring fine particles in Italy during 2015–2019 is much larger than the one used by Yu et al. (Figure S2 in their appendix[1]). Figure 1 shows the 289 monitors measuring during a slightly different period (2016–2019) that used in our recent publication.[4] The comparison of the two maps shows that Yu et al. did not include stations in Southern Italy or in the two main islands, Sicily and Sardinia. These areas have unique geoclimatic conditions and source profiles of ambient concentrations, with a mixture of anthropogenic emissions from large industrial plants and heavily urbanized areas, coupled with natural sources such as sea salt, desert dust from North Africa, forest fires and volcanic emissions from Mount Etna.[7] Such complexity is extremely difficult to capture with any empirical predictive model.[2]
Figure 1.
Map of the 289 monitoring sites available in Italy during 2016–2019 .
Second, the cross-validated coefficient of determination and root mean square errors reported by Yu et al. cannot be compared with those previously published,[2-6] because they are based on a small number of monitors selected in areas where a higher performance of spatiotemporal prediction models has been previously documented.[4]Third, we are surprised not to find several key predictors of spatial (e.g., road network, impervious surfaces, industrial sites) or spatiotemporal (e.g., desert dust episodes, outputs from atmospheric dispersion models, planetary boundary layer, vegetation indices) variability of . Such predictors were used in previous applications[2-6] and allowed the capture, at least partially, of the geoclimatic complexity of southern regions.In conclusion, we consider the methodological effort from Yu et al. a valid contribution to the literature. However, because their model represents only 6 of 20 regions contributing sufficient data, we question the use of their estimates for later epidemiological studies using Italian data.Map of the 289 monitoring sites available in Italy during 2016–2019 .
Authors: Alexandra Shtein; Itai Kloog; Joel Schwartz; Camillo Silibello; Paola Michelozzi; Claudio Gariazzo; Giovanni Viegi; Francesco Forastiere; Arnon Karnieli; Allan C Just; Massimo Stafoggia Journal: Environ Sci Technol Date: 2019-12-10 Impact factor: 9.028