| Literature DB >> 35710775 |
Sanghyu Nam1, Mi Young Shin2,3, Jung Yeob Han1, Su Young Moon1, Jae Yong Kim1, Hungwon Tchah1, Hun Lee4.
Abstract
This study investigated how changes in weather factors affect the prevalence of conjunctivitis using public big data in South Korea. A total of 1,428 public big data entries from January 2013 to December 2019 were collected. Disease data and basic climate/air pollutant concentration records were collected from nationally provided big data. Meteorological factors affecting eye diseases were identified using multiple linear regression and machine learning analysis methods such as extreme gradient boosting (XGBoost), decision tree, and random forest. The prediction model with the best performance was XGBoost (1.180), followed by multiple regression (1.195), random forest (1.206), and decision tree (1.544) when using root mean square error (RMSE) values. With the XGBoost model, province was the most important variable (0.352), followed by month (0.289) and carbon monoxide exposure (0.133). Other air pollutants including sulfur dioxide, PM10, nitrogen dioxides, and ozone showed low associations with conjunctivitis. We identified factors associated with conjunctivitis using traditional multiple regression analysis and machine learning techniques. Regional factors were important for the prevalence of conjunctivitis as well as the atmosphere and air quality factors.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35710775 PMCID: PMC9203752 DOI: 10.1038/s41598-022-13344-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Prevalence of conjunctivitis. (a) Prevalence of conjunctivitis by year (number of patients per 1,000 people). The number of patients increased from 2013 to 2019. (b) Prevalence by each province.
Figure 2Prevalence of conjunctivitis and weather parameters by month in each region. (a) Prevalence of conjunctivitis, (b) mean temperature, (c) mean daily temperature difference, and (d) mean wind speed.
Figure 3Air quality parameters by month in each region. (a) concentration of sulfur dioxide, (b) concentration of nitrogen dioxide, (c) concentration of carbon monoxide, (d) concentration of PM10, and (e) concentration of ozone.
Correlation coefficients of variables using correlation analysis between prevalence and temperature or air quality parameters. Average temperature, humidity, precipitation, and ozone showed positive correlation, daily temperature difference, average wind speed, sulfur dioxide, nitrogen dioxide, carbon monoxide and PM10 showed negative correlation.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1. Regional Prevalence | - | 0.04 | - | - | - | - | |||||
| 2. Temperature | - | - | - | - | - | - | |||||
| 3. Daily temperature difference | - | - | - | -0.01 | |||||||
| 4. Humidity | - | - | - | - | - | - | |||||
| 5. Precipitation | - | 0.01 | - | - | - | - | |||||
| 6. Wind speed | 0.04 | - | - | 0.01 | -0.04 | - | |||||
| 7. SO2 | - | - | -0.01 | - | - | - | |||||
| 8. NO2 | - | - | - | -0.04 | - | ||||||
| 9. O3 | - | - | - | 0.00 | |||||||
| 10. CO | - | - | - | - | - | - | |||||
| 11. PM10 | - | - | - | - | 0.00 |
SO2, sulfur dioxide; NO2, nitrogen dioxide; O3, Ozone; CO, carbon monoxide; PM10, levels of particulate matter with aerodynamic diameter < 10 μm.
Significant values are in bold.
Comparison of modeling techniques on root mean square error values.
| Model | RMSE |
|---|---|
| Multiple linear regression | 1.195 |
| XGBoost | 1.180 |
| Random forest | 1.206 |
| Decision tree | 1.544 |
RMSE, root mean square error.
Figure 4Predicted prevalence to actual prevalence for each model. The XGBoost model shows the most accurate prediction model and the decision tree model shows the least accurate prediction.
Variables of importance in the XGBoost prediction model.
| Variable | Gain |
|---|---|
| Province (local) | 0.352 |
| Month | 0.289 |
| CO | 0.133 |
| Temperature | 0.060 |
| Humidity | 0.030 |
| Wind speed | 0.030 |
| Precipitation | 0.022 |
| Temperature fluctuation | 0.021 |
| O3 | 0.019 |
| PM10 | 0.017 |
| SO2 | 0.016 |
| NO2 | 0.013 |
CO, carbon monoxide; O3, Ozone; PM10, levels of particulate matter with aerodynamic diameter < 10 μm; SO2, sulfur dioxide; NO2, nitrogen dioxide.
Basic variables from government-provided big data.
| Variable | Description | Data Source |
|---|---|---|
| Province* | 17 provinces (Si, Do) | KOSIS |
| Population | Population by province | KOSIS |
| Temperature (℃) | Mean temperature by province, round the number to 2 places | KMA |
| Highest temperature (℃) | Mean highest temperature by province, round the number to 2 places | KMA |
| Lowest temperature (℃) | Mean lowest temperature by province, round the number to 2 places | KMA |
| Temperature difference (℃) | Mean daily temperature difference by province, round the number to 2 places | KMA |
| Humidity (%) | Mean relative humidity by province, round the number to 2 places | KMA |
| Precipitation (mm) | Monthly total precipitation by province, round the number to 2 places | KMA |
| Wind speed (m/s) | Mean wind speed by province, round the number to 2 places | KMA |
| SO2 (ppm) | Concentration of SO2 by province, round the number to 4 places | Air Korea |
| NO2 (ppm) | Concentration of NO2 by province, round the number to 4 places | Air Korea |
| O3 (ppm) | Concentration of O3 by province, round the number to 4 places | Air Korea |
| CO (ppm) | Concentration of CO by province, round the number to 4 places | Air Korea |
| PM10 (μg/m3) | Concentration of PM10 by province | Air Korea |
SO2, sulfur dioxide; NO2, nitrogen dioxide; O3, Ozone; CO, carbon monoxide; PM10, levels of particulate matter with aerodynamic diameter < 10 μm; KOSIS, Korean Statistical Information Service; KMA, Korea Meteorological Administration.