| Literature DB >> 32917004 |
Abstract
The activity pattern is a significant factor in identifying hotspots of personal exposure to air pollutants, such as PM2.5. However, the recording process of an activity pattern can be annoying to study participants, because they are often asked to bring a diary or a tracking recorder to write or validate their activity patterns when they change their activity profiles. Furthermore, the accuracy of the records of activity patterns can be lower, because people can mistakenly record them. Thus, this paper proposes an idea to overcome these problems and make the whole data-collection process easier and more reliable. Our idea was based on transforming training data using the statistical properties of the children's personal exposure level to PM2.5, temperature, and relative humidity and applying the properties to a decision tree algorithm for classification of activity patterns. From our final machine-learning modeling processes, we observed that the accuracy for activity-pattern classification was more than 90% in both the training and test data. We believe that our methodology can be used effectively in data-collection tasks and alleviate the annoyance that study participants may feel.Entities:
Keywords: PM2.5; activity-pattern analysis; environmental data; machine learning
Mesh:
Substances:
Year: 2020 PMID: 32917004 PMCID: PMC7559092 DOI: 10.3390/ijerph17186573
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Snapshot of training data before (on the left-hand side) and after (on the right-hand side) the data transformation. The median (new_pm_med) and the maximum (new_pm_max) values of PM2.5 were used as features.
Figure 2Flow chart for the estimation of activity patterns.
Experimental setup.
| No. | Contents | Details |
|---|---|---|
| 1 | No. of activity patterns | 9 |
| 2 | No. of features | 4 |
| 3 | No. of observations | 142,654 |
| 4 | Features to use | Raw PM2.5, median and max of PM2.5, temperature, humidity |
| 5 | Type of classifier | decision tree |
| 6 | Training to test data ratio | 8:2 (with replacement) |
| 7 | R packages | Stringr, dplyr, party |
Figure 3This is a PM2.5 boxplot drawn in logarithmic scale for each activity patterns for the entire dataset. As shown in the figure, the PM2.5 concentrations for each activity patterns are quite different statistically, especially in terms of maximum values by eyeballing the figure. We chose the median and the maximum values as features for classification.
Figure 4These boxplots show ranges of temperature and humidity values. They typically range from 0 to less than 100, and the values are significantly lower than those for PM2.5.
Confusion table for training data simulation using raw dataset.
| Predicted | Actual | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bus | Car | Commercial Building | Cooking | Education Building | Indoor-House | Outdoor | Restaurant | Walking | Total | |
| Bus | 89 | 1 | 19 | 0 | 7 | 27 | 0 | 0 | 4 | 147 |
| Car | 1 | 540 | 114 | 1 | 62 | 122 | 0 | 6 | 4 | 850 |
| Commercial | 54 | 115 | 3194 | 8 | 243 | 986 | 0 | 3 | 149 | 4752 |
| Cooking | 0 | 1 | 5 | 184 | 19 | 74 | 1 | 0 | 3 | 287 |
| Education | 69 | 397 | 291 | 33 | 9954 | 3216 | 25 | 4 | 249 | 14,238 |
| Indoor-house | 310 | 1036 | 2301 | 253 | 6687 | 80,757 | 216 | 106 | 1119 | 92,785 |
| Outdoor | 0 | 1 | 7 | 0 | 7 | 30 | 39 | 0 | 3 | 87 |
| Restaurant | 0 | 1 | 20 | 0 | 0 | 26 | 0 | 219 | 4 | 270 |
| Walking | 13 | 6 | 34 | 0 | 56 | 251 | 0 | 0 | 529 | 889 |
| Total | 536 | 2098 | 5985 | 479 | 17,035 | 85,489 | 281 | 338 | 2064 | 114,305 |
Confusion table for test data simulation using raw dataset.
| Predicted | Actual | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bus | Car | Commercial Building | Cooking | Education Building | Indoor-House | Outdoor | Restaurant | Walking | Total | |
| Bus | 13 | 0 | 0 | 0 | 6 | 15 | 0 | 0 | 0 | 34 |
| Car | 1 | 109 | 43 | 0 | 17 | 53 | 0 | 1 | 0 | 224 |
| Commercial | 9 | 28 | 732 | 6 | 67 | 258 | 0 | 2 | 30 | 1132 |
| Cooking | 0 | 0 | 1 | 31 | 13 | 25 | 0 | 0 | 0 | 70 |
| Education building | 16 | 89 | 94 | 8 | 2300 | 908 | 3 | 1 | 48 | 3467 |
| Indoor-house | 94 | 286 | 606 | 75 | 1835 | 19,905 | 53 | 25 | 254 | 23,133 |
| Outdoor | 0 | 0 | 3 | 0 | 3 | 7 | 11 | 0 | 0 | 24 |
| Restaurant | 0 | 0 | 7 | 0 | 0 | 12 | 0 | 50 | 0 | 69 |
| Walking | 10 | 1 | 11 | 0 | 11 | 64 | 0 | 0 | 99 | 196 |
|
| 143 | 513 | 1497 | 120 | 4252 | 21,247 | 67 | 79 | 431 | 28,349 |
Confusion table for training data simulation using statistical dataset.
| Predicted | Actual | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bus | Car | Commercial Building | Cooking | Education Building | Indoor-House | Outdoor | Restaurant | Walking | Total | |
| Bus | 536 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 536 |
| Car | 0 | 2098 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2098 |
| Commercial | 0 | 0 | 5985 | 0 | 0 | 1 | 0 | 0 | 0 | 5986 |
| Cooking | 0 | 0 | 0 | 479 | 0 | 0 | 0 | 0 | 0 | 479 |
| Education building | 0 | 0 | 0 | 0 | 17,035 | 0 | 0 | 0 | 0 | 17,035 |
| Indoor-house | 0 | 0 | 0 | 0 | 0 | 85,488 | 0 | 0 | 0 | 85,488 |
| Outdoor | 0 | 0 | 0 | 0 | 0 | 0 | 281 | 0 | 0 | 281 |
| Restaurant | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 338 | 0 | 338 |
| Walking | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2064 | 2064 |
| Total | 536 | 2098 | 5985 | 479 | 17,035 | 85,489 | 281 | 338 | 2064 | 114,305 |
Confusion table for test data simulation using statistical dataset.
| Predicted | Actual | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Bus | Car | Commercial Building | Cooking | Education Building | Indoor-House | Outdoor | Restaurant | Walking | Total | |
| Bus | 142 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 142 |
| Car | 0 | 513 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 513 |
| Commercial | 0 | 0 | 1497 | 0 | 0 | 2 | 0 | 0 | 0 | 1499 |
| Cooking | 0 | 0 | 0 | 120 | 0 | 0 | 0 | 0 | 0 | 120 |
| Education building | 0 | 0 | 0 | 0 | 4252 | 0 | 0 | 0 | 0 | 4252 |
| Indoor-house | 1 | 0 | 0 | 0 | 0 | 21,245 | 0 | 0 | 0 | 21,246 |
| Outdoor | 0 | 0 | 0 | 0 | 0 | 0 | 67 | 0 | 0 | 67 |
| Restaurant | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 79 | 0 | 79 |
| Walking | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 431 | 431 |
| Total | 143 | 513 | 1497 | 120 | 4252 | 21,247 | 67 | 79 | 431 | 28,349 |