| Literature DB >> 22082316 |
Jun Wu1, Chengsheng Jiang, Douglas Houston, Dean Baker, Ralph Delfino.
Abstract
BACKGROUND: Air pollution epidemiological studies are increasingly using global positioning system (GPS) to collect time-location data because they offer continuous tracking, high temporal resolution, and minimum reporting burden for participants. However, substantial uncertainties in the processing and classifying of raw GPS data create challenges for reliably characterizing time activity patterns. We developed and evaluated models to classify people's major time activity patterns from continuous GPS tracking data.Entities:
Mesh:
Year: 2011 PMID: 22082316 PMCID: PMC3256108 DOI: 10.1186/1476-069X-10-101
Source DB: PubMed Journal: Environ Health ISSN: 1476-069X Impact factor: 5.984
Figure 1Overall flow chart.
Figure 2Identify static clusters.
Figure 3Identify periods of movement.
Figure 4Differentiate outdoor walking from in-vehicle travel.
Figure 5Differentiate indoor from outdoor static points. aWe found that 99.5% of outdoor static clusters lasted less than 2 hours. bAbout 2% (number) of outdoor static clusters (accounted for 21% of the total outdoor static time) lasted 1-2 hours. cWe found 77% of all the outdoor static clusters that satisfied the upper level rules (the rules above this criterion) lasted less than 5 minutes. dA home location was detected if the cluster included 12:00 AM or lasted for more than 24 hours assuming that the participants were at their home under such conditions. eThe spatial accuracy was 10 m for the GPS device without Wide Area Augmentation System. Thus the GPS points within 10 m of home are more likely indoor points. f We used 50 m as an approximate size of an apartment. gThe 2nd lowest speed of all the outdoor static cluster was 0 and 5% (number) of the indoor static cluster (accounted for approximately 15% of the total indoor time) was more than 0.1 km/h. hAbout 64% of the indoor clusters satisfied this rule with only 2 of 13 outdoor clusters being misclassified.
Time activity classification using the rule-based model
| Indoor (modeled) | Outdoor static (modeled) | Outdoor walking (modeled) | In-vehicle travel (modeled) | Sensitivity | Specificity | Precision | ||
|---|---|---|---|---|---|---|---|---|
| Model evaluation against the HCTLS data | Indoor (coded) | 284830 | 9840 | 1002 | 1362 | 84.1% | 81.9% | 95.9% |
| Outdoor static (coded) | 51991 | 12901 | 2336 | 5994 | 51.7% | 84.2% | 17.6% | |
| Outdoor walking (coded) | 1102 | 1145 | 9096 | 827 | 68.4% | 99.2% | 74.7% | |
| In-vehicle travel (coded) | 778 | 1073 | 857 | 21127 | 72.1% | 99.3% | 88.6% | |
| Model evaluation against the supplemental UCI data | Indoor (coded) | 103930 | 5430 | 134 | 144 | 94.8% | 82.6% | 98.2% |
| Outdoor static (coded) | 1558 | 2298 | 169 | 214 | 54.2% | 94.4% | 26.0% | |
| Outdoor walking (coded) | 284 | 764 | 646 | 193 | 34.2% | 99.6% | 57.4% | |
| In-vehicle travel (coded) | 114 | 336 | 176 | 4512 | 87.8% | 99.5% | 89.1% | |
The rule-based model was developed based on logic and the summary statistics of manually-classified time activity classifications of HCTLS data.
Sensitivity was calculated as true positive estimation/(true positive estimation + false negative estimation).
Specificity was calculated as true negative estimation/(true negative estimation + false positive estimation).
Precision was calculated as true positive estimation/(true positive estimation + false positive estimation).
Time activity classification using the random forest models
| Indoor (modeled) | Outdoor static (modeled) | Outdoor walking (modeled) | In-vehicle travel (modeled) | Sensitivity | Specificity | Precision | |||
|---|---|---|---|---|---|---|---|---|---|
| HCTLS random forest model | 10-fold cross validation | Indoor (coded) | 21959 | 6177 | 1604 | 260 | 73.2% | 81.8% | 64.1% |
| Outdoor static (coded) | 9777 | 9749 | 3596 | 1830 | 39.1% | 81.7% | 42.3% | ||
| Outdoor walking (coded) | 1052 | 1820 | 9931 | 477 | 74.8% | 92.9% | 62.5% | ||
| In-vehicle travel (coded) | 1475 | 5307 | 750 | 21762 | 74.3% | 96.2% | 89.4% | ||
| Evaluation against the full UCI dataset | Indoor (coded) | 82874 | 26278 | 170 | 314 | 75.6% | 91.6% | 98.9% | |
| Outdoor static (coded) | 850 | 3137 | 35 | 214 | 74.1% | 76.2% | 10.1% | ||
| Outdoor walking (coded) | 101 | 993 | 555 | 244 | 29.3% | 99.8% | 67.7% | ||
| In-vehicle travel (coded) | 0 | 515 | 60 | 4562 | 88.8% | 99.3% | 85.5% | ||
| UCI random forest model | 10-fold cross validation | Indoor (coded) | 3978 | 860 | 120 | 42 | 79.6% | 99.0% | 97.3% |
| Outdoor static (coded) | 109 | 3285 | 471 | 371 | 77.5% | 90.2% | 73.6% | ||
| Outdoor walking (coded) | 0 | 170 | 1313 | 410 | 69.4% | 94.4% | 62.1% | ||
| In-vehicle travel (coded) | 0 | 146 | 210 | 4781 | 93.1% | 92.6% | 85.3% | ||
| Evaluation against the full the full HCTLS dataset | Indoor (coded) | 153216 | 54796 | 128829 | 1894 | 45.2% | 92.8% | 96.9% | |
| Outdoor static (coded) | 3519 | 6590 | 13015 | 1828 | 26.4% | 84.5% | 10.0% | ||
| Outdoor walking (coded) | 320 | 725 | 11840 | 395 | 89.2% | 63.0% | 7.5% | ||
| In-vehicle travel (coded) | 999 | 3464 | 3550 | 21281 | 72.6% | 98.9% | 83.8% | ||
The results reported here came from random forest models with 10 trees and a maximum depth of 3 for each tree.
Sensitivity was calculated as true positive estimation/(true positive estimation + false negative estimation).
Specificity was calculated as true negative estimation/(true negative estimation + false positive estimation).
Precision was calculated as true positive estimation/(true positive estimation + false positive estimation).
The model was developed based on all outdoor static, outdoor walking, in-vehicle travel and randomly selected 30,000 indoor points
The reported validation results were the averages from repeated 10-fold cross validation.
The model was developed based on all outdoor static, outdoor walking, in-vehicle travel and randomly selected 5,000 indoor points from the supplemental UCI data.