| Literature DB >> 24795875 |
Katherine Ellis1, Suneeta Godbole2, Simon Marshall2, Gert Lanckriet1, John Staudenmayer3, Jacqueline Kerr2.
Abstract
BACKGROUND: Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data.Entities:
Keywords: physical activity; random forest
Year: 2014 PMID: 24795875 PMCID: PMC4001067 DOI: 10.3389/fpubh.2014.00036
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Prescribed trip parameters for data collection.
| Condition | Description | Number of trips |
|---|---|---|
| 511 (total) | ||
| Urban canyon | Downtown areas with high rise buildings that interfere with GPS signal | 259 |
| Open space | Areas without high rise buildings where GPS signal connectivity is high | 252 |
| Continuous | A continuous connection between transportation modes, e.g., stop car and passenger started walking immediately. Most naturally occurring trip transitions are continuous | 142 |
| Pause | A 2-min pause between transportation modes. Pauses enable trip ends to be detected more easily | 192 |
| Indoor/outdoor transition | Stationary periods indoors and outdoors were tested, as well as transitions between indoors and outdoors including transitions every 30 s from indoor to outdoor environments. The Qstarz device allows collection of satellite ratios, which can help to detect indoor vs. outdoor locations. | 22 |
| Full/partial signal buildings | Single story buildings with large windows, wooden roofs, and open courtyards | 12 |
| Blocked signal | Multistory buildings, underground garages | 12 |
Minutes of data collected for each transportation mode.
| Minutes of data | Percent of total (%) | |
|---|---|---|
| Bike | 857.5 | 10 |
| Bus | 632.3 | 7 |
| Car | 2063.0 | 23 |
| Sit | 849.5 | 9 |
| Stand | 1631.3 | 18 |
| Walk | 2490.3 | 28 |
| Unclassified | 464.0 | 5 |
| Total | 8987.8 | 100 |
Figure 1The classification pipeline. (1) We started from raw sensor data, which was split into 1-min windows. (2) Features were extracted from each window of data. (3) Then the features from each window were classified into transportation modes.
Figure 2Average roll, pitch, and yaw angles.
Performance results for various classifiers (without output filtering).
| Overall accuracy (%) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Bike | Bus | Car | Sit | Stand | Walk | Average | ||
| 0.924 | 0.585 | 0.855 | 0.682 | 0.829 | 0.955 | 0.805 | 86.2 | |
| Naïve Bayes | 0.872 | 0.220 | 0.824 | 0.503 | 0.484 | 0.920 | 0.637 | 74.2 |
| SVM | 0.962 | 0.609 | 0.884 | 0.724 | 0.833 | 0.954 | 0.828 | 87.7 |
| Decision tree | 0.922 | 0.537 | 0.846 | 0.674 | 0.792 | 0.936 | 0.785 | 83.6 |
| Random forest | 0.971 | 0.601 | 0.888 | 0.778 | 0.855 | 0.962 | 0.843 | 89.8 |
Precision (.
| Bike | Bus | Car | Sit | Stand | Walk | Average | Overall accuracy (%) | ||
|---|---|---|---|---|---|---|---|---|---|
| No output filter | 0.982 | 0.795 | 0.880 | 0.779 | 0.844 | 0.968 | 0.874 | 89.8 | |
| 0.979 | 0.545 | 0.934 | 0.832 | 0.880 | 0.945 | 0.853 | |||
| Output filter | 0.985 | 0.860 | 0.910 | 0.807 | 0.878 | 0.962 | 0.900 | 91.9 | |
| 0.976 | 0.701 | 0.952 | 0.821 | 0.871 | 0.970 | 0.882 |
Figure 3Example output for 1 day of data. We plot the activity mode vs. time for 1 day of data. In the top plot, we show the activity mode predicted by the random forest algorithm. In the middle plot, we show the smoothed predictions output by the moving average filter. In the bottom plot, we show the ground truth annotations for this day. Minutes in black were correctly classified, minutes in red were misclassified, and minutes in blue had no ground truth annotation with which to compare.
Confusion matrix for the random forest classifier with output filtering.
| Bike | Bus | Car | Sit | Stand | Walk | |
|---|---|---|---|---|---|---|
| Bike | 1526 | 18 | 4 | 0 | 8 | 3 |
| Bus | 2 | 611 | 409 | 54 | 34 | 11 |
| Car | 2 | 127 | 3563 | 42 | 69 | 13 |
| Sit | 0 | 1 | 44 | 1232 | 186 | 17 |
| Stand | 5 | 8 | 8 | 228 | 2546 | 97 |
| Walk | 19 | 4 | 22 | 26 | 174 | 4228 |
Rows represent number of examples of true activities; columns represent number of examples of predicted activities. Entries along the diagonal indicate correct predictions.
Top 15 most informative features.
| Score | |
|---|---|
| Standard deviation | 0.251 |
| Average speed | 0.147 |
| Net distance covered | 0.085 |
| Power at dominant frequency | 0.082 |
| Autocorrelation | 0.061 |
| Average yaw | 0.044 |
| Average roll | 0.039 |
| Minimum | 0.034 |
| FFT 4 Hz | 0.029 |
| FFT 3 Hz | 0.022 |
| Correlation between | 0.018 |
| Maximum | 0.018 |
| 25th Percentile | 0.013 |
| Total power | 0.013 |
| Average SNR used | 0.013 |
We computed an importance score for each feature and ranked the features according to this score