| Literature DB >> 33267458 |
Song Wang1, Xu Xie1, Kedi Huang1, Junjie Zeng1, Zimin Cai1.
Abstract
Reinforcement learning (RL)-based traffic signal control has been proven to have great potential in alleviating traffic congestion. The state definition, which is a key element in RL-based traffic signal control, plays a vital role. However, the data used for state definition in the literature are either coarse or difficult to measure directly using the prevailing detection systems for signal control. This paper proposes a deep reinforcement learning-based traffic signal control method which uses high-resolution event-based data, aiming to achieve cost-effective and efficient adaptive traffic signal control. High-resolution event-based data, which records the time when each vehicle-detector actuation/de-actuation event occurs, is informative and can be collected directly from vehicle-actuated detectors (e.g., inductive loops) with current technologies. Given the event-based data, deep learning techniques are employed to automatically extract useful features for traffic signal control. The proposed method is benchmarked with two commonly used traffic signal control strategies, i.e., the fixed-time control strategy and the actuated control strategy, and experimental results reveal that the proposed method significantly outperforms the commonly used control strategies.Entities:
Keywords: deep reinforcement learning; double dueling deep Q network; event-based data; high-resolution data; traffic signal control
Year: 2019 PMID: 33267458 PMCID: PMC7515273 DOI: 10.3390/e21080744
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The interaction between the signal controller and the traffic environment.
Figure 2A typical 4-arms intersection. (a) The geometry and detection system at the intersection; (b) The signal phases at the intersection.
Figure 3The illustration of discrete time traffic state encoding. (a) encoding the event data collected by detectors; (b) encoding the signal indication for a lane.
Figure 4The structure of the deep convolutional neural network.
The variable traffic volume (vehicle/hour).
| Time (minute) | EW | NS | |||||
|---|---|---|---|---|---|---|---|
| Right Turn | Through | Left Turn | Right Turn | Through | Left Turn | ||
| 1~15 | 180 | 360 | 240 | 120 | 240 | 160 | |
| 15~30 | 240 | 480 | 320 | 180 | 360 | 240 | |
| 30~45 | 180 | 360 | 240 | 240 | 480 | 320 | |
| 45~60 | 180 | 280 | 320 | 180 | 440 | 160 | |
| 60~75 | 180 | 440 | 160 | 180 | 280 | 320 | |
| 75~90 | 120 | 240 | 160 | 180 | 360 | 240 | |
Hyper-parameters used in the deep reinforcement learning algorithm.
| Hyperparameter | Learning rate | Discount factor | Initial | Final |
| Value | 0.0002 | 0.75 | 1.0 | 0.01 |
| Hyperparameter | Minibatchs size | Replay memory size | Target network update rate | |
| Value | 450,000 | 32 | 100,000 | 0.001 |
Parameters of the benchmarked signal control strategies.
| Phase 1 | Phase 2 | Phase 3 | Phase 4 | ||
|---|---|---|---|---|---|
| Fixed-time | Green time splits (s) | 26 | 23 | 26 | 23 |
| Cycle length (s) | 114 | ||||
| Fully actuated | Minimum green time (s) | 17 | 17 | 17 | 17 |
| Maximum green time (s) | 36 | 32 | 36 | 32 | |
| Unit extension (s) | 3.5 | ||||
| Passage time (s) | 3.4 | ||||
Figure 5Training performance of the event data-based agent against the aggregated data-based agent. (a) Cumulative reward; (b) Cumulative queue length.
Performance comparison of the last 100 training episodes.
| Performance Metrics | EBA | ABA | |||
|---|---|---|---|---|---|
| Mean | Std | Mean | Std | ||
| Cumulative reward | 771.6 | 37.0 | 52.1 | 90.1 | |
| Cumulative queue length (veh) | 137,485.7 | 2849.4 | 209,773.87 | 14,168.4 | |
Figure 6Average green time; (a) Phase 1; (b) Phase 2; (c) Phase 3; (d) Phase 4.
Figure 7Average queue length; (a) Phase 1; (b) Phase 2; (c) Phase 3; (d) Phase 4.
Figure 8The performance of metrics optimized by the proposed agent; (a) Throughput in evaluation simulations; (b) Total delay per vehicle.
Evaluation performance of our agent.
| Performance Metrics | Agent Using Event Data | Fixed-Time | Actuated | Improvement | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Std | Mean | Std | Mean | Std | vs. Fixed-Time | vs. Actuated | ||||
| Vehicular delay (s/veh) | 48.0 | 28.0 | 60.9 | 33.5 | 53.4 | 29.9 | 21.2% | 10.1% | |||
| Queue length (veh) | 26.0 | 7.8 | 31.0 | 8.3 | 36.9 | 10.0 | 29.7% | 16.4% | |||
| Vehicle speed (km/h) | 24.5 | 2.1 | 22.9 | 2.3 | 21.2 | 4.2 | 15.5% | 6.9% | |||
| Number of stops (#/veh) | 0.85 | 0.42 | 0.90 | 0.41 | 0.83 | 0.40 | 5.6% | −2.2% | |||