| Literature DB >> 34883929 |
Abstract
Wireless networking using GHz or THz spectra has encouraged mobile service providers to deploy small cells to improve link quality and cell capacity using mmWave backhaul links. As green networking for less CO2 emission is mandatory to confront global climate change, we need energy efficient network management for such denser small-cell heterogeneous networks (HetNets) that already suffer from observable power consumption. We establish a dual-objective optimization model that minimizes energy consumption by switching off unused small cells while maximizing user throughput, which is a mixed integer linear problem (MILP). Recently, the deep reinforcement learning (DRL) algorithm has been applied to many NP-hard problems of the wireless networking field, such as radio resource allocation, association and power saving, which can induce a near-optimal solution with fast inference time as an online solution. In this paper, we investigate the feasibility of the DRL algorithm for a dual-objective problem, energy efficient routing and throughput maximization, which has not been explored before. We propose a proximal policy (PPO)-based multi-objective algorithm using the actor-critic model that is realized as an optimistic linear support framework in which the PPO algorithm searches for feasible solutions iteratively. Experimental results show that our algorithm can achieve throughput and energy savings comparable to the CPLEX.Entities:
Keywords: deep reinforcement learning; energy saving; wireless backhaul mesh; wireless heterogeneous network
Mesh:
Year: 2021 PMID: 34883929 PMCID: PMC8659752 DOI: 10.3390/s21237925
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Heterogeneous cellular network architecture with mmWave backhaul mesh.
DRL-empowered wireless communication and networking research.
| References | Areas of DRL Studies on Wireless Communications |
|---|---|
| [ | Cognitive radio and dynamic wireless channel selection increase spectral efficiency, which is typically a combinatoric problem of matching channels to nodes. Using DRL, agents can learn the optimal policy from the degree of interference as a reward for every action of channel selection. |
| [ | The wireless link layer provides a media access scheme for multiple users which is realized in a MAC protocol. Several studies design the wireless MAC protocol based on the DRL algorithm, in which DRL agents learn an optimal transmission policy from the reward of contention resolution at a particular channel state. |
| [ | A user association or handover algorithm for a serving base station affects throughput and QoS of each user. The DRL algorithm enables UEs to select an optimal base station based on past experience. |
| [ | Wireless networks have various resources to be scheduled, such as radio block, channels, sequence codes, power, time slots, etc. Many of the scheduling problems have non-convex feasible set and user mobility, which makes the problems intractable. The DRL agents learn an optimal scheduling policy repeatedly from resource utilization against a chosen allocation. |
| [ | Energy and power consumption is critical, especially for green wireless networking, mobile edge cloud networks and UAV networks. The DRL algorithm explores possible policies based on the reward of energy saving while guaranteeing throughput constraint. |
Parameters (P) and variables (V) used in the model.
| Symbol | Description | |
|---|---|---|
|
| Bandwidth for a RB | P |
|
| Maximum capacity of link ( | P |
|
| Maximum AN capacity of eNB | P |
|
| Total energy consumption at node | V |
|
| Flow of UE | V |
|
| Indicator if UE | V |
|
| Set of interference links | P |
|
| Set of links | P |
|
| Set of AN links | P |
|
| Set of BH links | P |
|
| Set of eNB | P |
|
| Set of Macro eNB (MeNB) | P |
|
| Set of Small eNB (SeNB) | P |
|
| Set of UE | P |
|
| Number of antennas (MIMO) for UE | P |
|
| Number of RBs at node | P |
|
| Static power at node | P |
|
| User demand data rate | V |
Parameters used for evaluation.
| MeNB-AN | SeNB-AN | BH Link | |
|---|---|---|---|
| Frequency band (GHz) | 2 | 2.6 | 60 |
| Available BW (MHz) | 20 ( | 20 ( | 1000 (10 × 100 MHz) |
| Antenna gain (dBi) ( | <15 | <15 | 36 |
|
| 4 (MIMO 4 × 4) | 4 (MIMO 4 × 4) | 1 for each active BH link |
| 130 | 6.8 | 3.9 | |
| 20 | 0.13 | 0.224 | |
|
| 4.7 | 4.0 | not used |
| Distance-dep. Path Loss | Equations (6)–(11) in [ |
Figure 2System architecture of the PPO-based deep OLS learning in mmWave HetNet.
Figure 3Experimental HetNet topology.
Training hyperparameters.
| Parameter | Value | Parameter | Value |
|---|---|---|---|
|
| 0.8 |
| 0.8 |
| Trajectory size | 1024 | Batch size | 32 |
| K epoch | 10 | Clipping range | 0.2 |
| Learning rate of actor | 1 × | Learning rate of critic | 1 × |
| Network initialization | HE | Optimization method |
|
Figure 4Performance evaluation of PPO according to learning rate. (a) Reward convergence. (b) Value loss. (c) Policy loss.
Figure 5Performance evaluation of PPO according to reward weight (). (a) Scalarized reward. (b) Energy savings. (c) Average data rate per UE.
Figure 6Performance evaluation of PDOLS according to UE’s demand rate. (a) Average data rate per UE. (b) Number of active SeNBs.
Figure 7Performance evaluation of PDOLS according to the number of distributed UEs. (a) Average data rate per UE. (b) Number of active SeNBs.
Figure 8Performance comparison of proposed algorithms. (a) Average UE data rate and energy consumption. (b) Reward weight vector exploration.