| Literature DB >> 35459015 |
Johanna Andrea Hurtado Sánchez1, Katherine Casilimas1, Oscar Mauricio Caicedo Rendon1.
Abstract
Network Slicing and Deep Reinforcement Learning (DRL) are vital enablers for achieving 5G and 6G networks. A 5G/6G network can comprise various network slices from unique or multiple tenants. Network providers need to perform intelligent and efficient resource management to offer slices that meet the quality of service and quality of experience requirements of 5G/6G use cases. Resource management is far from being a straightforward task. This task demands complex and dynamic mechanisms to control admission and allocate, schedule, and orchestrate resources. Intelligent and effective resource management needs to predict the services' demand coming from tenants (each tenant with multiple network slice requests) and achieve autonomous behavior of slices. This paper identifies the relevant phases for resource management in network slicing and analyzes approaches using reinforcement learning (RL) and DRL algorithms for realizing each phase autonomously. We analyze the approaches according to the optimization objective, the network focus (core, radio access, edge, and end-to-end network), the space of states, the space of actions, the algorithms, the structure of deep neural networks, the exploration-exploitation method, and the use cases (or vertical applications). We also provide research directions related to RL/DRL-based network slice resource management.Entities:
Keywords: admission control; deep reinforcement learning; network slicing; resource allocation; resource orchestration; resource scheduling
Mesh:
Year: 2022 PMID: 35459015 PMCID: PMC9032530 DOI: 10.3390/s22083031
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Admission control based on RL and DRL.
| Ref. | Algorithm | Focus | Optimization Objective | Explore-Exploit | NN Structure | Use Case/Vertical App | Training | Dataset | Development |
|---|---|---|---|---|---|---|---|---|---|
|
[ | N3AC | RAN | Meet service guarantees while maximizing profit |
| FNN | Elastic and inelastic * | Emulation (Keras-TensorFlow) | ||
| [ | SARSA | E2E (RAN, TN, CN, Edge) | Maximize revenue while minimizing dropping probability of NSLRs |
| Non Apply | QoS and best effort slices * | Simulation (Undeclared tool) | ||
| [ | DQN | RAN & TN | Maximize revenue while minimizing slice degradation | Undeclared | FNN | High and low priority * | Emulation (Python-NetworkX) | ||
| [ | DQN | RAN | Maximize revenue while minimizing costs related to SLA violations |
| Target NN, Online NN | eMBB, uRLLC, and mMTC | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | Q-learning R-learning | CN | Maximize long-term average profit |
| Non Apply | Undeclared | Simulation (Undeclared tool) | ||
| [ | DQN | RAN | Enhance resource utilization and slices isolation |
| Target NN, Online NN, replay memory, and mini-batch | Best effort, constant bit-rate, and minimum bit-rate | Simulation (Undeclared tool) | ||
| [ | Q-learning DQN | RAN | Achieve a trade-off between the blocking and dropping probability of service requests |
| Target NN and Online NN | Drop-sensitive and best-effort * | Simulation (3D Urban Macro—available [ |
*: non-5G/6G terminology is used for the use case or vertical application.
Resource allocation based on RL.
| Ref. | Algorithm | Focus | Optimization Objective | Explore-Exploit | Use Case/Vertical App | Training | Dataset | Development |
|---|---|---|---|---|---|---|---|---|
| [ | Q-learning | RAN | Maximize resource utilization while meeting haptic communication performance requirement |
| Haptic | Centralized | Simulation (Undeclared tool) | |
| [ | Q-learning, SARSA, Expected SARSA, & Monte Carlo | RAN | Guarantee efficient resource utilization while meeting low-latency requirements |
| IoT | Centralized | Simulation (Undeclared tool) | |
| [ | Q-learning | RAN | Minimize end-to-end latency and maximize computing resource utilization | Undeclared | mMTC | Centralized | Simulation (5G K-SimNet) | |
| [ | Q-learning | RAN | Maximize profit and QoS satisfaction |
| Undeclared | Centralized | Synthetic | Emulation (Mininet) |
| [ | Multiagent PPO | E2E (RAN, TN, CN, Edge) | Maximize resource efficiency while meeting QoS |
| Undeclared | Distributed | Emulation (Python-Pytorch) | |
| [ | Q-learning | RAN | Maximize resource utilization |
| V2X | Centralized | Simulation (MATLAB) | |
| [ | Monte Carlo & Q-learning | Edge | Maximize social welfare / Maximize power allocation |
| Undeclared | Centralized | Simulation (Undeclared tool) | |
| [ | Q-learning | RAN | Optimize latency, energy consumption, and cost | Undeclared | mMTC | Centralized | Simulation (Undeclared tool) | |
| [ | Multiagent Q-learning | RAN | Maximize profit while meeting end-to-end delay |
| Undeclared | Distributed | Simulation (Undeclared tool) |
Resource allocation based on DRL.
| Ref. | Algorithm | Focus | Optimization Objective | Explore-Exploit | NN Structure | Use Case/Vertical App | Training | Dataset | Development |
|---|---|---|---|---|---|---|---|---|---|
| [ | DDQN & Dueling DQN | RAN | Maximize long-term profit while meeting diverse multitenants’ service demands |
| Target NN, Online NN, replay memory, and mini-batch | Utilities, automotive, and manufacturing | Centralized | Synthetic | Emulation (TensorFlow) |
| [ | DQN | RAN | Maximize radio resource utilization while QoS satisfaction |
| Target NN, Online NN, replay memory, and mini-batch | eMBB, uRLLC, mIoT | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | DQN | E2E (RAN, TN, CN, Edge) | Optimize VNFs positioning while meeting SFC traffic variations |
| FNN | eMBB | Centralized | Real-available [ | Emulation (openAI gym) |
| [ | DQN | Edge, RAN & TN | Optimize resource utilization at the edge network |
| DNN, replay memory, and mini-batch | Internet of vehicles and smart cities | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | Dueling GAN-DDQN | RAN | Maximize profit and resource utilization |
| Target NN, Online NN, Discriminator NN, memory replay, and mini-batch | VoLTE *, Video, and uRLLC | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | LSTM-A2C | RAN | Maximize spectral efficiency, SLA satisfaction ratio, and profit |
| Policy RNN and Value RNN | VoLTE *, eMBB, and uRLLC | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | Constrained DQN | RAN | Maximizing resource utilization and throughput during orchestration and network slice management under service constraints |
| FNN | Video, VoLTE *, and uRLLC | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | DDQN | RAN | Minimize number of allocated radio resource blocks while meeting diverse and dynamic slice performance requirements |
| Ape-X and replay Memory | Undeclared | Centralized | Synthetic | Simulation (NS3) |
| [ | DQN | E2E (RAN, TN, CN, Edge) | Maximize QoE satisfaction and resource utilization |
| FNN | V2X | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | DQN | RAN | Maximize long-term revenue while ensuring QoS satisfaction |
| Target NN, Online NN, replay memory, and mini-batch | Bandwidth sensitive * | Centralized | Synthetic | Simulation (MATLAB) |
| [ | DQN | CN | Maximize QoS satisfaction and minimize deployment costs while meeting bandwidth and computing resources | Undeclared | FNN | Bandwidth sensitive * | Centralized | Real-available [ | Emulation (TensorFlow) |
| [ | DQN & DDQN | RAN | Maximize spectral utilization and minimizing costs |
| Target NN, Oline NN, and replay memory | Elastic and real-time | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | DQN | RAN | Maximize QoE satisfaction and resource utilization |
| Target NN, Online NN, replay memory, and mini-batch | Delay constrained, rate constrained, rate and delay constrained, and rate and delay nonconstrained * | Centralized | Synthetic | Simulation (MATLAB) |
| [ | DQN | Edge | Maximize resource utilization and QoS satisfaction |
| Target NN, Online NN, replay memory, and mini-batch | Bit rate sensitive * | Centralized | Synthetic | Emulation (TensorFlow) |
| [ | Variation of Actor-Critic | RAN | Maximize the total throughput over the time | Gaussian distribution | Policy NN and Value NN, replay memory, and mini-batch | Undeclared | Centralized | Synthetic | Simulation (Undeclared tool) |
| [ | DQN | RAN | Maximize the data rate for eMBB and URLLC |
| Online NN, Target NN, replay memory, and mini-batch | eMBB, and uRLLC | Distributed | Synthetic | Simulation (PyTorch) |
*: non-5G/6G terminology is used for the use case or vertical application.
Resource orchestration based on DRL.
| Ref. | Algorithm | Focus | Optimization Objective | Explore-Exploit | NN Structure | Training | Dataset | Environment |
|---|---|---|---|---|---|---|---|---|
| [ | DDPG | CN and Edge | Optimize placement of VNFs and service routing paths while addressing the enormous number of real-time traffic requests | Gaussian noise | Target NN and Online NN | Centralized | Synthetic | Emulation (TensorFlow) |
| [ | DDQN | RAN | Maximize the expected long-term needs of tenants |
| Target NN, Online NN, replay memory, and mini-batch | Distributed | Synthetic | Emulation (TensorFlow) |
| [ | Online DQN | CN | Making chain placement decisions across geo-distributed data centers while minimizing deployment costs |
| LSTM | Centralized | Real-available [ | Emulation (Google data center) |
| [ | TD3 | RAN | Reconfigure computing resources autonomously while minimizing latency, energy consumption, and deployment costs | Gaussian | Policy Network and Value Network | Centralized | Synthetic | Emulation (OpenAI gym) |
| [ | DDPG | E2E (RAN, TN, CN, Edge) | Maximize resource utilization while meeting SLAs | Decay Gaussian | Target NN, Online NN, memory replay, and mini-batch | Centralized | Real-available [ | Emulation (Open air interface and open daylight) |
| [ | Decentralized DQN | E2E (RAN, TN, CN, Edge) | Maximize slices’ performance under networking and computing resources constraints | Decay Gaussian | Target and Online NNs with actor–critic and replay memory | Distributed | Real-available [ | Emulation (Open air interface and open daylight) |
Resource scheduling based on RL and DRL.
| Ref. | Algorithm | Focus | Optimization Objective | Explore-Exploit | NN Structure | Use Case / Vertical App | Training | Dataset | Environment |
|---|---|---|---|---|---|---|---|---|---|
| [ | A3C | RAN | Maximize resource utilization while guaranteeing slices isolation |
| LSTM | Undeclared | Distributed | Emulation (TensorFlow) | |
| [ | Q-learning | CN & TN | Minimize SFC’s delay |
| Undeclared | Delay and none delay sensitive * | Centralized | Simulation (Undeclared tool) | |
| [ | QV-learning, QV2-learning, QVMAX-learning, QVMAX2-learning | RAN | Minimize packet delay and packet drop rate |
| Distributed NNs | Undeclared | Centralized | Simulation (LTESim) | |
| [ | DQN | E2E (RAN, TN, CN, Edge) | Minimize SLA violations while maximizing physical nodes’ resource utilization |
| CNN | eMBB, uRLLC, mMTC | Centralized | Synthetic | Emulation (Python-Theano) |
| [ | Q-learning | CN & TN | Achieve adaptive and cost-effective SFC |
| Undeclared | Undeclared | Centralized | Simulation (Java-based) | |
| [ | DQN | RAN | Minimize latency |
| FNN | uRLLC | Centralized | Simulation (Undeclared tool) | |
| [ | DQN | RAN | Maximize the long-term QoE |
| Target NN, Online NN, and replay memory | Video streaming | Centralized | Simulation (Undeclared tool) |
*: non 5G/6G terminology is used for the use case or vertical application.
Figure 15G/6G network slices.
Figure 2Resource management phases.
Figure 3RL/DRL-based admission control architecture.
Figure 4Resource allocation architecture using RL/DRL.
Figure 5Resource orchestration architecture using RL/DRL.
Figure 6Resource scheduling architecture using RL/DRL.