| Literature DB >> 34960332 |
Wonseok Lee1, Young Jeon1, Taejoon Kim1, Young-Il Kim2.
Abstract
A network composed of unmanned aerial vehicles (UAVs), serving as base stations (UAV-BS network), is emerging as a promising component in next-generation communication systems. In the UAV-BS network, the optimal positioning of a UAV-BS is an essential requirement to establish line-of-sight (LoS) links for ground users. A novel deep Q-network (DQN)-based learning model enabling the optimal deployment of a UAV-BS is proposed. Moreover, without re-learning of the model and the acquisition of the path information of ground users, the proposed model presents the optimal UAV-BS trajectory while ground users move. Specifically, the proposed model optimizes the trajectory of a UAV-BS by maximizing the mean opinion score (MOS) for ground users who move to various paths. Furthermore, the proposed model is highly practical because, instead of the locations of individual mobile users, an average channel power gain is used as an input parameter. The accuracy of the proposed model is validated by comparing the results of the model with those of a mathematical optimization solver.Entities:
Keywords: reinforcement learning; trajectory optimization; unmanned aerial vehicles
Year: 2021 PMID: 34960332 PMCID: PMC8708867 DOI: 10.3390/s21248239
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Mathematical notations and descriptions.
| Notations | Description |
|---|---|
|
|
|
|
| Environmental parameters |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Noise power spectral |
|
|
|
|
| Path loss exponent |
|
| Attenuation factors for line of sight (LoS) and non-LoS (NLoS) |
|
|
|
|
|
|
|
| Traffic load |
Figure 1User mobility model with varying user density and random movement around moving center point.
Figure 2Example of unmanned aerial vehicles serving as base stations (UAV-BS) movement. , and mean the average received channel power gains from to .
Input vector over time and state.
| Time | State |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 3Training process of the deep Q-network (DQN) model.
DQN learning parameter settings.
| Parameter | Value |
|---|---|
| Batch size | 64 |
| Learning rate | 0.001 |
| Size of replay memory | 5000 |
| Number of hidden layers | 2 |
| Number of neurons in each hidden layer | 48 |
| Type of activation function | Rectified linear unit (ReLU) |
Experiment parameter settings.
| Time Parameter | Value |
|---|---|
|
| 25 |
|
| 10–50 m |
|
| 2 GHz |
| Transmit power | 20 dBm |
|
| 1 MHz |
|
| 8,000,000 bits |
|
| 9.61, 0.16 |
|
| 1.120, 4.6746 |
|
| 2 |
|
| 3 dB |
|
| 23 dB |
|
| 5, 10 m |
|
| −100 dBm |
|
| 10, 1 |
Comparison between the proposed algorithm and Broyden–Fletcher–Goldfarb–Shanno (BFGS).
| BFGS | Proposed Algorithm | |
|---|---|---|
|
|
| |
|
| Exact positions of the UAV-BS and all the GUs | Differences of the average received channel power gain |
|
| Optimal position(coordinate) | Optimal action(direction) |
Figure 4Execution times of two methods with increasing GUs.
Figure 5UAV-BS movement and line-of-sight (LoS) probability from random starting positions to the optimal position without considering GU mobility.
Figure 6The change of UAV-BS position by varying group radius.
Figure 7Mean opinion score (MOS) and UAV-BS altitude with varying group radius.
Figure 8UAV-BS trajectory and MOS with GUs moving in various paths.