| Literature DB >> 25808769 |
Náthalee C Almeida1, Marcelo A C Fernandes2, Adrião D D Neto3.
Abstract
The use of beamforming and power control, combined or separately, has advantages and disadvantages, depending on the application. The combined use of beamforming and power control has been shown to be highly effective in applications involving the suppression of interference signals from different sources. However, it is necessary to identify efficient methodologies for the combined operation of these two techniques. The most appropriate technique may be obtained by means of the implementation of an intelligent agent capable of making the best selection between beamforming and power control. The present paper proposes an algorithm using reinforcement learning (RL) to determine the optimal combination of beamforming and power control in sensor arrays. The RL algorithm used was Q-learning, employing an ε-greedy policy, and training was performed using the offline method. The simulations showed that RL was effective for implementation of a switching policy involving the different techniques, taking advantage of the positive characteristics of each technique in terms of signal reception.Entities:
Year: 2015 PMID: 25808769 PMCID: PMC4435209 DOI: 10.3390/s150306668
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Functional diagram of an adaptive array.
Figure 2Scheme of interaction between the agent and environment.
Simulation Parameters.
| Parameters | Value |
|---|---|
| Number of elements in the array ( | 8 |
| Number of signals ( | 2 |
| Initial transmit power (P0) | 1 W |
| SINR threshold (δ) | 2 dB |
| Step Adaptation ( | 0.001 |
| Noise Variance | 0.1 |
| Distance between each element of the array ( | |
| Learning Rate | 0.1 |
| Discount Factor ( | 0.9 |
| Greedy Rule | 0.2 |
| Attenuation of the | 1 |
Policy Improvement of the agent.
| Policy 10 | Policy 50 | Policy 250 | |||||
|---|---|---|---|---|---|---|---|
| INDEX | SINR(dB) | BF | PC | BF | PC | BF | PC |
| 1 | −0.8 | 0.5 | 0.5 | 1 | 0 | 0 | 1 |
| 2 | −0.6 | 0.5 | 0.5 | 1 | 0 | 1 | 0 |
| 3 | −0.4 | 0.5 | 0.5 | 1 | 0 | 0 | 1 |
| 4 | −0.2 | 0 | 1 | 1 | 0 | 1 | 0 |
| 5 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| 6 | 0.2 | 0 | 1 | 0 | 1 | 0 | 1 |
| 7 | 0.4 | 0 | 1 | 0 | 1 | 0 | 1 |
| 8 | 0.6 | 0 | 1 | 0 | 1 | 0 | 1 |
| 9 | 0.8 | 1 | 0 | 1 | 0 | 0 | 1 |
| 10 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
| 11 | 1.2 | 0 | 1 | 1 | 0 | 0 | 1 |
| 12 | 1.4 | 0 | 1 | 0 | 1 | 0 | 1 |
| 13 | 1.6 | 0 | 1 | 0 | 1 | 0 | 1 |
| 14 | 1.8 | 0 | 1 | 0 | 1 | 0 | 1 |
| 15 | 2 | ||||||
| 16 | 3 | 1 | 0 | 0 | 1 | 1 | 0 |
| 17 | 4 | 1 | 0 | 1 | 0 | 1 | 0 |
| 18 | 5 | 1 | 0 | 1 | 0 | 1 | 0 |
Figure 3(a) System response. Agent started with SINR = −0.8 dB; (b) System response. Agent started with SINR = 0 dB; (c) System response. Agent started with SINR = 5 dB.
Figure 4The switching sequence among the two techniques.
Policy Improvement of the agent.
| Policy 10 | Policy 50 | Policy 250 | ||||||
|---|---|---|---|---|---|---|---|---|
| INDEX | SINR(dB) | BF | PC | BF | PC | BF | PC | |
| 1 | −0.8 | 0.5 | 0.5 | 0 | 1 | 0 | 1 | |
| 2 | −0.6 | 0.5 | 0.5 | 0 | 1 | 0 | 1 | |
| 3 | −0.4 | 0.5 | 0.5 | 1 | 0 | 0 | 1 | |
| 4 | −0.2 | 0 | 1 | 1 | 0 | 0 | 1 | |
| 5 | 0 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 1 | |
| 6 | 0.2 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 7 | 0.4 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 8 | 0.6 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 9 | 0.8 | 1 | 0 | 0 | 1 | 0 | 1 | |
| 10 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 11 | 1.2 | 0.5 | 0.5 | 0 | 1 | 0 | 1 | |
| 12 | 1.4 | 0 | 1 | 0 | 1 | 1 | 0 | |
| 13 | 1.6 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 14 | 1.8 | 0 | 1 | 0 | 1 | 0 | 1 | |
| 15 | 2 | |||||||
| 16 | 3 | 0 | 1 | 1 | 0 | 1 | 0 | |
| 17 | 4 | 0 | 1 | 1 | 0 | 1 | 0 | |
| 18 | 5 | 1 | 0 | 1 | 0 | 1 | 0 | |
Figure 5(a) System response. Agent started with SINR = −0.8 dB; (b) System response. Agent started with SINR = 0 dB; (c) System response. Agent started with SINR = 5 dB.
Figure 6The switching sequence among the two techniques.