| Literature DB >> 22346624 |
Thrishantha Nanayakkara1, Malka N Halgamuge, Prasanna Sridhar, Asad M Madni.
Abstract
In a network of low-powered wireless sensors, it is essential to capture as many environmental events as possible while still preserving the battery life of the sensor node. This paper focuses on a real-time learning algorithm to extend the lifetime of a sensor node to sense and transmit environmental events. A common method that is generally adopted in ad-hoc sensor networks is to periodically put the sensor nodes to sleep. The purpose of the learning algorithm is to couple the sensor's sleeping behavior to the natural statistics of the environment hence that it can be in optimal harmony with changes in the environment, the sensors can sleep when steady environment and stay awake when turbulent environment. This paper presents theoretical and experimental validation of a reward based learning algorithm that can be implemented on an embedded sensor. The key contribution of the proposed approach is the design and implementation of a reward function that satisfies a trade-off between the above two mutually contradicting objectives, and a linear critic function to approximate the discounted sum of future rewards in order to perform policy learning.Entities:
Keywords: Markov decision process; reward shaping; sensing; sensor network
Year: 2011 PMID: 22346624 PMCID: PMC3274088 DOI: 10.3390/s110101229
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Temporal difference based learning to predict.
Figure 2.Actor−critic based learning: using the ability to predict to improve the behaviors (control policy). Here is a sleeping policy of the sensor node.
Figure 3.How the temporal difference can be used to improve the policy. Here u(t) = π(s(t)).
Figure 4.The structure of the polynomial critic function.
Figure 5.Evaluations of reward and critic.
Figure 6.Implementation of reinforcement learning on sensors in an outdoor environment, by using MTS400 CA embedded board with external antenna.
The comparison of performance with and without the critic based adaptive sleeping behavior.
| Time | Number of Transmitted Packet | |
|---|---|---|
| With Critic | Without Critic | |
| 0 | 0 | 0 |
| 50 | 100 | 600 |
| 100 | 120 | 1200 |
| 150 | 200 | 1850 |
| 200 | 260 | 2500 |
| 250 | 310 | 3100 |
| 300 | 400 | 3750 |
Figure 7.Adaptive behavior of a cluster of sensor nodes following a Markov decision process in a stochastic environment.