Literature DB >> 35222626

Intelligent L2-L∞ Consensus of Multiagent Systems under Switching Topologies via Fuzzy Deep Q Learning.

Haoyu Cheng¹, Linpeng Xu^2,3, Ruijia Song⁴, Yue Zhu^2,5, Yangwang Fang¹.

Abstract

The problem of intelligent L 2-L ∞ consensus design for leader-followers multiagent systems (MASs) under switching topologies is investigated based on switched control theory and fuzzy deep Q learning. It is supposed that the communication topologies are time-varying, and the model of MASs under switching topologies is constructed based on switched systems. By employing linear transformation, the problem of consensus of MASs is converted into the issue of L 2-L ∞ control. The consensus protocol is composed of the dynamics-based protocol and learning-based protocol, where the robust control theory and deep Q learning are applied for the two parts to guarantee the prescribed performance and improve the transient performance. The multiple Lyapunov function (MLF) method and mode-dependent average dwell time (MDADT) method are combined to give the scheduling interval, which ensures stability and prescribed attenuation performance. The sufficient existing conditions of consensus protocol are given, and the solutions of the dynamics-based protocol are derived based on linear matrix inequalities (LMIs). Then, the online design of the learning-based protocol is formulated as a Markov decision process, where the fuzzy deep Q learning is utilized to compensate for the uncertainties and achieve optimal performance. The variation of the learning-based protocol is modeled as the external compensation on the dynamics-based protocol. Therefore, the convergence of the proposed protocol can be guaranteed by employing the nonfragile control theory. In the end, a numerical example is given to validate the effectiveness and superiority of the proposed method.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35222626 PMCID： PMC8865973 DOI： 10.1155/2022/4105546

Source DB: PubMed Journal: Comput Intell Neurosci

1. Introduction

In recent years, the coordination control of MASs has attracted considerable attention for their broad applications in many fields [1, 2], such as formation control, cooperative attack, and attitude alignment. The MAS consists of a series of agents, which can communicate and interact with each other to realize multiple missions and adapt to the complex environment [3, 4]. In particular, much attention has been paid to the problem of consensus of MASs because of their great potential applications in both economic and military. The purpose of MASs is to construct a relationship between the agents to achieve an agreement for the state/output. In the past decades, fruitful research studies have emerged to contribute to the development in theory and applications. To mention a few, the problem of distributed formation control for MASs is studied in [5], the time-varying formation design for MASs with disturbances is proposed in [6], and the problem of finite-time consensus for switched nonlinear MASs is investigated in [7]. In practical applications, it is well known that the communication topology among the agents may change dramatically over time to adjust to multiple missions and complex environments [8, 9], such as the MASs can realize obstacle avoidance and higher flight efficiency by formation transformation [10, 11]. The design flexibility, security, and performance of convergence will be improved, which motivated the studies on the switching topologies of MASs [1, 12]. Recently, because of the broad potential applications of switching topologies, considerable significant research studies have been proposed by scholar at home and abroad. The communication topologies among interacting agents will change according to the flight conditions and missions, which can be modeled as switched systems. The switched systems consist of a series of continuous-time (or discrete-time) subsystems and a switching signal, which determines the switching strategy between subsystems. It provides an efficient approach to deal with the problem of fast time-varying conditions. Therefore, it can be inferred that the switching of topologies can be viewed as the switching between subsystems, and it is essential to study the problem of consensus protocol design to make sure the state/output can converge to the given value. In [13], the problem of time-varying formation control of MASs is investigated. The communication topologies switching among given connected topologies and the switching signal depend on the Markovian process. The Lyapunov function method is utilized to analyze the convergence. In the work of [14], the problem of event-triggered leader-following consensus problem for multiagent systems with external disturbances is addressed under switching topologies. A novel distributed event-triggered protocol is proposed to realize disturbance rejection based on extended state observer. The average dwell time (ADT) method is utilized to ensure the stability of the event-triggered protocol. In [15], the time-varying practical formation problem is studied for spacecraft, where switching topologies and time-delays are taken into consideration. Sufficient conditions are provided to ensure that the error system is convergent, which are derived based on the ADT method. It is well known that the research studies mentioned above are proposed to deal with the problem of switching topologies. However, the convergence is guaranteed based on the ADT method. It can be inferred that the common parameters are applied for all subsystems in the ADT method, which will lead to conservativeness. To obtain tighter bounds on dwell time and improve the design flexibility of the algorithm, MDADT is applied during last decades. In [16], the MDADT method and multiple discontinuous Lyapunov function (MDLF) method are combined to analyze the stability of switched systems with unstable modes. The sufficient conditions are established, and the results in existing literature are covered as a special case. The fast switching and slow switching in the framework of MDADT are applied to unstable modes and stable modes. In [17], the global adaptive control algorithm for switched systems is proposed based on the MDADT method. The different properties of subsystems are taken into consideration. Then, the adaptive tracking controller is applied to the nonlinear switched systems with external disturbance and unmodeled dynamics, which illustrates the effectiveness and superiority of the MDADT method. In the work of [18], the event-triggered sliding mode controller is proposed. By employing the MDADT method and event-triggered strategy, less conservative and more practical results are obtained. Sufficient conditions are given to ensure stochastically exponential stability by the aid of the LMI technique. The literature mentioned has provided fruitful results on consensus protocol design for MASs under switching topologies. However, stability and convergence are ensured by the traditional ADT method. The different properties of subsystems cannot be considered, which will lead to conservativeness. Therefore, how to obtain less restrictive results is still an open and challenging problem, which has been fully investigated, and it has an important value and potential applications in practice. Moreover, in practical environment, there always exist uncertainties and disturbances, which will lead to performance degradation and even instability [19, 20]. Therefore, it is essential to investigate the robust consensus problem to improve the performance in the uncertain environment [21-23]. In the work of [24], the problem of distributed H∞ containment control for MASs with switching topologies is studied. An observer-based containment control scheme is proposed. The external disturbance and time delay in the environment are taken into consideration, which is more applicable than the traditional method. By employing the Lyapunov function method and LMIs technique, the sufficient existing conditions and solutions of control protocol are given in the form of LMIs. In [25], the problem of time-varying formation of second-order discrete-time MASs under switching topologies and the time delay is investigated. The sufficient conditions are given to ensure MASs accomplish the mission of time-varying formation based on the state transformation method. The time delay and uncertainties are considered. Compared with the existing literature, the proposed can overcome the undesirable response caused by time delay and improve the transient performance. In the work of [26], the problem of formation control for tail-sitters in flight mode transitions is studied. The nonlinear dynamics and uncertainties are considered, and the robust time-varying formation control protocol is proposed. It is proven that the tracking errors can converge to the origin in finite time. The problem of L2-gain robust protocol for time-varying output formation-containment of MASs is addressed in [27]. The PID-based output-feedback control protocol is provided to ensure that all followers can track a time-varying formation reference, where communication delays and external disturbance are taken into consideration. The asymptotic stability of MASs is proved by the Lyapunov function method. However, as well known, the transient performance and robustness cannot achieve simultaneously. Therefore, we need to make comprise of the transient performance and robustness, which still remains an open and challenging problem. In addition, with the development of computing ability, the intelligent technique has been an attractable problem during the last decades [28-30]. It is widely applied in the areas of target recognition, machine vision, robotic systems, and controller design [31, 32]. It provides an efficient method to improve the autonomy and design flexibility of the system [33]. The most widely used methods are the deep learning and reinforcement learning. As a combination of deep learning and reinforcement learning, the advantages of deep learning and reinforcement learning are adopted, which include the characteristics of self-fitting and self-learning. In the work of [34], the automatic completion of multiple peg-in-hole assemble tasks is realized. Because the traditional method requires an accurate contact model and complex analysis, the intelligent control method is formulated by constructing the task as a Markov decision process. The deep deterministic policy gradient (DDPG) algorithm is proposed to accomplish the task to achieve optimal policy and avoid risky actions. In [35], a noninteger PID controller is proposed based on the DDPG algorithm. The measurement noises and external disturbances are taken into consideration. The kinematic controller and dynamic controller are proposed to achieve optimal performance. The DDPG algorithm is given to compensate for the uncertainties and disturbances in the framework of actor-critic. A numerical example is given to illustrate the effectiveness of the proposed method. Cheng et al. [36] proposed the real-time controller for the problem of fuel-optimal moon landing. Because the traditional method cannot meet the demand of high requirements of real-time performance and autonomy, the deep reinforcement learning algorithm is proposed for the real-time optimal control based on actor-indirect method architecture. The deep neural networks are applied for initial guesses, and the efficiency of training data is guaranteed. The literature mentioned above has provided considerable meaningful results in the area of machine learning. However, to the best of the authors' knowledge, the intelligent consensus design for MASs with considerations of stability, robustness, and optimal transient performance has not been fully studied yet. It is essential and important to achieve optimal comprise of robustness and transient performance. Based on the statement above, it can be inferred that the problem of the improvement of autonomy and design flexibility for the system needs to be studied. The problem of consensus protocol design for MASs under switching topologies has not been fully investigated yet. The design flexibility can be improved by employing tighter bounds on dwell time because less conservative results can be obtained, and it leaves more room to ensure the switching logic stays in the subsystems with better performance for long enough time. Moreover, it is of great importance to combine the advantages of the traditional method and intelligent technique, which can ensure convergence, robustness, and transient performance simultaneously. Therefore, the problem of intelligent L2-L∞ consensus design of MASs under switching topologies is investigated. The convergence and robustness are guaranteed by the Lyapunov function method and the MDADT method, which are more applicable. The transient performance is improved by fuzzy deep Q learning, in which the fuzzy reward function is proposed for the complex scheduling process. The main contributions of this study can be summarized as follows: The L2-L∞ consensus protocol of MASs under switching topologies is designed. The problem of L2-L∞ consensus of MASs is converted into the problem of stability analysis for switched systems, which is more applicable than the traditional method. The MDADT method and multiple Lyapunov function method are combined to guarantee the stability and prescribed attenuation performance index, which can obtain tighter bounds on dwell time and less conservative results. The consensus protocol is composed of the dynamics-based consensus protocol and learning-based consensus protocol. Compared with the traditional method, the proposed strategy can ensure the stability, robustness, and transient performance simultaneously. The fuzzy reward function is utilized to improve the efficiency of the deep reinforcement learning algorithm. The design of reward function for the traditional method mainly depends on the experience of designer, which will lead to complexity. The fuzzy reward function can improve the data efficiency and ensure optimal performance. The rest of the study is organized as follows: the preliminaries and problem statement are provided in Section 2; in Section 3, the main results of the study are given; the numerical example is given in Section 4, which is followed by the conclusion in Section 5.

2. Preliminaries and Problem Statement

In this study, it is supposed that MASs are composed of a leader labelled as 0 and n followers labelled as 1, 2,…, n. The connection topology among n followers can be described as a time-varying model with N topologies. We define 𝒢=(𝒢1, 𝒢2,…, 𝒢) as undirected connected graph, respectively. ℋ=(1,2,…, n), n > 1 represents the set of finite nodes. s=σ(k) : [0, ∞)⟶R={1,2,…, N} denotes the switching signal, which is a piecewise continuous function of time and takes value in the finite set ℋ. 𝒜=(a) and ℒ=(l) are the adjacency matrices of the undirected graph 𝒢 and the Laplacian matrix at time instant k, where a stands for the element of adjacency matrix, where a=1 represents that the node i can obtain information from node j, and l is defined in the following equation. Then, for given node i ∈ ℋ, we can define the neighbors of node i as 𝒩={j ∈ ℋ : a=1}. Another undirected connected graph is defined as to indicate the information transformation between the leader and the followers with n nodes. Define a diagonal matrix Θ=diag{θ1, θ2,…, θ}, where θ=1 stands for that the node i ∈ ℋ can obtain information from the leader; otherwise, we define θ=0. Therefore, MASs with leader-followers can be described as in the following equations:where A, B, C, and D are the system matrices with appropriate dimensions, x0(k)=[x01(k), x02(k),…,x0(k)] ∈ R represents the state vector of leader, x(k)=[x(k), x(k), ..., x(k)] ∈ R is the state of the ith follower, u(k)=[u(k), u(k),…,u(k)] ∈ R is the input of the ith follower, z(k)=[z(k), z(k),…,z(k)] ∈ R stands for the output of the ith follower, and ω(k) ∈ R denotes the external disturbance belonging to L2[0, ∞). It is supposed that the agent i can obtain information from its neighbors and leader. Therefore, we define υ(k) as relative state measurements of the ith agent, which can be described as follows: In this study, the control input of the ith agent to ensure the consensus of leader-followers is proposed.where K is the control parameter to be determined by robust control theory, and K is the compensated parameter obtained by deep Q learning. In this study, the gained parameters K are supposed to vary in a finite set with given bounds. The K can be viewed as additional perturbance of K, which can be described as follows:where M ∈ R and N ∈ R are the known matrices with appropriate dimensions, and F ∈ R are the unknown matrices with FTF ≤ I. For the ith agent, the error of state is defined as e(k)=x(k) − x0(k). Then, the closed-loop system can be rewritten aswhere e(k)=[e1T(k),…,eT(k)]T, z(k)=[z1T(k),…,zT(k)]T, ω(k)=[ω1T(k),…,ωT(k)]T, , , and . To facilitate the proof, the definitions and lemmas are given as follows.

Definition 1 (see [37]).

For given switching signal σ(k) and k1 > 0, define N(0, k1) as the number of switching instants over the time interval (0, k1). T(0, k1) is set to be the activated time of undirected graph 𝒢 during (0, k1). There exist constant scalars N0 ≥ 0 and τ > 0, such that Then, τ is called the mode-dependent average dwell time and N(0, k1) is the mode-dependent chatter bound, respectively. In this study, we set N0=0.

Definition 2 .

(see [37]). If there exist control protocol in equation (5), all agents asymptotically track the state trajectory of the leader, such that

Definition 3 .

(see [38]). For given constant scalars 0 < δ < 1 and γ > 0, the prescribed L2 − L attenuation performance γ is satisfied such that The MASs in equations (2)-(3) are asymptotically stable when ω(k)=0. The following inequation holds for all nonzero ω(k) ∈ l2(0, ∞].

Lemma 1 .

(see [35]). The matrices ℒ+Θ are symmetric and positive definite if and only if the graphs are connected for t ≥ 0. Moreover, there exist a transformation matrix T, such that the following equation holds.where λ, i ∈ ℋ are the nonzero eigenvalues of matrices ℒ+Θ.

Lemma 2 .

(see [39]). For given constant a > 0 and real matrices Θ,U,V,W, it is concluded that equation (12) is equivalent to equation (13).

Lemma 3 (see [39]).

For given symmetric matrix 𝒯 and matricesℳ, 𝒩, if there exist constant scalar ε > 0, such that Then, the following equation holds for any appropriate ℱ with ℱℱ ≤ I.

3. Main Results

3.1. L2-L∞ Consensus Protocol Design

In this section, the L2-L∞ consensus protocol is proposed, and the stability and prescribed performance are guaranteed.

Lemma 4 .

For given constant scalars 0 < δ < 1, γ > 0. The system in (7) with control input in (5) is asymptotic stable with L2-L∞ attenuation performance γ if and only if the following equation holds.where

Proof

Substituting equation (17) to (7), one can obtain equation (16). It can be inferred that the transformation matrix T is unique; therefore, we have the following equations. It is obvious that the problem of robust consensus protocol design can be converted to the controller design of (16).

Remark 1 .

The system in equation (16) consists of the independent system in equation (20). Therefore, the stability of equation (7) is equivalent to the stability of n subsystems in equation (20); the attempt to ensure the prescribed attenuation performance of (7) can be converted to guarantee the attenuation performance of (16).where , and . In Theorem 1, the sufficient conditions to guarantee the stability and prescribed attenuation performance index are presented.

Theorem 1 .

For given constant scalars μ > 1, 0 < δ < 1, γ > 0, if there exist Lyapunov functions , and class functions κ1, κ2, the switched systems in equation (20) with MDADT satisfying equation (25) are globally uniformly asymptotically stable with prescribed L2-L∞ attenuation performance γ, such that The entire proof can be divided into two steps. The stability of equation (20). The switching instants in the time interval (0, k) are set to be k1, k2,…, k with k=k. Then, (26) holds when ‖ω(k)‖ ≡ 0. Together with (22), we can conclude that Based on equations (26)-(27), the following equation can be obtained by iteration. Combining with Definition 1, we have Then, we can obtain (29) based on (21). Therefore, the system in (20) with MDADT satisfying (25) is globally uniformly asymptotically stable. The system in equation (20) has prescribed L2-L∞ attenuation performance γ. Together with equations (22)-(23), one has Then, one can obtain the equation as follows by iteration. Substituting the equation above into (8), one can obtain that According to the conditions μ > 1, 0 < δ < 1, and (25), we have Combining equations (32)–(34), one can obtain equation (35). Together with (24), it is obvious thatwhich implies that , and the proof is complete.

Corollary 1 .

For given constant scalars μ > 1, 0 < δ < 1, γ > 0, if there exist positive-definite matrices P ∈ R satisfying equations (37)–(39), the switched systems in equation (20) with MDADT satisfying equation (25) are globally uniformly asymptotically stable with prescribed L2-L∞ attenuation performance γ. The Lyapunov function , is defined as follows: According to (20) and (39), we can conclude that (38) is equivalent to (24). Along the trajectory of , one has Together with equations (40)-(41), we have According to Theorem 1, we can conclude that the system in (20) with MDADT satisfying (25) is globally uniformly asymptotically stable with prescribed L2-L∞ attenuation performance γ. Based on Theorem 1 and Corollary 1, the solutions of consensus protocol are given in Theorem 2.

Theorem 2 .

For given constant scalars μ > 1, 0 < δ < 1, γ > 0, a > 0, and ε > 0, if there exist positive-definite matrices P ∈ R, matrices X ∈ R, Y ∈ R, the MASs in (2)-(3) with control input in equation (5) are asymptotically stable with prescribed L2-L∞ attenuation performance γ such that equation (43) holds. The parameters of control protocol can be derived in (43).where Ξ11=−(1 − δ)P+ε(λ)2NTN, Ξ13=ATP − λYTBT. According to Schur complement, it is obvious that equation (43) is equivalent to equation (45). Define Θ=𝒯+ε−1ℳ+ε𝒩𝒩, , and , where , , and . Together with Lemma 2, we have Moreover, based on Lemma 3, one has According to Schur complement, it is obvious that (46) is equivalent to (37), which completes the proof.

4. Compensated Consensus Protocol Design Based on FDQL

In this section, the learning-based consensus protocol is proposed based on deep reinforcement learning, where fuzzy deep Q learning is utilized. The stability and prescribed attenuation performance are guaranteed by the robust control theory, and the learning-based control protocol is introduced to improve the transient performance and realize optimal control policy. The output of the learning-based control protocol can be viewed as an additional variation of robust consensus protocol. The online scheduling of control protocol is established as a Markovian process. Therefore, the advantages of robust control theory and deep reinforcement learning are combined. It is well known that reinforcement learning is composed of state, action, agent, and environment. The state of kth step is defined as s, and the chosen action is supposed to be a; then, the reward function r and the state s are generated based on the interaction with the environment. Therefore, the optimal control policy can be obtained by maximum the reward function. To improve the convergence of consensus protocol, the state is defined as and the action is defined as a=[K]. In Q learning, the deep neural network is utilized to approximate the action-state value function Q(s, a), which can be described aswhere f(s, a, ω) denotes the function of deep neural networks. The action is chosen based on the maximum Q value: There exist two neural networks in the deep Q learning algorithm, whose structures are the same and can be called as the critic neural network and target neural network. The parameters of the critic neural network are updated based on temporal-difference learning. The output of the critic neural network is defined as Q(s, a, ω) and the output of the target neural network is defined as Q(s, a, ω−). Therefore, the parameters of the critic neural network are updated based on the equation as follows:where L is the learning rate, γ denotes the discount factor, R represents the reward of state transition from s to s′ through action a, and max(Q(s′, a′, ω−)) stands for the maximum Q value of the target neural network. It can be inferred that the reward function has an important influence on the final performance. The design of traditional deep Q learning mainly depends on the experience of designers, which can not achieve optimal performance and will improve the computational complexity. In this study, the reward function is applied to design the reward function. The input value of fuzzy reward function can be divided into five categories, which can be described as VB, B, N, G, and VG. The five categories represent very bad, bad, normal, good, and very good. In this study, it is supposed that there are four followers. Therefore, the inputs of the fuzzy reward system are set to be |e1|, |e2|, |e3|, and |e4|. It can be inferred that each fuzzy set includes 25 rules, and the total number of the fuzzy rules is 75. The output of the fuzzy reward function is limited in the interval [−1,0), and the defuzzifier of the fuzzy reward function is defined as Based on the statement above, the learning-based consensus protocol design algorithm can be summarized as follows:

Remark 2 .

The FDQN algorithm proposed in this study can improve the transient convergence performance of MASs. The output of the deep Q network is supposed to be variation of parameters of consensus protocol. As well known, the design of reward function in the traditional method depends of the experience of the designers. To overcome the problem, the fuzzy reward function is developed to improve the learning efficiency in this study.

5. Numerical Example

In this section, an example is provided to illustrate the effectiveness of the method. The model of MASs is constructed as follows: The external disturbance is The switching topologies are shown in Figure 1. Then, we can obtain the Laplace matrices as follows:

Figure 1

Switching topologies of MASs. (a) Interacting topology 𝒢1. (b) Interacting topology 𝒢2.

The parameters of switching topologies are given as follows: Therefore, we can obtain MDADT according to (25). It is well known that the ADT method can be viewed as a special case of the MDADT method. Therefore, it can be inferred that τ=max{τ}=0 · 4266. It is obvious that tighter bounds on dwell time and less conservative results can be obtained. Then, we define the attenuation performance index γ=0 · 9, and we can obtain the parameters of consensus protocol based on Theorem 2. The switching logic is shown in Figure 2. In order to illustrate the effectiveness and superiority of the proposed method, the traditional ADT method and MDADT method are given as comparisons. From the statement above, we have realized that MDADT can obtain tighter bounds and less conservative results. Moreover, the comparisons of state response of the ADT method and MDADT method are shown in Figures 3–6. The state responses of MASs with ADT switching topologies are shown in Figures 3-4. The state responses of MASs with MDADT switching topologies are shown in Figures 5-6. We can see that the transient performance of the ADT method is better than that of the MDADT method because the different characteristics of subsystems are taken into consideration, which will no doubt improve the design flexibility and make it more applicable for practical conditions.

Figure 2

The switching logic.

Figure 3

The state response of (x)1 under the ADT method.

Figure 4

The state response of (x)2 under the ADT method.

Figure 5

The state response of (x)1 under the MDADT method.

Figure 6

The state response of (x)2 under the MDADT method.

Validate the superiority of the proposed method. The state response of the proposed method is shown in Figures 7–11. The state responses of the proposed method are shown in Figures 7-8. We can conclude that the transient performance can be improved by the aid of fuzzy deep Q learning. The advantages of the traditional method and intelligent method are combined. Compared with the traditional method, the transient performance can be improved, and compared with the intelligent method, stability and training efficiency can be guaranteed. The attenuation performance index is shown in Figure 9, from which we can see that the robustness of the proposed is ensured. The episodes reward response is shown in Figure 10, and we can see that the reward function of the fuzzy deep Q learning algorithm can converge to the neighbor of the origin, which demonstrates the effectiveness of the algorithm in this study. In addition, the response of the action is shown in Figure 11, from which we can see that the learning-based consensus protocol is provided to compensate the additional input caused by the uncertainties.

Figure 7

The state response of (x)1 with the proposed method.

Figure 8

The state response of (x)2 with the proposed method.

Figure 9

The response of attenuation performance index.

Figure 10

The response of episodes reward.

Figure 11

The response of the action.

Based on the statement above, we can conclude that the convergence, robustness, and prescribed attenuation performance index are guaranteed. The less conservative results and tighter bounds on dwell time can be obtained by the MDADT method. The transient performance of the system can be improved based on the fuzzy deep Q learning algorithm. It is worth mentioning that the traditional robust cannot make comprised of robustness and transient performance, and the intelligent method always cannot guarantee convergence. By employing the proposed method, convergence, robustness, and transient performance are guaranteed simultaneously.

6. Conclusions

The problem of intelligent L2-L∞ consensus design for MASs under switching topologies is investigated in this study. The switching topologies of MASs are modeled as switched system theory by employing linear transformation. Then, the problem of consensus protocol design can be converted to the problem of L2-L∞ control. To ensure the convergence, robustness, and transient performance simultaneously, the proposed consensus protocol is composed of dynamics-based consensus protocol and learning-based consensus protocol, which provides baseline and compensation of uncertainties. The baseline of consensus protocol is obtained by dynamics-based consensus protocol, which is provided based on the MDADT method and MLF method. The scheduling interval of learning-based protocol is given by nonfragile control theory. Then, the learning-based consensus protocol is proposed based on the fuzzy deep Q learning algorithm to improve the transient performance and achieve optimal policy, where the fuzzy reward function is introduced to improve the learning efficiency.

5 in total

1. Output-Feedback Cooperative Formation Maneuvering of Autonomous Surface Vehicles With Connectivity Preservation and Collision Avoidance.

Authors: Zhouhua Peng; Dan Wang; Tieshan Li; Min Han
Journal: IEEE Trans Cybern Date: 2019-06-05 Impact factor: 11.448

1 in total

1. Control Design for Uncertain Higher-Order Networked Nonlinear Systems via an Arbitrary Order Finite-Time Sliding Mode Control Law.

Authors: Maryam Munir; Qudrat Khan; Safeer Ullah; Tayyaba Maryam Syeda; Abdullah A Algethami
Journal: Sensors (Basel) Date: 2022-04-02 Impact factor: 3.576