Otwarty dostęp

Formation Control of Multi-agent Nonlinear Systems using The State-Dependent Riccati Equation

  
31 mar 2025

Zacytuj
Pobierz okładkę

Introduction

This paper presents research on controlling a highly-populated multi-agent system within the framework of the state-dependent Riccati equation (SDRE), which is a nonlinear optimal closed-loop control policy. A multi-agent system consists of a leader and a population of followers, playing the role of agents. The agents in a swarm system show a collective behavior, with interactions achieving a common goal [1]. The previous reports on swarm robotics have been quite diverse in subject matter: technological aspects of swarm robotics [2,3]; arrangement, formation, and behavior [46]; and bio-inspiration [7,8], among others.

The application of swarm robotics has been reported in many works, including micro-robotics: the interaction and physical contact between the agents [911], nano-swarms for delivering medicine or performing an operation in human body [12], search and rescue [1315], monitoring and data collection [16, 17], etc.

Multiple decision-making agents with auto–nomous operation, social ability interaction, reactivity perception, and pro-activeness exhibition, were motivated by complex inherently distributed systems to increase robustness. Multi-agent and swarm systems are similar in the sense that they require multiple agents that communicate and cooperate. Each agent in multi-agent systems can do some meaningful part of a task; however, in swarm systems, we would expect each agent to be too unaware of its environment to possibly even know what is going on around it. This emphasizes the existence of a large number of agents, and promotes scalability, soft collisions and physical contacts of the swarm in migration. This might not be the case for swarms in nature, i.e., in birds, which do not collide, and also do not receive a command from leader [18]. Here, in this current work, the modeling is not based on bio-inspiration, since we are far away from such sophisticated design in mother nature.

Multi-agent systems, or cooperative robots, can usually be fit into these two categories: systems that considered complex dynamics for each agent, though the number of agents is small [1924], and systems with simple dynamics or only kinematics for each agent when the number of agents is large [2532]. For example, Yang et al. presented a formation control strategy based on a stable control law to guide the agents towards a constraint region and simulated it for a total of 155 agents [32]. Collision avoidance and obstacle avoidance between the agents are also two important characteristics of the formation methods. Wang and Xin presented an optimal control approach for formation control taking into consideration obstacle avoidance for multiple unmanned aerial vehicle (UAV) systems [33]. Kumar et al. presented a velocity controller for a swarm of UAVs with obstacle avoidance capability [34]. Park and Yoo proposed a connectivity-based formation control with collisionavoidance properties for a swarm of unmanned surface vessels [35,36].

The application of the swarm regulation control is devoted to multi-copter UAV formation control for a large number of agents–1,089–to show the capabilities of the proposed approach. Swarm control using augmented methods such as graph theory and multilayer network design is a useful approach; however, in this work, the pure capacity of the SDRE is used to control a highly populated multi-agent system. This will simplify the design and limit the formulation to pure SDRE. The advantages of this point of view result in controlling large-scale systems with complex dynamics. A fully coupled six-degree-of-freedom (DoF) nonlinear model of the system is considered (12 state variables). The second case study is a leaderfollower system of 45-wheeled mobile robots (WMRs) with non-holonomic and holonomic constraints of wheels. A smaller number of robots was considered to highlight the effectiveness of obstacle and collision avoidance. SDRE has been used in both multi-agent systems [3739], and leader-follower control [22,4047]. The details of the number of agents using SDRE are reported on Table 1. The majority of these studies focused on the control and behavior of the leaderfollower system, first considering two agents and then sometimes increasing the number of agents to 5 or 10. The only large populated system using SDRE control included 1,024 satellites; obstacle avoidance and collision avoidance between agents were not considered [39]. The large distance between the agents and the pattern of the satellites did not necessitate collision/obstacle avoidance. In this work, a more mature version of multi-agent SDRE [39] is presented to add those characteristics, since the agents are working close together in both solved examples.

A detailed report on the population of agents in SDRE in previous, as literature compared to this work.

context Ref. No. of agents, type of agent
multi‐agentleader‐followerformation [37][38][40][41][42][43][44][45][46][47][39] 5, single‐integrator2, mobile robot2, spacecraft5, single‐integrator2, spacecraft3, mobile robot2, aircraft3, spacecraft2, spacecraft2, satellite1024, satellite
consensus control [22] 10, crane
cooperativemulti‐agents [48][49][50][19][51][52] 2, manipulator2, manipulator3, spacecraft4, manipulator4, UAV3, mobile robot
multi‐agentsleader‐follower this workthis workthis work 1,089, UAV45, mobile robot1,050, mobile robot

The swarm formulation in control methods could be solved by graph theory for the formation design of the whole swarm; in that case, another controller must control the agents, individually by checking the connection for the agents through multiple layers [5355]. The design of the formation using graph theory, and adapting a controller for handling the multiagent system, might pose difficulty in implementation because a controller must include other layers of designs through external methods.

The reported number of agents was 16 satellites [53], 6 agents [54], and 90 agents [55]. The use of the Kronecker delta function for the selection of information from a particular agent limits the application of the swarm multi-agent formulation to a limited number. In the SDRE method particularly, the solution to the Riccati equation might become overly complex if the SDRE should be solved for the entire swarm. In this approach, the gain of the targeted agent would be collected from the overall matrix of the swarm. Again, this approach would limit the implementation of the formulation for a large number of agents. The reported SDRE agents, using consensus control and graph theory, were 10 [22] and 2 [38], respectively.

The focus of this work is large-scale multi-agent systems that need simplicity in their design. This simplicity does not imply a simple linear controller; on the contrary, the SDRE is a nonlinear sub-optimal controller with a relatively complex solution. However, the practical approach to handle the multi-agent system relies solely on the SDRE’s capabilities itself, and uses the weighting matrix of the states to handle distance constraints and obstacle avoidance. As a result, there would not be further complexity for practical implementation raised by the interaction of the SDRE with another layer or graph theory. This can be viewed as independent SDRE controllers working with neighbor components and their leader through a weighting matrix, which is inside the umbrella of the SDRE, not another system augmented by the multiagents or other complementary methods. This is the key to large-scale modeling and simulation–which is simplicity in the communication of the multi-agent system and reliance on the capacities of the SDRE itself.

The main contribution of this work is the presentation of a control structure for forming a multi-agent system with complex dynamics with obstacle and collision avoidance between the agents. Complex systems with a large number of agents are seldom reported; this work presents a suitable structure for that. This work adds collision and obstacle avoidance to SDRE multi-agent system control, motivated by the example set by [39], which reported solely multi-agent system SDRE without contact prevention. For very crowded systems, contact between the agents has been both expected and considered in the literature [30,32], but this work proposes a method that adds distance between the agents to reduce collisions during the control task.

The distance-based arrangement of agents, and avoiding collision between them, is basically the idea of multi-agent system control methods that use graph theory, potential field method, multi-layer designs, and the like; here in this work, the distribution, as well as obstacle, and collision avoidance, are structured within the SDRE formulation–specifically, the weighting matrix of states.

Section 2 presents the control structure of the SDRE and multi-agent system modeling. Section 3 describes the dynamics of the two case studies of this work, UAV, and WMR. The simulation results are presented in Section 4 and the concluding remarks are summarized in Section 5.

Control Structure: SDRE for Multi-agent Systems
Conventional Controller: A Brief Preliminary Formulation

Consider N number of agents, randomly distributed in a predefined area, with continuous-time nonlinear systems and affine-in-control properties. The j-th agent’s dynamic is represented in state-space form as: x.j(t)=fj(xj(t))+gj(xj(t),uj(t)), where xj (t) ∈ ℝn is the state vector and uj (t) ∈ ℝm is the input vector of the j-th agent. fj(xj (t)) : ℝn → ℝn and gj (xj (t), uj (t)) : ℝn × ℝm → ℝn in (1) are vectorvalued smooth functions that satisfy the local Lipschitz condition, with equilibrium point fj(0) = 0. The nonlinear system (1) is transformed into the statedependent coefficient (SDC) parameterization: x.j(t)=Aj(xj(t))xj(t)+Bj(xj(t))uj(t), where Aj(xj(t)) : ℝn → ℝn×n and Bj(xj(t)) : ℝn → ℝn×m are held.

The SDC parameterization (2) is referred to as apparent linearization; however, it preserves the nonlinearity of the system. For example, consider an over-actuated single-degree-of-freedom system in state-space form with n = 2 and m = 2, then the system vectors f(x)=[ x2x1x22cosx2+x1ex1 ],g(x,u)=[ 0x2u1+x1x2u2 ], can present a set of SDC matrices as f(x)=[ 01ex1x1x2cosx2 ]A(x)[ x1x2 ],g(x,u)=[ 00x2x1x2 ]B(x)[ u1u2 ].

The result of multiplication of A(x)x and B(x)u will be the same original values of f(x) and g(x, u); hence, the SDRE method is a nonlinear optimal control that keeps the nonlinearities in the control structure. Another point on SDC matrices is that they are not unique representations, and can show numerous versions, such as: f(x)=[ 01ex1+x22cosx20 ]A(x)[ x1x2 ],g(x,u)=[ 00x2+x1x2u2u10 ]B(x,u)[ u1u2 ].

B(x, u) in the second set of the SDC matrices is not ideal because it changes the problem into one with non-affine parameterization and singularity in the case of u1 = 0. The limitation and proper selection of SDC matrices can be summarized in controllability assumption, as follows.

Assumption 1

The pair of {Aj(xj(t)), Bj(xj(t))} is a completely controllable parameterization of nonlinear affine-in-control system (1) for the states of the j-th agent xj(t) ∈ ℝn in t ∈ [0, ∞) [56].

The control objective is to minimize the cost function: J=120{ xj(t)Qj(xj(t))xj(t)+ uj(t)Rj(xj(t))uj(t) }dt, where Qj(xj(t)) : ℝn → ℝn×n and Rj(xj(t)) : ℝn → ℝm×m are weighting matrices of states and inputs of j-th agent, respectively. Qj is symmetric positive semidefinite and Rj is symmetric positive definite.

Assumption 2

The pair of { Aj(xj(t)),Qj1/2(xj(t)) } is a completely observable parameterization of the non-linear affine-in-control system (1) for the states of j-th agent xj(t) ∈ ℝn in t ∈ [0, ∞), where Qj1/2 , in Eq. (3), is Cholesky decomposition of Qj [57].

The control law of the SDRE is [56]: uj(t)=Rj1(xj(t))Bj(xj(t))Kj(xj(t))xj(t), where Kj(xj(t)) is the positive-definite symmetric gain of the control law. The gain Kj(xj(t)) is the solution to the state-dependent Riccati equation: Aj(xj(t))Kj(xj(t))+Kj(xj(t))Aj(xj(t))+Qj(xj(t))Kj(xj(t))Bj(xj(t))Rj1(xj(t))Bj(xj(t))Kj(xj(t))=0.

Without loss of generality, the input law (4) is transformed to: uj(t)=Rj1(xj(t))Bj(xj(t))Kj(xj(t))ej(t), for controlling the system to a desired condition in which ej(t) = xj(t) – xdes,j(t) where xdes,j(t) is the desired state vector of j-th agent.

Formation Control

The definition of formation control is introduced by a set of time-varying desired conditions for the agents to follow the leader of the group. The first agent, j = 1, is the leader and the rest of the agents are followers, j = 2, …, N. The leader is regulated (pointto-point motion) from an initial condition, x1(0), to the desired condition, xdes,1(tf), in which tf is the final time of the control task. The schematic of the distribution of the agents, and the migration from the initial to the final condition, are illustrated in Fig. 1.

Figure 1.

A schematic view of the multi-agent system, moving from an initial randomly-distributed form to a final desired shape with arranged agents in an environment with an obstacle. The 2D representation is intended to show more detail; its formation is general and could be applied to 3D formation as well.

The rest of the agents follow the leader while they try to keep the predefined geometric formation during the task and sit at the exact predefined geometric formation at the end of the task: xdes,j(t)=x1(t)xf,1+xf,j,j=2,,N, where xf,j represents the desired coordinates (constant values) for the j-th agent in the predefined formation shape. Only the leader has a constant (timeinvariant) desired condition. Equation (6) shows that the time-varying desired conditions of the followers are fed by the online position of the leader and the constant desired formation. That means the followers try to keep the formation of the overall point-to-point trajectory of the leader from the initial to the final condition.

Remark 1

It should be pointed out that xf,j, is a state vector, and only the first two or three states are different from zero (those that represent the configuration space). The rest of the desired states could be set to zero. Common dynamics of multirotor UAVs usually have six degrees of freedom that provide 12 states for the system. For the UAVs, only three position states of Cartesian coordinate (x, y, z) must be considered in the formation. The orientation and velocity states might be independently controlled.

Collision Avoidance and Obstacle Avoidance Between Agents

The weighting matrix for the states of the SDRE controller can include nonlinear functions of state variables in its components. This property provides obstacle avoidance in regulation control based on artificial potential field method [52,58,59]. The distance between an obstacle and an agent is: do,j(t)= xp,j(t)xo 2, for j = 1, …, N where xp,j(t) is a part of state vector, which includes the position coordinates, xp,j(t) ∈ xj(t), and has xo as the position of the obstacle.

The next step is to define the distance between two consecutive agents for collision avoidance. The distance between the j-th and (j – 1)-th agent is: dj(t)= xp,j(t)xp,j1(t) 2, for j = 2, …, N, it should be noted that the leader does not need to include collision avoidance terms. The collision avoidance approach in Eq. (8) only avoids the impact between two consecutive agents. The topology of the agents is defined through the distributed pattern (6) where the main character is the time-varying position of the leader. Evidently, this pattern and leader information does not provide enough safety in the migration of the multi-agent system. Hence, the collision avoidance term was introduced (8) to add another safety condition for migration without impact. This feature creates communication between two consecutive agents for reading and sending the position data.

The index is assigned to a given agent randomly; as a result, two consecutive agents might not be neighbors at the initial condition and could even be far away from each other. This initialization might raise issues. The collision avoidance between neighbor agents (8) might not be effective and collision might happen between j-th agent and other members. So, it is not assumed that agents j and j – 1 are closed; however, starting the regulation control, they follow the leader and the pattern, and they merge to find their neighbor agents in a short period. To avoid impact and collision in a global scheme, the following terms will be considered as well.

To include the collision avoidance between the f-th agent and all previous members, the following condition is introduced: ds,j(t)={dtot,k,j(t),dtot,k,j(t)<rmin,1,dtot,k,j(t)rmin, for j = 3, …, N, and k = 1, …, j – 1 where dtot,k,j(t) = ||xp,j(t) – xp,k(t)||2, is the distance between the j-th agent and the k-th agent, and rmin is the minimum collision avoidance safety distance. The pseudocode of Eq. (9) is presented as

for j = 3, …, N,

    for k = 1, …, j – 1,

        dtot,k,j(t) = ∥xp,j(t) – xp,k(t)||2,

           if dtot,k,j(t) < rmin,

             ds,j(t) = dtot,k,j(t),

      else,

           ds,j(t) = 1,

      end,

    end,

end.

Equation (9) is updated at each time step of simulation when all the agents are moving from the initial to the final condition. In cases in which the j-th agent is close to one agent, the term ds,j(t) is updated in (9); otherwise ds,j(t) = 1. The different distance terms (do,j(t), dj(t), and ds,j(t)) will be used in Section 2.4 to set the obstacle and collision avoidance strategy based on the modification of Qj(xj(t)) matrix.

Definition of Weighting Matrix

The state-dependent weighting matrices of the SDRE are one of the advantages of this method over a linear quadratic regulator (LQR) controller. The weighting matrices of the LQR are constant and cannot include parameters related to trajectories such as–the introduced distance terms in Section 2.3– for obstacle avoidance, collision avoidance, or adaptive designs. The definitions of the weighting matrices for the leader and followers are different in order to gain different motion speeds. The leader must be slower and the followers must be faster to track the leader properly. The following definition of the weighting matrix will provide for this condition.

The weighting matrix of states for the leader only includes obstacle avoidance: Qm,m,1(x1(t))=1+1d0,1(t), where Qm,m,1 (x1 (t)) is the m-th diagonal component of Q1(x1(t)) related to m-th position state; m could be a state variable of leader’s dynamics, i.e. the second state, generated by Q2,2,1(x1(t)). It should be noted that only a few diagonal components of Q1(xj(t)) are responsible for obstacle avoidance. For the example presented in Remark 1, only the first two diagonal components of the weighting matrix include (10), and the rest of the 10 components are tuned conventionally. The UAV position variables are indeed defined in X, Y, Z, though the obstacle avoidance terms are set only for in X, Y; the results were satisfactory. The choice is with the designer, and in some cases, it can also be assigned for all three axes in 3D platforms.

Remark 2

If we notice the term 1+1do,1(t) in (10), 1do,1(t) might gain a very small value if the distance between the obstacle and the leader is far. The constant term “1” in Qm,m,1(x1(t)) makes sure that the velocity of the leader is properly defined when 1do,1(t)0 .

The followers j-th must avoid obstacles, as well as the collision between the other agents, through the matrix Qj(xj(t)): Qm,m,j(xj(t))=1do,j(t)+1dj(t)+1ds,j(t), for j = 2, …, N. If the distance terms become small, which shows there is a probability of collision between agents or obstacles and an agent, the diagonal component of Q gets bigger and that state deviates the trajectory of the system (in other words, it avoids the collision). Assigning all the position states for an obstacle, or between collision avoidance speeds up all the states simultaneously and the collision avoidance fails. More details could be revisited in [52]. The following is helpful to clarify this point. Imagine a mobile robot moves forward in a straight line and there is an obstacle on the way. The actuators are two motors on the wheels. If the obstacle avoidance mechanism commands both wheels to speed up at the same time, the robot just moves faster in a straight line. In order to move the robot to one side, only one wheel must rotate faster! Therefore, avoidance functions are introduced only in one or two components related to the Cartesian coordinates.

In this section, three different distance terms were introduced: do,j(t), dj(t), and ds,j(t). It must be noted that these distance terms will be applied in the weighting matrix of states, using the artificial potential field method. The distance terms increase the value of the weighting matrix, leading the agent to avoid the collision.

Each distance term plays its own role in the migration of the multi-agent system. do,j(t) avoids collision of the agent with the obstacle in the environment; dj(t) avoids collision of the two neighbor agents; and in case of the proximity of any previous agents to current agent, ds,j(t) will be activated to avoid the collision. In summary, the three components will contribute to the safety in case of collision of the agents with the obstacle and with other agents.

The definition of the rest of the components of the weighting matrix of states, as well as the inputs, could be done based on conventional tuning methods and rules [60].

System Dynamics and SDC Parameterization
Unmanned Aerial Vehicle

The dynamic equation of one quadrotor unmanned aerial vehicle (UAV) is presented in this section. The schematic view of the system and the configuration of the rotors are illustrated in Fig. 2. The agents are identical and share the same dynamics. The generalized coordinates of one system are {xc,j(t),yc,j(t),zc,j(t),ϕj(t),θj(t),ψj(t)} , where ξ1,j(t)=[ xc,j(t),yc,j(t),zc,j(t) ](m) defines the center-of-mass (CoM) of the j-th UAV in the Cartesian coordinate system, and ξ2,j(t)=[ ϕj(t),θj(t),ψj(t) ](rad) are its Euler angles. The linear and angular velocities of the drone are presented υ1,j(t) = [uj(t), vj(t), wj(t)](m/s) and υ2,j(t) = [pj(t), qj(t), rj(t)](rad/s), respectively. The state vector of the system is defined as: xj(t)=[ ξ1,j(t),ξ2,j(t),υ1,j(t),υ2,j(t) ].

Figure 2.

The schematic view and axis definition of a quadrotor.

The kinematics relations ξ˙1,j(t)=RZYX,j(ξ2,j(t))υ1,j(t),ξ˙2,j(t)=Tj(ξ2,j(t))υ2,j(t), are held for the quadrotor system. Considering that the quadrotor system flies based on small changes in the Euler angles, it is not supposed to perform aggressive and aerobatic maneuvers, and the hovering state is assumed for the flight condition.

Therefore, the derivative of state-vector (12) generates x.j(t)=[ ξ˙1,j(t),ξ˙2,j(t),υ˙1,j(t),υ˙2,j(t) ].

Assuming small changes in Euler angels during flight control, the derivative of the kinematics equation (13): ξ¨1,j(t)=R.ZYX,j(ξ2,j(t))0υ1,j(t)+RZYX,j(ξ2,j(t))Iυ˙1,j(t),ξ¨2,j(t)=T.j(ξ2,j(t))0υ2,j(t)+Tj(ξ2,j(t))Iυ˙2,j(t),

shows that in a hovering state, ξ¨1,j(t)υ˙1,j(t) and ξ¨2,j(t)υ˙2,j(t) are valid. As a result, the state-space representation of one agent is [61]: x.j=[ RZYX,j(ξ2,j)υ1,jTj(ξ2,j)υ2,j1mj{ RZYX,3,j(ξ2,j)TB,jmjge3Df,jξ˙1,j }Jj1(ξ2,j){ τB,jCj(ξ1,j,ξ2,j)ξ˙2,j } ], where mj(kg) is the mass of the drone, TB,j(t)(N) is the thrust force in Zc direction (upward), g = 9.81(m/s2) is gravity constant, e3 = [0, 0, 1], Df,j = diag(Dx, Dy, Dz)(kg/s) includes drag, aerodynamic effects, etc., and τB,j(t)=[ τϕ,j(t),τθ,j(t),τψ,j(t) ] (Nm) is the input torque vector. Moreover, the details of RZYX,j(ξ2,j(t)),Tj(ξ2,j(t)),Cj(ξ1,j(t),ξ2,j(t)) , and Jj(ξ2,j(t)) can be found in Section III of Ref. [61], and RZYX,3,j(ξ2,j(t)) is the third column of the rotation matrix RZYX,j(ξ2,j(t)) .

The quadrotor UAV is underactuated, and the control structure uses a cascade design, which controls the translation part of dynamics (14) in the first layer and the orientation in the second layer. The translation dynamics includes the states xt,j(t) = [xc,j(t), yc,j(t), zc,j(t), uj(t), vj(t), wj(t)], expressed in rows 1-3 and 7-9 of Eq. (14). The SDC parameterization considers full actuation for translation control: At,j(xt,j(t))=[ 03×3RZYX,j(ξ2,j(t))03×31mjDf,jRZYX,j(ξ2,j(t)) ],Bt,j=[ 03×31mjI3×3 ].

The component [ RZYX,3,j(ξ2,j)TB,j ]3×1 in Eq. (14) is a vector of thrust, and is distributed in the translation dynamics by the third component of the rotation matrix. Besides the SDRE design, there is a cascade controller as well, Eq. (17), to address the underactuation condition of the quadrotor.

In order to implement the cascade design and SDRE control, in the first step, the SDRE controller must be operated with the assumption of full actuation condition, which demands a Bt,j=[ 03×31mjI3×3 ] matrix of 6 × 3 dimension. If the third component of the rotation matrix enters the SDC parameterization, a scalar thrust term is obtained as the single input to the problem, which creates an obstacle in the design. This topic has been presented in previous implementations and exercised with different conditions, platforms, and controllers [6163], but it was originally proposed as cascade control design; details can be seen in Ref. [64].

Assigning weighting matrices with proper dimensions and solving the SDRE for the translation part: At,j(xt,j)Kt,j(xt,j)+Kt,j(xt,j)At,j(xt,j)Kt,j(xt,j)Bt,jRt,j1(xt,j)Bt,jKt,j(xt,j)+Qt,j(xt,j)=0, results in the sub-optimal gain Kt,j(xt,j(t)) for the control law: ut,j(t)=Rt,j1(xt,j(t))Bt,jKt,j(xt,j(t))et,j(t), where et,j(t) = xt,j(t) – xt,des,j(t) in which xt,des,j(t) includes the desired translation variables. The three components of the control law, in (15), are substituted in the thrust equation (16) [61]: TB,j(t)=mj{ RZYX,3,j(ξ2,j(t))(ut,j(t)+ge3) }.

Note that gravity is compensated for in (16), and for this reason, the SDC matrix At,j(xt,j(t)) excluded the gravity term. The second cascade control layer regulates the orientation of the drone to help the translation part, based on two constraints [62]: θdes,j(t)=tan1ut,1,j(t)cos(ψdes,j)+ut,2,j(t)sin(ψdes,j)ut,3,j(t)+g,ϕdes,j(t)=sin1ut,1,j(t)sin(ψdes,j)ut,2,j(t)cos(ψdes,j)ut,1,j2(t)+ut,2,j2(t)+(ut,3,j(t)+g)2, where ψdes,j (rad) is the desired constant yaw of the system and i.e. ut,2,j(t) is the second component of ut,j(t). The state vector of orientation control is set as x0,j(t)=[ ϕj(t),θj(t),ψj(t),pj(t),qj(t),rj(t) ] , and the relevant dynamics for orientation, rows 4–6 and 10–12 of (14), provides the SDC matrices: A0,j(x0,j)=[ 03×3Tj(ξ2,j)03×3Jj1(ξ2,j)Cj(ξ1,j,ξ2,j) ],Bo,j(x0,j)=[ 03×3Jj1(ξ2,j) ].

The SDRE for the orientation control is: Ao,j(xo,j)Ko,j(xo,j)+Ko,j(xo,j)Ao,j(xo,j)Ko,j(xo,j)Bo,j(xo,j)Ro,j1(xo,j)Bo,j(xo,j)K0,j(x0,j)+Q0,j(x0,j)=0, which generates the sub-optimal gain Ko,j(xo,j(t)) for the control law: u0,j(t)=Ro,j1(x0,j(t))Bo,j(x0,j(t))Ko,j(x0,j(t))eo,j(t), where eo,j(t) = xo,j(t) – xo,des,j(t) and xo,des,j(t) includes the desired orientation variables. Note that ϕdes,j(t) and θdes,j(t) in xo,des,j(t) are updated in each time-step of the simulation or experiment by (17).

The input thrust and torque vector to the system are generated by four rotating propellers: [ TB,j(t)τϕ,j(t)τθ,j(t)τψ,j(t) ]=Mx,jωj2(t), where ωj(t) ∈ ℝ4 is the angular velocity vector of the rotors and Mx,j=[ kkkk0lk0lklk0lk0dddd ], is the mixer matrix of the quadrotor, which is defined based on the “plus” configuration of the rotors in Fig. 2, in which k(Ns2/rad2) is the lift constant or thrust factor, l(m) is the distance between motor and CoM of the quadrotor, and d(Nms2/rad2) is the drag constant.

Remark 3

To regulate the system (2) to the desired condition, the equilibrium point of the system should be zero. Many systems, such as robotic manipulators, multirotor UAVs, etc., work under the effect of gravity, where fj(0) ≠ 0. For those systems, the gravity compensation should be considered as shifting the equilibrium to zero [65]. Here in Section 3.1, the gravity will be compensated for through the cascade control design of the UAVs, which performs translation control in the first layer and orientation control in the second one [62].

Wheeled Mobile Robot

Consider j-th wheeled mobile robot of a multiagent system with differential wheels working on {X, Y} planar Cartesian coordinates. The coordinates, and global and local axes, are shown in Fig. 3. The center of mass of the robot is defined by {xc,j(t), Yc,j(t)} (m), and the rotation of the robot around Z axis, measured from X axis (counter-clockwise indicates positive direction), is called ϕj(t)(rad). The system is subjected to two constraints showing that the robot cannot move alongside the axis of the wheels and that the wheels only have a pure rolling motion (rolling without slipping).

Figure 3.

The schematic view and axis definition of a differential-wheel mobile robot.

The first auxiliary relation is found by combining the constraints on the left and right wheels, ϕ˙j(t)=r2b(θ˙r,j(t)θ˙l,j(t)) . This integration generates a holonomic constraint [66]: ϕj(t)=r2b(θr,j(t)θl,j(t)).

The non-holonomic constraint, derived from the rolling condition of the wheels, is obtained [67]: x˙c,j(t)cosϕj(t)+y˙c,j(t)sinϕj(t)=r2(θ˙r,j(t)+θ˙l,j(t)).

Equation (19) is the holonomic constraint of the system and equations ( y˙c,j(t)cosϕj(t) x˙c,j(t)sinϕj(t)=0 ) and (20) are non-holonomic constraints, which could be represented as Aj(qj(t))q.j(t)=0 , where Aj(qj(t))=[ sinϕj(t)cosϕj(t)00cosϕj(t)sinϕj(t)r2r2 ], in which generalized coordinate vector of the j-th system is: qj(t)=[ xc,j(t),yc,j(t),θr,j(t),θl,j(t) ].

The dynamics equation of the mobile robot is the common form ofa second-order differential equation: Mj(qj(t))q..j(t)+cj(qj(t),q.j(t))=Eτj(t)Aj(qj(t))λj(t), where Mj(qj(t)) : ℝ4 → ℝ4×4 is the inertia matrix; the Coriolis-centrifugal vector is cj(qj(t),q.j(t)):4×44;E=[ 02×2,I2×2 ],τj(t)=[ τr,j(t),τl,j(t) ] is the input torque vector to the wheels; Aj(qj(t)) : ℝ4 → ℝ2×4 is constraint matrix (21); and λj(t) = [λ1,j(t),λ2,j(t)] is the vector of the Lagrange multipliers. The details of the inertia matrix and Coriolis vector can be found in Ref. [67]. One way to omitting unknown Lagrange multipliers from Eq. (23) is to find the null space of Aj(qj(t)): Sj(qj(t))=[ r2cos(ϕj(t))r2cos(ϕj(t))r2sin(ϕj(t))r2sin(ϕj(t))1001 ] and pre-multiply the transpose of Sj(qj(t)) to (23) to obtain [66]: Sj(qj(t))Mj(qj(t))q..j(t)+Sj(qj(t))cj(qj(t),q.j(t))=τj(t).

The angular velocities of the wheels are arranged in a vector vj(t)=[ θ˙r,j(t),θ˙l,j(t) ];q.j(t)=Sj(qj(t))vj(t) is introduced; the second time derivative of the generalized coordinates results in: q..j(t)=S.j(qj(t))vj(t)+Sj(qj(t))v.j(t).

The state-vector of the system is chosen as xj=[ qj(t),vj(t) ] . Substitution of q..j(t) from (24) into (25) will result in a state-space representation of the system: x.j=[ Sjvj(SjMjSj)1[ SjMjS.jvj+Sjcj ] ]+[ 04×2(SjMjSj)1 ]τj.

In the control scheme of WMR, the controlled output of the system is a point Pw(t), ahead of the CoM of the mobile robot; ws is the distance between the CoM of the robot; Pw(t) (see Fig. 3). The point is related to the state variables of the WMR through a kinematics relation. To control this point, the outputand statedependent Riccati equation (OSDRE) approach is needed [57]. The output of the j-th WMR is defined as yj(t)=[ xc,j(t)+wscosϕj(t)yc,j(t)+wssinϕj(t) ], and the first two components of Sj(qj(t))vj(t) results in x˙c,j(t)=r2(θ˙r,j(t)+θ˙l,j(t))cosϕj(t),y˙c,j(t)=r2(θ˙r,j(t)+θ˙l,j(t))sinϕj(t).

Substituting (27) into y.j(t) , and another time derivative y..j(t) results in: y..j(t)=αj(xj(t))v.j(t)+βj(xj(t)), where αj(xj(t)) : ℝ6 → ℝ2×2 and βj(xj(t)) : ℝ6 → ℝ2. αj(xj(t)) and βj(xj(t)) are a nonlinear matrix and vector, respectively, that have been generated during the computation of the derivative of the output. The symbolic representation of them can be revisited in [67]. αj(xj(t)) and βj(xj(t)) are kinematic transformations that change the dynamics of the mobile robot from a state to an output representation. Substituting v.j(t) from the lower set of (26) into (28) provides: y..j=αj{ (SjMjSj)1[ SjMjS.jvj+Sjcj ] }+βj+αj(SjMjSj)1τj.

By defining a new state-vector for the output dynamics (29), zj(t)=[ yj(t),y.j(t) ] , one could find the state-space representation of the output dynamic: z.j= [ y.jαj{ (SjMjSj)1[ SjMjS.jvj+Sjcj ] }+βj ] +[ 02×2αj(SjMjSj)1 ]τj.

The outputand state-dependent coefficient parameterization of system (30) is expressed as [57]: Aj=[ 02×2I2×202×2 [ αj{(SjMjSj)1[SjMjS.jvj+Sjcj}+βj]y.j ],Bj=[ 02×2αj(SjMjSj)1 ], where y.j(t) is pseudo-inverse of the velocity of the output y.j(t) . Finally, the control law is set in the form of: uj(t)=Rj1(xj(t))Bj(xj(t))Kj(xj(t))[ zj(t)zdes,j(t) ].

The necessary conditions of the output, controllability, and observability of the OSDRE, and condition on y.j(t) to avoid a singularity issue, were reported in [57].

Simulations
Multi-agent systems of UAV

A multi-agent system of N = 1,089 quadrotor agents is considered for a simulation that represents a regulation case from an initial to a final condition in tf = 20(s). The agents must form a square shape when they are launched and keep the shape during the regulation. As they approach the final condition, the square shape and arrangement of the agents get more precise, and the error in regulation also converges to zero. The pseudocode of the square pattern of agents for the simulation case of 1,089 drones. The following loop will define the local arrangement of the agents in a square shape during the regulation, presented in the pseudocode:

for k = 1, …,Nn,

    for j = 1+(k – 1)Nn,…,Nn + (k – 1)Nn,

       xc,f,j=(L+Da)2+jDa(k1)L ,

       yc,f,j=(L+Da)2+kDa ,

       zc,f,j = 0,

   end,

end,

where length of square side is denoted by L = 30(m), Nn = 33 is number of rows and columns, and Da=LNn=0.9091(m) .

The leader is the first agent of the system and guides the multi-agent system in point-to-point regulation from an arbitrary start to an endpoint. The 1,089 agents are initially scattered randomly inside a 10 × 10(m) square at the height of 10(m). The initial orientation angle and velocities of the translation and orientation states of the agents are zero. The three components of the pseudocode (xc,f,j, yc,f,j, zc,f,j) are the first set of vector xf,j ∈ ℝ12 in (6) for the quadrotor multi-agent system.

The position of the obstacle is (xo, yo, zo) = (15, 0.25, 6)(m), with a safety radius of 0.5 for collision avoidance. The final position coordinates for the leader are: xf,1=[ xc,des,1yc,des,1zc,des,109×1 ]=[ 20+xc,f,1yc,f,1zc,f,109×1 ], which is constant though the value for the rest of the agents is time-varying, presented in Eq. (6). This is a result of following the leader.

The physical characteristics of the agents are similar in the following way. The distance between the motor and the CoM of the quadrotor is l = 0.225(m); the radius of the propeller is R = 0.075(m); the lift constant is k = 2.98 × 10−6(Ns2rad−2); the drag constant is d = 1.14 × 10−7(Nms2rad−2); additionally Ixx = 4.856×10−3(kgm2), Iyy = 4.856×10−3(kgm2), and Izz = 8.801 × 10−3(kgm2) are moments of inertia around x, y, z axes. The mass of the drone is m = 0.468(kg), the aerodynamics matrix is Df = diag(0.25, 0.25, 0.25)(kg/s), the minimum and maximum rotor velocities are also ωmin,max = 496.48, 744.73(rad/s).

The motion of the multi-agent system is represented in Fig. 4. Because the agents are very close to each other at the beginning, and the collision avoidance scheme in tuning increases the Q matrix drastically, we see a rapid divergence at the commencement of the simulation. The beginning of the regulation is like an “explosion” of multiple agents escaping from collisions with each other. The Cartesian-coordinate errors of the agents in 3D positioning are illustrated in Fig. 5 for all agents, followed by a zoomed view that shows the leader first converging towards zero, then the followers catching up with the leader.

Figure 4.

The trajectories of the multi-agent system of 1,089 quadrotor UAVs.

Figure 5.

The error convergence of the multi-agent system of 1,089 quadrotor UAVs.

The weighting matrices of the leader are selected as: Rt,1=10×I3×3,R0,1=I3×3,Qt,1(x)=0.5×diag(1+1d0,1(x),1+1d0,1(x),1,2,2,5),Q0,1=diag(21×3,0.51×3)6×6.

The weighting matrices of the followers are: Rt,j=I3×3,R0,j=I3×3,Qt,j(x)=diag(1+QD(x),1+QD(x),1,2,2,5)Qo,j=diag(21×3,0.51×3)6×6, where QD(x(t))=1dj(x(t))+1do,j(x(t))+1ds,j(x(t)) , with ds,j(x(t)) set by (9).

Obstacle avoidance by embedding an artificial potential field into the SDRE has been reported on and analyzed in the literature. It has been stated that this method does not guarantee absolute obstacle avoidance, though it reduces the chance of impact significantly [52]. Figure 6 shows that the minimum distance between some agents and the obstacle was 30(cm). The collision avoidance between the agents is reported in Fig. 7. A group of drones in Fig. 7 merged to almost a 30(cm) distance; those are the ones at the beginning of rows, with the previous agents located at the end of previous rows. For the rest of the agents, the collision avoidance term, dj(t), shows a collision when it is lower than 0.48(m). The reason for the collision between the agents is the accumulation of 1, 089 drones in a 10 × 10(m) at the initial condition of the simulation. After the commencement of the simulation, the collision avoidance scheme guided the drones and increased the distance between them after 8 seconds.

Figure 6.

The distance between the drones and the obstacle for the multi-agent system of 1,089 quadrotor UAVs.

Figure 7.

The collision distance between the drones for the multi-agent system of 1,089 quadrotor UAVs.

Checking the collision between two consecutive agents does not necessarily show all the collisions. Other agents might also have collisions that are demonstrated in Fig. 8.

Figure 8.

The ds,j(t) parameter for the multi-agent system of 1,089 quadrotor UAVs. Samples are shown the axes instead of time as a result of the mesh presentation.

Contrary to dj(t), the parameter ds,j(t) checks the collision of j-th agent with all previous agents. The behavior of ds,j(t) is similar to dj(t); at the beginning, the number of collisions is higher, and moving towards the final pattern, the distance between the agents gets bigger, starting from first time-step and ending 30,000-th time-steps, discretized into 20 seconds of total simulation time (samples are shown in the axes instead of time as a result of the mesh presentation). The collision of the agents is obvious since their arrangement is quite ambitious, leaving only 42(cm) between two agents in the final square shape. It can be observed in Fig. 7 when the distance for some drones is less than 0.48(m).

The simulation of 1, 089 agents, presented in Fig. 4 might not reveal the complexity of the multi-agent system dynamics, therefore, the data of the simulation for the 5th agent is presented. The position and velocity states of the 5th drone are presented in Figs. 9 and 10, respectively. The equation of the mixer matrix (18) presents the input angular velocities of the rotors, limited to the lower and upper bounds, Fig. 11.

Figure 9.

The position and orientation states for the 5th quadrotor UAV.

Figure 10.

The linear and angular velocity states for the 5th quadrotor UAV of the multi-agent system.

Figure 11.

The angular velocities of the rotors for the 5th quadrotor UAV.

WMR

A group N = 45, agents was chosen for a regulation simulation over tf = 15(s) with circular final arrangements of the agents. The number of agents was selected smaller in comparison with Section 4.1 to highlight the role of obstacle avoidance in the results. The following algorithm defines the arrangement of the agents in a circular pattern:

for j = 1, …, N,

    rp = 1,

    co = rp,

    xc,f,j=xshift+rpcos(j1)2πNc, ,

    yc,f,j=yshift +rpsin(j1)2πNc, ,

   for k = 1, …, Nr,

        ifj>k(k+1)Nc2,

           co = co + rp,

            xc,f,j=xshift +rpcos(jNc1)2πrpNcco,

            yc,f,j=yshift+rpsin(jNc1)2πrpNcco ,

     end,

   end,

end,

where (xshift, yshift) = (40, 20)(m) defines the center of the circular pattern; Nc = 3 is the number of agents in the first circle; Nr = 5 is number of rings; N=Nr(Nr+1)Nc2=45 is total number of agents; and rp = 1(m) is the distance between the rings.

The position of the obstacle is set as (xo, yo) = (25, 10)(m) with a safety radius of xmin = 0.5(m). The agents are randomly scattered in a 10 × 10(m) square at the origin. These random scattered agents are almost 40(m) away from the desired position of the pattern.

The physical characteristics of the mobile robots were chosen as follows: the radius of the wheels is denoted by r = 0.08(m); the length of the wheels’ axis by b = 0.145(m); mass of base is mb = 6(kg); mass of wheels is mw = 0.32(kg); moment of inertia components are Izz,b = 0.06363,Izz,w = 0.00006, Iyy,w = 0.00081(kgm2); and the distance of the look-ahead control term is selected as ws = 0.1(m).

The weighting matrices of the leader are set: R1=I2×2,Q1(x(t))=diag(0.5,0.5+1(d0,1(x(t))2)2,2,2).

The weighting matrices of the followers are selected as: Rj=I2×2,Qj(x(t))=diag(0.5,0.5+QD(x(t)),2,2), where QD(x(t))=1dj(x(t))+1ds,j(x(t))+1(d0,1(x(t))2)2, in which dj=(xc,jxc,j1)2+(yc,jyc,j1)2 .

The trajectories of the leader and follower agents are illustrated in Fig. 12. The leader and the followers avoided the obstacle, which are represented in the trajectories. The error convergence of the multi-agent system is presented in Fig. 13. The obstacle and collision avoidance terms are plotted in Figs. 14 and 15, respectively. The input signals and state variables of one agent are presented in Figs. 16 and 17, respectively. The error of the leader and 5th follower is found at 122.6741(mm) and 411.7172(mm) respectively. It is important to notice that the error is a function of the look-ahead control parameter which was set ws = 0.1(m). This means that the robot is regulating the look-ahead point towards the desired condition, not the CoM of the robot. If ws is reduced, the error changes. The overshoot and the behavior of the regulation also change as the look-ahead parameter varies. A comparative table is presented to clarify this effect in Table 2. Another simulation was done for 1, 050 WMRs to show the capacity of the method for a largepopulation multi-agent system. For the sake of brevity, only the migration of the systems from initial to final condition, following the leader and the pattern, is illustrated in Fig. 18.

Figure 12.

The trajectories of the leader and 45 follower agents.

Figure 13.

The errors of the leader and 45 follower agents.

Figure 14.

The obstacle avoidance performance of the leader/follower system of mobile robots.

Figure 15

The collision avoidance performance of the leader/follower system of mobile robots.

Figure 16.

The input signals of the wheels for one of the agents.

Figure 17.

The state information for one of the agents.

Figure 18.

The migration of 1,050 mobile robots in leader-follower formation.

The error of the system under different ws values.

ws(mm) error leader (mm) error 5th follower (mm)
20 144.3912 911.3158
50 76.4762 345.8130
80 104.3379 388.0787
100 124.3148 427.7242
200 224.7471 630.6206
300 324.6205 827.9986
400 424.4897 1025.9861
500 524.4070 1223.8401
Tuning of Weighting Matrices

There are some general rules for the selection of weighting matrices of the SDRE. Q penalizes the states and R penalizes the inputs. Since the inverse of R is the first term of the control law, a decrease in R increases the input signal, which results in more energy consumption. Increase in Q also increases the precision of the state regulation. In this work, the weighting matrix of states has obstacle/collision avoidance terms for multi-agent control design. The definition of the weighting matrices for the leader and followers must be different. If the leader is too fast, the followers might not follow the leader with proper accuracy.

Hence, the diagonal components of Q relative to the position coordinates of the leader are smaller than the ones for the followers for both simulations of the UAVs and WMRs. Another important point for tuning the robots is the contrast between the weighting matrices’ value and the distance terms! If large values are selected for Q, the changes due to distance terms vanish, and collision/obstacle avoidance terms are not effective. These points were considered i n the tuning of the matrices in the simulation sections. Considering the aforementioned point, it must be said that several trials and errors are needed to tune the controller.

Conclusion

This work presents an SDRE control design (without augmentation of other techniques) for the formation regulation of a multi-agent system of robotic systems that takes into account the complex dynamics of the agents. The command to the multi-agent system is given to the leader, and the followers pursue the leader in accordance with the predefined pattern of the multi-agent system. Obstacle and collision avoidance among the agents was designed through the modification of the weighting matrices of states and the artificial potential field method. The application of the proposed method was devoted to the control of unmanned multi-copter systems and differential wheeled mobile robots. A highly populated multi-agent system of 1,089 agents was simulated for the UAV, and another simulation was presented for the WMRs with 45, and then 1,050 agents which were successfully regulated in the control task, preserving the expected formation shapes.