An intelligent campus simulation system based on game

As school is the main carrier to promote social development and individual development, the construction of an ‘intelligent’ campus has gradually become an important link in constructing an intelligent city in the macro context of social informatisation [1] while the diversity of campus safety characteristics is also one of the main concerns in the process of intelligent campus construction. Meanwhile, the campus security problems of different universities are characterised by dense personnel, complex and diverse environments, and openness, which bring difficulties to the security defence, decision-making, and governance of campuses of colleges and universities. At present, the traditional management mode can no longer meet the needs of constructing an intelligent campus. Following the development of new technologies, although advanced artificial intelligence, security game, digital twin and other technologies are gradually specifically applied in different types of public safety governance issues in smart parks, there are few related studies conducted from the comprehensive perspective of the development of intelligent campus security issues. As a consequence, this paper has unfolded the three aspects subsequently, namely, the identification and detection of accidents when they are occurring on campus, the tracking of abnormal targets, together with the security resource path planning.

1.1

Abnormal target identification

In the current study regarding abnormal behaviour identification, the feature extraction methods based on human appearance and movement information require to extract the specific features, while the preprocessing of data is also complicated. Behaviour identification based on deep images asks for deep data, and its acquisition cost is relatively high. Comparatively speaking, behaviour identification based on a convolutional neural network is more suitable for this study Since defining specific extracted features is unnecessary in this method, it is better to directly learn useful features from the raw data through less preprocessing and hidden nodes. Although C3D [2], P3D [3], and LSTM [4] networks generate good results, C3D and P3D fail to notice local information, and the training of LSTM itself is difficult. Moreover, strict iterative comparison in sequence order affects the training efficiency. As a result, this paper plans to design a network to learn the spatiotemporal information in the video added with the attention mechanism to focus on the local information, thus improving the network performance, and realising the identification of common abnormal behaviours on campus.

1.2

Abnormal target tracking tag

With the development of artificial intelligence, in current research, the influence generated by the inter-agent interactions is gradually applied to the decision-making process, and network structures are designed based on this. The purpose of introducing multi-agent reinforcement learning to form Lenient DQN is also to accelerate the speed of cooperative learning. From another perspective, it is to design the corresponding networks to solve the problem of multi-agent learning. For instance, in 2018, Deepmind proposed a VDN network. The VDN network wishes to allocate the global reward signals so that each agent can learn faster. The RFM proposed by Tacchetti et al. is to use a pre-trained graph network to predict the behaviour of other teammates, anticipating learning to be a better feature expression, which will be used to learn more efficiently about learning speed and cooperation.

In terms of applying reinforcement learning, current research are still constrained by the instability of the environment where the intelligent agent is perceived by the setting of the reward function framework, etc. In the NLP task, training the model mainly relies on reducing the gap between the expected value and the real label while in the intelligent agent, agent performance mostly depends on the designer's accurate understanding and design of the reward signal function and environmental instability, which reflect the accuracy degree and the stability degree of modelling. Accordingly, the key to applying reinforcement learning to unstable environments is the setting of the reward function and unstable environments. We can inject the idea of game theory into the reinforcement learning methods, attempt to remove the obstacles brought about by the unstable environment, and build a bridge between the applications. Starting from the two perspectives of environmental modelling and solving process, we reduce the performance decline caused by environmental instability. Reward design based on the attention mechanism can be used to explain the agent behaviour of reinforcement learning and improve the interpretability of reinforcement learning. The concept of attention can be utilised to encode a strategy tendency of interactive behaviour in the environment so that we can code the behaviour data produced by various participants, and map them to the motivation space. Then, we can differentiate strategy styles through methods like Cluster, employ the evolutionary strategy to encourage to the generation of a new strategy style different from the strategy base, obtain a diversified strategy base, and also inspire more environmentally adaptive methods and practical applications of reinforcement learning consequently.

1.3

Security resource path planning

Path planning is one of the main research contents of motion planning, which consists of path planning and trajectory planning. The sequence point or curve connecting the starting and ending positions is called a path. There are many studies on path planning, and the main research contents can be summarised as [5]: (1) Intelligent search algorithm; (2) Artificial intelligence algorithm; (3) Geometric model algorithm; (4) Local obstacle avoidance algorithm.

Shortest path planning has always been a hot issue and plays a significant role, especially in resource allocation, path planning and other directions. Recently, the Dijkstra algorithm has been extensively used in numerous fields, such as optimisation, image processing and grid processing. In pace with the rapid development and change of traffic, new requirements are put forward for the efficient operation of the Dijkstra algorithm, and the algorithm optimisation for solving the shortest path problem has always been a research hotspot of experts and scholars. For instance, Liu et al. [6] studied the ‘intersection path’ and ‘loop path’ problems in the Dijkstra algorithm and proposed an improved Dijkstra algorithm method for the above problems. Furthermore, in studying choosing the most feasible path at least cost or time, Tamatjita and Mahastama [7] applied the case using the Dijkstra algorithm on the graph representing the street paths with two possible digraphs, to name just a few.

Considering the characteristics of campus roads, this paper takes optimising Dijkstra algorithm as the core to design and implement the path planning system. The research ideas of this paper are as follows: (1) Data processing, preprocessing the data set to make the research go more smoothly; (2) Data analysis, studying the characteristics of the data, is the basis for selecting appropriate research methods; (3) Strategy selection, when anomalies are identified During the behaviour, choose the best tracking strategy, and plan the optimal path to transport the material to the accident site; (4) Set problem optimisation, analyse the remaining problems and optimise them.

Experimental procedure

2.1

Data process

2.1.1

Data set

The dataset used in this paper is obtained from the CASIA behaviour analysis dataset of the Chinese Academy of Sciences, which consists of 422 video data with a frame rate of 25 fps. The dataset is shot from three angles, that is, horizontal angle, oblique angle and overlooking angle with single-person behaviour and multi-person interaction behaviour. Each video is recorded between 5 s and 30 s, depending on different behaviours.

2.1.2

Extract video frames

There are other behaviours unrelated to this behaviour in the video. To avoid affecting the model performance by the irrelevant behaviour, it is initially needed to delete these behaviours in the video. Since there are too many overlaps between adjacent frames, there is a lot of overlapping information between them. Thus, a lot of redundant calculations will be encountered when extracting features, which affects the performance of the model. To reduce redundant information and improve operational efficiency, this paper employs the method of extracting N-frame images at the same time interval.

2.1.3

Dataset augmentation

The process of training a model is very complex and requires a lot of calculations. If all kinds of behavioural video data are insufficient, it is easy to cause model over-fitting. In general, the larger the data volume, the more robust the model is. To increase the generalisation ability of the model, we use the method of data augmentation to increase the data amount Before training the model, the data is augmented. We use six augmentation methods to augment and save the augmented results, that is, the upper right corner cutting, upper left corner cutting, lower right corner cutting, lower left corner cutting, centre cutting, and horizontal flip of the above enhancement results and the original image, the final data set is enlarged 11 times than the original data set size, which can increase the robustness of the model, and provide enough data set for training. In each training round, the pictures are randomly cut to 112×112 and fed into the network.

2.2

Abnormal behaviour identification model construction

Convolutional neural networks (CNN) is one of the representative algorithms of deep learning, which has the characteristics like reducing the data preprocessing work, requiring no manual extraction of the feature information, and possessing excellent learning ability, etc In this paper, considering that the datasets are a characteristic of a video, a network structure that can learn the spatiotemporal information in a video is designed, which is shown in Figure 1. After the input image passes through the 3DCNN convolutional layer, the obtained features are entered into the improved attention mechanism. After the obtained result passes through the fully connected layer, we obtain the classification results by softmax.

Convolutional neural network architecture of spatiotemporal fusion

2.2.1

CNN model

The 3DCNN in this paper are made up of overlapping convolutional and pooling layers. The 3DCNN, different from 2DCNN, can not only extract spatial features in the video but also learn temporal information in it. The pooling layer downsamples, removes redundant features and presses features to reduce computation. The network takes the 16-frame pictures of 16×112×112×3 as the input and obtains the feature vectors through eight convolution layers and five pooling layers. Whereinto, the convolution core size is 3×3×3 and the step size is 1. In the pooling layer, all the pooling window is 2×2×2 with 2×2×2 step size except the first pool layer which is 1×2×2 in pooling window and 1×2×2 in step size.

2.2.2

Attention mechanism

The CBAM [8] attention mechanism model contains two independent sub-modules, namely, the channel attention module (CAM) and the spatial attention module (SAM). It is a lightweight attention module, with negligible cost when added to other models, and can be trained in an end-to-end manner together with the original model. Woo et al. [8] added it to ResNet and MobileNet, significantly improving the model performance. For a given feature map, CBAM will compute the attention map along the path attention channel and the spatial attention channel, and then multiply the attention map by the input feature map to conduct feature learning. In order that the 3DCNN can learn features better, the CBAM model can be improved to be combined with 3DCNN. The specific refined model is shown in Figure 2.

After entering the input feature F_3D ∈ R^T^*^W^*^H^*^C, and passing the path attention module M_c3D ∈ R¹^*¹^*¹^*^C, the convolution result multiplies the input feature to obtain the output of the path attention module, which is input to the SAM M_s3D ∈ R^T^*^W^*^H^*¹. By multiplying the result with the input feature, the final output result of the improved attention model is obtained. The formula is as follows: (1) $F_{3 D}^{1} = M_{c 3 D} (F_{3 D}) \otimes F_{3 D}$ {\rm{F}}_{3{\rm{D}}}^1 = {{\rm{M}}_{{\rm{c}}3{\rm{D}}}}\left( {{{\rm{F}}_{3{\rm{D}}}}} \right) \otimes {{\rm{F}}_{3{\rm{D}}}} (2) $F_{3 D}^{2} = M_{s 3 D} (F_{3 D}^{1}) \otimes F_{3 D}^{1}$ {\rm{F}}_{3{\rm{D}}}^2 = {{\rm{M}}_{{\rm{s}}3{\rm{D}}}}\left( {{\rm{F}}_{3{\rm{D}}}^1} \right) \otimes {\rm{F}}_{3{\rm{D}}}^1

CAM mainly focuses on which paths contribute to the final classification. It adopts global average pooling and global maximum pooling. The two pooling are more conducive to extracting richer high-level features. The specific formula is as follows: (3) $\begin{array}{l} M_{c 3 D} (F_{3 D}) = σ (MLP (AvgPoll 3 D (F_{3 D})) + MLP (MaxPoll 3 D (F_{3 D}))) \\ = σ (W_{1} (W_{0} (F_{avg}^{c})) + W_{1} (W_{0} (F_{Max}^{c}))) \end{array}$ \matrix{ {{{\rm{M}}_{{\rm{c}}3{\rm{D}}}}\left( {{{\rm{F}}_{3{\rm{D}}}}} \right) = \sigma \left( {{\rm{MLP}}\left( {AvgPoll3D\left( {{F_{3D}}} \right)} \right) + {\rm{MLP}}\left( {MaxPoll3D\left( {{F_{3D}}} \right)} \right)} \right)} \hfill \cr {\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; = \sigma \left( {{W_1}\left( {{W_0}\left( {F_{avg}^c} \right)} \right) + {W_1}\left( {{W_0}\left( {F_{Max}^c} \right)} \right)} \right)} \hfill \cr }

First, the feature F_3D is entered, and two features of 1 × 1×1×C are obtained by the two pooling, and sent to the MLP network. After summing the corresponding elements of the output of MLP and passing the sigmoid activation function, the final feature is obtained, which is multiplied with the input feature F_3D to obtain the feature required by the CAM.

SAM mainly focuses on which positions in the image can affect the final classification. It also adopts global average pooling and global maximum pooling. The specific formula is as follows: (4) $M_{s 3 D} = σ (f^{7 * 7} ([AvgPoll 3 D (F_{3 D}^{1}); MaxPoll 3 D (F_{3 D}^{1})])) = σ (f^{7 * 7} ([f_{Avg}^{s}; f_{Max}^{s}]))$ {{\rm{M}}_{{\rm{s}}3{\rm{D}}}} = \sigma \left( {{{\rm{f}}^{7*7}}\left( {\left[ {AvgPoll3D\left( {F_{3D}^1} \right);MaxPoll3D\left( {F_{3D}^1} \right)} \right]} \right)} \right) = \sigma \left( {{{\rm{f}}^{7*7}}\left( {\left[ {f_{Avg}^s;f_{Max}^s} \right]} \right)} \right)

Whereinto $F_{3 D}^{1}$ {\rm{F}}_{3{\rm{D}}}^1 is the output feature of the CAM used as the input to the SAM. After passing through two pooling layers, $F_{3 D}^{1}$ {\rm{F}}_{3{\rm{D}}}^1 obtains two T×H×W×1 features, and then the two features are stitched together. After a 7×7 convolution, then the final feature is obtained through sigmoid, and this feature and the input feature $F_{3 D}^{1}$ {\rm{F}}_{3{\rm{D}}}^1 are multiplied to obtain the output feature.

2.3

Game modelling and solution

It is supposed that both the manchasers and the runaway are rational [9] participants, that is, both the manchaser and the runaway choose their own strategies according to the principle of maximising their own returns.

2.3.1

Model definition

Define 1 multi-agent mobile target tracking strategy selected model Multi-Agent Deep Deterministic Policy Gradient (MADDPG), described by six-agent group (E, A, D, P, $\tilde{P}$ {\rm{\tilde P}} , U).

E= {Ea, Ed} is the action space. Whereinto, Na is set as the manchaser and Nd as the runaway. The experiment grids the map, that is, the movement space of the manchaser and the runaway is seen as five movements: up, down, left, right and static, that is, in the hunting process, the runaway is first given a random orientation probability, and the manchaser makes his own strategy according to the position information of the runaway.

A= {a1, a2,…, am} is the hunting strategy set, where m ≥ 1.

D= {d1, d2,…, dk} is the escape strategy set, where k ≥ 1.

P is the set of prior probabilities of the manchaser to the movements of the runaway, representing the initial judgement of the direction position of the runaway by the manchaser, where P= {p1, p2,…, pn}, pi > 0.

$\tilde{P}$ {\rm{\tilde P}} is the set of posterior probabilities of the manchaser, indicating that the manchaser updates the inference of the runaway's position by the Bayes’ rule after the manchaser observes the escape strategy, where, $\tilde{P} = \tilde{p} ij (θ i | aj)$ {\rm{\tilde P}}\;{\rm{ = }}\;{\rm{\tilde pij}}\left( {\theta {\rm{i}}|{\rm{aj}}} \right) , 1 ≤ i ≤ n, 1 ≤ j ≤ m.

U = {Ua, Ud} is the set of revenue functions of the runaway and manchaser, and different hunting strategies have different revenue values.

2.3.2

Game algorithm

Algorithm 1

Optimal active hunting strategy selection algorithm for the hunting game

Input: Hunting game model

Output: Optimal hunting strategy

Begin

(Φa={θ}1,θ 2,…,θ n,Φd={θ d});

//Initialize the escape action space and the hunting action space

(P={p}1,p2,…,pn);

//Initialize the escape action prior probability

(A={a}1,a2, …,am, D={d1,d2, …,dk});

//Initialize the policy set

While (aj??Adh??D) //Calculate the proceeds

{

Bayes′

(\tilde{p} (θ | a))

\left( {{\rm{\tilde p}}\left( {\theta |{\rm{a}}} \right)} \right)

//Calculate the posterior probabilities

Ua(θ)i,aj,dh=SLC(a)j+DC(d)h,θ i-AC(a)j,θ i;

Ud(a)j,dh,θ i=SLC(a)j+AC(a)j,θ i-DCh-DSR(θ)i,aj,dh;}

for(i=1;i≤ s;i++)

//The s is the number of stages in the game process

{

d * (a) \in argmax \sum θ = \tilde{p}, (θ | a) Ud (a, d, θ)

{\rm{d}}*\left( {\rm{a}} \right) \in {\rm{argmax}}\sum \theta = {\rm{\tilde p,}}\left( {\theta |{\rm{a}}} \right){\rm{Ud}}\left( {{\rm{a}},{\rm{d}},\theta } \right)

;

a*(θ)∈ maxUa(a,d)*(a),θ;

//Calculate the optimal escape and hunting strategies

Bayes′

(\tilde{p} (θ | a))

\left( {{\rm{\tilde p}}\left( {\theta |{\rm{a}}} \right)} \right)

// Use the Bayes’ rule to calculate the posterior probability of the escape action

Create

(d * (a), a * (θ), \tilde{p} (θ | a))

\left( {{\rm{d}}*\left( {\rm{a}} \right),{\rm{a}}*\left( \theta \right),{\rm{\tilde p}}\left( {\theta |{\rm{a}}} \right)} \right)

;

//EQ Construct a refined Bayes’ equilibrium solution EQ

OutPu (td*(a));

//Output the optimal hunting strategy in this stage

}

End

The runaway action is first assigned a prior probability that can be obtained by historical experience or equally assigned probability [10]. The manchaser adjusts his tracking strategy by observing the escape behaviour and constantly correcting the judgement of the runaway's position. The game order is as follows:

Select an action Ei from the action space Ea of the runaway Na at a certain probability, where Ei∈ Ea. The runaway Na knows Ei and manchaser Nd does not, but the manchaser Nd has the inference on Ei, that is to say, the manchaser knows the prior probability of the runaway's action.

The runaway Na selects an escape strategy Dw from his strategy space D after observing Ei.

The manchaser Nd observes that the escape strategy is aj and then applies the Bayes’ method to obtain the posterior probability from the prior probability, and selects a chase strategy Ah from his strategy space A.

The proceeds of both the manchaser and the runaway are obtained after calculating by Ua and Ud, respectively. The game ends in two cases [11, 12]: the hunting strategy of the manchaser can completely block each escape path; the runaway has succeeded to escape.

2.3.3

Refined Bayes’ equilibrium solution

The calculation steps of the refine Bayes’ equilibrium solution are as follows:

The manchaser establishes posterior probability inference $\tilde{p} (θ i | aj)$ {\rm{\tilde p}}\left( {\theta {\rm{i}}|{\rm{aj}}} \right) on each meshed location information set.

The manchaser deduces the optimal hunting strategy d * (aj). When the manchaser collects the escape strategy a of the runaway, the escape optimal strategy d * (a) is selected based on the posterior judgement of the action θ of the runaway $\tilde{p} (θ | a)$ {\rm{\tilde p}}\left( {\theta |{\rm{a}}} \right) to maximise his expected game proceeds that is, calculate through (5) $d^{*} (a) \in argmax \sum_{θ = 1} \tilde{p} ((θ | a) U_{d} (a, d, θ)$ {d^*}(a) \in argmax \sum\limits_{\theta = 1} {\rm{\tilde p}}(\left( {\theta |a} \right){U_d}(a,d,\theta ) and obtain the optimal strategy d * (a).

The runaway infers the optimal escape strategy a * (θ). The runaway predicts that the manchaser will choose the optimal strategy d * (a) based on observing his own escape strategy a, so he chooses the optimal escape strategy a * (θ) so that his game can gain the expected maximum Ua, that is, calculated by (6) $a^{*} (θ) \in \max U_{a} (a, d^{*} (a), θ)$ {a^*}(\theta ) \in max {U_a}(a,{d^*}(a),\theta ) and obtain the optimal strategy a * (θ).

Refined Bayes’ equilibrium solution (a*(θ),d*(a), $(a^{*} (θ), d^{*} (a), \tilde{p} ((θ_{i} a_{w}))$ ({a^*}(\theta ),{d^*}(a),{\rm{\tilde p}}(\left( {{\theta _i}{a_w}} \right))

The subgame refined Nash equilibrium (a * (θ), d * (a)) gained in steps (2) and (3) is used to obtain the inference ${\tilde{p}}^{*} = {\tilde{p}}^{*} (θ | a)$ {{\rm{\tilde p}}^*} = {{\rm{\tilde p}}^*}(\theta |{\rm{a}}) of the location information of the runaway satisfying the Bayes’ rule. If ${\tilde{p}}^{*} (θ | a)$ {{\rm{\tilde p}}^*}(\theta |{\rm{a}}) does not conflict with ${\tilde{P}}^{*} (θ | a)$ {{\rm{\tilde P}}^*}(\theta |{\rm{a}}) , the refined Bayes’ equilibrium is solved as: (7) $EQ = (a^{*} (θ), d^{*} (a), {\tilde{p}}^{*} (θ | a))$ EQ = \left( {a*\left( \theta \right),d*\left( a \right),{\rm{\tilde p}}*\left( {\theta |a} \right)} \right)

According to the game theory [13], the strategy in equilibrium is the optimal choice for both sides, so the manchaser believes that d * (a) is chosen as the optimal hunting strategy.

2.4

Optimal path modelling of campus security resources

The shortest path refers to the shortest path from the starting point to the end point and the smallest sum of path weights. However, after the emergency occurring on the campus, besides requiring the shortest path of demanded material arriving at the demanded point, restrictions such as the effective time of delivery of materials are added. Comprehensively analysed, the so-called security resources scheduling ‘optimal path’ refers to that when a security event happens at one point on the campus, and within the required time scope, it ensures the supplies are delivered in the shortest path, hoping to reduce more unnecessary losses generated in the delivery process. Since there is not only one site for campus materials to be stored, thus it involves more material points of the in-campus security material scheduling ‘optimal path’. Therefore, it is necessary to improve according to the actual situation of the campus based on the existing refined Dijkstra [14]. (8) $a^{2} + d_{1} = 0$ {a^2} + {d_1} = 0 (9) $d_{j} = min_{i \neq j, (v_{i}, v_{j}) \in E} {d_{j} + l_{ij} w_{ij}} = min_{i \neq j, (v_{i}, v_{j}) \in E} {d_{j} + \frac{t_{ij}}{k_{ij}} w_{ij}} (j = 2, 3, \dots, n)$ {d_j} = \mathop {\min }\limits_{i \ne j,({v_i},{v_j}) \in E} \left\{ {{d_j} + {l_{ij}}{w_{ij}}} \right\} = \mathop {\min }\limits_{i \ne j,\left( {{v_i},{v_j}} \right) \in E} \left\{ {{d_j} + {{{t_{ij}}} \over {{k_{ij}}}}{w_{ij}}} \right\}(j = 2,3, \ldots ,n) (10) $d_{j} < d_{\max}$ {d_j} < {d_{max }}

Whereinto, v_i is the node in the mapped network, E is the set of edges from v_i to v_i, l_ijw_ij is the effective path length, d_j is the optimal path, d_max is the longest path, k_ij is the time length influence factor, and k_ij is the passage efficiency. One of the multiple storage points in the map is selected to mobilise supplies for the attack point and develop the optimal path planning platform, see Subsection 3.

Results and discussion

3.1

Abnormal behaviour recognition

We first label each class of behaviours to distinguish between different behaviours, and divide 75% of the dataset into the training set and 25% into the test set. When training at 250 Epoch, the model accuracy and loss values are nearly stationary, and all the parameters of the model are approximately converged. Figure 3 reflects the changes in accuracy and loss values during training, with the network losing values around 0.1 after training over 250 Epoch.

Loss and accuracy changes during training

To verify the advantages of the network in the paper, the present network and several other network structures are compared. The method used by Karpthy et al. [15] achieved an average accuracy of 79.9% on the CASIA dataset and the method used by Koesdwiady et al. [16] tested on the CASIA dataset obtained 84.2%. Wang and Gao [17] proposed the method of dual-stream convolutional neural network fusion with spatiotemporal space, which achieved 88.3% accuracy on CASIA. However, our proposed method achieves an accuracy of 89.6% on the CASIA dataset, as shown in details in Table 1.

Table 1

Accuracy comparison of several methods

Method used	Accuracy (%)

Karpthy et al. [15]	79.9
Koesdwiady et al. [16]	84.2
Wang and Gao [17]	88.3
The method in this paper	89.6

To verify the effect of the added attention mechanism, we compare the effect of the 3DCNN network without the attention mechanism, and also compare the added positions, both adding the attention mechanism after five convolution layers and in the last convolution layer. The experimental results are shown in Table 2. The added attention mechanism is better for identifying the classification. Although the attention model helps the identification, the excessive number of layers makes the generalisation performance of the model worse.

Table 2

Effect comparison of the attention mechanism

Number of attention mechanism layers	Accuracy (%)

No increase	85.2
5	87.6
1	89.6

3.2

The game experiment and simulation

In this paper, we demonstrate the improved algorithm while introducing game ideas based on the existing algorithms, and building our own hunting game test environment. We first chose to improve the DQN series algorithm and compare it with the original algorithm. The experience playback method of the algorithm adopts the priority experience playback (PER). In the subsequent supplementary experiments, we improve and compare the algorithms of the strategy gradient class to further prove the reliability and superiority of the algorithm.

3.2.1

Experimental environment

In the experimental environment, n manchasers must cooperate with each other to chase a runaway in a 2-dimensional discrete k×k grid environment. We assume that the grid is bounded by the wall, that is, if moving in a certain direction causes the manchaser or the runaway to hit the wall, the movement fails, and the agent remains in the same grid. The environmental state includes the positions of all manchasers d1t, d2t,…, dnt and the location of the runaway gt. This ambient diversely runs, and for each episode, the manchaser's initial locations d10, d20,…, dn0 and the runaway's initial position g0 are uniformly sampled, so that no two locations are the same. Each round is composed of many time steps. At each time step, the manchaser performs one of the five possible actions: up, down, left, right, or static. If there is more than one manchaser moving into the same grid, the movement fails and the manchasers remain in their own grids. For models of the reward function, we consider the following settings of the reward function:

Sparse team reward: At each time step, all manchasers are punished by 1 for delaying capturing the runaway, and zero punishment if successful.

Full information team reward: At each time step, all agents are punished by $- 1 \times 1 ‖ d \sum_{a = 1}^{n} ? ? = at-gt ‖ 1$ - 1 \times 1\left\| {{\rm{d}}\sum\nolimits_{a = 1}^n {?? = {\text {at-gt}}} } \right\|1 , where | | dat-gt | | 1 is the distance from the manchaser a at time step t to the runaway.

Joint team and personal reward: At each time step, the manchaser a gets two rewards, $- 1 \times 1 ‖ d \sum_{a = 1}^{n} ? ? = at-gt ‖ 1$ - 1 \times 1\left\| {{\rm{d}}\sum\nolimits_{a = 1}^n {?? = {\text {at-gt}}} } \right\|1 team reward and | | dat- personal reward gt | | 1.

3.2.2

Experimental hyperparameter adjustment

The combinatorial space of the hyperparameters is too large for an exhaustive search. Due to high computational costs, we do not perform a systematic grid search. Instead, we only conduct informal searches. We linearly anneal from 1.0 to 0.01 during the exploration phase (100,000 time step) and fix it at 0.01 thereafter. For the first environment, we train each agent's network for a total of 50,000 sets, while for the second environment, each network is trained for 20,000 sets. We use replay memory with 10,000 recent time steps. The RMSProp algorithm that we use has a learning rate of 0.00005. Furthermore, we use a discount factor of 0.99 and perform a learning update every 4 time steps using a small batch of 32 transformations. For replay priority, we use the priority index ω of 0.5 and linearly increase the importance sampling (IS) index from 0.4 to 1 during the exploration phase (100,000 time steps). The specific parameters of the experiment are shown in the following figure:

3.2.3

Interpretation of the experimental results

During training, the manchaser and the runaway move for five movements like up and down, left and right, or static. The algorithm is trained 500 times, and the cumulative return of each time is recorded. The results are shown in the Figure. It can be seen from Figure 4 that the algorithm starts to converge when trained about 100 times.

Convergence trend of the reward algorithm

Through analysing the attack and defence game model and analysing the proceeds of the algorithm, the general law of the hunting strategy of the algorithm can be obtained: the accuracy of the detection of the escape strategy has a significant impact on the game trend of the hunting and the equilibrium solution. When the manchaser may mistake the escape strategy, introducing the minimum risk strategy benefit is closer to the actual hunting situation so that the algorithm can choose a better hunting strategy. Therefore, the proposed algorithmic strategy should be deployed together with the hunting detection strategy to achieve optimal hunting effects.

3.3

Campus security resource scheduling path planning

Based on the results in Section 4.2, the traffic topology network on the campus of Xinjiang Normal University is planned, and the application scenario and practical value of the planning path are mainly studied.

3.3.1

Scene description

In the research process of various safety issues, the campus, as one of the types with a crowded population, has a typical risk amplification and social response characteristics, making it have a very important safety significance. In this paper taking Xinjiang Normal University as an example, the campus map is shown in Figure 5a). MATLAB software is used to abstract important sites as nodes and numbers them, and abstract the roads between important sites as edges. The obtained campus network map is shown in Figure 5b), which consists of 44 nodes and 64 edges.

Based on the experimental content in Section 4.2, the planning platform is developed to set the initial position of attacker escape g0 (point 39 as shown in Figure 5b) as the attack point, requiring a certain amount of security resources, and the material storage in the real campus background is the starting point of path planning, points 2, 22 and 31 as shown in Figure 5b). The simulation results are shown in Figure 6 (bold blue line is the acquired optimal path):

Results of the path simulation experiment

Conclusion

Under the demand of constructing intelligent campus, to improve the ability in campus security management, we target the campus security events in three stages, namely, abnormal behaviour identification, abnormal behaviour mark tracking, security resource scheduling path planning of the attacked point after the safety event happens, which helps college and university campus reduce human resources cost as far as possible. This paper fully applies advanced technology, fuses it with a game idea, designs and develops a path planning platform, and combines it with the real scene of Xinjiang Normal University for simulation.

Some limitations in this model remain to simplify the model and to ease the solution. First, the combined platform for campus real-scene design has boundedness, such as the universal impact of scene scenarios. Second, there is still room for the models and methods in this paper still have room to be improved. Therefore, future research can be turned to break the above bottlenecks, take the budget into the model, and further relax the premise of sufficient facility capacity, to better fit the real situation.

eISSN:: 2444-8656
Lingua:: Inglese

Frequenza di pubblicazione:: Volume Open
Argomenti della rivista:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Feed RSS della rivista

An intelligent campus simulation system based on game

Pubblicato online: 12 dic 2022

Pagine: 1169 - 1180

Ricevuto: 08 ott 2022

Accettato: 19 nov 2022

DOI: https://doi.org/10.2478/amns.2021.2.00315

Parole chiave
Intelligent campus, behavior recognition, intelligent decision-making, resource planning, intelligent simulation

© 2022 Zhou Zheng Hao et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

An intelligent campus simulation system based on game

Pubblicato online: 12 dic 2022

Pagine: 1169 - 1180

Ricevuto: 08 ott 2022

Accettato: 19 nov 2022

DOI: https://doi.org/10.2478/amns.2021.2.00315

Parole chiaveIntelligent campus, behavior recognition, intelligent decision-making, resource planning, intelligent simulation

© 2022 Zhou Zheng Hao et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Parole chiave
Intelligent campus, behavior recognition, intelligent decision-making, resource planning, intelligent simulation