A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

Long, Yingying; Wang, Zongxin; Wei, Hanzhu; Bai, Xiaojun

Otwarty dostęp

A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

,

,

oraz

31 gru 2024

International Journal of Advanced Network, Monitoring and Controls

Tom 9 (2024): Zeszyt 4 (Grudzień 2024)

O artykule

Poprzedni artykuł

Następny artykuł

Zacytuj

Udostępnij

Pobierz okładkę

Data publikacji: 31 gru 2024

Zakres stron: 48 - 58

DOI: https://doi.org/10.2478/ijanmc-2024-0036

Słowa kluczowe
Violent Behavior Detection, Datasets, Spatio-temporal Feature, Target Detection, Feature Fusion

© 2024 Yingying Long et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Illustration of a sample of labeled acts of violence

The violence detection algorithm's framework is displayed in Fig.2. The model for extracting spatio-temporal features and the spatio-temporal feature fusion module make up the majority of the framework. The spatio-temporal feature extraction model is composed of the temporal feature extraction model and the spatial feature extraction module, and the I3D network is the network structure of the temporal feature extraction model, as illustrated in (a); (b)(c) are the 3D-CBAM Attention Mechanism and 3D Inception (3D Inc) module, respectively. The Atrous Spatial Pyramid Pooling (ASPP) model is added at the end of the spatial feature extraction model, which has the CSPDarkNet-Tiny network as its network structure, which is shown in (d), where rate denotes the expansion rate of the null convolution. atrous Spatial Pyramid Pooling (ASPP) has five branches, including one ordinary convolutional branch, three null convolutional branches, and one global average pooling branch; (e) shows the overall structure of Channel Fusion and Attention Mechanism(CFAM); D is the final output feature map of CFAM, and C1 and C2 are the number of feature map output channels for the I3D network and the ASP module, respectively.

CSPDarkNet-Tiny Network Overall Structure

Detection results of 3D-CBAM attention model embedded at different locations

Network	Embedding position	UCF101-24	JHMDB	VioData
Network	Embedding position	mAP
	-	84.4%	80.4%	86.5%
	3D Inc_1	86.1%	83.7%	89.0%
	3D Inc_2	86.7%	83.3%	88.3%
	3D Inc_3	85.9%	84.2%	89.6%
	3D Inc_1+3D Inc_2	88.2%	87.5%	90.7%
I3D	3D Inc_1+3D Inc_3	89.8%	88.6%	91.8%
	3D Inc_2+3D Inc_3	88.0%	88.0%	91.4%
	3D Inc_1+3D Inc_2+3D Inc_3	90.0%	88.7%	92.0%

Parameter settings in network training

Parameter	Setting
Initial Learning Rate	0.001
Epoch	230
ReSize	(416,416)
ReSize	(416,416)
Weight Decay	0.0005
Optimizer	Adam

Results of violence detection accuracy of different models

Method	UCF101-24	JHMDB	VioData
Method	mAP
MPS	82.4%	-	85.3
P3D-CTN	-	84.0%	84.9%
STEP	83.1%	-	86.4%
YOWO	82.5%	85.7%	88.0%
ours	89.8%	88.6%	91.8%

Detection results with embedded ASPP model and introduction of spatio-temporal depth separable convolution

Network	UCF101-24	JHMDB	VioData
Network	mAP
Baseline	78.5%	75.3%	78.9%
CSPDarkNet-Tiny+ASPP	80.7%	76.6%	82.0%
CSPDarkNet-Tiny+ASPP++I3D(Impr oved 3D Inc)	84.8%	80.4%	86.5%

Język:: Angielski

Częstotliwość wydawania:: 4 razy w roku
Dziedziny czasopisma:: Informatyka, Informatyka, inne

Kanał RSS czasopisma

A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

Data publikacji: 31 gru 2024

Zakres stron: 48 - 58

DOI: https://doi.org/10.2478/ijanmc-2024-0036

Słowa kluczowe
Violent Behavior Detection, Datasets, Spatio-temporal Feature, Target Detection, Feature Fusion

© 2024 Yingying Long et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Detection results of 3D-CBAM attention model embedded at different locations

Parameter settings in network training

Results of violence detection accuracy of different models

Detection results with embedded ASPP model and introduction of spatio-temporal depth separable convolution

A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

Yingying Long

Zongxin Wang

Hanzhu Wei

Xiaojun Bai

Data publikacji: 31 gru 2024

Zakres stron: 48 - 58

DOI: https://doi.org/10.2478/ijanmc-2024-0036

Słowa kluczoweViolent Behavior Detection, Datasets, Spatio-temporal Feature, Target Detection, Feature Fusion

© 2024 Yingying Long et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Detection results of 3D-CBAM attention model embedded at different locations

Parameter settings in network training

Results of violence detection accuracy of different models

Detection results with embedded ASPP model and introduction of spatio-temporal depth separable convolution

Słowa kluczowe
Violent Behavior Detection, Datasets, Spatio-temporal Feature, Target Detection, Feature Fusion