A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

[1] Soomro K, Zamir A R, Shah M. UCF101: A Datasets of 101 Human Actions Classes from Videos in The Wild [J]. Computer Science, 2012.DOI: 10.48550/arXiv.1212.0402. Soomro K Zamir A R Shah M. UCF101: A Datasets of 101 Human Actions Classes from Videos in The Wild [J]. Computer Science , 2012 . DOI: 10.48550/arXiv.1212.0402 . Open DOI Search in Google Scholar

[2] Jhuang H, Gall J, Zuffi S, et al. Towards understanding action recognition [C] //IEEE International Conference on Computer Vision. IEEE, 2014. DOI: 10.1109/ICCV.2013.396. Jhuang H Gall J Zuffi S Towards understanding action recognition [C] // IEEE International Conference on Computer Vision . IEEE , 2014 . DOI: 10.1109/ICCV.2013.396 . Open DOI Search in Google Scholar

[3] Wishart D S, Djoumbou F Y, Ana M, et al. HMDB 4.0: the human metabolome database for 2018 [J]. Nucleic Acids Research, 2017(D1): D1.DOI: 10.1093/nar/gkx1089. Wishart D S Djoumbou F Y Ana M HMDB 4.0: the human metabolome database for 2018 [J]. Nucleic Acids Research , 2017 ( D1 ): D1 . DOI: 10.1093/nar/gkx1089 . Open DOI Search in Google Scholar

[4] Kay W, Carreira J, Simonyan K, et al. The Kinetics Human Action Video datasets [J]. 2017.DOI: 10.48550/arXiv.1705.06950. Kay W Carreira J Simonyan K The Kinetics Human Action Video datasets [J]. 2017 . DOI: 10.48550/arXiv.1705.06950 . Open DOI Search in Google Scholar

[5] Xu Long, Gong Chen, Yang Jie, et al. Violent video detection based on mosift feature and sparse coding [C] //2014 IEEE International Conference on Acoustics, Speech and Signal Processing, 2014:3538-3542. Long Xu Chen Gong Jie Yang Violent video detection based on mosift feature and sparse coding [C] // 2014 IEEE International Conference on Acoustics, Speech and Signal Processing , 2014 : 3538 - 3542 . Search in Google Scholar

[6] Febin I P, Jayasree K, Joy P T. Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm [J]. Pattern Analysis and Applications, 2020, 23(2):611-623. Febin I P Jayasree K Joy P T. Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm [J]. Pattern Analysis and Applications , 2020 , 23 ( 2 ): 611 - 623 . Search in Google Scholar

[7] Sudhakaran S, Lanz O. Learning to Detect Violent Videos using Convolutional Long Short-Term Memory[C]. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2017:33–34. Sudhakaran S Lanz O. Learning to Detect Violent Videos using Convolutional Long Short-Term Memory [C]. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance , 2017 : 33 – 34 . Search in Google Scholar

[8] Liang Qicheng, Li Yong, Yang Kaikai, et al. Long-term recurrent convolutional network violent Behaviour recognition with attention mechanism [J]. MATEC Web of Conferences, 2021, 336 (1): 5013. Qicheng Liang Yong Li Kaikai Yang Long-term recurrent convolutional network violent Behaviour recognition with attention mechanism [J]. MATEC Web of Conferences , 2021 , 336 ( 1 ): 5013 . Search in Google Scholar

[9] Feichtenhofer C, Fan Haoqi, Malik J, et al. SlowFast Networks for Video Recognition [C] //Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6202-6211. Feichtenhofer C Haoqi Fan Malik J SlowFast Networks for Video Recognition [C] // Proceedings of the IEEE/CVF international conference on computer vision . 2019 : 6202 - 6211 . Search in Google Scholar

[10] Okan Köpüklü, Wei Xiangyu, Rigoll G. You Only Watch Once: A Unified CNN Architecture for RealTime Spatiotemporal Action Localization [J]. arXiv preprint arXiv:1911. 06644, 2019. Köpüklü Okan Xiangyu Wei Rigoll G. You Only Watch Once: A Unified CNN Architecture for RealTime Spatiotemporal Action Localization [J]. arXiv preprint arXiv:1911.06644 , 2019 . Search in Google Scholar

[11] Li Hongchang, Wang Jing, Han Jianjun, et al. A novel multi-stream method for violent interaction detection using deep learning [J]. Measurement and Control, 2020, 53(5):796-806. Hongchang Li Jing Wang Jianjun Han A novel multi-stream method for violent interaction detection using deep learning [J]. Measurement and Control , 2020 , 53 ( 5 ): 796 - 806 . Search in Google Scholar

[12] Islam Z, Rukonuzzaman M, Ahmed R, et al. Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM [C] //2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8. Islam Z Rukonuzzaman M Ahmed R Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM [C] // 2021 International Joint Conference on Neural Networks (IJCNN) . IEEE , 2021 : 1 - 8 . Search in Google Scholar

[13] Carreira J, Zisserman A Quo Vadis, Action Recognition? A New Model and the Kinetics datasets [J]. IEEE, 2017. DOI: 10.1109/CVPR.2017.502. Carreira J Quo Zisserman A Vadis, Action Recognition? A New Model and the Kinetics datasets [J]. IEEE , 2017 . DOI: 10.1109/CVPR.2017.502 . Open DOI Search in Google Scholar

[14] Direkoglu C. Abnormal Crowd Behavior Detection Using Motion Information Images and Convolutional Neural Networks [J]. IEEE Access, 2020, PP (99): 1-1. DOI: 10.1109/ACCESS.2020.2990355. Direkoglu C. Abnormal Crowd Behavior Detection Using Motion Information Images and Convolutional Neural Networks [J]. IEEE Access , 2020 , PP ( 99 ): 1 - 1 . DOI: 10.1109/ACCESS.2020.2990355 . Open DOI Search in Google Scholar

[15] Dong Min, Fang Zhenglin, Li Yongfa, et al. AR3D: Attention Residual 3D Network for Human Action Recognition [J]. Sensors, 2021, 21(5):1656-1669. Min Dong Zhenglin Fang Yongfa Li AR3D: Attention Residual 3D Network for Human Action Recognition [J]. Sensors , 2021 , 21 ( 5 ): 1656 - 1669 . Search in Google Scholar

[16] Li Zhan. Research on Video Violence Detection Algorithm Based on 3D Convolutional Neural Network [D]. Anhui University of Architecture, 2022. DOI: 10.27784/d.cnki.gahjz.2022.000160. Zhan Li . Research on Video Violence Detection Algorithm Based on 3D Convolutional Neural Network [D]. Anhui University of Architecture , 2022 . DOI: 10.27784/d.cnki.gahjz.2022.000160 . Open DOI Search in Google Scholar

[17] XU Pengfei, ZHANG Pengchao, LIU Yaheng, et al. A human behavior detection algorithm based on SR3D network [J]. Computer Knowledge and Technology, 2022, 18(01):10-11. DOI: 10.14004/j.cnki.ckt.2022.0068. Pengfei XU Pengchao ZHANG Yaheng LIU A human behavior detection algorithm based on SR3D network [J]. Computer Knowledge and Technology , 2022 , 18 ( 01 ): 10 - 11 . DOI: 10.14004/j.cnki.ckt.2022.0068 . Open DOI Search in Google Scholar

[18] Sanghyun Woo, Jongchan Park, Joon-Young Lee,In SoKweon. CBAM: Convolutional Block Attention Module. 2018. Woo Sanghyun Park Jongchan Lee Joon-Young ,In SoKweon . CBAM: Convolutional Block Attention Module . 2018 . Search in Google Scholar

[19] Wang C Y, Liao H Y M, Wu Y H, et al. CSPNet: A New Backbone that can Enhance Learning Capability of CNN [C] //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020. DOI: 10.1109/CVPRW50498.2020.00203. Wang C Y Liao H Y M Wu Y H CSPNet: A New Backbone that can Enhance Learning Capability of CNN [C] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . IEEE , 2020 . DOI: 10.1109/CVPRW50498.2020.00203 . Open DOI Search in Google Scholar

[20] Lim B, Ark S, Loeff N, et al. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting [J]. International Journal of Forecasting, 2021(1). DOI: 10.1016/j.ijforecast.2021.03.012. Lim B Ark S Loeff N Temporal Fusion Transformers for interpretable multi-horizon time series forecasting [J]. International Journal of Forecasting , 2021 ( 1 ). DOI: 10.1016/j.ijforecast.2021.03.012 . Open DOI Search in Google Scholar

[21] Alwando E, Yie-Tarng Chen, Wen-Hsien. CNN-Based Multiple Path Searchfor Action Tube Detection in Videos [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 30 (1): 104-116. Alwando E Chen Yie-Tarng Wen-Hsien . CNN-Based Multiple Path Searchfor Action Tube Detection in Videos [J]. IEEE Transactions on Circuits and Systems for Video Technology , 2018 , 30 ( 1 ): 104 - 116 . Search in Google Scholar

[22] Wei Jiangchuan, Wang Hanli, Yi Yun, et al. P3D-CTN: Pseudo-3D Convolutional Tube Network for SpatioTemporal Action Detection in Videos [C] //2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019: 300-304. Jiangchuan Wei Hanli Wang Yun Yi P3D-CTN: Pseudo-3D Convolutional Tube Network for SpatioTemporal Action Detection in Videos [C] // 2019 IEEE International Conference on Image Processing (ICIP) . IEEE , 2019 : 300 - 304 . Search in Google Scholar

[23] Yang Xitong, Yang Xiaodong, Liu Mingyu, et al. STEP: Spatio-Temporal Progressive Learning for Video Action Detection [C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 264-272. Xitong Yang Xiaodong Yang Mingyu Liu STEP: Spatio-Temporal Progressive Learning for Video Action Detection [C] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2019 : 264 - 272 . Search in Google Scholar

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Computer Sciences, other

Journal RSS Feed

A Baseline for Violence Behavior Detection in Complex Surveillance Scenarios

Yingying Long

Zongxin Wang

Hanzhu Wei

Xiaojun Bai

Published Online: Dec 31, 2024

Page range: 48 - 58

DOI: https://doi.org/10.2478/ijanmc-2024-0036

KeywordsViolent Behavior Detection, Datasets, Spatio-temporal Feature, Target Detection, Feature Fusion

© 2024 Yingying Long et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Violent Behavior Detection, Datasets, Spatio-temporal Feature, Target Detection, Feature Fusion