Spatial–temporal graph neural network based on node attention

[1] Wang, H., Kläser, A., Schmid, C., et al.: ‘Dense trajectories and motion boundary descriptors for action recognition’, International Journal of Computer Vision, 2013, 103, (1), pp. 60–79. WangH. KläserA. SchmidC. ‘Dense trajectories and motion boundary descriptors for action recognition’ International Journal of Computer Vision 2013 103 1 60 79 Search in Google Scholar

[2] Wang, H., Schmid, C.: ‘Action recognition with improved trajectories’. International Conference on Computer Vision (ICCV), Sydney, NSW, Australia, October 2013, pp. 3551–3558. WangH. SchmidC. ‘Action recognition with improved trajectories’ International Conference on Computer Vision (ICCV) Sydney, NSW, Australia October 2013 3551 3558 Search in Google Scholar

[3] Simonyan, K., Zisserman, A.: ‘Two-stream convolutional networks for action recognition in videos’. Neural Information Processing Systems (NIPS), Montreal, Canada, December 2014, pp. 2136–2145. SimonyanK. ZissermanA. ‘Two-stream convolutional networks for action recognition in videos’ Neural Information Processing Systems (NIPS) Montreal, Canada December 2014 2136 2145 Search in Google Scholar

[4] Feichtenhofer, C., Pinz, A., Wildes, R. P.: ‘Spatiotemporal residual networks for video action recognition’. Neural Information Processing Systems (NIPS), Barcelona, SPAIN, December 2016, pp. 3476–3484. FeichtenhoferC. PinzA. WildesR. P. ‘Spatiotemporal residual networks for video action recognition’ Neural Information Processing Systems (NIPS) Barcelona, SPAIN December 2016 3476 3484 Search in Google Scholar

[5] Tran, D., Bourdev, L., Fergus, R., et al.: ‘Learning spatiotemporal features with 3D convolutional networks’. International Conference on Computer Vision (ICCV), Santiago, Chile, December 2015, pp. 4489–4497. TranD. BourdevL. FergusR. ‘Learning spatiotemporal features with 3D convolutional networks’ International Conference on Computer Vision (ICCV) Santiago, Chile December 2015 4489 4497 Search in Google Scholar

[6] He, K., Zhang, X., Ren, S., et al.: ‘Deep residual learning for image recognition’. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, June 2016, pp. 770–778. HeK. ZhangX. RenS. ‘Deep residual learning for image recognition’ IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Las Vegas, Nevada, USA June 2016 770 778 Search in Google Scholar

[7] R. Vemulapalli, F. Arrate, and R. Chellappa, “Human action recognition by representing 3d skeletons as points in a lie group,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588–595, IEEE, Columbus, OH (2014). VemulapalliR. ArrateF. ChellappaR. “Human action recognition by representing 3d skeletons as points in a lie group,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 588 595 IEEE, Columbus, OH 2014 Search in Google Scholar

[8] Liu H, Tu J, Liu M. Two-stream 3D convolutional neural network for skeleton-based action recognition [J/OL]. [2017-03-23]. LiuH TuJ LiuM Two-stream 3D convolutional neural network for skeleton-based action recognition [J/OL] [2017-03-23]. Search in Google Scholar

[9] Li C, Zhong Q, Xie D, et al. Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation [C]. Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18. 2018. LiC ZhongQ XieD Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation [C] Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18. 2018 Search in Google Scholar

[10] Zhang P, Lan C, Xing J, et al. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2019. ZhangP LanC XingJ View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition [J] Pattern Analysis and Machine Intelligence, IEEE Transactions on 2019 Search in Google Scholar

[11] Liu J, Shahroudy A, Dong X, et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition[J]. 2016. LiuJ ShahroudyA DongX Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition[J] 2016 Search in Google Scholar

[12] Du, Y.; Wang, W.; and Wang, L. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In CVPR, 1110–1118. DuY. WangW. WangL. 2015 Hierarchical recurrent neural network for skeleton based action recognition In CVPR 1110 1118 Search in Google Scholar

[13] S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Thirty-Second AAAI Conference on Artificial Intelligence, pp. 7444–7452, AAAI Press, New Orleans, Louisiana, USA (2018). YanS. XiongY. LinD. “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Thirty-Second AAAI Conference on Artificial Intelligence 7444 7452 AAAI Press New Orleans, Louisiana, USA 2018 Search in Google Scholar

[14] L. Shi et al., “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019). ShiL. “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 12026 12035 2019 Search in Google Scholar

[15] K. Thakkar, P J. Narayanan, “Part-based Graph Convolutional Network for Action Recognition,” arXiv preprint arXiv:1809.04983, 2018. ThakkarK. NarayananP J. “Part-based Graph Convolutional Network for Action Recognition,” arXiv preprint arXiv:1809.04983, 2018 Search in Google Scholar

[16] J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, IEEE, Salt Lake City, UT (2018). HuJ. ShenL. SunG. “Squeeze-and-Excitation Networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 7132 7141 IEEE Salt Lake City, UT 2018 Search in Google Scholar

[17] S. Woo et al., “CBAM: Convolutional Block Attention Module,” in Proceedings of the European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11211, pp. 3–19, Springer, Cham (2018). WooS. “CBAM: Convolutional Block Attention Module,” in Proceedings of the European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science 11211 3 19 Springer Cham 2018 Search in Google Scholar

[18] X. Wang et al., “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803, IEEE, Salt Lake City, UT, USA (2018). WangX. “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 7794 7803 IEEE Salt Lake City, UT, USA 2018 Search in Google Scholar

[19] Kong, Y., Li, L., Zhang, K., Ni, Q., & Han, J. (2019). Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition. Journal of Electronic Imaging, 28(4), 1. KongY. LiL. ZhangK. NiQ. HanJ. 2019 Attention module-based spatial–temporal graph convolutional networks for skeleton-based action recognition Journal of Electronic Imaging 28 4 1 Search in Google Scholar

[20] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning Deep Features for Discriminative Localization. CVPR. IEEE Computer Society. ZhouB. KhoslaA. LapedrizaA. OlivaA. TorralbaA. 2016 Learning Deep Features for Discriminative Localization. CVPR IEEE Computer Society Search in Google Scholar

[21] Shahroudy A, Liu J, Ng T T, et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis [J]. IEEE Computer Society, 2016:1010–1019. ShahroudyA LiuJ NgT T NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis [J] IEEE Computer Society 2016 1010 1019 Search in Google Scholar

[22] Tae S K, Austin R. Interpretable 3D human action analysis with temporal convolutional networks [C]. Proc of IEEE Computer Vision and Pattern Recognition Workshops. New York: IEEE, 2017: 1623–1631. TaeS K AustinR Interpretable 3D human action analysis with temporal convolutional networks [C] Proc of IEEE Computer Vision and Pattern Recognition Workshops New York IEEE 2017 1623 1631 Search in Google Scholar

[23] Oord A V D, Dieleman S, Zen H, et al. Wavenet: a generative model for raw audio [J/OL]. [2016-09-12]. OordA V D DielemanS ZenH Wavenet: a generative model for raw audio [J/OL] [2016-09-12]. Search in Google Scholar

eISSN:: 2444-8656
Lingua:: Inglese

Frequenza di pubblicazione:: Volume Open
Argomenti della rivista:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Feed RSS della rivista

Spatial–temporal graph neural network based on node attention

Pubblicato online: 08 apr 2022

Pagine: 703 - 712

Ricevuto: 23 mag 2021

Accettato: 27 set 2021

DOI: https://doi.org/10.2478/amns.2022.1.00005

Parole chiaveAction recognition, skeletons, spatial–temporal graph convolution, attention mechanism

© 2021 Qiang Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Action recognition, skeletons, spatial–temporal graph convolution, attention mechanism