Using Text and Visual Cues for Fine-Grained Classification

[1] Z Akata, S Reed, D Walter, H Lee, “Evaluation of output embeddings for fine-grained image classification,” pattern recognition,” 2015. Z Akata S Reed D Walter H Lee “Evaluation of output embeddings for fine-grained image classification,” pattern recognition,” 2015 10.1109/CVPR.2015.7298911 Search in Google Scholar

[2] X He, Y Peng, “Fine-grained image classification via combining vision and language,” Computer Vision and Pattern Recognition, 2017. X He Y Peng “Fine-grained image classification via combining vision and language,” Computer Vision and Pattern Recognition 2017 10.1109/CVPR.2017.775 Search in Google Scholar

[3] Maron, AL Ratan, “Multiple-instance learning for natural scene classification,” ICML, 1998. Maron AL Ratan “Multiple-instance learning for natural scene classification,” ICML 1998 Search in Google Scholar

[4] W Geng, F Han, J Lin, L Zhu, J Bai, S Wang, “Fine-grained grocery product recognition by one-shot learning,” Proceedings of the 26th ACM international conference on Multimedia,” 2018. W Geng F Han J Lin L Zhu J Bai S Wang “Fine-grained grocery product recognition by one-shot learning,” Proceedings of the 26th ACM international conference on Multimedia,” 2018 10.1145/3240508.3240522 Search in Google Scholar

[5] S Albawi, TA Mohammed, “Understanding of a convolutional neural network,” IEEE, 2017. S Albawi TA Mohammed “Understanding of a convolutional neural network,” IEEE 2017 10.1109/ICEngTechnol.2017.8308186 Search in Google Scholar

[6] SE Umbaugh, “Digital image processing and analysis: human and compute vision applications with CVIPtools,” Amazon book, 2010. SE Umbaugh “Digital image processing and analysis: human and compute vision applications with CVIPtools,” Amazon book 2010 10.1201/9781439802069 Search in Google Scholar

[7] Q Ye, D Doermann, “Text detection and recognition in imagery: A survey,” IEEE transactions on pattern analysis, 2014. Q Ye D Doermann “Text detection and recognition in imagery: A survey,” IEEE transactions on pattern analysis 2014 10.1109/TPAMI.2014.236676526352454 Search in Google Scholar

[8] L Neumann, J Matas, “A method for text localization and recognition in real-world images,” Asian conference on computer vision, 2010. L Neumann J Matas “A method for text localization and recognition in real-world images,” Asian conference on computer vision 2010 Search in Google Scholar

[9] A Coates, B Carpenter, C Case, “Text detection and character recognition in scene images with unsupervised feature learning,” IEEE, 2011. A Coates B Carpenter C Case “Text detection and character recognition in scene images with unsupervised feature learning,” IEEE 2011 10.1109/ICDAR.2011.95 Search in Google Scholar

[10] M Jaderberg, A Vedaldi, A Zisserman, “Deep features for text spotting,” European conference on computer, 2014. M Jaderberg A Vedaldi A Zisserman “Deep features for text spotting,” European conference on computer 2014 10.1007/978-3-319-10593-2_34 Search in Google Scholar

[11] C Yao, X Bai, W Liu, “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing, 2014 C Yao X Bai W Liu “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing 2014 10.1109/TIP.2014.235381325203989 Search in Google Scholar

[12] P Shivakumara, A Dutta, CL Tan, U Pal, “Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing,” Multimedia tools and applications, 2014. P Shivakumara A Dutta CL Tan U Pal “Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing,” Multimedia tools and applications 2014 10.1007/s11042-013-1385-0 Search in Google Scholar

[13] Z Zhang, C Zhang, W Shen, C Yao, “Multi-oriented text detection with fully convolutional networks,” pattern recognition, 2016. Z Zhang C Zhang W Shen C Yao “Multi-oriented text detection with fully convolutional networks,” pattern recognition 2016 10.1109/CVPR.2016.451 Search in Google Scholar

[14] Y Zhu, C Yao, X Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science, 2016. Y Zhu C Yao X Bai “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science 2016 10.1007/s11704-015-4488-0 Search in Google Scholar

[15] B Zhao, J Feng, X Wu, S Yan, “segmentation,” International Journal of Automation, 2017. B Zhao J Feng X Wu S Yan “segmentation,” International Journal of Automation 2017 Search in Google Scholar

[16] N Zhang, J Donahue, R Girshick, T Darrell, “Part-based R-CNNs for fine-grained category detection,” European conference, 2014. N Zhang J Donahue R Girshick T Darrell “Part-based R-CNNs for fine-grained category detection,” European conference 2014 10.1007/978-3-319-10590-1_54 Search in Google Scholar

[17] E Gavves, B Fernando, CGM Snoek, “Fine-grained categorization by alignments,” IEEE 2013. E Gavves B Fernando CGM Snoek “Fine-grained categorization by alignments,” IEEE 2013 10.1109/ICCV.2013.215 Search in Google Scholar

[18] P Baraldi, M Compare, S Sauco, E Zio, “Ensemble neural network-based particle filtering for prognostics,” Mechanical Systems and Signal, 2013. P Baraldi M Compare S Sauco E Zio “Ensemble neural network-based particle filtering for prognostics,” Mechanical Systems and Signal 2013 10.1016/j.ymssp.2013.07.010 Search in Google Scholar

[19] F Fan, Y Feng, “D Zhao Multi-grained attention network for aspect-level sentiment classification,” conference on empirical methods, 2018. F Fan Y Feng “D Zhao Multi-grained attention network for aspect-level sentiment classification,” conference on empirical methods 2018 10.18653/v1/D18-1380 Search in Google Scholar

[20] OM Parkhi, A Vedaldi, A Zisserman, “Cats and dogs,” IEEE conference, 2012. OM Parkhi A Vedaldi A Zisserman “Cats and dogs,” IEEE conference 2012 10.1109/CVPR.2012.6248092 Search in Google Scholar

[21] G Lowe, “Sift-the scale invariant feature transform,” Int. J 2004. G Lowe “Sift-the scale invariant feature transform,” Int. J 2004 Search in Google Scholar

[22] N Dalal, B Triggs, “Histograms of oriented gradients for human detection,” IEEE computer society conference, 2005. N Dalal B Triggs “Histograms of oriented gradients for human detection,” IEEE computer society conference 2005 Search in Google Scholar

[23] J Van De Weijer, C Schmid, J Verbeek, “Learning color names for real-world applications,” IEEE Transactions, 2009. J Van De Weijer C Schmid J Verbeek “Learning color names for real-world applications,” IEEE Transactions 2009 10.1109/TIP.2009.201980919482579 Search in Google Scholar

[24] T Berg, PN Belhumeur, “Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation,” Proceedings of the IEEE, 2013. T Berg PN Belhumeur “Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation,” Proceedings of the IEEE 2013 10.1109/CVPR.2013.128 Search in Google Scholar

[25] KC Kamal, Z Yin, B Li, B Ma, “Transfer learning for fine-grained crop disease classification based on leaf images,” IEEE, 2019. KC Kamal Z Yin B Li B Ma “Transfer learning for fine-grained crop disease classification based on leaf images,” IEEE 2019 10.1109/WHISPERS.2019.8921213 Search in Google Scholar

[26] V Badrinarayanan, A Kendall, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on, 2017. V Badrinarayanan A Kendall “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on 2017 10.1109/TPAMI.2016.264461528060704 Search in Google Scholar

[27] P Rodríguez, D Velazquez, G Cucurull, “Pay attention to the activations: a modular attention mechanism for fine-grained image recognition,” IEEE Transactions, 2019. P Rodríguez D Velazquez G Cucurull “Pay attention to the activations: a modular attention mechanism for fine-grained image recognition,” IEEE Transactions 2019 10.1109/TMM.2019.2928494 Search in Google Scholar

[28] A Mafla, S Dey, AF Biten, L Gomez, “Fine-grained image classification and retrieval by combining visual and locally pooled textual features,” WACV, 2020. A Mafla S Dey AF Biten L Gomez “Fine-grained image classification and retrieval by combining visual and locally pooled textual features,” WACV 2020 10.1109/WACV45572.2020.9093373 Search in Google Scholar

[29] X Bai, M Yang, P Lyu, Y Xu, J Luo, “Integrating scene text and visual appearance for fine-grained image classification,” IEEE Access, 2018. X Bai M Yang P Lyu Y Xu J Luo “Integrating scene text and visual appearance for fine-grained image classification,” IEEE Access 2018 10.1109/ACCESS.2018.2878899 Search in Google Scholar

[30] K Cho, A Courville, Y Bengio, “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia, 2015. K Cho A Courville Y Bengio “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia 2015 10.1109/TMM.2015.2477044 Search in Google Scholar

[31] PK Atrey, MA Hossain, A El Saddik, MS Kankanhalli, “Multimodal fusion for multimedia analysis: a survey,” Multimedia systems, 2010. PK Atrey MA Hossain A El Saddik MS Kankanhalli “Multimodal fusion for multimedia analysis: a survey,” Multimedia systems 2010 10.1007/s00530-010-0182-0 Search in Google Scholar

[32] X Yang, P Molchanov, J Kautz, “Multilayer and multimodal fusion of deep neural networks for video classification,” Proceedings of the 24th ACM, 2016. X Yang P Molchanov J Kautz “Multilayer and multimodal fusion of deep neural networks for video classification,” Proceedings of the 24th ACM 2016 10.1145/2964284.2964297 Search in Google Scholar

[33] H Liu, Y Wu, F Sun, B Fang, “Weakly paired multimodal fusion for object recognition,” IEEE, 2017. H Liu Y Wu F Sun B Fang “Weakly paired multimodal fusion for object recognition,” IEEE 2017 10.1109/TASE.2017.2692271 Search in Google Scholar

[34] N Audebert, C Herold, K Slimani, C Vidal, “Multimodal deep networks for text and image-based document classification,” Joint European Conference, 2019. N Audebert C Herold K Slimani C Vidal “Multimodal deep networks for text and image-based document classification,” Joint European Conference 2019 10.1007/978-3-030-43823-4_35 Search in Google Scholar

[35] P Maragos, A Potamianos, P Gros, “Multimodal processing and interaction: audio, video, text,” IEEE 2008. P Maragos A Potamianos P Gros “Multimodal processing and interaction: audio, video, text,” IEEE 2008 10.1007/978-0-387-76316-3 Search in Google Scholar

[36] J Deng, W Dong, R Socher, LJ Li, K Li, “ImageNet,” IEEE, 2009. J Deng W Dong R Socher LJ Li K Li “ImageNet,” IEEE 2009 Search in Google Scholar

[37] Karen Simonyan, Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” Department of Engineering Science, University of Oxford, 2015. Karen Simonyan Andrew Zisserman “Very deep convolutional networks for large-scale image recognition,” Department of Engineering Science, University of Oxford 2015 Search in Google Scholar

[38] A Karnawat, K More, T Rade, B Rane, M Mulik, “A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People,” ITB, 2016. A Karnawat K More T Rade B Rane M Mulik “A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People,” ITB 2016 Search in Google Scholar

[39] R Smith, “An overview of the Tesseract OCR engine,” Ninth international conference on document analysis, 2007. R Smith “An overview of the Tesseract OCR engine,” Ninth international conference on document analysis 2007 10.1109/ICDAR.2007.4376991 Search in Google Scholar

[40] KW Church, “Word2Vec,” Natural Language Engineering, 2017. KW Church “Word2Vec,” Natural Language Engineering 2017 10.1017/S1351324916000334 Search in Google Scholar

eISSN:: 2470-8038
Langue:: Anglais

Périodicité:: 4 fois par an
Sujets de la revue:: Computer Sciences, other

RSS Feed de la revue

Using Text and Visual Cues for Fine-Grained Classification

Publié en ligne: 22 févr. 2021

Pages: 42 - 49

DOI: https://doi.org/10.21307/ijanmc-2021-026

Mots clésScene Text, Product Text, Fine-Grained Classification, Convolution Neural Network, Attention, Product Search

© 2021 Zaryab Shaker et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Scene Text, Product Text, Fine-Grained Classification, Convolution Neural Network, Attention, Product Search