This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Z Akata, S Reed, D Walter, H Lee, “Evaluation of output embeddings for fine-grained image classification,” pattern recognition,” 2015.ZAkataSReedDWalterHLee201510.1109/CVPR.2015.7298911Search in Google Scholar
X He, Y Peng, “Fine-grained image classification via combining vision and language,” Computer Vision and Pattern Recognition, 2017.XHeYPengComputer Vision and Pattern Recognition201710.1109/CVPR.2017.775Search in Google Scholar
Maron, AL Ratan, “Multiple-instance learning for natural scene classification,” ICML, 1998.MaronALRatanICML1998Search in Google Scholar
W Geng, F Han, J Lin, L Zhu, J Bai, S Wang, “Fine-grained grocery product recognition by one-shot learning,” Proceedings of the 26th ACM international conference on Multimedia,” 2018.WGengFHanJLinLZhuJBaiSWang“Fine-grained grocery product recognition by one-shot learning,”201810.1145/3240508.3240522Search in Google Scholar
S Albawi, TA Mohammed, “Understanding of a convolutional neural network,” IEEE, 2017.SAlbawiTAMohammedIEEE201710.1109/ICEngTechnol.2017.8308186Search in Google Scholar
SE Umbaugh, “Digital image processing and analysis: human and compute vision applications with CVIPtools,” Amazon book, 2010.SEUmbaugh“Digital image processing and analysis: human and compute vision applications with CVIPtools,”201010.1201/9781439802069Search in Google Scholar
Q Ye, D Doermann, “Text detection and recognition in imagery: A survey,” IEEE transactions on pattern analysis, 2014.QYeDDoermann“Text detection and recognition in imagery: A survey,”201410.1109/TPAMI.2014.236676526352454Search in Google Scholar
L Neumann, J Matas, “A method for text localization and recognition in real-world images,” Asian conference on computer vision, 2010.LNeumannJMatas“A method for text localization and recognition in real-world images,”2010Search in Google Scholar
A Coates, B Carpenter, C Case, “Text detection and character recognition in scene images with unsupervised feature learning,” IEEE, 2011.ACoatesBCarpenterCCase“Text detection and character recognition in scene images with unsupervised feature learning,”201110.1109/ICDAR.2011.95Search in Google Scholar
M Jaderberg, A Vedaldi, A Zisserman, “Deep features for text spotting,” European conference on computer, 2014.MJaderbergAVedaldiAZisserman“Deep features for text spotting,”201410.1007/978-3-319-10593-2_34Search in Google Scholar
C Yao, X Bai, W Liu, “A unified framework for multioriented text detection and recognition,” IEEE Transactions on Image Processing, 2014CYaoXBaiWLiu“A unified framework for multioriented text detection and recognition,”201410.1109/TIP.2014.235381325203989Search in Google Scholar
P Shivakumara, A Dutta, CL Tan, U Pal, “Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing,” Multimedia tools and applications, 2014.PShivakumaraADuttaCLTanUPal“Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing,”201410.1007/s11042-013-1385-0Search in Google Scholar
Z Zhang, C Zhang, W Shen, C Yao, “Multi-oriented text detection with fully convolutional networks,” pattern recognition, 2016.ZZhangCZhangWShenCYao“Multi-oriented text detection with fully convolutional networks,”201610.1109/CVPR.2016.451Search in Google Scholar
Y Zhu, C Yao, X Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science, 2016.YZhuCYaoXBai“Scene text detection and recognition: Recent advances and future trends,”201610.1007/s11704-015-4488-0Search in Google Scholar
B Zhao, J Feng, X Wu, S Yan, “segmentation,” International Journal of Automation, 2017.BZhaoJFengXWuSYan“segmentation,”2017Search in Google Scholar
N Zhang, J Donahue, R Girshick, T Darrell, “Part-based R-CNNs for fine-grained category detection,” European conference, 2014.NZhangJDonahueRGirshickTDarrell“Part-based R-CNNs for fine-grained category detection,”201410.1007/978-3-319-10590-1_54Search in Google Scholar
E Gavves, B Fernando, CGM Snoek, “Fine-grained categorization by alignments,” IEEE 2013.EGavvesBFernandoCGMSnoek“Fine-grained categorization by alignments,”201310.1109/ICCV.2013.215Search in Google Scholar
P Baraldi, M Compare, S Sauco, E Zio, “Ensemble neural network-based particle filtering for prognostics,” Mechanical Systems and Signal, 2013.PBaraldiMCompareSSaucoEZio“Ensemble neural network-based particle filtering for prognostics,”201310.1016/j.ymssp.2013.07.010Search in Google Scholar
F Fan, Y Feng, “D Zhao Multi-grained attention network for aspect-level sentiment classification,” conference on empirical methods, 2018.FFanYFeng“D Zhao Multi-grained attention network for aspect-level sentiment classification,”201810.18653/v1/D18-1380Search in Google Scholar
OM Parkhi, A Vedaldi, A Zisserman, “Cats and dogs,” IEEE conference, 2012.OMParkhiAVedaldiAZisserman“Cats and dogs,”201210.1109/CVPR.2012.6248092Search in Google Scholar
G Lowe, “Sift-the scale invariant feature transform,” Int. J 2004.GLowe“Sift-the scale invariant feature transform,”2004Search in Google Scholar
N Dalal, B Triggs, “Histograms of oriented gradients for human detection,” IEEE computer society conference, 2005.NDalalBTriggs“Histograms of oriented gradients for human detection,”2005Search in Google Scholar
J Van De Weijer, C Schmid, J Verbeek, “Learning color names for real-world applications,” IEEE Transactions, 2009.JVan De WeijerCSchmidJVerbeek“Learning color names for real-world applications,”200910.1109/TIP.2009.201980919482579Search in Google Scholar
T Berg, PN Belhumeur, “Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation,” Proceedings of the IEEE, 2013.TBergPNBelhumeur“Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation,”201310.1109/CVPR.2013.128Search in Google Scholar
KC Kamal, Z Yin, B Li, B Ma, “Transfer learning for fine-grained crop disease classification based on leaf images,” IEEE, 2019.KCKamalZYinBLiBMa“Transfer learning for fine-grained crop disease classification based on leaf images,”201910.1109/WHISPERS.2019.8921213Search in Google Scholar
V Badrinarayanan, A Kendall, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on, 2017.VBadrinarayananAKendall“Segnet: A deep convolutional encoder-decoder architecture for image segmentation,”201710.1109/TPAMI.2016.264461528060704Search in Google Scholar
P Rodríguez, D Velazquez, G Cucurull, “Pay attention to the activations: a modular attention mechanism for fine-grained image recognition,” IEEE Transactions, 2019.PRodríguezDVelazquezGCucurull“Pay attention to the activations: a modular attention mechanism for fine-grained image recognition,”201910.1109/TMM.2019.2928494Search in Google Scholar
A Mafla, S Dey, AF Biten, L Gomez, “Fine-grained image classification and retrieval by combining visual and locally pooled textual features,” WACV, 2020.AMaflaSDeyAFBitenLGomez“Fine-grained image classification and retrieval by combining visual and locally pooled textual features,”202010.1109/WACV45572.2020.9093373Search in Google Scholar
X Bai, M Yang, P Lyu, Y Xu, J Luo, “Integrating scene text and visual appearance for fine-grained image classification,” IEEE Access, 2018.XBaiMYangPLyuYXuJLuo“Integrating scene text and visual appearance for fine-grained image classification,”201810.1109/ACCESS.2018.2878899Search in Google Scholar
K Cho, A Courville, Y Bengio, “Describing multimedia content using attention-based encoder-decoder networks,” IEEE Transactions on Multimedia, 2015.KChoACourvilleYBengio“Describing multimedia content using attention-based encoder-decoder networks,”201510.1109/TMM.2015.2477044Search in Google Scholar
PK Atrey, MA Hossain, A El Saddik, MS Kankanhalli, “Multimodal fusion for multimedia analysis: a survey,” Multimedia systems, 2010.PKAtreyMAHossainAEl SaddikMSKankanhalli“Multimodal fusion for multimedia analysis: a survey,”201010.1007/s00530-010-0182-0Search in Google Scholar
X Yang, P Molchanov, J Kautz, “Multilayer and multimodal fusion of deep neural networks for video classification,” Proceedings of the 24th ACM, 2016.XYangPMolchanovJKautz“Multilayer and multimodal fusion of deep neural networks for video classification,”201610.1145/2964284.2964297Search in Google Scholar
H Liu, Y Wu, F Sun, B Fang, “Weakly paired multimodal fusion for object recognition,” IEEE, 2017.HLiuYWuFSunBFang“Weakly paired multimodal fusion for object recognition,”201710.1109/TASE.2017.2692271Search in Google Scholar
N Audebert, C Herold, K Slimani, C Vidal, “Multimodal deep networks for text and image-based document classification,” Joint European Conference, 2019.NAudebertCHeroldKSlimaniCVidal“Multimodal deep networks for text and image-based document classification,”201910.1007/978-3-030-43823-4_35Search in Google Scholar
P Maragos, A Potamianos, P Gros, “Multimodal processing and interaction: audio, video, text,” IEEE 2008.PMaragosAPotamianosPGros“Multimodal processing and interaction: audio, video, text,”200810.1007/978-0-387-76316-3Search in Google Scholar
J Deng, W Dong, R Socher, LJ Li, K Li, “ImageNet,” IEEE, 2009.JDengWDongRSocherLJLiKLi“ImageNet,”2009Search in Google Scholar
Karen Simonyan, Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” Department of Engineering Science, University of Oxford, 2015.KarenSimonyanAndrewZissermanDepartment of Engineering Science, University of Oxford2015Search in Google Scholar
A Karnawat, K More, T Rade, B Rane, M Mulik, “A Survey on Easy OCR Techniques used to build Systems for Visually Impaired People,” ITB, 2016.AKarnawatKMoreTRadeBRaneMMulikITB2016Search in Google Scholar
R Smith, “An overview of the Tesseract OCR engine,” Ninth international conference on document analysis, 2007.RSmith“An overview of the Tesseract OCR engine,”200710.1109/ICDAR.2007.4376991Search in Google Scholar
KW Church, “Word2Vec,” Natural Language Engineering, 2017.KWChurch“Word2Vec,”201710.1017/S1351324916000334Search in Google Scholar