Zitieren

Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.

eISSN:
2470-8038
Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
4 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Informatik, andere