This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Ali, A., & Renals, S. (2018). Word Error Rate Estimation for Speech Recognition: e-WER. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/p18-2004Search in Google Scholar
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual Question Answering. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.279Search in Google Scholar
Bagić Babac, M. (2023). Emotion analysis of user reactions to online news. Information Discovery and Delivery, 51(2), 179-193. https://doi.org/10.1108/IDD-04-2022-0027Search in Google Scholar
Bhatnagar, V., Sharma, S., Bhatnagar, A., & Kumar, L. (2021). Role of Machine Learning in Sustainable Engineering: A Review. IOP Conference Series: Materials Science and Engineering, 1099(1), 012036. https://doi.org/10.1088/1757-899x/1099/1/012036Search in Google Scholar
Bodnar, C. (2018). Text to image synthesis using generative adversarial networks. Available at: arXiv preprint arXiv:1805.00676.Search in Google Scholar
Čemeljić, H., & Bagić Babac, M. (2023). Preventing Security Incidents on Social Networks: An Analysis of Harmful Content Dissemination Through Applications. Police and Security (in press)Search in Google Scholar
Clark, A., Prosser, J., & Wiles, R. (2010). Ethical Issues in Image-Based Research, Arts & Health, 2(1), 81-93. doi: 10.1080/17533010903495298Search in Google Scholar
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). “Generative adversarial networks: An overview”. IEEE signal processing magazine, 35(1), 53-65.Search in Google Scholar
Cvitanović, I., & Bagić Babac, M. (2022). Deep Learning with Self-Attention Mechanism for Fake News Detection. In Lahby, M., Pathan, A.S.K., Maleh, Y., Yafooz, W.M.S. (Eds.), Combating Fake News with Computational Intelligence Techniques (pp. 205-229). Springer, Switzerland.Search in Google Scholar
Dunđer, I., Seljan, S. & Pavlovski, M. (2021), “What Makes Machine-Translated Poetry Look Bad? A Human Error Classification Analysis.”, Central European conference on information and intelligent systems, Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, pp.183 - 191.Search in Google Scholar
Dunđer, I., Seljan, S. & Pavlovski, M. (2020), "Automatic Machine Translation of Poetry and a Low-Resource Language Pair," 43rd International Convention on Information, Communication and Electronic Technology (MIPRO 2020), Opatija, Croatia, pp. 1034-1039, doi: 10.23919/MIPRO48935.2020.9245342.Search in Google Scholar
Elasri, M., Elharrouss, O., Al-Maadeed, S., & Tairi, H. (2022). Image Generation: A Review. Neural Processing Letters, 54(5), 4609-4646. https://doi.org/10.1007/s11063-022-10777-xSearch in Google Scholar
Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.169Search in Google Scholar
Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K. R., & Samek, W. (2022). xxAI-Beyond Explainable Artificial Intelligence. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (pp. 15-47). Springer, Cham.Search in Google Scholar
Hu, Y., Liu, B., Kasai, J., Wang, Y., Ostendorf, M., Krishna, R., & Smith, N. A. (2023). Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. Available at: arXiv preprint arXiv:2303.11897.Search in Google Scholar
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., & Belongie, S. (2017). Stacked Generative Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).https://doi.org/10.1109/cvpr.2017.202Search in Google Scholar
Ivasic-Kos, M. (2022). Application of Digital Images and Corresponding Image Retrieval Paradigm. ENTRENOVA - ENTerprise REsearch InNOVAtion, 8(1), 350-363. https://doi.org/10.54820/entrenova-2022-0030Search in Google Scholar
Jamwal, A., Agrawal, R., & Sharma, M. (2022). Deep learning for manufacturing sustainability: Models, applications in Industry 4.0 and implications. International Journal of Information Management Data Insights, 2(2), 100107. https://doi.org/10.1016/j.jjimei.2022.100107Search in Google Scholar
Jurafsky, D., & Martin, J.H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, Upper Saddle River, NJ.Search in Google Scholar
Karimian, G., Petelos, E., & Evers, S. M. A. A. (2022). The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review. AI and Ethics, 2(4), 539-551. https://doi.org/10.1007/s43681-021-00131-7Search in Google Scholar
Karras, T., Laine, S., Aila, T. & Hellsten, J. (2020). Training generative adversarial networks with limited data. Proceedings of the International Conference on Learning Representations. Advances in Neural Information Processing Systems, 33 (NeurIPS 2020)Search in Google Scholar
Krivosheev, N., Vik, K., Ivanova, Y., & Spitsyn, V. (2021). Investigation of the Batch Size Influence on the Quality of Text Generation by the SeqGAN Neural Network. Proceedings of the 31th International Conference on Computer Graphics and Vision. Volume 2. https://doi.org/10.20948/graphicon-2021-3027-1005-1010Search in Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386Search in Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791Search in Google Scholar
Li, F., Ruijs, N., & Lu, Y. (2022). Ethics & AI: A Systematic Review on Ethical Concerns and Related Strategies for Designing with AI in Healthcare. AI, 4(1), 28-53. https://doi.org/10.3390/ai4010003Search in Google Scholar
Lin, C.-Y. (2004). ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 – 26.Search in Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. Computer Vision - ECCV 2014, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48Search in Google Scholar
Lipovac, I., Bagić Babac, M. (2023), Developing a Data Pipeline Solution for Big Data Processing, International Journal of Data Mining, Modelling and Management. Accepted for publication.Search in Google Scholar
Lu, J., Xu, H., Yang, J., & Huang, Q. (2018). Neural baby talk. Proceedings of the European Conference on Computer Vision (pp. 721-736).Search in Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z. & Smolley, P. (2017). Least squares generative adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (pp. 2794-2802).Search in Google Scholar
Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I. & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. Available at: arXiv preprint arXiv:2112.10741.Search in Google Scholar
Oliveira dos Santos, G., Colombini, E. L., & Avila, S. (2021). CIDEr-R: Robust Consensus-based Image Description Evaluation. Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). https://doi.org/10.18653/v1/2021.wnut-1.39Search in Google Scholar
Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318.Search in Google Scholar
Persello, C., Wegner, J. D., Hansch, R., Tuia, D., Ghamisi, P., Koeva, M., & Camps-Valls, G. (2022). Deep Learning and Earth Observation to Support the Sustainable Development Goals: Current approaches, open challenges, and future opportunities. IEEE Geoscience and Remote Sensing Magazine, 10(2), 172-200. https://doi.org/10.1109/mgrs.2021.3136100Search in Google Scholar
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., & Rombach, R. (2023). SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv preprint arXiv:2307.01952.Search in Google Scholar
Puh, K., Bagić Babac, M. (2023a). Predicting sentiment and rating of tourist reviews using machine learning, Journal of Hospitality and Tourism Insights, 6(3), 1188-1204. https://doi.org/10.1108/JHTI-02-2022-0078Search in Google Scholar
Puh, K., & Bagić Babac, M. (2023b). Predicting stock market using natural language processing. American Journal of Business, 38(2), 41-61. https://doi.org/10.1108/ajb-08-2022-0124Search in Google Scholar
Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). MirrorGAN: Learning Text-To-Image Generation by Redescription. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).https://doi.org/10.1109/cvpr.2019.00160Search in Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. Available at: https://arxiv.org/abs/2204.06125Search in Google Scholar
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. In International Conference on Machine Learning (pp. 8821-8831). Available at: https://arxiv.org/abs/2102.12092Search in Google Scholar
Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning Deep Representations of Fine-Grained Visual Descriptions. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).https://doi.org/10.1109/cvpr.2016.13Search in Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. https://doi.org/10.1109/tpami.2016.2577031Search in Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28Search in Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0Search in Google Scholar
Sah, S., Peri, D., Shringi, A., Zhang, C., Dominguez, M., Savakis, A., & Ptucha, R. (2018). Semantically Invariant Text-to-Image Generation. 2018 25th IEEE International Conference on Image Processing (ICIP).https://doi.org/10.1109/icip.2018.8451656Search in Google Scholar
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Kamyar, S., Ghasemipour, S., Ayan, B. K., Mahdavi, S. S., Lopes, R. G., Salimans, T., Ho, J., Fleet, D. J., & Norouzi, M. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv:2205.11487Search in Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. Available at: https://arxiv.org/abs/1606.03498Search in Google Scholar
Samek, W., Wiegand, T. & Müller, K. R. (2017). Explainable artificial intelligence: Understanding, visualising and interpreting deep learning models. Available at: https://arxiv.org/abs/1708.08296Search in Google Scholar
Šandor, D., & Bagić Babac, M. (2023). Sarcasm detection in online comments using machine learning. Information Discovery and Delivery. https://doi.org/10.1108/idd-01-2023-0002Search in Google Scholar
Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).https://doi.org/10.1109/cvpr.2015.7298594Search in Google Scholar
Tomičić Furjan, M., Tomičić-Pupek, K., & Pihir, I. (2020). Understanding Digital Transformation Initiatives: Case Studies Analysis. Business Systems Research, 11 (1), 125-141. https://doi.org/10.2478/bsrj-2020-0009Search in Google Scholar
Tunmibi, S., & Okhakhu, D. (2022). Machine Learning for Sustainable Development. In Conference proceedings of the First Conference of the National Institute of Office Administrators and Information Managers (NIOAIM) between 7th and 10th February, Lead City University, Ibadan, Oyo State, Nigeria.Search in Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need, In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), Curran Associates (pp. 6000-6010). Red Hook, NY, USASearch in Google Scholar
Vinuesa, R., & Sirmacek, B. (2021). Interpretable deep-learning models to help achieve the Sustainable Development Goals. Nature Machine Intelligence, 3(11), 926-926. https://doi.org/10.1038/s42256-021-00414-ySearch in Google Scholar
Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2019). Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251-2265. https://doi.org/10.1109/tpami.2018.2857768Search in Google Scholar
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R. & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048-2057). PMLR.Search in Google Scholar
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2018.00143Search in Google Scholar
Yildirim, E.. (2022). Text-to-Image Generation A.I. in Architecture, In (Kozlu Hale, H., 2022). Art and Architecture: Theory, Practice and Experience, Lyon: Livre de Lyon, 97-120.Search in Google Scholar
Zhang, C., Zhang, C., Zhang, M., & Kweon, I. S. (2023). Text-to-image Diffusion Models in Generative AI: A Survey. Available at: https://arxiv.org/abs/2303.07909Search in Google Scholar
Zhang, H., Koh, J. Y., Baldridge, J., Lee, H., & Yang, Y. (2021). Cross-Modal Contrastive Learning for Text-to-Image Generation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).https://doi.org/10.1109/cvpr46437.2021.00089Search in Google Scholar
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2019). StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1947-1962. https://doi.org/10.1109/tpami.2018.2856256Search in Google Scholar