Otwarty dostęp

Trends and Challenges of Text-to-Image Generation: Sustainability Perspective


Zacytuj

Ali, A., & Renals, S. (2018). Word Error Rate Estimation for Speech Recognition: e-WER. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/p18-2004 Search in Google Scholar

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual Question Answering. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.279 Search in Google Scholar

Bagić Babac, M. (2023). Emotion analysis of user reactions to online news. Information Discovery and Delivery, 51(2), 179-193. https://doi.org/10.1108/IDD-04-2022-0027 Search in Google Scholar

Bhatnagar, V., Sharma, S., Bhatnagar, A., & Kumar, L. (2021). Role of Machine Learning in Sustainable Engineering: A Review. IOP Conference Series: Materials Science and Engineering, 1099(1), 012036. https://doi.org/10.1088/1757-899x/1099/1/012036 Search in Google Scholar

Bodnar, C. (2018). Text to image synthesis using generative adversarial networks. Available at: arXiv preprint arXiv:1805.00676. Search in Google Scholar

Čemeljić, H., & Bagić Babac, M. (2023). Preventing Security Incidents on Social Networks: An Analysis of Harmful Content Dissemination Through Applications. Police and Security (in press) Search in Google Scholar

Clark, A., Prosser, J., & Wiles, R. (2010). Ethical Issues in Image-Based Research, Arts & Health, 2(1), 81-93. doi: 10.1080/17533010903495298 Search in Google Scholar

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). “Generative adversarial networks: An overview”. IEEE signal processing magazine, 35(1), 53-65. Search in Google Scholar

Cvitanović, I., & Bagić Babac, M. (2022). Deep Learning with Self-Attention Mechanism for Fake News Detection. In Lahby, M., Pathan, A.S.K., Maleh, Y., Yafooz, W.M.S. (Eds.), Combating Fake News with Computational Intelligence Techniques (pp. 205-229). Springer, Switzerland. Search in Google Scholar

Dunđer, I., Seljan, S. & Pavlovski, M. (2021), “What Makes Machine-Translated Poetry Look Bad? A Human Error Classification Analysis.”, Central European conference on information and intelligent systems, Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, pp.183 - 191. Search in Google Scholar

Dunđer, I., Seljan, S. & Pavlovski, M. (2020), "Automatic Machine Translation of Poetry and a Low-Resource Language Pair," 43rd International Convention on Information, Communication and Electronic Technology (MIPRO 2020), Opatija, Croatia, pp. 1034-1039, doi: 10.23919/MIPRO48935.2020.9245342. Search in Google Scholar

Elasri, M., Elharrouss, O., Al-Maadeed, S., & Tairi, H. (2022). Image Generation: A Review. Neural Processing Letters, 54(5), 4609-4646. https://doi.org/10.1007/s11063-022-10777-x Search in Google Scholar

Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.169 Search in Google Scholar

Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K. R., & Samek, W. (2022). xxAI-Beyond Explainable Artificial Intelligence. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (pp. 15-47). Springer, Cham. Search in Google Scholar

Hu, Y., Liu, B., Kasai, J., Wang, Y., Ostendorf, M., Krishna, R., & Smith, N. A. (2023). Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. Available at: arXiv preprint arXiv:2303.11897. Search in Google Scholar

Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., & Belongie, S. (2017). Stacked Generative Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.202 Search in Google Scholar

Ivasic-Kos, M. (2022). Application of Digital Images and Corresponding Image Retrieval Paradigm. ENTRENOVA - ENTerprise REsearch InNOVAtion, 8(1), 350-363. https://doi.org/10.54820/entrenova-2022-0030 Search in Google Scholar

Jamwal, A., Agrawal, R., & Sharma, M. (2022). Deep learning for manufacturing sustainability: Models, applications in Industry 4.0 and implications. International Journal of Information Management Data Insights, 2(2), 100107. https://doi.org/10.1016/j.jjimei.2022.100107 Search in Google Scholar

Jurafsky, D., & Martin, J.H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, Upper Saddle River, NJ. Search in Google Scholar

Karimian, G., Petelos, E., & Evers, S. M. A. A. (2022). The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review. AI and Ethics, 2(4), 539-551. https://doi.org/10.1007/s43681-021-00131-7 Search in Google Scholar

Karras, T., Laine, S., Aila, T. & Hellsten, J. (2020). Training generative adversarial networks with limited data. Proceedings of the International Conference on Learning Representations. Advances in Neural Information Processing Systems, 33 (NeurIPS 2020) Search in Google Scholar

Krivosheev, N., Vik, K., Ivanova, Y., & Spitsyn, V. (2021). Investigation of the Batch Size Influence on the Quality of Text Generation by the SeqGAN Neural Network. Proceedings of the 31th International Conference on Computer Graphics and Vision. Volume 2. https://doi.org/10.20948/graphicon-2021-3027-1005-1010 Search in Google Scholar

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386 Search in Google Scholar

Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791 Search in Google Scholar

Li, F., Ruijs, N., & Lu, Y. (2022). Ethics & AI: A Systematic Review on Ethical Concerns and Related Strategies for Designing with AI in Healthcare. AI, 4(1), 28-53. https://doi.org/10.3390/ai4010003 Search in Google Scholar

Lin, C.-Y. (2004). ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 – 26. Search in Google Scholar

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. Computer Vision - ECCV 2014, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48 Search in Google Scholar

Lipovac, I., Bagić Babac, M. (2023), Developing a Data Pipeline Solution for Big Data Processing, International Journal of Data Mining, Modelling and Management. Accepted for publication. Search in Google Scholar

Lu, J., Xu, H., Yang, J., & Huang, Q. (2018). Neural baby talk. Proceedings of the European Conference on Computer Vision (pp. 721-736). Search in Google Scholar

Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z. & Smolley, P. (2017). Least squares generative adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (pp. 2794-2802). Search in Google Scholar

Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I. & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. Available at: arXiv preprint arXiv:2112.10741. Search in Google Scholar

Oliveira dos Santos, G., Colombini, E. L., & Avila, S. (2021). CIDEr-R: Robust Consensus-based Image Description Evaluation. Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). https://doi.org/10.18653/v1/2021.wnut-1.39 Search in Google Scholar

Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318. Search in Google Scholar

Persello, C., Wegner, J. D., Hansch, R., Tuia, D., Ghamisi, P., Koeva, M., & Camps-Valls, G. (2022). Deep Learning and Earth Observation to Support the Sustainable Development Goals: Current approaches, open challenges, and future opportunities. IEEE Geoscience and Remote Sensing Magazine, 10(2), 172-200. https://doi.org/10.1109/mgrs.2021.3136100 Search in Google Scholar

Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., & Rombach, R. (2023). SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv preprint arXiv:2307.01952. Search in Google Scholar

Puh, K., Bagić Babac, M. (2023a). Predicting sentiment and rating of tourist reviews using machine learning, Journal of Hospitality and Tourism Insights, 6(3), 1188-1204. https://doi.org/10.1108/JHTI-02-2022-0078 Search in Google Scholar

Puh, K., & Bagić Babac, M. (2023b). Predicting stock market using natural language processing. American Journal of Business, 38(2), 41-61. https://doi.org/10.1108/ajb-08-2022-0124 Search in Google Scholar

Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). MirrorGAN: Learning Text-To-Image Generation by Redescription. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00160 Search in Google Scholar

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. Available at: https://arxiv.org/abs/2204.06125 Search in Google Scholar

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. In International Conference on Machine Learning (pp. 8821-8831). Available at: https://arxiv.org/abs/2102.12092 Search in Google Scholar

Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning Deep Representations of Fine-Grained Visual Descriptions. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.13 Search in Google Scholar

Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. https://doi.org/10.1109/tpami.2016.2577031 Search in Google Scholar

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28 Search in Google Scholar

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0 Search in Google Scholar

Sah, S., Peri, D., Shringi, A., Zhang, C., Dominguez, M., Savakis, A., & Ptucha, R. (2018). Semantically Invariant Text-to-Image Generation. 2018 25th IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip.2018.8451656 Search in Google Scholar

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Kamyar, S., Ghasemipour, S., Ayan, B. K., Mahdavi, S. S., Lopes, R. G., Salimans, T., Ho, J., Fleet, D. J., & Norouzi, M. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv:2205.11487 Search in Google Scholar

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. Available at: https://arxiv.org/abs/1606.03498 Search in Google Scholar

Samek, W., Wiegand, T. & Müller, K. R. (2017). Explainable artificial intelligence: Understanding, visualising and interpreting deep learning models. Available at: https://arxiv.org/abs/1708.08296 Search in Google Scholar

Šandor, D., & Bagić Babac, M. (2023). Sarcasm detection in online comments using machine learning. Information Discovery and Delivery. https://doi.org/10.1108/idd-01-2023-0002 Search in Google Scholar

Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298594 Search in Google Scholar

Tomičić Furjan, M., Tomičić-Pupek, K., & Pihir, I. (2020). Understanding Digital Transformation Initiatives: Case Studies Analysis. Business Systems Research, 11 (1), 125-141. https://doi.org/10.2478/bsrj-2020-0009 Search in Google Scholar

Tunmibi, S., & Okhakhu, D. (2022). Machine Learning for Sustainable Development. In Conference proceedings of the First Conference of the National Institute of Office Administrators and Information Managers (NIOAIM) between 7th and 10th February, Lead City University, Ibadan, Oyo State, Nigeria. Search in Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need, In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), Curran Associates (pp. 6000-6010). Red Hook, NY, USA Search in Google Scholar

Vinuesa, R., & Sirmacek, B. (2021). Interpretable deep-learning models to help achieve the Sustainable Development Goals. Nature Machine Intelligence, 3(11), 926-926. https://doi.org/10.1038/s42256-021-00414-y Search in Google Scholar

Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2019). Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251-2265. https://doi.org/10.1109/tpami.2018.2857768 Search in Google Scholar

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R. & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048-2057). PMLR. Search in Google Scholar

Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2018.00143 Search in Google Scholar

Yildirim, E.. (2022). Text-to-Image Generation A.I. in Architecture, In (Kozlu Hale, H., 2022). Art and Architecture: Theory, Practice and Experience, Lyon: Livre de Lyon, 97-120. Search in Google Scholar

Zhang, C., Zhang, C., Zhang, M., & Kweon, I. S. (2023). Text-to-image Diffusion Models in Generative AI: A Survey. Available at: https://arxiv.org/abs/2303.07909 Search in Google Scholar

Zhang, H., Koh, J. Y., Baldridge, J., Lee, H., & Yang, Y. (2021). Cross-Modal Contrastive Learning for Text-to-Image Generation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00089 Search in Google Scholar

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2019). StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1947-1962. https://doi.org/10.1109/tpami.2018.2856256 Search in Google Scholar