Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, https://arxiv.org/abs/1706.03762 (2017). https://doi.org/10.48550/ARXIV.1706.03762. Search in Google Scholar

[2] N. Li, S. Liu, Y. Liu, S. Zhao, M. Liu, Neural speech synthesis with transformer network, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 33, 2019, pp. 6706–6713. Search in Google Scholar

[3] P. Morris, R. St. Clair, W. E. Hahn, E. Barenholtz, Predicting binding from screening assays with transformer network embeddings, Journal of Chemical Information and Modeling 60 (9) (2020) 4191–4199. Search in Google Scholar

[4] F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, H. Fu, Transformers in medical imaging: A survey, https://arxiv.org/abs/2201.09873 (2022). https://doi.org/10.48550/ARXIV.2201.09873. Search in Google Scholar

[5] T. Lin, Y. Wang, X. Liu, X. Qiu, A survey of transformers (2021). https://doi.org/10.48550/ARXIV.2106.04554. Search in Google Scholar

[6] M. Zhang, J. Li, A commentary of gpt-3 in mit technology review 2021, Fundamental Research 1 (6) (2021) 831–833. Search in Google Scholar

[7] M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, arXiv preprint arXiv:1807.03819 (2018). Search in Google Scholar

[8] M. Feurer, F. Hutter, Hyperparameter optimization, in: Automated machine learning, Springer, Cham, 2019, pp. 3–33. Search in Google Scholar

[9] J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, S.-H. Deng, Hyperparameter optimization for machine learning models based on bayesian optimization, Journal of Electronic Science and Technology 17 (1) (2019) 26–40. Search in Google Scholar

[10] R. Turner, D. Eriksson, M. McCourt, J. Kiili, E. Laaksonen, Z. Xu, I. Guyon, Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020, in: NeurIPS 2020 Competition and Demonstration Track, PMLR, 2021, pp. 3–26. Search in Google Scholar

[11] R. G. Mantovani, A. L. Rossi, J. Vanschoren, B. Bischl, A. C. De Carvalho, Effectiveness of random search in svm hyper-parameter tuning, in: 2015 International Joint Conference on Neural Networks (IJCNN), Ieee, 2015, pp. 1–8. Search in Google Scholar

[12] X. He, K. Zhao, X. Chu, Automl: A survey of the state-of-the-art, Knowledge-Based Systems 212 (2021) 106622. Search in Google Scholar

[13] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110. Search in Google Scholar

[14] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020). Search in Google Scholar

[15] M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, I. Sutskever, Generative pretraining from pixels, in: International conference on machine learning, PMLR, 2020, pp. 1691–1703. Search in Google Scholar

[16] N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, D. Tran, Image transformer, in: International conference on machine learning, PMLR, 2018, pp. 4055–4064. Search in Google Scholar

[17] Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, L. Sun, Transformers in time series: A survey, arXiv preprint arXiv:2202.07125 (2022). Search in Google Scholar

[18] E. Dogo, O. Afolabi, N. Nwulu, B. Twala, C. Aigbavboa, A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks, in: 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS), IEEE, 2018, pp. 92–99. Search in Google Scholar

[19] S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond, arXiv preprint arXiv:1904.09237 (2019). Search in Google Scholar

[20] Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, K. C. Tan, A survey on evolutionary neural architecture search, IEEE transactions on neural networks and learning systems (2021). Search in Google Scholar

[21] G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, Q. Le, Understanding and simplifying one-shot architecture search, in: International conference on machine learning, PMLR, 2018, pp. 550–559. Search in Google Scholar

[22] J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, X. Wang, Densely connected search space for more flexible neural architecture search, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10628–10637. Search in Google Scholar

[23] P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, X. Wang, A comprehensive survey of neural architecture search: Challenges and solutions, ACM Computing Surveys (CSUR) 54 (4) (2021) 1–34. Search in Google Scholar

[24] K. Murray, J. Kinnison, T. Q. Nguyen, W. Scheirer, D. Chiang, Auto-sizing the transformer network: Improving speed, efficiency, and performance for low-resource machine translation, https://arxiv.org/abs/1910.06717 (2019), http://arxiv.org/abs/1910.06717. Search in Google Scholar

[25] M. Baldeon-Calisto, S. K. Lai-Yuen, Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation, Neurocomputing 392 (2020) 325–340. Search in Google Scholar

[26] K. Chen, W. Pang, Immunetnas: An immune-network approach for searching convolutional neural network architectures, arXiv preprint arXiv:2002.12704 (2020). Search in Google Scholar

[27] M. Wistuba, A. Rawat, T. Pedapati, A survey on neural architecture search, arXiv preprint arXiv:1905.01392 (2019). Search in Google Scholar

[28] J. G. Robles, J. Vanschoren, Learning to reinforcement learn for neural architecture search, arXiv preprint arXiv:1911.03769 (2019). Search in Google Scholar

[29] A. Vahdat, A. Mallya, M.-Y. Liu, J. Kautz, Unas: Differentiable architecture search meets reinforcement learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11266–11275. Search in Google Scholar

[30] K. De Jong, Evolutionary computation: a unified approach, in: Proceedings of the 2016 on genetic and evolutionary computation conference companion, 2016, pp. 185–199. Search in Google Scholar

[31] S. Gibb, H. M. La, S. Louis, A genetic algorithm for convolutional network structure optimization for concrete crack detection, in: 2018 IEEE congress on evolutionary computation (CEC), IEEE, 2018, pp. 1–8. Search in Google Scholar

[32] L. Xie, A. Yuille, Genetic cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 1379–1388. Search in Google Scholar

[33] F. Ye, Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data, PloS one 12 (12) (2017) e0188746. Search in Google Scholar

[34] K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, E. Xing, https://arxiv.org/abs/1802.07191, Neural architecture search with bayesian optimisation and optimal transport (2018), https://doi.org/10.48550/ARXIV.1802.07191, https://arxiv.org/abs/1802.07191. Search in Google Scholar

[35] F. P. Casale, J. Gordon, N. Fusi, Probabilistic neural architecture search, arXiv preprint arXiv:1902.05116 (2019). Search in Google Scholar

[36] H. K. Singh, T. Ray, W. Smith, Surrogate assisted simulated annealing (SASA) for constrained multi-objective optimization, in: IEEE congress on evolutionary computation, IEEE, 2010, pp. 1–8. Search in Google Scholar

[37] K. T. Chitty-Venkata, M. Emani, V. Vishwanath, A. K. Somani, Neural architecture search for transformers: A survey, IEEE Access 10 (2022) 108374–108412, https://doi.org/10.1109/ACCESS.2022.3212767. Search in Google Scholar

[38] J. Kim, J. Wang, S. Kim, Y. Lee, Evolved speech-transformer: Applying neural architecture search to end-to-end automatic speech recognition., in: Interspeech, 2020, pp. 1788–1792. Search in Google Scholar

[39] D. Cummings, A. Sarah, S. N. Sridhar, M. Szankin, J. P. Munoz, S. Sundaresan, A hardware-aware framework for accelerating neural architecture search across modalities (2022), http://arxiv.org/abs/2205.10358. Search in Google Scholar

[40] D. So, Q. Le, C. Liang, The evolved transformer, in: International conference on machine learning, PMLR, 2019, pp. 5877–5886. Search in Google Scholar

[41] D. So, W. Mańke, H. Liu, Z. Dai, N. Shazeer, Q. V. Le, Searching for efficient transformers for language modeling, Advances in neural information processing systems 34 (2021) 6010–6022. Search in Google Scholar

[42] V. Cahlik, P. Kordik, M. Cepek, Adapting the size of artificial neural networks using dynamic autosizing, in: 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), IEEE, 2022, pp. 592–596. Search in Google Scholar

[43] B. Goldberger, G. Katz, Y. Adi, J. Keshet, Minimal modifications of deep neural networks using verification., in: LPAR, Vol. 2020, 2020, p. 23. Search in Google Scholar

[44] T. Chen, I. Goodfellow, J. Shlens, Net2net: Accelerating learning via knowledge transfer, arXiv preprint arXiv:1511.05641 (2015). Search in Google Scholar

[45] T. Wei, C. Wang, Y. Rui, C. W. Chen, Network morphism, in: International conference on machine learning, PMLR, 2016, pp. 564–572. Search in Google Scholar

[46] H. Jin, Q. Song, X. Hu, Auto-keras: An efficient neural architecture search system, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 1946–1956. Search in Google Scholar

[47] T. Elsken, J. H. Metzen, F. Hutter, https://openreview.net/forum?id=ByME42AqK7 Efficient multi-objective neural architecture search via lamarckian evolution (2019), https://openreview.net/forum?id=ByME42AqK7 Search in Google Scholar

[48] P. J. Van Laarhoven, E. H. Aarts, Simulated annealing, in: Simulated annealing: Theory and applications, Springer, 1987, pp. 7–15. Search in Google Scholar

[49] D. J. Ram, T. Sreenivas, K. G. Subramaniam, Parallel simulated annealing algorithms, Journal of parallel and distributed computing 37 (2) (1996) 207–212, https://doi.org/https://doi.org/10.1006/jpdc.1996.0121. Search in Google Scholar

[50] N. Metropolis, A. W. Rosenbluth, M. N. Rosen-bluth, A. H. Teller, E. Teller, Equation of state calculations by fast computing machines, The journal of chemical physics 21 (6) (1953) 1087–1092. Search in Google Scholar

[51] D. Delahaye, S. Chaimatanan, M. Mongeau, Simulated annealing: From basics to applications, Handbook of metaheuristics (2019) 1–35. Search in Google Scholar

[52] Z. Xinchao, Simulated annealing algorithm with adaptive neighborhood, Applied Soft Computing 11 (2) (2011) 1827–1836. Search in Google Scholar

[53] Z. Michalewicz, Z. Michalewicz, GAs: Why Do They Work?, Springer, 1996. Search in Google Scholar

[54] S.-H. Zhan, J. Lin, Z.-J. Zhang, Y.-W. Zhong, List-based simulated annealing algorithm for traveling salesman problem, Computational intelligence and neuroscience 2016 (2016). Search in Google Scholar

[55] M. E. Aydin, V. Yigit, 12 parallel simulated annealing, Parallel Metaheuristics: A new Class of Algorithms (2005) 267. Search in Google Scholar

[56] L. Ozdamar, Parallel simulated annealing algorithms in global optimization, Journal of global optimization 19 (2001) 27–50. Search in Google Scholar

[57] G. P. Babu, M. N. Murty, Simulated annealing for selecting optimal initial seeds in the k-means algorithm, Indian Journal of Pure and Applied Mathematics 25 (1-2) (1994) 85–94. Search in Google Scholar

[58] R. S. Sexton, R. E. Dorsey, J. D. Johnson, Optimization of neural networks: A comparative analysis of the genetic algorithm and simulated annealing, European Journal of Operational Research 114 (3) (1999) 589–601. Search in Google Scholar

[59] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014). Search in Google Scholar

[60] S. Mo, J. Xia, P. Ren, Simulated annealing for neural architecture search, Advances in Neural Information Processing Systems (NeurIPS) (2021). Search in Google Scholar

[61] C.-W. Tsai, C.-H. Hsia, S.-J. Yang, S.-J. Liu, Z.-Y. Fang, Optimizing hyperparameters of deep learning in predicting bus passengers based on simulated annealing, Applied soft computing 88 (2020) 106068. Search in Google Scholar

[62] H.-K. Park, J.-H. Lee, J. Lee, S.-K. Kim, Optimizing machine learning models for granular ndfeb magnets by very fast simulated annealing, Scientific Reports 11 (1) (2021) 3792. Search in Google Scholar

[63] L. Ingber, Very fast simulated re-annealing, Mathematical and computer modelling 12 (8) (1989) 967–973. Search in Google Scholar

[64] M. Fischetti, M. Stringher, https://arxiv.org/abs/1906.01504, Embedded hyper-parameter tuning by simulated annealing (2019), https://doi.org/10.48550/ARXIV.1906.01504, https://arxiv.org/abs/1906.01504. Search in Google Scholar

[65] M. Chen, H. Peng, J. Fu, H. Ling, Autoformer: Searching transformers for visual recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12270–12280. Search in Google Scholar

[66] A. Cutkosky, H. Mehta, Momentum improves normalized sgd, in: International conference on machine learning, PMLR, 2020, pp. 2260–2268. Search in Google Scholar

[67] M. Trzciński, Optimizing the Structures of Transformer Neural Networks using Parallel Simulated Annealing (3 2023). Search in Google Scholar

[68] F. Guzmán, P.-J. Chen, M. Ott, J. Pino, G. Lample, P. Koehn, V. Chaudhary, M. Ranzato, Two new evaluation datasets for low-resource machine translation: Nepali-english and sinhala-english, 2019. Search in Google Scholar

[69] P. Koehn, https://aclanthology.org/2005.mtsummit-papers.11, Europarl: A parallel corpus for statistical machine translation, in: Proceedings of Machine Translation Summit X: Papers, Phuket, Thailand, 2005, pp. 79–86. https://aclanthology.org/2005.mtsummit-papers.11. Search in Google Scholar

[70] J. Tiedemann, Parallel data, tools and interfaces in opus., in: Lrec, Vol. 2012, Citeseer, 2012, pp. 2214–2218. Search in Google Scholar

[71] K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. Search in Google Scholar

[72] C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81. Search in Google Scholar

[73] M. Post, A call for clarity in reporting bleu scores, arXiv preprint arXiv:1804.08771 (2018). Search in Google Scholar

[74] D. Coughlin, Correlating automated and human assessments of machine translation quality, in: Proceedings of Machine Translation Summit IX: Papers, 2003. Search in Google Scholar

[75] K. Ganesan, Rouge 2.0: Updated and improved measures for evaluation of summarization tasks, arXiv preprint arXiv:1803.01937 (2018). Search in Google Scholar

eISSN:: 2449-6499
Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Computer Sciences, Artificial Intelligence, Databases and Data Mining

Feed RSS della rivista

Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

Pubblicato online: 11 giu 2024

Pagine: 267 - 282

Ricevuto: 09 gen 2024

Accettato: 26 mag 2024

DOI: https://doi.org/10.2478/jaiscr-2024-0015

Parole chiaveneural architecture search, transformers, natural language processing, simulated annealing

© 2024 Maciej Trzciński et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parole chiave
neural architecture search, transformers, natural language processing, simulated annealing