Accesso libero

Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

INFORMAZIONI SU QUESTO ARTICOLO

Cita

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, https://arxiv.org/abs/1706.03762 (2017). https://doi.org/10.48550/ARXIV.1706.03762. Search in Google Scholar

N. Li, S. Liu, Y. Liu, S. Zhao, M. Liu, Neural speech synthesis with transformer network, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 33, 2019, pp. 6706–6713. Search in Google Scholar

P. Morris, R. St. Clair, W. E. Hahn, E. Barenholtz, Predicting binding from screening assays with transformer network embeddings, Journal of Chemical Information and Modeling 60 (9) (2020) 4191–4199. Search in Google Scholar

F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, H. Fu, Transformers in medical imaging: A survey, https://arxiv.org/abs/2201.09873 (2022). https://doi.org/10.48550/ARXIV.2201.09873. Search in Google Scholar

T. Lin, Y. Wang, X. Liu, X. Qiu, A survey of transformers (2021). https://doi.org/10.48550/ARXIV.2106.04554. Search in Google Scholar

M. Zhang, J. Li, A commentary of gpt-3 in mit technology review 2021, Fundamental Research 1 (6) (2021) 831–833. Search in Google Scholar

M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, arXiv preprint arXiv:1807.03819 (2018). Search in Google Scholar

M. Feurer, F. Hutter, Hyperparameter optimization, in: Automated machine learning, Springer, Cham, 2019, pp. 3–33. Search in Google Scholar

J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, S.-H. Deng, Hyperparameter optimization for machine learning models based on bayesian optimization, Journal of Electronic Science and Technology 17 (1) (2019) 26–40. Search in Google Scholar

R. Turner, D. Eriksson, M. McCourt, J. Kiili, E. Laaksonen, Z. Xu, I. Guyon, Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020, in: NeurIPS 2020 Competition and Demonstration Track, PMLR, 2021, pp. 3–26. Search in Google Scholar

R. G. Mantovani, A. L. Rossi, J. Vanschoren, B. Bischl, A. C. De Carvalho, Effectiveness of random search in svm hyper-parameter tuning, in: 2015 International Joint Conference on Neural Networks (IJCNN), Ieee, 2015, pp. 1–8. Search in Google Scholar

X. He, K. Zhao, X. Chu, Automl: A survey of the state-of-the-art, Knowledge-Based Systems 212 (2021) 106622. Search in Google Scholar

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110. Search in Google Scholar

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020). Search in Google Scholar

M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, I. Sutskever, Generative pretraining from pixels, in: International conference on machine learning, PMLR, 2020, pp. 1691–1703. Search in Google Scholar

N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, D. Tran, Image transformer, in: International conference on machine learning, PMLR, 2018, pp. 4055–4064. Search in Google Scholar

Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, L. Sun, Transformers in time series: A survey, arXiv preprint arXiv:2202.07125 (2022). Search in Google Scholar

E. Dogo, O. Afolabi, N. Nwulu, B. Twala, C. Aigbavboa, A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks, in: 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS), IEEE, 2018, pp. 92–99. Search in Google Scholar

S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond, arXiv preprint arXiv:1904.09237 (2019). Search in Google Scholar

Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, K. C. Tan, A survey on evolutionary neural architecture search, IEEE transactions on neural networks and learning systems (2021). Search in Google Scholar

G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, Q. Le, Understanding and simplifying one-shot architecture search, in: International conference on machine learning, PMLR, 2018, pp. 550–559. Search in Google Scholar

J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, X. Wang, Densely connected search space for more flexible neural architecture search, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10628–10637. Search in Google Scholar

P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, X. Wang, A comprehensive survey of neural architecture search: Challenges and solutions, ACM Computing Surveys (CSUR) 54 (4) (2021) 1–34. Search in Google Scholar

K. Murray, J. Kinnison, T. Q. Nguyen, W. Scheirer, D. Chiang, Auto-sizing the transformer network: Improving speed, efficiency, and performance for low-resource machine translation, https://arxiv.org/abs/1910.06717 (2019), http://arxiv.org/abs/1910.06717. Search in Google Scholar

M. Baldeon-Calisto, S. K. Lai-Yuen, Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation, Neurocomputing 392 (2020) 325–340. Search in Google Scholar

K. Chen, W. Pang, Immunetnas: An immune-network approach for searching convolutional neural network architectures, arXiv preprint arXiv:2002.12704 (2020). Search in Google Scholar

M. Wistuba, A. Rawat, T. Pedapati, A survey on neural architecture search, arXiv preprint arXiv:1905.01392 (2019). Search in Google Scholar

J. G. Robles, J. Vanschoren, Learning to reinforcement learn for neural architecture search, arXiv preprint arXiv:1911.03769 (2019). Search in Google Scholar

A. Vahdat, A. Mallya, M.-Y. Liu, J. Kautz, Unas: Differentiable architecture search meets reinforcement learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11266–11275. Search in Google Scholar

K. De Jong, Evolutionary computation: a unified approach, in: Proceedings of the 2016 on genetic and evolutionary computation conference companion, 2016, pp. 185–199. Search in Google Scholar

S. Gibb, H. M. La, S. Louis, A genetic algorithm for convolutional network structure optimization for concrete crack detection, in: 2018 IEEE congress on evolutionary computation (CEC), IEEE, 2018, pp. 1–8. Search in Google Scholar

L. Xie, A. Yuille, Genetic cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 1379–1388. Search in Google Scholar

F. Ye, Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data, PloS one 12 (12) (2017) e0188746. Search in Google Scholar

K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, E. Xing, https://arxiv.org/abs/1802.07191, Neural architecture search with bayesian optimisation and optimal transport (2018), https://doi.org/10.48550/ARXIV.1802.07191, https://arxiv.org/abs/1802.07191. Search in Google Scholar

F. P. Casale, J. Gordon, N. Fusi, Probabilistic neural architecture search, arXiv preprint arXiv:1902.05116 (2019). Search in Google Scholar

H. K. Singh, T. Ray, W. Smith, Surrogate assisted simulated annealing (SASA) for constrained multi-objective optimization, in: IEEE congress on evolutionary computation, IEEE, 2010, pp. 1–8. Search in Google Scholar

K. T. Chitty-Venkata, M. Emani, V. Vishwanath, A. K. Somani, Neural architecture search for transformers: A survey, IEEE Access 10 (2022) 108374–108412, https://doi.org/10.1109/ACCESS.2022.3212767. Search in Google Scholar

J. Kim, J. Wang, S. Kim, Y. Lee, Evolved speech-transformer: Applying neural architecture search to end-to-end automatic speech recognition., in: Interspeech, 2020, pp. 1788–1792. Search in Google Scholar

D. Cummings, A. Sarah, S. N. Sridhar, M. Szankin, J. P. Munoz, S. Sundaresan, A hardware-aware framework for accelerating neural architecture search across modalities (2022), http://arxiv.org/abs/2205.10358. Search in Google Scholar

D. So, Q. Le, C. Liang, The evolved transformer, in: International conference on machine learning, PMLR, 2019, pp. 5877–5886. Search in Google Scholar

D. So, W. Mańke, H. Liu, Z. Dai, N. Shazeer, Q. V. Le, Searching for efficient transformers for language modeling, Advances in neural information processing systems 34 (2021) 6010–6022. Search in Google Scholar

V. Cahlik, P. Kordik, M. Cepek, Adapting the size of artificial neural networks using dynamic autosizing, in: 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), IEEE, 2022, pp. 592–596. Search in Google Scholar

B. Goldberger, G. Katz, Y. Adi, J. Keshet, Minimal modifications of deep neural networks using verification., in: LPAR, Vol. 2020, 2020, p. 23. Search in Google Scholar

T. Chen, I. Goodfellow, J. Shlens, Net2net: Accelerating learning via knowledge transfer, arXiv preprint arXiv:1511.05641 (2015). Search in Google Scholar

T. Wei, C. Wang, Y. Rui, C. W. Chen, Network morphism, in: International conference on machine learning, PMLR, 2016, pp. 564–572. Search in Google Scholar

H. Jin, Q. Song, X. Hu, Auto-keras: An efficient neural architecture search system, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 1946–1956. Search in Google Scholar

T. Elsken, J. H. Metzen, F. Hutter, https://openreview.net/forum?id=ByME42AqK7 Efficient multi-objective neural architecture search via lamarckian evolution (2019), https://openreview.net/forum?id=ByME42AqK7 Search in Google Scholar

P. J. Van Laarhoven, E. H. Aarts, Simulated annealing, in: Simulated annealing: Theory and applications, Springer, 1987, pp. 7–15. Search in Google Scholar

D. J. Ram, T. Sreenivas, K. G. Subramaniam, Parallel simulated annealing algorithms, Journal of parallel and distributed computing 37 (2) (1996) 207–212, https://doi.org/https://doi.org/10.1006/jpdc.1996.0121. Search in Google Scholar

N. Metropolis, A. W. Rosenbluth, M. N. Rosen-bluth, A. H. Teller, E. Teller, Equation of state calculations by fast computing machines, The journal of chemical physics 21 (6) (1953) 1087–1092. Search in Google Scholar

D. Delahaye, S. Chaimatanan, M. Mongeau, Simulated annealing: From basics to applications, Handbook of metaheuristics (2019) 1–35. Search in Google Scholar

Z. Xinchao, Simulated annealing algorithm with adaptive neighborhood, Applied Soft Computing 11 (2) (2011) 1827–1836. Search in Google Scholar

Z. Michalewicz, Z. Michalewicz, GAs: Why Do They Work?, Springer, 1996. Search in Google Scholar

S.-H. Zhan, J. Lin, Z.-J. Zhang, Y.-W. Zhong, List-based simulated annealing algorithm for traveling salesman problem, Computational intelligence and neuroscience 2016 (2016). Search in Google Scholar

M. E. Aydin, V. Yigit, 12 parallel simulated annealing, Parallel Metaheuristics: A new Class of Algorithms (2005) 267. Search in Google Scholar

L. Ozdamar, Parallel simulated annealing algorithms in global optimization, Journal of global optimization 19 (2001) 27–50. Search in Google Scholar

G. P. Babu, M. N. Murty, Simulated annealing for selecting optimal initial seeds in the k-means algorithm, Indian Journal of Pure and Applied Mathematics 25 (1-2) (1994) 85–94. Search in Google Scholar

R. S. Sexton, R. E. Dorsey, J. D. Johnson, Optimization of neural networks: A comparative analysis of the genetic algorithm and simulated annealing, European Journal of Operational Research 114 (3) (1999) 589–601. Search in Google Scholar

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014). Search in Google Scholar

S. Mo, J. Xia, P. Ren, Simulated annealing for neural architecture search, Advances in Neural Information Processing Systems (NeurIPS) (2021). Search in Google Scholar

C.-W. Tsai, C.-H. Hsia, S.-J. Yang, S.-J. Liu, Z.-Y. Fang, Optimizing hyperparameters of deep learning in predicting bus passengers based on simulated annealing, Applied soft computing 88 (2020) 106068. Search in Google Scholar

H.-K. Park, J.-H. Lee, J. Lee, S.-K. Kim, Optimizing machine learning models for granular ndfeb magnets by very fast simulated annealing, Scientific Reports 11 (1) (2021) 3792. Search in Google Scholar

L. Ingber, Very fast simulated re-annealing, Mathematical and computer modelling 12 (8) (1989) 967–973. Search in Google Scholar

M. Fischetti, M. Stringher, https://arxiv.org/abs/1906.01504, Embedded hyper-parameter tuning by simulated annealing (2019), https://doi.org/10.48550/ARXIV.1906.01504, https://arxiv.org/abs/1906.01504. Search in Google Scholar

M. Chen, H. Peng, J. Fu, H. Ling, Autoformer: Searching transformers for visual recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12270–12280. Search in Google Scholar

A. Cutkosky, H. Mehta, Momentum improves normalized sgd, in: International conference on machine learning, PMLR, 2020, pp. 2260–2268. Search in Google Scholar

M. Trzciński, Optimizing the Structures of Transformer Neural Networks using Parallel Simulated Annealing (3 2023). Search in Google Scholar

F. Guzmán, P.-J. Chen, M. Ott, J. Pino, G. Lample, P. Koehn, V. Chaudhary, M. Ranzato, Two new evaluation datasets for low-resource machine translation: Nepali-english and sinhala-english, 2019. Search in Google Scholar

P. Koehn, https://aclanthology.org/2005.mtsummit-papers.11, Europarl: A parallel corpus for statistical machine translation, in: Proceedings of Machine Translation Summit X: Papers, Phuket, Thailand, 2005, pp. 79–86. https://aclanthology.org/2005.mtsummit-papers.11. Search in Google Scholar

J. Tiedemann, Parallel data, tools and interfaces in opus., in: Lrec, Vol. 2012, Citeseer, 2012, pp. 2214–2218. Search in Google Scholar

K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318. Search in Google Scholar

C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81. Search in Google Scholar

M. Post, A call for clarity in reporting bleu scores, arXiv preprint arXiv:1804.08771 (2018). Search in Google Scholar

D. Coughlin, Correlating automated and human assessments of machine translation quality, in: Proceedings of Machine Translation Summit IX: Papers, 2003. Search in Google Scholar

K. Ganesan, Rouge 2.0: Updated and improved measures for evaluation of summarization tasks, arXiv preprint arXiv:1803.01937 (2018). Search in Google Scholar

eISSN:
2449-6499
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, Artificial Intelligence, Databases and Data Mining