Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

The Transformer is an important addition to the rapidly increasing list of different Artificial Neural Networks (ANNs) suited for extremely complex automation tasks. It has already gained the position of the tool of choice in automatic translation in many business solutions. In this paper, we present an automated approach to optimizing the Transformer structure based upon Simulated Annealing, an algorithm widely recognized for both its simplicity and usability in optimization tasks where the search space may be highly complex. The proposed method allows for the use of parallel computing and time-efficient optimization, thanks to modifying the structure while training the network rather than performing the two one after another. The algorithm presented does not reset the weights after changes in the transformer structure. Instead, it continues the training process to allow the results to be adapted without randomizing all the training parameters. The algorithm has shown a promising performance during experiments compared to traditional training methods without structural modifications. The solution has been released as open-source to facilitate further development and use by the machine learning community.

Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Informatica, Base dati e data mining, Intelligenza artificiale

Feed RSS della rivista

Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

Maciej Trzciński

Szymon Łukasik

Amir H. Gandomi

Pubblicato online: 11 giu 2024

Pagine: 267 - 282

Ricevuto: 09 gen 2024

Accettato: 26 mag 2024

DOI: https://doi.org/10.2478/jaiscr-2024-0015

Parole chiaveneural architecture search, transformers, natural language processing, simulated annealing

© 2024 Maciej Trzciński et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parole chiave
neural architecture search, transformers, natural language processing, simulated annealing