Accesso libero

Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

INFORMAZIONI SU QUESTO ARTICOLO

Cita

The Transformer is an important addition to the rapidly increasing list of different Artificial Neural Networks (ANNs) suited for extremely complex automation tasks. It has already gained the position of the tool of choice in automatic translation in many business solutions. In this paper, we present an automated approach to optimizing the Transformer structure based upon Simulated Annealing, an algorithm widely recognized for both its simplicity and usability in optimization tasks where the search space may be highly complex. The proposed method allows for the use of parallel computing and time-efficient optimization, thanks to modifying the structure while training the network rather than performing the two one after another. The algorithm presented does not reset the weights after changes in the transformer structure. Instead, it continues the training process to allow the results to be adapted without randomizing all the training parameters. The algorithm has shown a promising performance during experiments compared to traditional training methods without structural modifications. The solution has been released as open-source to facilitate further development and use by the machine learning community.

eISSN:
2449-6499
Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Computer Sciences, Artificial Intelligence, Databases and Data Mining