Hyper-parameter optimization in neural-based translation systems: A case study

[1] I. J. Unanue, E. Z. Borzeshi, and M. Piccardi, “Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation,” IEEE Trans. Artif. Intell., vol. 00, no. 0, pp. 1–15, 2022, doi: 10.1109/TAI.2022.3187680. UnanueI. J. BorzeshiE. Z. PiccardiM. “Regressing Word and Sentence Embeddings for Low-Resource Neural Machine Translation,” IEEE Trans. Artif. Intell. 00 0 1 15 2022 10.1109/TAI.2022.3187680 Open DOI Search in Google Scholar

[2] H. Wang, H. Wu, Z. He, L. Huang, and K. W. Church, “Progress in Machine Translation,” Engineering, vol. 18, pp. 143–153, 2022, doi: 10.1016/j.eng.2021.03.023. WangH. WuH. HeZ. HuangL. ChurchK. W. “Progress in Machine Translation,” Engineering 18 143 153 2022 10.1016/j.eng.2021.03.023 Open DOI Search in Google Scholar

[3] S. A. Wang Na, Zhang Xiaohong, “A Research on HMM based Speech Recognition in Spoken English,” Recent Adv. Electr. Electron. Eng., 2021. Wang NaS. A. ZhangXiaohong “A Research on HMM based Speech Recognition in Spoken English,” Recent Adv. Electr. Electron. Eng. 2021 Search in Google Scholar

[4] A. Banerjee et al., “BENGALI-ENGLISH RELEVANT CROSS LINGUAL INFORMATION ACCESS USING FINITE AUTOMATA,” 2010, pp. 595–599, doi: 10.1063/1.3516373. BanerjeeA. “BENGALI-ENGLISH RELEVANT CROSS LINGUAL INFORMATION ACCESS USING FINITE AUTOMATA,” 2010 595 599 10.1063/1.3516373 Open DOI Search in Google Scholar

[5] P. Koehn, F. J. Och, and D. Marcu, “Statistical Phrase-Based Translation,” no. June, pp. 48–54, 2003. KoehnP. OchF. J. MarcuD. “Statistical Phrase-Based Translation,” June 48 54 2003 Search in Google Scholar

[6] P. Koehn et al., “Moses: open source toolkit for statistical machine translation,” Proc. 45th Annu. Meet. ACL Interact. Poster Demonstr. Sess. - ACL ’07, no. June, p. 177, 2007, doi: 10.3115/1557769.1557821. KoehnP. “Moses: open source toolkit for statistical machine translation,” Proc. 45th Annu. Meet. ACL Interact. Poster Demonstr. Sess. - ACL ’07 June 177 2007 10.3115/1557769.1557821 Open DOI Search in Google Scholar

[7] Y. Wu et al., “Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” pp. 1–23, 2016, [Online]. Available: http://arxiv.org/abs/1609.08144. WuY. “Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” 1 23 2016 [Online]. Available: http://arxiv.org/abs/1609.08144. Search in Google Scholar

[8] F. Stahlberg, “Neural machine translation: A review,” J. Artif. Intell. Res., vol. 69, pp. 343–418, 2020, doi: 10.1613/JAIR.1.12007. StahlbergF. “Neural machine translation: A review,” J. Artif. Intell. Res. 69 343 418 2020 10.1613/JAIR.1.12007 Open DOI Search in Google Scholar

[9] Z. Tan et al., “Neural machine translation: A review of methods, resources, and tools,” AI Open, vol. 1, no. October 2020, pp. 5–21, 2020, doi: 10.1016/j.aiopen.2020.11.001. TanZ. “Neural machine translation: A review of methods, resources, and tools,” AI Open 1 October 2020 5 21 2020 10.1016/j.aiopen.2020.11.001 Open DOI Search in Google Scholar

[10] E. Salesky, A. Runge, A. Coda, J. Niehues, and G. Neubig, “Optimizing segmentation granularity for neural machine translation,” Mach. Transl., vol. 34, no. 1, pp. 41–59, 2020, doi: 10.1007/s10590-019-09243-8. SaleskyE. RungeA. CodaA. NiehuesJ. NeubigG. “Optimizing segmentation granularity for neural machine translation,” Mach. Transl. 34 1 41 59 2020 10.1007/s10590-019-09243-8 Open DOI Search in Google Scholar

[11] R. Lim, K. Heafield, H. Hoang, M. Briers, and A. Malony, “Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures,” pp. 1–8, 2018, [Online]. Available: http://arxiv.org/abs/1805.02094. LimR. HeafieldK. HoangH. BriersM. MalonyA. “Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures,” 1 8 2018 [Online]. Available: http://arxiv.org/abs/1805.02094. Search in Google Scholar

[12] R. Rubino, B. Marie, R. Dabre, A. Fujita, M. Utiyama, and E. Sumita, Extremely low-resource neural machine translation for Asian languages, vol. 34, no. 4. Springer Netherlands, 2020. RubinoR. MarieB. DabreR. FujitaA. UtiyamaM. SumitaE. Extremely low-resource neural machine translation for Asian languages 34 4 Springer Netherlands 2020 Search in Google Scholar

[13] Y. Li, J. Li, and M. Zhang, “Deep Transformer modeling via grouping skip connection for neural machine translation,” Knowledge-Based Syst., vol. 234, p. 107556, 2021, doi: 10.1016/j.knosys.2021.107556. LiY. LiJ. ZhangM. “Deep Transformer modeling via grouping skip connection for neural machine translation,” Knowledge-Based Syst. 234 107556 2021 10.1016/j.knosys.2021.107556 Open DOI Search in Google Scholar

[14] N. Tran, J.-G. Schneider, I. Weber, and A. K. Qin, “Hyper-parameter Optimization in Classification: To-do or Not-to-do,” Pattern Recognit., vol. 103, p. 107245, Jul. 2020, doi: 10.1016/j.patcog.2020.107245. TranN. SchneiderJ.-G. WeberI. QinA. K. “Hyper-parameter Optimization in Classification: To-do or Not-to-do,” Pattern Recognit. 103 107245 Jul. 2020 10.1016/j.patcog.2020.107245 Open DOI Search in Google Scholar

[15] L. H. B. Nguyen, V. H. Pham, and D. Dinh, “Improving Neural Machine Translation with AMR Semantic Graphs,” Math. Probl. Eng., vol. 2021, 2021, doi: 10.1155/2021/9939389. NguyenL. H. B. PhamV. H. DinhD. “Improving Neural Machine Translation with AMR Semantic Graphs,” Math. Probl. Eng. 2021 2021 10.1155/2021/9939389 Open DOI Search in Google Scholar

[16] G. X. Luo, Y. T. Yang, R. Dong, Y. H. Chen, and W. B. Zhang, “A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation,” Math. Probl. Eng., vol. 2020, 2020, doi: 10.1155/2020/6140153. LuoG. X. YangY. T. DongR. ChenY. H. ZhangW. B. “A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation,” Math. Probl. Eng. 2020 2020 10.1155/2020/6140153 Open DOI Search in Google Scholar

[17] J. G. Carbonell, R. E. Cullingford, and A. V. Gershman, “Steps Toward Knowledge-Based Machine Translation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-3, no. 4, pp. 376–392, 1981, doi: 10.1109/TPAMI.1981.4767124. CarbonellJ. G. CullingfordR. E. GershmanA. V. “Steps Toward Knowledge-Based Machine Translation,” IEEE Trans. Pattern Anal. Mach. Intell. PAMI-3 4 376 392 1981 10.1109/TPAMI.1981.4767124 Open DOI Search in Google Scholar

[18] C. K. Wu, C. C. Shih, Y. C. Wang, and R. T. H. Tsai, “Improving low-resource machine transliteration by using 3-way transfer learning,” Comput. Speech Lang., vol. 72, no. February 2020, p. 101283, 2022, doi: 10.1016/j.csl.2021.101283. WuC. K. ShihC. C. WangY. C. TsaiR. T. H. “Improving low-resource machine transliteration by using 3-way transfer learning,” Comput. Speech Lang. 72 February 2020 101283 2022 10.1016/j.csl.2021.101283 Open DOI Search in Google Scholar

[19] L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020, doi: 10.1016/j.neucom.2020.07.061. YangL. ShamiA. “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing 415 295 316 2020 10.1016/j.neucom.2020.07.061 Open DOI Search in Google Scholar

[20] Y. Bengio, “Practical recommendations for gradient-based training of deep architectures,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7700 LECTU, pp. 437–478, 2012, doi: 10.1007/978-3-642-35289-8_26. BengioY. “Practical recommendations for gradient-based training of deep architectures,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 7700 LECTU 437 478 2012 10.1007/978-3-642-35289-8_26 Open DOI Search in Google Scholar

[21] M. Y. Mikheev, Y. S. Gusynina, and T. A. Shornikova, Building Neural Network for Pattern Recognition. 2020. MikheevM. Y. GusyninaY. S. ShornikovaT. A. Building Neural Network for Pattern Recognition 2020 Search in Google Scholar

[22] C. M. Bishop, Neural Networks for Pattern Recognition. USA: Oxford University Press, Inc., 1995. BishopC. M. Neural Networks for Pattern Recognition USA Oxford University Press, Inc. 1995 Search in Google Scholar

[23] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, 2012. BergstraJ. BengioY. “Random search for hyper-parameter optimization,” J. Mach. Learn. Res. 13 281 305 2012 Search in Google Scholar

[24] G. Dhiman, V. Vinoth Kumar, A. Kaur, and A. Sharma, “DON: Deep Learning and Optimization-Based Framework for Detection of Novel Coronavirus Disease Using X-ray Images,” Interdiscip. Sci. – Comput. Life Sci., vol. 13, no. 2, pp. 260–272, 2021, doi: 10.1007/s12539-021-00418-7. DhimanG. Vinoth KumarV. KaurA. SharmaA. “DON: Deep Learning and Optimization-Based Framework for Detection of Novel Coronavirus Disease Using X-ray Images,” Interdiscip. Sci. – Comput. Life Sci. 13 2 260 272 2021 10.1007/s12539-021-00418-7 Open DOI Search in Google Scholar

[25] G. Melis, C. Dyer, and P. Blunsom, “On the state of the art of evaluation in neural language models,” arXiv Prepr. arXiv1707.05589, 2017. MelisG. DyerC. BlunsomP. “On the state of the art of evaluation in neural language models,” arXiv Prepr. arXiv1707.05589 2017 Search in Google Scholar

[26] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., vol. 4, no. January, pp. 3104–3112, 2014. SutskeverI. VinyalsO. LeQ. V. “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst. 4 January 3104 3112 2014 Search in Google Scholar

[27] D. Bahdanau, K. H. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, 2015. BahdanauD. ChoK. H. BengioY. “Neural machine translation by jointly learning to align and translate,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1 15 2015 Search in Google Scholar

[28] M. T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process., pp. 1412–1421, 2015, doi: 10.18653/v1/d15-1166. LuongM. T. PhamH. ManningC. D. “Effective approaches to attention-based neural machine translation,” Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process. 1412 1421 2015 10.18653/v1/d15-1166 Open DOI Search in Google Scholar

[29] A. Vaswani et al., “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Nips, pp. 5999–6009, 2017. VaswaniA. “Attention is all you need,” Adv. Neural Inf. Process. Syst. 2017-Decem Nips 5999 6009 2017 Search in Google Scholar

[30] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019. DevlinJ. ChangM. W. LeeK. ToutanovaK. “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf. 1 Mlm 4171 4186 2019 Search in Google Scholar

[31] I. Goodfellow et al., “Generative adversarial networks,” Commun. ACM, vol. 63, no. 11, pp. 139–144, 2020, doi: 10.1145/3422622. GoodfellowI. “Generative adversarial networks,” Commun. ACM 63 11 139 144 2020 10.1145/3422622 Open DOI Search in Google Scholar

[32] Z. Yang, W. Chen, F. Wang, and B. Xu, “Generative adversarial training for neural machine translation,” Neurocomputing, vol. 321, pp. 146–155, 2018, doi: 10.1016/j.neucom.2018.09.006. YangZ. ChenW. WangF. XuB. “Generative adversarial training for neural machine translation,” Neurocomputing 321 146 155 2018 10.1016/j.neucom.2018.09.006 Open DOI Search in Google Scholar

[33] L. Wu et al., “Adversarial Neural Machine Translation,” 2017, [Online]. Available: http://arxiv.org/abs/1704.06933. WuL. “Adversarial Neural Machine Translation,” 2017 [Online]. Available: http://arxiv.org/abs/1704.06933. Search in Google Scholar

[34] Z. Zhang, S. Liu, M. Li, M. Zhou, and E. Chen, “Bidirectional generative adversarial networks for neural machine translation,” CoNLL 2018 - 22nd Conf. Comput. Nat. Lang. Learn. Proc., no. CoNLL, pp. 190–199, 2018, doi: 10.18653/v1/k18-1019. ZhangZ. LiuS. LiM. ZhouM. ChenE. “Bidirectional generative adversarial networks for neural machine translation,” CoNLL 2018 - 22nd Conf. Comput. Nat. Lang. Learn. Proc. CoNLL 190 199 2018 10.18653/v1/k18-1019 Open DOI Search in Google Scholar

[35] C. H. Lin, C. J. Lin, Y. C. Li, and S. H. Wang, “Using generative adversarial networks and parameter optimization of convolutional neural networks for lung tumor classification,” Applied Sciences (Switzerland), vol. 11, no. 2. pp. 1–17, 2021, doi: 10.3390/app11020480. LinC. H. LinC. J. LiY. C. WangS. H. “Using generative adversarial networks and parameter optimization of convolutional neural networks for lung tumor classification,” Applied Sciences (Switzerland) 11 2 1 17 2021 10.3390/app11020480 Open DOI Search in Google Scholar

[36] S. Merity, N. S. Keskar, and R. Socher, “Regularizing and optimizing LSTM language models,” 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc., 2018. MerityS. KeskarN. S. SocherR. “Regularizing and optimizing LSTM language models,” 6th Int. Conf. Learn. Represent. ICLR 2018 - Conf. Track Proc. 2018 Search in Google Scholar

[37] X. Liu, K. Duh, L. Liu, and J. Gao, “Very Deep Transformers for Neural Machine Translation,” 2020, [Online]. Available: http://arxiv.org/abs/2008.07772. LiuX. DuhK. LiuL. GaoJ. “Very Deep Transformers for Neural Machine Translation,” 2020 [Online]. Available: http://arxiv.org/abs/2008.07772. Search in Google Scholar

[38] X. Zhang and K. Duh, “Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 393–408, 2020, doi: 10.1162/tacl_a_00322. ZhangX. DuhK. “Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems,” Trans. Assoc. Comput. Linguist. 8 393 408 2020 10.1162/tacl_a_00322 Open DOI Search in Google Scholar

[39] X. Liu, W. Wang, W. Liang, and Y. Li, “Speed Up the Training of Neural Machine Translation,” Neural Process. Lett., vol. 51, no. 1, pp. 231–249, 2020, doi: 10.1007/s11063-019-10084-y. LiuX. WangW. LiangW. LiY. “Speed Up the Training of Neural Machine Translation,” Neural Process. Lett. 51 1 231 249 2020 10.1007/s11063-019-10084-y Open DOI Search in Google Scholar

[40] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “{B}leu: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Jul. 2002, pp. 311–318, doi: 10.3115/1073083.1073135. PapineniK. RoukosS. WardT. ZhuW.-J. “{B}leu: a Method for Automatic Evaluation of Machine Translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics Jul. 2002 311 318 10.3115/1073083.1073135 Open DOI Search in Google Scholar

[41] G. Datta, N. Joshi, and K. Gupta, “Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score,” in Speech and Computer, 2022, pp. 155–162. DattaG. JoshiN. GuptaK. “Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score,” in Speech and Computer 2022 155 162 Search in Google Scholar

[42] A. Pathak and P. Pakray, “Neural machine translation for Indian languages,” J. Intell. Syst., vol. 28, no. 3, pp. 465–477, 2019, doi: 10.1515/jisys-2018-0065. PathakA. PakrayP. “Neural machine translation for Indian languages,” J. Intell. Syst. 28 3 465 477 2019 10.1515/jisys-2018-0065 Open DOI Search in Google Scholar

[43] A. Koutsoukas, K. J. Monaghan, X. Li, and J. Huan, “Deep-learning: Investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data,” J. Cheminform., vol. 9, no. 1, pp. 1–13, 2017, doi: 10.1186/s13321-017-0226-y. KoutsoukasA. MonaghanK. J. LiX. HuanJ. “Deep-learning: Investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data,” J. Cheminform. 9 1 1 13 2017 10.1186/s13321-017-0226-y Open DOI Search in Google Scholar

[44] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–15, 2015. KingmaD. P. BaJ. L. “Adam: A method for stochastic optimization,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1 15 2015 Search in Google Scholar

[45] H. Gete and T. Etchegoyhen, “Making the most of comparable corpora in Neural Machine Translation: a case study,” Lang. Resour. Eval., 2022, doi: 10.1007/s10579-021-09572-2. GeteH. EtchegoyhenT. “Making the most of comparable corpora in Neural Machine Translation: a case study,” Lang. Resour. Eval. 2022 10.1007/s10579-021-09572-2 Open DOI Search in Google Scholar

[46] B. Zoph, D. Yuret, J. May, and K. Knight, “Transfer learning for low-resource neural machine translation,” EMNLP 2016 - Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 1568–1575, 2016, doi: 10.18653/v1/d16-1163. ZophB. YuretD. MayJ. KnightK. “Transfer learning for low-resource neural machine translation,” EMNLP 2016 - Conf. Empir. Methods Nat. Lang. Process. Proc. 1568 1575 2016 10.18653/v1/d16-1163 Open DOI Search in Google Scholar

[47] R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” 54th Annu. Meet. Assoc. Comput. Linguist. ACL 2016 - Long Pap., vol. 3, pp. 1715–1725, 2016, doi: 10.18653/v1/p16-1162. SennrichR. HaddowB. BirchA. “Neural machine translation of rare words with subword units,” 54th Annu. Meet. Assoc. Comput. Linguist. ACL 2016 - Long Pap. 3 1715 1725 2016 10.18653/v1/p16-1162 Open DOI Search in Google Scholar

eISSN:: 1178-5608
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Engineering, Introductions and Overviews, other

Journal RSS Feed

Hyper-parameter optimization in neural-based translation systems: A case study

Article Category: Article

Published Online: Sep 25, 2023

Page range: -

Received: Feb 22, 2023

DOI: https://doi.org/10.2478/ijssis-2023-0010

KeywordsNeural machine translation, LSTM, transformer, GAN, hyper-parameter

© 2023 Goutam Datta et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Neural machine translation, LSTM, transformer, GAN, hyper-parameter