Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

In this study, we explore the implications of dataset limitations in semantic knowledge-driven machine translation (MT) for intelligent virtual assistants (IVA). Our approach diverges from traditional single-best translation techniques, utilizing a multi-variant MT method that generates multiple valid translations per input sentence through a constrained beam search. This method extends beyond the typical constraints of specific verb ontologies, embedding within a broader semantic knowledge framework.

We evaluate the performance of multi-variant MT models in translating training sets for Natural Language Understanding (NLU) models. These models are applied to semantically diverse datasets, including a detailed evaluation using the standard MultiATIS++ dataset. The results from this evaluation indicate that while multivariant MT method is promising, its impact on improving intent classification (IC) accuracy is limited when applied to conventional datasets such as MultiATIS++. However, our findings underscore that the effectiveness of multivariant translation is closely associated with the diversity and suitability of the datasets utilized.

Finally, we provide an in-depth analysis focused on generating variant-aware NLU datasets. This analysis aims to offer guidance on enhancing NLU models through semantically rich and variant-sensitive datasets, maximizing the advantages of multi-variant MT.

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Engineering, Electrical Engineering, Control Engineering, Metrology and Testing, Mechanical Engineering, Fundamentals of Mechanical Engineering

Journal RSS Feed

Analysis of Dataset Limitations in Semantic Knowledge-Driven Multi-Variant Machine Translation

Marcin Sowański

Jakub Hościłowicz

Artur Janicki

Published Online: Sep 12, 2024

Page range: 39 - 48

Received: Dec 27, 2023

Accepted: Mar 10, 2024

DOI: https://doi.org/10.14313/jamris/3-2024/20

Keywordsmachine translation, intelligent virtual assistants, natural language understanding

© 2024 Marcin Sowański et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
machine translation, intelligent virtual assistants, natural language understanding