Practice and Evaluation of an Intelligent Translation System for English Language Teaching in the Classroom

As the most widely used language in the world, English is an important tool for China’s trade with other countries, so the teaching of English education is highly valued. With the accelerated pace of scientific and technological innovation, the value of intelligent translation systems in higher education is becoming more and more prominent. In order to improve the efficiency and quality of English teaching, many educators follow the trend of the times and apply AI translation models in English classrooms to solve the translation problems encountered by students in the process of learning English [1-2].

At this stage, teaching and translation needs occupy a central position in English teaching in colleges and universities. Teachers and students have a growing need for real-time access to accurate translations in order to find gaps and fill in the gaps so as to improve the quality and efficiency of learning [3-4]. Personalised learning solutions and resource sharing have also become an important part of teaching needs, helping to adapt to multiple learning modes of different students and enriching teaching content [5-6]. The intelligent translation system for English language teaching not only helps to improve the quality of teaching, but also promotes students’ personalised learning, enriches teachers’ teaching resources, as well as promotes the innovation of teaching and research in colleges [7-9]. Intelligent translation system based on digital technology provides strong support for students’ personalised learning, and in the human-computer collaborative environment, students are able to freely explore the translation practices of different languages and fields, strengthen their own skills and gain a deeper understanding of the efficiency and practicality of machine translation [10-13]. In addition, teachers are able to expand the teaching space and integrate post-translation editing technology through neural machine translation, which lays the foundation for cultivating translators adapted to the new era [14-16].

Literature [17] constructs an English intelligent translation system based on an improved multi-objective optimisation algorithm model, and uses a semi-supervised neural machine translation method to train on a parallel corpus and a monolingual corpus, and the experiments show that the translation model has a high degree of intelligence, and is able to meet the actual translation needs. Literature [18] used Internet of Things technology and big data model to construct an intelligent translation system for English, compared with other translation models, the proposed big data intelligent translation model is the best in the comparison experiments of translation speed, translation accuracy and response time, which is reliable and effective. Literature [19] proposes an intelligent English automatic translation system (ATS) that combines artificial intelligence technology and support vector machine technology, which achieves system optimisation and upgrading by analysing user behaviour data in logs, and provides a vocabulary intelligent proof-reading method based on support vector machines, which effectively improves the accuracy of translation. Literature [20] establishes an intelligent English translation system based on model predictive control algorithm (MPC) based on machine learning algorithms, and proposes a control solution combining self-triggered MPC and robust control to reduce the loss of translation performance in the process of speech-to-text translation. Literature [21] shows that AI technology plays an important role in the design of English translation teaching software, and the new AI translation system can not only translate a large number of English texts as well as spoken materials more intelligently, quickly and accurately, but also achieve personalised English translation teaching by adding functional modules. Literature [22] introduces a chatbot equipped with an AI learning system, which has the function of bidirectional conversion of speech and text, and can translate speech, and its use in the English translation teaching classroom can effectively enhance the learning experience of students. Literature [23] points out that although machine translation is convenient, it can be potentially harmful to language learners, and argues that teachers should use interventions to adjust students’ attitudes towards the use of machine translation tools when designing teaching and learning activities based on machine translation, and to cultivate students’ awareness of the process of language learning, so as to give them a sense of satisfaction in communicating in the target language.

From the perspective of system requirements and overall architecture, this paper identifies the neural network machine translation function module, personalised recommendation module of teaching resources, and adaptive learning path module based on genetic algorithm, which together form the function module of English teaching intelligent translation system. It is found that there are some problems in the practical teaching of English courses in colleges and universities. Based on this situation, it is proposed to introduce the English teaching intelligent translation system into the practical teaching of English courses in colleges and universities, so as to promote the development of practical teaching of English courses in colleges and universities. The research samples are selected, the relevant indicators and parameters are set, and a mixed approach of statistical analysis and simulation analysis is adopted to explore the promotion effect of the ELT Intelligent Translation System on the practical teaching of English courses in colleges and universities, with the aim of promoting the digital innovation and development of the practical teaching of English courses in colleges and universities.

2

Intelligent Translation System for English Language Teaching

2.1

Theoretical Analysis of System Requirements and Overall Architecture

2.1.1

System requirements theory analysis

Teaching and translation needs are at the centre of English language teaching in higher education. Teachers and students are increasingly demanding real-time access to accurate translations in order to find gaps and make up for deficiencies so as to improve the quality and efficiency of learning. Figure 1 shows a schematic diagram of the system requirements analysis. Personalised learning solutions and resource sharing have also become an important part of the teaching requirements, helping to adapt to multiple learning modes of different students and enriching the teaching content. In terms of translation requirements, quality and accuracy are at the top of the list to ensure that the meaning of the original text is accurately conveyed despite cultural and contextual differences. In addition, efficiency and professionalism are equally important, especially in academic and teaching scenarios, where the need for fast and accurate professional translation in different disciplines and fields has become a basic requirement.

2.1.2

Overall structure

The overall architecture of the English Teaching Intelligent Translation System is shown in Figure 2. The system as a whole is mainly divided into functional class modules and non-functional parts. The functional modules include neural network machine translation module, personalised recommendation module of teaching resources, and adaptive learning path module based on genetic algorithm. The non-functional part contains visual interaction page, request processing and forwarding, permission and system control, logging and management, data caching and data persistence.

2.2

Neural Network Machine Translation Functional Module

2.2.1

Word vectors

The first step in translating a natural language processing problem into a machine learning problem is definitely to find a way to mathematize these symbols. In natural language processing, the simplest method of word representation is One-hot Representation, which represents each word as a very long vector, which is the word representation, where the vast majority of the elements are 0, and there is only one dimension with a value of 1, which represents the current word [24]. For example:

“Noun” can be represented as [10000000000...].

“Adjective” can be represented as [00000001000...].

A word vector in a neural network is a low-dimensional vector of real numbers, which makes some words closer together, such as related or similar words, and this distance is usually defined according to the Euclidean distance, the cosine of the angle. Word vectors not only avoid the dimensionality catastrophe problem, but also the models constructed using word vectors are inherently smooth due to the small distances between similar or related words.

2.2.2

The n-gram model

Language models have an important place in natural language processing, in fact a language model measures the fluency of a sentence or evaluates the likelihood that a person will use a certain sequence of words, and it has a wide range of applications in a number of natural language processing tasks, such as speech recognition, lexical annotation, and machine translation. Assuming that word sequence $w$ consists of $t$ words, i.e., $w = w_{1}, w_{2}, \dots, w_{i}$ , the probability of production of word sequence $w$ is: (1) $\begin{matrix} P (w) = P (w_{1}, w_{2}, \dots, w_{t}) \\ = P (w_{1}) P (w_{2} |w_{1}) P (w_{3} |w_{1}, w_{2}) \dots P (w_{t} |w_{1}, \dots, w_{t - 1}) \end{matrix}$

It is desired to find the $P (w)$ with the highest probability, but there are too many parameters to be estimated in the above equation, and the longer the sentence, the more parameters it involves, which would make it too computationally intensive. Appropriate methods can be taken to simplify the model and reduce the parameters.

If Markov’s assumption is used, i.e., the probability of the current word occurrence is only related to the previous word, the model is simplified to: (2) $P (w) = P (w_{1}) P (w_{2} |w_{1}) P (w_{3} |w_{2}) \dots P (w_{t} |w_{t - 1})$

The above is the 2-gram, binary grammar model. And if it is extended to mean that the probability of the current word occurrence is related to the previous $n - 1$ word, a $n - g r a m$ , $n$ -gram model of grammar is formed, that is: (3) $P (w_{t} |w_{1}, \dots, w_{t - 1}) = P (w_{t} |w_{t - n + 1}^{t - 1})$

The $n$ -gram model is relatively simple, and the most commonly used language model is the $n$ -gram model, however, due to the extremely common lack of $n$ -gram words in the training corpus, which tends to cause sparsity in the data, some smoothing algorithms need to be used in the model [25]. As the length of the context increases, the number of $n$ -grams increases exponentially, which prevents the model from effectively capturing longer context types, which is the biggest drawback of the $n$ -gram model. Thus the idea of applying neural networks to language models overcomes the exponential increase in parameters by sharing them between similar data.

2.2.3

Neuroprobabilistic language models

Statistical language models can be expressed in the form of multiplication of conditional probabilities, viz: (4) $\hat{P} (w_{1}^{T}) = \prod_{1}^{T} \hat{P} (w_{t} |w_{1}^{t - 1})$ where $w_{t}$ is the $t$ nd word and $w_{i}^{j} = (w_{i}, w_{i + 1}, \dots, w_{j - 1}, w_{j})$ denotes a sequence of words. And the fact that we know that words that are closer together in the word sequence are statistically more dependent, so using the $n$ -element syntactic model can be obtained: (5) $\hat{P} (w_{t} |w_{1}^{t - 1}) \approx \hat{P} (w_{t} |w_{t - n + 1}^{t - 1})$

Only those combinations of consecutive words that occur frequently enough in the training corpus are considered here.

The Neuroprobabilistic Language Model is shown in Figure 3. The Neuroprobabilistic Language Model uses a neural network model to estimate $\hat{P} (w_{t} |w_{t - n + 1}^{t - 1})$ . The training set for the model is a sequence of $w_{1}, \dots, u_{T}$ ( ${u^{'}}_{i} \in V$ , where the vocabulary, is a large and finite set), and the goal is to learn a good model $f (w_{t} \dots \dots w_{t - n + 1}) = \hat{P} \{(w_{t} |w_{1}^{t - 1})$ . There is a default restriction within the model: for an arbitrary $w_{t - n + 1}^{t - 1}$ there is a $\sum_{i = 1}^{V} f (i, w_{t - 1}, \dots, w_{t - n + 1}) = 1$ and a $f > 0$ . $f (w_{t}, \dots w_{t - n + 1})$ will be considered in two parts.

1)

The mapping $C$ of the elements in $V$ to the real vector $C (i) \in ℝ^{m}$ represents the correspondence of is a word in the vocabulary with its corresponding distributed feature quantity. In fact, $C$ represents a matrix of $|V| \times m$ .

2)

The probability function of a word is represented by $g$ : the function $g$ represents the conditional probability fraction of the input sequence $(C (w_{t - n + 1}) \dots . C (w_{t - 1}))$ of feature vectors of the word in context to the next word $u_{i}$ in the vocabulary $V$ . The loss of the number $g$ is an interrogative quantity whose $i$ th element estimates the probability of $\hat{P} (w_{t} = i |u_{1}^{t - 1})$ . Thus, merging these two steps yields: (6) $f (i, w_{t - 1}, \dots \cdot w_{t - n - 1}) = g (i, C (w_{t - 1}), \dots C (w (t - n + 1)))$

Thus, function $f$ is a composite of mappings $C$ and $g$ . The argument of mapping $C$ is the feature vector itself, represented by a matrix of $|V| \times m$ , where row $i$ of the matrix is the feature vector of word $i$ . $C (i)$ Function $g$ can be implemented by a feed-forward neural network, a recurrent neural network, or some other parametric function, assuming that it has an argument of $ω$ , and a set of all arguments of $θ = (C, ω)$ .

Training is achieved by finding the $θ$ that makes the penalised logarithmic function in the training corpus the largest: (7) $L = \frac{1}{T} \sum_{i} \log f (w_{t}, w_{t - 1}, \dots, w_{t - n + 1}; θ) + R (θ)$ where $R (θ)$ is a regularisation operator. In this article, $R$ is a weighted penalty function acting in matrix $C$ in the neural network.

In subsequent experiments, the neural network model constructed has a hidden layer in addition to the feature vector mapping layer, which is a direct connection from the word feature vectors to the output layer, so there are two hidden layers in the model: layer $C$ for sharing word features and the normal hyperbolic tangent hidden layer. In fact, the neural network also has a smoothing output layer for ensuring that its probability is positive and the probability sum of all possible outcomes is 1, which is calculated as: (8) $\hat{P} (w_{t} |w_{t - 1}, \dots, w_{t - n + 1}) = \frac{e^{y_{ω_{t}}}}{\sum_{i} e^{y_{i}}}$ where $y_{i}$ is the logarithmic probability of informality for the output word $i$ , computed using a formula consisting of parameters $b$ , $W$ , $U$ , $d$ and $H$ : (9) $y = b + W x + U \tanh (d + H x)$ where $W$ can be 0 (i.e., it means that there is no direct connection from the feature vector layer to the output layer), and $x$ is the vector of the word feature layer, which is the connection of the input word feature vectors from matrix $C$ , i.e.,: (10) $x = (C (w_{t - 1}), C (w_{t - 2}), \dots, C (w_{t - n + 1}))$

Thus, the set of parameters is $θ = (b, d, W, U, H, C)$ , the number of free parameters is $|V| (1 + n m + h) + h (1 + (n - 1) m)$ , and the number of parameters that play a dominant factor is $|V| (n m + h)$ , where $h$ refers to the number of neurons in the hidden layer.

2.3

Personalised Recommendation Module for Teaching Resources

For English learning, the most important part is to combine the user’s interest to give students relevant recommendations, so as to expand students’ knowledge and further improve the intelligence of the system. In this regard, combined with the current intelligent recommendation algorithms, this paper proposes a collaborative filtering of English learning content, which is mainly recommended from the content and items [26]. In this paper, combined with the above research basis, the recommendation steps of English learning materials are designed as: 1)

Assume that $L = \{l_{1}, l_{2}, \dots, l_{N}\}$ denotes the set of learners and $M = \{m_{1}, m_{2}, \dots, m_{n}\}$ denotes the combination of all English video materials; use $g_{l, m}$ to denote the rating of video $m$ by user $l$ .

2)

Calculate the similarity as shown in equation (11): (11) $s i m (x, y) = \frac{\sum_{m \in m_{x y}} (g_{x, m} - {\bar{g}}_{x}) (g_{y, m} - {\bar{g}}_{y})}{\sqrt{\sum_{m \in m_{x y}} {(g_{x, m} - {\bar{g}}_{x})}^{2} \sum_{m \in m_{x y}} {(g_{y, m} - {\bar{g}}_{y})}^{2}}}$

Where $x \in L$ , $y \in L$ , $g_{x, m}$ , $g_{y, m}$ denote the ratings of user $x$ and user $y$ on video $m$ respectively, $g_{x}$ and $g_{y}$ are denoted as the corresponding average ratings; $m_{x y}$ is denoted as the items that have been jointly rated by both.

3)

The learners with high similarity to the target learner $x$ are selected as their neighbouring sets, and then the rating values of learner $x$ for the unrated video $m$ are predicted, as described in Equation (12): (12) $g_{x, m} = {\bar{g}}_{x} + \frac{\sum_{a = 1}^{k} (g_{a, i} - {\bar{g}}_{a}) s i m (x, a)}{\sum_{a = 1}^{k} s i m (x, a)}$ where $k$ denotes the number of nearest neighbours of learner $x$ .

4)

Sort $g_{x, m}$ in descending order and then recommend the highest rated $k$ video material to user $x$ .

2.4

Adaptive learning path module based on genetic algorithm

2.4.1

Learning path encoding and population initialisation

In the adaptive learning path population evolution process, all online learning paths are regarded as genetic individuals, then different individuals represent different learning programmes, and the corresponding learning resources represent the specific learning paths of the students, including the learning resources learning time length as well as learning paths [27-28]. Take a certain knowledge point as an example, assume that the completion of the knowledge learning needs 4 steps, respectively $A$ , $B$ , $C$ , $D$ , each step can have a variety of ways to complete, assuming that there are 3 ways to complete the $A$ th step, 4 ways to complete the $B$ th step, 2 ways to complete the $C$ th step, and 3 ways to complete the $D$ th step. Coding of knowledge points (genetic individuals).

2.4.2

Determining the fitness function

The fitness function of adaptive learning for online courses is an important indicator for evaluating the degree of individual strengths and weaknesses of learning paths. It needs to be determined based on data such as online course learners’ learning data and learning effects, and has ensured the degree of optimisation of the final generated adaptive learning path. The adaptive function of the study is evaluated based on the tendency and effect of students’ online learning, including three evaluation dimensions: (1) it is the form of the resource that the more students like, the higher the tendency value of that resource; (2) the higher the value of the measurement corresponding to each resource of students’ online learning, the higher the value of the learning efficiency; and (3) the shorter the time of learning for each resource, the higher the value of the learning efficiency.

Using $X_{i j}$ represents the value of each resource evaluation ( $i$ represents the type of resource and $j$ represents the evaluation dimension), i.e., the value of $X_{i}$ the resource on the $j$ dimension.

Based on the above evaluation dimensions, the fitness function $f$ can be defined as shown in Equation (13): (13) $f = \sum_{i = A, j = 1}^{D, 3} X_{i j}$

Obviously, a larger value of the fitness function indicates that the learning path individual is better and the planned learning path is optimal.

2.4.3

Selection operations

The selection operation is based on the assessment of the fitness of individuals in a population of adaptive learning paths for an online course. Commonly used selection operators are fitness proportion method, random traversal sampling method, and local selection method. The case takes roulette selection and tournament selection by fitness proportion. Roulette selection i.e. by the better the fitness value the higher the probability that an individual of the adaptive learning path will be selected. The tournament method selection strategy is to take out a certain number of individuals from the adaptive learning path population each time (which becomes the tournament scale) and then select the best one of them into the adaptive learning path offspring population. These two operations are repeated until the new adaptive learning path population size reaches the original population size. This ensures both the excellence and diversity of individuals in the population, as well as the speed of convergence.

2.4.4

Cross-operator operations

The central role in genetic algorithms is the crossover operation. The crossover operation is an operation that replaces parts of the structure of the learning paths of the two parents and then reorganises them to generate new learning path individuals. There are crossover operations such as binary coded crossover and crossover operators suitable for floating point encoding. The study takes cyclic crossover in the latter method, which is generally taken because genetic individuals are composed of a certain sequence and chromosomes cannot be identical. The learning resources of different knowledge points may be the same, but for a knowledge point can not have two identical learning resources to support, such a learning path is not optimal, so that the circular crossover algorithm is adopted here as the crossover operation of the genetic algorithm.

2.4.5

Variation operations

The basic element of the adaptive learning path mutation operation for online courses is to make changes to certain gene values of the individual strings in the learning path population to form new learning path individuals. Adopting uniform variation, which replaces the original gene values on each locus in the coding strings of the learning path individuals with some small probability based on random numbers that conform to a uniform distribution within a certain range, this variation is particularly suitable in the primary run of the adaptive learning path genetic algorithm.

2.5

Detailed process of function realisation

2.5.1

Real-time translation

The core of the real-time translation function is to provide accurate translation results quickly to meet users’ immediate translation needs. The implementation of this function relies on the neural probabilistic language model. The functional implementation of the system covers three key areas: algorithm implementation, data flow optimisation and user interface interaction. An efficient translation model that can accurately capture the dependency relationship between the source and target languages is implemented through the neural probabilistic language model. The data flow is optimised with parallel computing, data preloading and caching techniques to ensure that the source text can be processed and translated quickly and with less delay. The front-end interface design focuses on simplicity and intuition, enabling users to easily input text and obtain real-time translation, and the tight integration of the interface with the core translation engine further ensures real-time and accurate translation.

2.5.2

Personalised Recommendations

The personalised recommendation function aims to provide users with a customised translation and learning experience, and the implementation of this function is based on user behaviour analysis, machine learning and AI recommendation algorithms. The implementation of the personalised recommendation function relies on in-depth analysis of user behaviour, accurate recommendation algorithms and real-time update mechanisms. First, the system collects and analyses users’ translation history, preferences and behaviours by strictly adhering to privacy and security standards in order to form an accurate user profile. Second, it uses collaborative filtering and deep learning algorithms and combines user profiles with global data to achieve accurate personalised translation and learning recommendations. The recommendation system has the ability to respond to user behaviour and feedback in real time, and can dynamically update the recommended content to ensure the timeliness and relevance of the recommended content, thus greatly enhancing the user’s learning and usage experience.

2.5.3

Adaptive learning paths

The Adaptive Learning Path feature generates customised learning plans and resources for each user through genetic algorithms to optimise learning. The implementation of Adaptive Learning Path is based on the careful analysis of user learning data and the application of genetic algorithms. The system collects the user’s learning data, including data on learning progress, effects and difficulties encountered, and analyses them in depth through machine learning algorithms. Genetic algorithms are used to dynamically generate and adjust a personalised learning path based on the results of data analysis. The learning path is tightly integrated with rich learning resources to ensure that each user has access to the most suitable learning materials and practice content, thus effectively enhancing learning efficiency and effectiveness.

3

Practical application of the system in the classroom

3.1

Classroom Context Creation

The purpose of context creation is to introduce the theme of the unit and stimulate students’ interest and motivation in learning. Firstly, the teacher plays a video about Globalisation to introduce the theme of the unit - globalisation. Then the teacher asks students to form groups of two to discuss it. At the end of the discussion, the students enter the results of the discussion into the system of this paper as a group, and the projector at the teacher’s end will instantly display the content of the students’ discussion. The teacher takes a few more representative ideas for class discussion and comments on them.

3.2

Learning task design

3.2.1

Language input task design

Listening and reading are the two main ways of foreign language input. In traditional language input activities, teachers can only rely on naming students or group responses to judge students’ answers, and cannot provide comprehensive and accurate instant feedback. In the teaching design case, after finishing reading the text, the teacher designed five questions to judge the correctness of the text to understand the students’ initial understanding of the text. The teacher focuses on discussing the solution ideas of the questions with the students, and the rest of the questions with higher correct rates can be briefly discussed, which improves the teaching efficiency. In the listening practice and fast reading practice using this system for immediate feedback, also received good results.

3.2.2

Vocabulary learning task design

Language form-centred vocabulary teaching is an important part of the language curriculum and lists guessing words based on context, lexical learning, and lexical construction learning as the three most effective vocabulary teaching strategies. College English classes using CRS can improve the effectiveness of vocabulary teaching more effectively. In the teaching case, the teacher picked out the word strengthen that appeared in the text. Students were divided into groups of four to discuss listing more words with -en affixes and typing Socrative, and because the results were displayed instantly, a competitive atmosphere was created between the groups, with students thinking more actively and listing more words than in a traditional classroom. The teacher also designed three multiple-choice questions on guessing the meaning of words according to the context, which were discussed by the students in a peer-teaching method, and the teacher added some example sentences and extended vocabulary according to the discussion.

3.2.3

Language Output Task Design

Output plays an important role in second language teaching, and successful language teaching requires that students be given opportunities for language output. Classroom discussions, writing and oral reports are all effective language output activities. In the design case, after completing the textbook content study, the teacher asked the students to form a 4-member task-based group to share the four tasks of data collection, report writing, PPT production and oral report. After the presentation, the whole class voted for the best group using the system of this paper, and the works of other groups were evaluated in the form of short-answer questions for inter-group evaluation. The whole process of voting and inter-group assessment was very exciting for the students and brought the learning of this unit to a climax.

3.3

Evaluation of learning

Assessment of learning is used to measure the fulfilment of established teaching objectives and is an integral part of the teaching and learning process. Tests are the most common form of learning assessment. Case unit of study at the end of the teacher to pre-designed with the unit objectives to match the test questions entered into the system of this paper, students complete the accompanying test within 10 minutes, the results of immediate feedback, the error rate of the higher topics immediately explained. This interactive approach received good results. Students not only summarised their learning, but also raised some problems encountered in their learning. As all the reflections were instantly presented on the big screen, the teacher could comment or answer the questions as they went along, and the teacher-student interaction was very positive.

4

Functional testing of the system and evaluation of the effectiveness of its practical application

4.1

System function testing

4.1.1

Neural Network Machine Translation Functional Module Testing

1)

Introduction to the dataset

Translation tasks in the context of neural networks are data-driven, so choosing the right dataset is especially important for neural machine translation, plus the difficulty of data collection, which is one of the difficulties faced by translation tasks. The parallel corpus information for the experiment is shown in Table 1, and the training dataset used in this section of the experiment is the WMT20 Chinese-English (WMT20zh-en) dataset, which is used to train the neural machine translation model based on neural probabilistic language proposed in this chapter, and newsdev2020, and newstest2020 are used as the validation set and the test set, respectively.

Since the data are complex and come from a number of conference documents, online news, and international documents, they need to be pre-processed.

It is necessary to pre-process these data. The steps are as follows:

(1)

Segmentation processing. English sentences themselves contain spaces, and this space serves as a sign for word separation, which is carried out by using the Moses tool. Chinese sentences are all next to each other, there is no space to speak of, so Chinese participles need to be processed using the jieba participle tool.

(2)

The BPE method is used to generate the vocabulary lists of Chinese and English, in which the size of the Chinese dictionary is 37501 and the size of the English dictionary is 25503.

(3)

The distribution of sentence length in the WMT20zh-en dataset is shown in Figure 4, which evaluates the length of Chinese and English sentence sequences respectively, and counts the number of sentences at intervals of 0-10,11-20 words and so on. For Chinese sentences, the length distribution is shown in Figure 4(a). For English sentences, the length distribution is shown in Figure 4(b). It can be seen that most of the sentence lengths are distributed in the three intervals of 0-10, 11-20, and 21-30, while the other lengths relative to the

Generally setting a fixed text length can be understood as setting the time step fixed to ensure the dimensionality of the neural network output layer is controllable. However, in the experimental training, validation and testing phases, it is difficult to ensure that the input text length is the same, so it is necessary to carry out truncation operations and pad operations on the input sentences. Before the experiment, we set the maximum length to 100, and for all the sentences with length more than 100 are truncated. For sentences with length less than 100, a pad operation is needed, and after the pad operation, the part of the sentence with less than 100 words is supplemented with zeros of the same dimension.

2)

Translation Effect Evaluation Metrics

What BLEU does is to calculate a similarity score between the translation generated by a given machine translation system, and the reference translation, which is used to measure the performance of this machine translation system, where the range of this score is [0,1]. When the value of this score is closer to 1, it means that the machine translation result is closer to the reference answer, and the performance of the machine translation system is higher.The specific calculation method of the BLEU score is shown in Eq. (14)(15): (14) $B L E U = B P • \exp (\sum_{n = 1}^{N} w_{n} \log P_{n})$ (15) $B P = \{\begin{cases} 1, c > r \\ e^{(1 - r / c)}, c \leq r \end{cases}$

For Eq. (25), $N = 3$ is added, then scores are calculated based on 1-gram, 2-gram, and 3-gram, respectively. Where $w_{n}$ is the weight for different $n - g r a m$ , $P_{n}$ corresponds to the weight of $n$ meta-phrases to the reference answer sequence. $c$ is the length of the candidate sentences, and $r$ is the number of words in common between the sentences translated by the model and the reference answer sentences.

3)

Experimental results and analysis

Compare and analyse the performance of three neural machine translation models, which are this paper’s model, ConvS2S, and LSTM, and jointly compare and analyse the BLEU scores on the training set WMT20zh-en, the validation set newsdev2020, and the test set newstest2020, and the BLEU scores of the three models are shown in Fig. 5. It can be seen that the model in this paper performs better compared to ConvS2S and LSTM. On the WMT17zh-en test dataset, this paper’s model outperforms the LSTM model and the ConvS2S model by 1.77 BLEU values and 0.62 BLEU values, respectively. On top of the other two datasets, the same is true. It can be seen that the use of the neural probabilistic language model is able to mine the potential semantic information in the sentence, which improves the efficiency and accuracy of machine translation.

Table 1.

Parallel material information of experiment

Data type	Name	Size
Training set	WMT20zh-en	234k
Verification set	newsdev2020	8k
Test set	newstest2020	8k

4.1.2

Testing of Personalised Recommendation Module for Teaching Resources

1)

Experimental data

Data of 6000 ratings of 1000 teaching resources by 100 users were collected. The sparsity level of the dataset was also considered in the experiment, which was defined as the percentage of entries in the user-item rating matrix that were not rated. The data sparsity level for this experiment was 1-6000/(100*1000) = 0.94, with a rating value of 1 from 5, with higher values indicating higher user preference for the resource. According to the literature, the rating data is divided into training set and test set according to the 0.8 ratio.

2)

Experimental evaluation criteria

If the recommended results meet the user’s requirements, then it will improve the user experience and increase the user’s viscosity to the system, which will bring the corresponding economic or social effect to achieve the purpose of the recommendation But if the recommender system recommends inappropriate things to the user, it will not only make the user doubt the quality of the recommendation, but also likely to cause the loss of the user.

This experiment adopts the average absolute deviation (MAE), which is the most widely used and intuitive measure of statistical accuracy, as the evaluation standard, and divides the dataset into the training set and the test set, the algorithm works in the training set, and predicts the items in the test set through the data in the training set, and the MAE is the average of the absolute value of the actual values of the scores of the resources scored by all the users in the test set and the absolute value of the predicted values, and the smaller the value of MAE, the higher the quality of recommendations.

For target user $U_{i}$ , the set of ratings predicted by the recommendation algorithm is $\{p_{1}, p_{2}, \dots, p_{n}\}$ , and the corresponding set of actual user ratings is $\{q_{1}, q_{2}, \dots, q_{n}\}$ . The mean absolute deviation $M A E_{i}$ for user $U_{i}$ is defined as: (16) $M A E_{i} = \frac{\sum_{i = 1}^{n} |p_{i} - q_{i}|}{n}$

Therefore, the average absolute deviation $M A E_{i}$ for all users is: (17) $M A E = \frac{\sum_{i = 1}^{m} M A E_{i}}{m}$ where $m$ is the number of all users.

3)

Experimental programme design

Experiment 1: Using the collected dataset, compare the performance of the traditional personality recommendation algorithm and this paper’s algorithm in the case of the same training set test set, and adjust the nearest neighbour size, as far as possible, so that it produces the best results in the recommendation process.

Experiment 2: Compare the performance of traditional algorithm, hybrid recommendation algorithm and this paper’s algorithm with the same training set test set using the collected dataset.

4)

Experimental results

Comparison of MAE values between traditional personalised recommendation algorithm and this paper’s algorithm with different number of nearest neighbours, the results of experiment 1 are shown in Figure 6. From the figure, it can be seen that this paper’s algorithm has a smaller MAE value (integrated value: 0.819), which indicates that this paper effectively solves the problem that the user and item evaluation matrix can not measure the similarity between users and items well through the traditional similarity algorithm in the case of extreme sparsity. In the calculation of similarity, by seeking the concatenation set instead of the intersection set, the effective information is used to predict the score, and then the similarity is sought, this algorithm improves the accuracy in calculating the similarity set of the items, so as to effectively improve the quality of personalised recommendation of teaching resources.

The values of the three algorithms for different number of nearest neighbours are shown in Fig. 7. It can be seen that in the case of the same number of neighbours, the difference between the MAE values of the traditional algorithm (0.842) and the hybrid algorithm (0.814) is 0.028 (0.842-0.814=0.028), but the algorithm of this paper (0.783) has to be smaller than both of them, which shows that the algorithm of this paper can be computed offline, which reduces the computation overhead, and at the same time, it can increase the real-time recommendation efficiency. Experiments have proved that the algorithm in this paper has a small, can achieve the purpose of improving the quality of recommendation.

4.1.3

Adaptive Learning Path Module Testing

In this paper, five English knowledge points (A1, A2, A3, A4, and A5) are selected as the study samples, A1 represents the future progressive tense, A2 represents the general present tense, A3 represents the ongoing tense, A4 represents the past progressive tense, and A5 represents the general past tense. Figure 8 shows the mastery probabilities of these two learners for the five attributes. If the value of the mastery probability is greater than 0.5, the learner can be considered to have mastered the attribute. The attributes that learner B needs to learn are A4 and A5, and his knowledge state is (11100). The attribute that learner C needs to learn is A5 and its knowledge status is (11110).

Genetic algorithms were used to select appropriate learning materials for these two learners. Table 2 gives the sequence of learning materials selected using the genetic algorithm, from top to bottom is the learning sequence for both learners. The learning difficulty of these two sequences of learning materials (D and E) was 0.627 and 0.618, respectively. This indicates that the features of the learning materials selected using the genetic algorithm are more in line with the characteristics of the learners. Figure 9 shows the learning paths of learners B and C, where (a) ~ (b) are learners B and C, respectively. As shown in Table 2, the initial knowledge state of learner B is (11100), and there are two attributes (A4, A5) to be learnt. By learning material D (past progressive knowledge), learner B’s knowledge state will become (11110), and then the learning of material E will be carried out, and the final knowledge state will become (11111). Learner C’s initial state of knowledge (11101) is changed to (11111) by simply proceeding to learn material D (past progressive knowledge). Learning stops when the learner has learnt all the attributes of this measurement.

Table 2.

The sequence of learning materials selected by genetic algorithm

Learners	Learning material	Knowledge point	Media	Content	Disciplines	Difficulty
B	D	A4	Image	Examples	English	0.627
B	E	A5	Image	Examples	English	0.618
C	D	A4	Image	Examples	English	0.627

4.2

Evaluation of the effectiveness of practical application

4.2.1

Analytical notes

In order to verify whether the introduction of this paper’s system into classroom practice can improve the effect of classroom practice, this paper verifies the three dimensions of English translation performance, teacher-student interaction and satisfaction. Two parallel classes A and B of English majors in a university are selected as research subjects, with a total of 80 students, and the number of both classes is 40; class A adopts the traditional classroom practice mode, and class B adopts the classroom practice teaching mode that integrates the Intelligent Translation System for English Teaching. Quantitative values of English translation scores, teacher-student interaction, and course teacher satisfaction before and after the intervention were obtained by distributing questionnaires, and independent samples t-tests were used to analyse the differences in English translation scores, teacher-student interaction, and satisfaction, based on which the results of the difference analyses confirmed the facilitating effect of the system in this paper on classroom practice.

4.2.2

Differential Analysis of Students’ Achievement in English Translation

The collated students’ English translation scores before and after the intervention in class B and class A were carried out with SPSS.20 to carry out independent samples t-tests, and the results of the analysis of differences in students’ English translation scores before and after the intervention are shown in Fig. 10, in which (a) ~ (b) are class A and class B, respectively. The results show that the difference between the mean values of English translation scores of students in class A and class B before and after the intervention is 0.355 (67.14-66.79=0.355) and 10.871 (75.531-64.66=10.871), respectively, and there is a significant difference in the English translation scores of students before and after the intervention in class B (P=0.008<0.05), whereas there is no significant difference in class A, which means that the introduction of this systematic classroom practice model will help to improve the students’ English translation performance. That is, the introduction of this paper’s systematic classroom practice model is very significant in improving students’ English translation.

4.2.3

Analysis of differences in teacher-student interactions

Independent samples t-test was used to analyse the differences in classroom teacher-student interactions before and after the intervention, and the results of the differences in teacher-student interactions are shown in Fig. 11, where (a) ~ (b) are pre-intervention and post-intervention, respectively. The difference in the mean value of classroom teacher-student interaction between class A and class B before intervention is 0.33 (71.48-71.15=0.33), and there is no significant difference between them. After a period of classroom practice intervention, the difference between the mean value of classroom teacher-student interaction between class A and class B is 7.77 (80.49-72.72=7.77), and at the same time there is a significant difference between the quantitative value of classroom teacher-student interaction between class A and class B (P=0.001<0.05, T=0.477), which suggests that, compared with the traditional classroom practice model, the classroom practice model of the present paper is more likely to promote the classroom teacher-student interaction and make the students’ interest in learning is enhanced.

4.2.4

Differential Analysis of Satisfaction

Taking the same method described above to analyse the difference in satisfaction before and after the intervention, the results of the difference in satisfaction analysis are shown in Fig. 12, where (a) ~ (b) are class A and class B respectively. Combining (a)~(b) in Fig. 12, it can be seen that the pre- and post-intervention satisfaction of class A does not satisfy the significant difference, P>0.05. On the contrary, the pre- and post-intervention satisfaction of class B has a significant difference at the 0.05 level, P=0.009<0.05. In summary, in terms of satisfaction with the classroom practice, class B is a better priority than class A, which fully verifies the integration of the intelligent translation system of ELT’s classroom practice effectiveness.

5

Conclusion

Based on the theoretical analysis of the demand theory of English teaching intelligent translation system, this paper introduces the neural network machine translation function module, the adaptive path module based on genetic algorithm, and the personalised recommendation module of teaching resources, and realises the construction of English teaching intelligent translation system. In order to improve the practice of English classroom in colleges and universities, the English teaching intelligent translation system is introduced into the practice of English classroom in colleges and universities, and the system functions and practical application effects are evaluated respectively. After evaluation, it can be seen that the English translation model in this paper has 1.77 BLEU value and 0.62 BLEU value more than the LSTM model and ConvS2S model respectively, which has excellent translation performance. In the case of the same number of neighbours, the recommended algorithm in this paper is smaller than the traditional algorithm (0.842) and the hybrid algorithm (0.814), which indicates that the recommended teaching resources based on the algorithm in this paper are of high quality, and that the difficulty of the sequence of learning materials selected by the genetic algorithm (D, E) is 0.627 and 0.618, which make the learners BC master all the knowledge points. Finally, in terms of practical application effect assessment, the college English classroom practice model introducing the English teaching intelligent translation system has significant (P<0.05) differences in the three dimensions of English translation performance, teacher-student interaction, and satisfaction, which all-roundly verifies the facilitating effect of the English teaching intelligent translation system on the college English classroom practice.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

Practice and Evaluation of an Intelligent Translation System for English Language Teaching in the Classroom

Qilu Xu

Data publikacji: 24 wrz 2025

Otrzymano: 11 sty 2025

Przyjęty: 26 kwi 2025

DOI: https://doi.org/10.2478/amns-2025-1001

Słowa kluczoweNeural network, Personalised recommendation, Adaptive learning path, Intelligent translation system

© 2025 Qilu Xu, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Neural network, Personalised recommendation, Adaptive learning path, Intelligent translation system