A Study of Imagery Translation Strategies in English and American Poetry Aided by Natural Language Processing Technology

China’s translation of English and American poetry began in the middle of the 19th century, and in the following two hundred years, English and American poetry came in like a tidal wave. In the process of translation and dissemination, the old and the new cultural systems collided, and friction constantly restrained each other, weighed each other, and impacted and led the literary habits, values, and social trends at that time [1–2]. This was the case in the late Qing Dynasty and the early Republican period, and also in the new period, and the related translations, commentaries and researches were endless. In contrast, the translation of English and American poems in the seventeen years from 1949 to 1966 after the founding of the People’s Republic of China suffered a very cold reception. The reason for this is that the ideological constraints of this period were sufficient, and the political imprints were sensitive and deep, which emphasized the great specificity of the times [3–4]. Translation of British and American literature gave way to Russian and Soviet literature, and some important writers’ works were either rarely translated or avoided, while some writers who took second place in the British and American territories, on the contrary, attracted much attention and were emphasized in the translation [5–6].

The historical and social context of the construction of New Chinese culture has determined the unique nature and characteristics of the translation of English and American poetry in the past 17 years. Seventeen years of poetry translation in Britain and the United States contains a wealth of translation phenomena—from the selection of the original poems to the translation strategies adopted by the translators—therefore, the seventeen years of poetry translation, combined with the specific historical and cultural context of the time, and under the literary rules that were vigorously promoted at that time, such as “political standards first, artistic standards second”, constitute a relatively singular and functional poetry translation picture that eliminates differences[7–8].

With the trend of world integration, communication between different languages and cultures, as well as the international academic circle, is getting closer and closer [9]. In this process, people will inevitably come into contact with a large amount of information in non-native languages, and when information seekers are faced with this kind of information, the rudiments of non-native languages and the intricacies of network information bring great inconvenience to them, and it is difficult for them to get the useful information they need quickly because reading through the huge amount of pages of information will definitely consume a lot of time and energy [10–12].

The booming development of artificial intelligence technology has made natural language processing techniques present a good answer for solving such problems. The field of Natural Language Processing (NLP) originated about 50 years ago with machine translation systems, and the technology is used for automatic processing analysis and presentation of human natural language [13–14]. Currently, the field of Natural Language Processing includes various linguistic theories, cognitive models, and engineering approaches. Nowadays, millions of web pages can be processed in less than 1s by this technology [15–17]. Among the various natural language processing techniques, an important branch of them, information extraction techniques, can be used to extract the information needed or specified type of information by the user based on natural linguistic features from unstructured or semi-structured text (e.g., web pages, new media, forums, news, academic literature, etc.), such as new media information, forum web pages, news, and literature resources. Information (e.g., time, place, people and events, attribute relationship, purpose, and conclusion, etc.) needed or specified by the user, and converting the unstructured text into structured information by integrating and merging and splicing, removing redundancy elimination, and noise processing and other techniques [18].

In this paper, based on natural language processing technology, we design a strategy for translating English and American poetry imagery with a machine translation model that incorporates chapter context validity recognition as the core and quantitatively analyze English and American modern poetry, as well as test the translation performance of the model. The model in this paper, i.e., the original model based on Transformer, uses the same encoder to encode the source utterance and its contextual utterance and adds a multi-head contextual attention layer, a residual network with a gating mechanism, and a loss function based on joint learning. Finally, the performance of the model is verified using the BLEU metrics to quantitatively analyze English and American poetry and to perform a deep semantic statistical analysis of its imagery.

2

Imagery Corpus of English and American Poetry Based on Natural Language Processing Techniques

2.1

Corpus acquisition

In the process of building a corpus of English and American poetry imagery, the first step is to collect and preprocess the corpus of English and American poetry imagery.

2.1.1

Acquisition of Imagery Corpus in English and American Poetry

When collecting, organizing, editing, and proofreading the corpus, it should be done according to the actual application and ensure the professionalism and accuracy of the corpus. Although the corpus needs a large amount of corpus resources to support it, the quantity is not the only key; it should be accurate and avoid redundant and unnecessary corpus. The imagery corpus of English and American poetry in this paper is mainly obtained from publicly released collections of English and American poetry and translated versions of English and American poetry in different languages on relevant poetry websites.

The flow chart of the corpus collection in this paper is shown in Figure 1. In the construction of the English and American poetry imagery resource base, the corresponding data and information content are collected by the method of corpus acquisition and are saved, processed, and managed. Thus, a database system of English and American poetry imagery resources can be saved and managed in accordance with the standard requirements of a database in a computer system.

2.1.2

Web Crawler Acquisition of Corpus

In this paper, the web crawler is used to obtain English and American poetic imagery information from the Internet. Web crawlers, also called web spiders, automatically obtain the required resources from web pages according to certain resource acquisition rules. Types of web crawlers: batch-type web crawlers, which are suitable for crawling with a clear target and scope, stop the web crawler immediately after acquiring the set target. An incremental web crawler, different from batch type web crawler, will continuously acquire the web content to ensure that the acquired data is the latest and shorten the acquisition time and space. A generalized web crawler, also known as the whole web crawler, obtains a huge range and data volume, has higher requirements for access space and crawling speed, and adopts parallel work mode, which takes a long time to update the page. In this paper, manual collection and web crawler are used to obtain the required corpus of English and American poetry imagery corpus.

The web crawler acquires an initial URL address, acquires a new URL address based on the initial URL, saves the data acquired from the web page in the database, and stops acquiring when the crawler goal is met. The steps of the web crawler in this paper are as follows: 1)

The web crawler sends a request to the crawler engine to get an initial URL address.

2)

Receive the crawled URL address and collect web page data.

3)

Refresh the URL address, send it to the acquisition module, and send the collected data to the storage module for data storage.

4)

Send the data to Spider for data processing and de-duplication.

5)

Repeat the above steps until all the required data related to English and American poetry imagery are acquired.

2.2

Corpus preprocessing

In order to solve the problem of inconsistent or garbled text format of the acquired corpus, corpus processing is needed to unify the collected text into the same data format for analysis and collection. Corpus processing mainly includes denoising, coding, and categorization. Denoising refers to the elimination of unwanted elements in the text, such as confusing formatting, illegible characters, broken lines, and unrecognizable graphical elements. Coding refers to the conversion of various texts into texts with the same coded format. Classification is able to facilitate the collection and maintenance of the imagery corpus of English and American poetry, especially when constructing a large corpus of English and American poetry imagery. It is more necessary to categorize and save. The denoising completed corpus is stored in the database, and each line is numbered according to the line standard, with additional sentence numbers for easy modification.

2.2.1

Preprocessing of the original corpus

The preprocessing of the original (English) corpus of English and American poetry imagery mainly involves the removal of web tags and the extraction of plain English text. The first step of processing the original text corpus: the corpus acquired on the web is judged whether there are web tags and whether the content is duplicated or not. The second step is to batch process the acquired corpus to clear the redundant spaces, garbled codes, broken lines, unrecognizable graphics, etc., and further unify the text layout and formatting. Finally, we realize the preprocessing of the original text corpus, which lays the foundation for the construction of the corpus in the later stage.

2.2.2

Preprocessing of the Translated Corpus

The preprocessing of a corpus of English and American poetry imagery translations is a key step in building a corpus of English and American poetry imagery. Before that, the first task is to convert the massive textual data of multilingual English-American poetry translations with different encoding inputs into unified Unicode-encoded textual data.

Under the national and international character encoding standards, the English and American poetic imagery translations are encoded in order to realize the unity of the encoding method. The unification of English and American poetic imagery translation encoding is mainly to judge the encoding of all the English and American poetic imagery translation corpus using the existing encoding conversion software to convert the non-Unicode standard encoding and finally store the text in “utf-8” format.

2.3

Corpus construction

2.3.1

Multilingual Corpus Construction for English and American Poetry Imagery

Since the corpus of English and American poetry imagery is collected through different channels and exists in many forms, the process of entering these corpora into the corpus needs to consume a lot of manpower and time to informally enter and save these collected corpora. The corpus construction framework of this paper is shown in Figure 2, and the steps of corpus construction are as follows: 1)

Input the original and translated texts of English and American poetry imagery into the designated text box.

2)

Translate the original text of English and American poetic imagery into the translated text using text conversion rules. Due to the potential inaccuracies of current translation platforms, human intervention was required to manually translate and correct certain words and phrases.

3)

Subsequently, the original English-American poetic imagery text and the translated statements were binned and saved into a database. In addition, the collected TXT files were stored in the same database.

4)

The database was saved, the words and sentences were queried, and when the query was prompted to be successful, it was possible to move on to the next corpus. If a corpus error is displayed or no corpus is found, it is necessary to proofread the corpus again or re-enter the corpus into the database.

In this paper, a certain size of multilingual word alignment corpus of English and American poetry imagery is constructed by using web crawlers and manual corpus collection. The features and characteristics of English and translated languages are analyzed. The corpus preprocessing of multilingual English and American poetry imagery will be carried out through the word alignment method, and a dictionary of English and American poetry imagery terminology will be constructed. A corpus management system for English and American poetry imagery will be designed to enter the collected multilingual English and American poetry imagery corpus, and the corresponding multilingual corpus will be obtained through querying.

2.3.2

Translation plug-in technologies and processing methods

With the rapid development of modern science and technology, there are more and more kinds of computer-aided translation software, and most of the translation plug-ins can improve the efficiency and quality of translation. But without translation plug-ins, even the best translation software cannot realize automatic translation. In this paper, real-time translation of English and American poetic imagery is realized through the use of translation plug-ins, which will facilitate free communication between researchers of English and American poetic imagery, improve the quality of translation of English and American poetic imagery, and lay a good foundation for the construction of a multilingual corpus of English and American poetic imagery. The translation plug-in and the computer program exist independently of each other, and the plug-in provides translation services for the program without affecting the normal operation of the program.

3

Machine Translation Modeling of English and American Poetic Imagery

3.1

Chapter Neural Machine Translation

Unlike sentence-level neural machine translation, chapter-level translation requires consideration of translating on a chapter-by-chapter basis and making full use of the chapter context. Given source document X, the probability of document translation Y is given by the following equation: 1 $P_{θ} (Y | X) = \prod_{j = 1}^{J} P_{θ} (y^{j} | x^{j}, D_{- j})$ where yⁱ and x ^j denote the j rd target language utterance and the source language utterance, respectively, while D_−j = {X_−j,Y_−j} is the set of contexts related to sentence j in the source and target chapters. Since the translation model translates one word at a time, expression (1) can be understood as: 2 $P_{θ} (Y | X) = \prod_{j = 1}^{J} \prod_{n = 1}^{N} P_{θ} (y_{n}^{j} | y_{< n}^{j}, x^{j}, D_{- j})$ where $y_{n}^{j}$ is the nrd word in the j nd target-end sentence and $y_{< n}^{j}$ is the previously generated word.

3.2

Encoder-Decoder Structure Neural Networks

The research in this paper is about automatic translation of English and American poetic imagery based on a chapter-level neural machine translation model in a neural network framework, using a neural network model based on an encoder-decoder and attention mechanism.

3.2.1

Self-attention mechanisms

One of the major drawbacks of using recurrent neural networks for neural machine translation is that the model needs to be trained word-by-word for the entire input sentence in a lengthy manner, a structure that is both time-consuming and limits parallelism. Attention mechanisms allow the model to use the context around the words and can be highly parallelized.

Multihead self-attention mechanisms are able to capture dependencies between words within the same sentence or between different sentences within the same chapter through matrix operations and assign the required context for the word or sentence based on the dependency weights [19].

3.2.2

Transformer model

Currently, the Transformer model is used to achieve optimal performance in sentence-level machine translation tasks [20]; a brief schematic of the Transformer model is shown in Fig. 3, and its main components are as follows: 1)

Word vector embedding layer

This layer converts each input word into a word vector by means of an embedding algorithm. Word embedding occurs only in the encoder at the bottom layer of the model, and the other encoders receive the output of the encoder at the previous layer in the stack structure.

2)

Positional Encoding

The Transformer model determines the position of each word or the distance between different words in a sequence by adding a position encoding vector to each word vector.

3)

Self-attention layer

The self-attention layer allows the self-attention layer to focus only on the position before the decoded word in the output sequence at the time of decoding. In this paper, we still use residual linkage and layer normalization to process the data passed between sub-layers, and the actual output of sub-layers is as follows: 3 $L a y e r N o r m (X + S u b l a y e r (X))$

Where, x is the sublayer input data, and Sublayer is the function that realizes the function of the corresponding sublayer.

4)

Encoder-Decoder Attention Layer

The encoder-decoder attention layer integrates the utterance representation output from the source language encoder into the decoder used to help translate the target utterance: 4 $G^{(n)} = M u l t i H e a d (F^{(n)}, S^{(N_{s})}, S^{(N_{s})})$

5)

Fully connected feed-forward neural network layer

The linear layer and the immediately following normalization layer (softmax) convert the floating point vectors output by the decoder into words: 5 $S^{(n)} = [F N N (D_{1}^{(n)}), \dots, F N N (D_{1}^{(n)})]$ where S⁽ⁿ⁾ ∈ R^D×I is the representation of the source language sentence x^(k) after the layer n(n = 1,…,N_s) source language encoder.

In this paper, we compute the target-side representation using a decoder that stacks N_t layers. Each layer consists of three sub-layers. Multihead Self-Attention Layer: 6 $E^{(n)} = M u l t i H e a d (T^{(n - 1)}, T^{(n - 1)}, T^{(n - 1)})$

Where T⁽⁰⁾ = Y. The context attention layer integrates information from the context encoder into the decoder: 7 $F^{(n)} = M u l t i H e a d (E^{(n)}, C^{(N_{c})}, C^{(N_{c})})$

3.3

Experimental tools

3.3.1

Pre-processing tools

Moses is a statistical machine translation system that can automatically train translation models for any language pairs, requiring only the translation of parallel corpora. Meanwhile, the system provides a feature-rich script library. In this paper, we use the scripts in Moses to perform characterization or true case operations on the training set in English, German, and Spanish.

3.3.2

Subtitling tools

For the purpose of improving the performance of the neural machine translation model, all the training corpus used in this paper is sub-lexicalized, which can effectively reduce the length of the word list. The Byte Pair Encoding (BPE) algorithm encodes according to the byte pairs. The main purpose of adopting the BPE is to shorten the length of the word list of the translation model so as to reduce the parameters of the model. At the same time, it can reduce the frequency of occurrence of the un-landed words in the translation results. The main purpose of using BPE is to reduce the length of the translation model word list and thus reduce the model parameters. For uncommon strings such as unregistered words, they are iteratively disambiguated into common subword units. The operand is the number of iterations in the splitting process, and the maximum length matching is performed according to the cut granularity generated by the operand to re-split the words to form subword word lists.

3.3.3

Deep Learning Framework

In this paper, we build all models using Pytorch, an open-source Python machine-learning library based on Torch, which is widely used in the field of natural language processing. Pytorch supports dynamic neural networks, and compared to static Tensorflow, Pytorch can handle problems such as RNNs varying the length of time of the outputs more effectively.

3.4

Performance Evaluation Indicators

This paper reports the computation of sentence-level BLEU (denoted as s-BLEU) and chapter-level BLEU (denoted as d-BLEU) via scripts. In addition to BLEU scores, this paper also reports Meteor scores for translation results.

In this paper, the quality of translation results is assessed by pronoun translation accuracy using a reference translation-based metric (APT).

For the purpose of testing whether the research method of this paper meets the research goal of translation modeling to solve discourse inconsistencies using chapter context. This paper uses a dataset to evaluate the discourse phenomenon of imagery translation in English and American poetry. The dataset contains four test sets on denotation, lexical consistency, and ellipsis (lexical changes and verb phrases). Each test set contains a set of comparison samples, including correct (positive) translations with correct discourse phenomena and incorrect (negative) translations. The goal of this paper is to determine whether the improved translation model is more inclined to generate correct translations.

3.5

Machine Translation Incorporating Chapter Contextual Validity Recognition

3.5.1

Baseline system model

The benchmark system model selected for this chapter is a sentence-level neural machine translation model based on Transformer. Transformer is a model based on the structure of codecs. The encoder and decoder are composed of multiple layers of identical network layers. Each encoding layer consists of a multi-head self-attention sublayer and a fully-connected feedforward neural network, and each decoding layer consists of a multi-head self-attention sublayer, a multi-head contextual attention sublayer, and a fully-connected feedforward neural network. In addition, in order to solve the problems of gradient explosion, gradient vanishing, and unstable training process, the Transformer adds residual connectivity and layer regularization techniques to each layer.

3.5.2

Machine Translation Models for Chapter Context Validity Recognition

The architecture of the machine translation model proposed in this paper that incorporates chapter context validity recognition is shown in Figure 4. The basic idea is to encode the chapter-level context representation and incorporate the encoded context information into the decoder. For the consideration of computational cost and attention burden, the multi-encoder structure of the model in this paper adopts a shared encoder, i.e., the same encoder is used to encode the source utterance and its contextual utterance based on the original model of the Transformer. In order to effectively utilize the chapter context information, the system adds a classification module. The classifier will further process the two encoder outputs with the aim of determining whether the context is relevant to the current sentence from the encoded information. On the one hand, the classification task improves the expression of the sentence, and on the other hand, it combines the contextual attention mechanism and the gating mechanism to guide the model to utilize the more valuable contextual information. The chapter contextual attention layer is located between the mask self-attention sublayer and the contextual attention sublayer of the decoder to guide the decoder to be able to use the contextual information. A gating mechanism is introduced to further control the utilization of contextual information on the target side.

3.5.3

Design and structure of the classifier

Transformer-based sentence-level models are translated without considering the context, so encoder-encoded sentence representations are weaker in expressing relationships between sentences. In addition, the encoded output of chapter translation should be better in determining the relationship between sentences. Therefore there is a hypothesis that the better the performance of the chapter translation encoder, the better its encoded output should be in judging inter-sentence relations. Accordingly, the translation system in this paper adds a joint classification task: using the encoder output to determine whether the contextual information of the current sentence is actually valid. The improved classification performance not only indicates the enhanced representation of sentences in the system but also shows that the model is able to focus on contextual information more effectively during the learning process.

The core of classifier design is to build the training corpus. In order to improve the classification effect, the system in this paper sets the positive example as the sentence with the highest similarity to the current sentence in the chapter and the negative example as the sentence with the lowest similarity to the current sentence in the chapter. In this paper, the TF-IDF strategy and KL_div strategy are used to calculate the inter-sentence similarity, respectively. In the TF-IDF strategy, the TfidfVectorizer function in the Sklearn library is used to transform the text sentences into TF-IDF sentence representations $x_{T F}^{(i)}$ and $x_{T F}^{(j)}$ , and then the similarity is calculated as shown in Equation (8): 8 $s_{T F} (X_{T F}^{(i)}, X_{T F}^{(j)}) = \frac{X_{T F}^{(i)} \cdot X_{T F}^{(j)}}{| X_{T F}^{(i)} | | X_{T F}^{(j)} |}$

In the KL_div strategy, a pre-trained model (BERT) based on the Transformer’s bi-directional encoded representation is used to convert sentences into feature vectors and calculate the KL distance between the sentence representations. The KL_div strategy calculates the KL distance as shown in Equation (9): 9 $s_{K L} (X_{K L}^{(i)} | | X_{K L}^{(j)})) = \sum X_{K L}^{(i)} \log (\frac{X_{K L}^{(i)}}{X_{K L}^{(j)}})$

Notations h ∈ ℝ ^n_t×d and s ∈ ℝ ^n_c×d denote the encoded output representations of the current sentence x^(t) and the current sentence context c^(t), respectively, where n_c denotes the number of words contained in the context, n_t denotes the number of words contained in the current sentence at the source and d denotes the size of the hidden state. Hidden states h and s are further represented as vectors u and v by maximum pooling and average pooling, i.e: 10 $u = [h^{\max} - f^{P} (h), h^{m e a n} - f^{P} (h)]$ 11 $v = [s^{\max} - f^{P} (s), s^{m e a n} - f^{P} (s)]$ where u, v ∈ ℝ ^2d, f^P are pooling functions.

The system constructs three computations for the classifier to capture the relational information between u and v, consisting of the splice u,v composition [u,v], the element-by-element product uv, and the element-by-element absolute difference |u–v|, the combination of which forms the input to the classifier, i.e.: 12 $o = [u, v, | u - v |, u v]$ which o ∈ ℝ^6d.

Finally, the fully connected and Softmax layer outputs are used to determine whether context c^(t) is a valid context for the current sentence x^(t): 13 $z = S o f t \max (σ (o W_{1} + b_{1}) W_{2} + b_{2})$ where W₁,W₂ ∈ ℝ^6d×2,b₁,b₂ ∈ ℝ² is the deviation and σ is the sigmoid function.

3.5.4

Residual networks with gating mechanisms

In the decoder shown in Fig. 4, an additional multi-head contextual attention sublayer is added for fusing chapter context information. For convenience, the output of the self-attention layer at the decoding end is remembered as p ∈ ℝ^m_t×d and the chapter context information s ∈ ℝ^n_c×d, i.e: 14 $q = f^{m h} (p, s, s)$ where q ∈ ℝ^m_t×d, f^mh denotes the multi-head attention mechanism. In order to obtain the chapter context information, the residual network is usually utilized to fuse the context information [21], i.e: 15 $r = q + p$

It is easy to see that the residual network shown in Eq. (15) will fuse the acquired chapter context through global reception q. In order to distinguish whether the chapter context is valid or not, this paper improves the residual network by fusing the acquired chapter context information through the gating layer in a certain proportion α, as shown in Eq. (16): 16 $r = α q + p$ where α is a measure of the importance of the information in the context of the chapter, which can be expressed as: 17 $α = σ (W_{p} p + W_{q} q)$

Where σ is the sigmoid function and W_p,W_q is the weight parameter. In general, when the chapter context is valid, it is desirable to train so that the weight α is close to 1. Otherwise, it is pressed close to 0.

3.5.5

Joint learning

The loss function based on joint learning in this paper is divided into three parts, which contain the prediction of the NMT target, effective chapter context identification, and the importance of chapter context in residual networks with a gating mechanism α [22].

The loss function associated with the prediction of the NMT target end is: 18 $L_{1} = - \sum_{n = 1}^{N} \sum_{t = 1}^{T_{n}} \sum_{j}^{m_{n, t}} \log p (y_{j}^{n, (t)} | x^{n, (t)}, c^{n, (t)}, y_{< j}^{n, (t)}; θ)$

Where x^n,(t), y^n,(t) denotes the sentences at the source and target end respectively, c^n,(t) denotes the introduced context information, N refers to the total number of chapters in the training set, T_n is the number of sentences in the nth chapter, m_n,t is the number of words in the current sentence of the nth chapter, and θ is the model parameters.

The relevant loss function for effective chapter context recognition is: 19 $L_{2} = - \sum_{i = 1}^{M} k_{i} \log (z (k_{i} | θ)) + (1 - k_{i}) \log (1 - z (k_{i} | θ))$ where M is the total number of sentences in the machine translation training set, and also the total number of training samples effectively recognized. k_t ∈ {0,1} is the chapter context validity label of the ird sentence. z(k_i | θ) is the probability of classification prediction.

The relevant loss function for the importance of chapter context α in residual networks with gating mechanism is: 20 $L_{3} = - \sum_{i = 1}^{M} \sum_{j = 1}^{6} \frac{1}{2} (^{k_{i} - α_{i, j}) 2}$ where α_ij denotes the importance of the ind sentence in the chapter context in the residual network with a gating mechanism in the jrd layer of the decoder, and 6 is the number of sublayers of the decoder.

Summarizing the three losses yields that the training objective for joint learning is: 21 $θ^{*} = \begin{matrix} \arg \min \\ θ \end{matrix} (L_{1} + L_{2} + L_{3})$

3.5.6

Two-step training method

In this paper, we adopt a two-step training method approach, which uses the idea of pre-training and additionally uses sentence-level parallel corpus data D_s, which is larger than the chapter-level corpus data D_d, and the specific training steps are as follows:

In the first step, the fusion corpus D_d ∪ D_s is fused, and then sentence-level parameters are trained on it using the original Transformer train sentence-level parameters and chapter-level modules are not involved in the training, and in the first step of the training, only the real-line module is involved in the training.

In the second step, the chapter-level parameters are trained on the chapter-level parallel corpus D_d. In the second training step, both the dashed module and the solid line transformed are involved in the training.

4

Experimentation and analysis

4.1

A Quantitative Study of Modern Poetry Styles in England and America

4.1.1

High-frequency words

In order to study the imagery characteristics of Anglo-American modernist poetry and the subsequent translation research, this paper organizes and establishes a corpus of Anglo-American modernist poetry with a total of 42104 words. Meanwhile, with the help of the corpus software AntConc, this study counts the high-frequency words of English and American modernist poetry texts, and the frequency of the words ranked in the top 30 is shown in Table 1.

Table 1.

High-frequency words of British and American modern poetry text (Top 30)

Sequence	Word	Frequency	Sequence	Word	Frequency
1	the	2477	16	with	289
2	of	1371	17	are	287
3	and	1355	18	Not	287
4	a	1084	19	but	277
5	to	1077	20	on	258
6	in	720	21	one	251
7	I	684	22	They	244
8	it	613	23	at	230
9	is	608	24	you	221
10	that	587	25	we	220
11	was	410	26	his	219
12	for	321	27	have	216
13	as	310	28	all	195
14	he	307	29	an	188
15	be	289	30	or	187

As can be seen from Table 1, the top five words in terms of word frequency in English and American modern poetry texts are the, of, and, a, and to, which are basically the same as the statistical results of word frequency of English corpus for English original texts. Among the high-frequency words ranked in the top 30, personal pronouns and possessive pronouns are quite a lot and are used more frequently. For example, the frequency of I is 684 times, which is ranked 7th. The frequency of use of it is 613 times, ranking 8th. In addition, among the top 100 high-frequency words, there are a number of real words worth noting in addition to imaginary words. For example, the nouns with high word frequency are time, day, people, etc., in which the frequency of time is 73 times, which is ranked 63rd. High-frequency words in adverbs include very, never, etc. Little also has a high word frequency, used 79 times, ranking 59th. Great appears 67 times, ranking 62nd. Little also has a high word frequency, used 79 times, ranking 59th. Great appears 67 times and is in 62nd place.

4.1.2

Theme words

In order to study the subject matter of the corpus of Anglo-American Imagist poetry, this paper organizes and builds a reference corpus. The corpus is the English text of the classic British poetry collection, totaling 41623 words. The subject words ranked in the top 30 of the English and American modern poetry texts of the reference corpus are shown in Table 2.

Table 2.

The theme words of the British and American poetry text (Top 30)

Sequence	Word	Frequency	Topicality	Sequence	Word	Frequency	Topicality
1	the	2476.6	277.321	16	school	21	40.283
2	is	607.6	104.698	17	or	187.6	39.290
3	we	219.8	104.045	18	grade	18.9	37.907
4	English	51.1	98.177	19	white	25.9	32.884
5	a	1084.3	88.043	20	out	102.2	32.490
6	one	250.6	84.459	21	through	40.6	32.471
7	are	287	82.874	22	social	16.1	31.975
8	water	33.6	63.303	23	don’t	20	31.146
9	our	81.2	49.092	24	its	77	29.792
10	students	24.5	48.525	25	snow	14.7	29.457
11	old	38.5	45.868	26	sun	14.7	29.122
12	like	93.8	45.738	27	black	18.2	27.998
13	sea	23.8	44.718	28	years	37.8	26.154
14	lake	21.7	43.124	29	down	63	25.277
15	up	96.6	42.824	30	Englishman	14	24.883

By observing the top 30 words in Table 2, it can be found that when the British Classic Poetry Collection is used as the reference corpus, the high-frequency theme words of the English and American modern poetry texts as the observation corpus are: the is, we, English, a and so on. Among the theme words, nouns indicate natural scenery, such as water, sea, lake, snow, sun, etc., and words related to the country, society, and culture, such as English, Englishman, students, school, social, etc., which reflect the breadth of English and American modern poetry in terms of theme.

In addition, by looking at the top 100 subject words, it was also found that color words appeared more frequently. The color words located in the top 100 include white, black, green, blue, and so on. In addition, nouns indicating natural scenery include trees, sky, earth, fog, wind, sunlight, etc. Topic words related to topics include political, world, war, work, national, modern, American, moral, language politics, etc. The statistical analysis of the topic words helps us to objectively summarize the stylistic features of modern English and American prose.

4.2

Model simulation experiments and results

In order to validate the effectiveness of the machine translation model incorporating chapter context validity recognition in the translation of English and American poetic imagery, the Anglo-German datasets TED, New, and Europarl are used for validation in this experiment, the datasets used. The data processing methods are consistent with previous work in the field in order to allow for fair comparisons. Moses is used to perform the participle processing, and OpenNMT is used to perform the BPE participle processing. Also, in order to verify the effectiveness of the proposed method on different language pairs, this chapter additionally tests it on the English-French dataset IWSLT.

4.2.1

Evaluation indicators

Currently, BLEU is mainly used within the field of neural machine translation to evaluate the quality of translations. IBM proposed BLEU for the evaluation of machine translation tasks. Its general idea is accuracy: if the standard translation reference is given, the sentence generated by the neural network is candidate, the sentence length is n, there are m words in candidate appearing in reference, m/n is the formula for the 1-gram of BLEU. There are many more variants of BLEU. According to the n-gram can be divided into a variety of evaluation indicators. The common indicators are BLEU-1, BLEU-2, BLEU-3, and BLEU-4 four, where n-gram refers to the number of consecutive words n. BLEU-1 measures word-level accuracy, and higher-order bleu measures sentence fluency. The evaluation within this topic is performed using BLEU-4 in accordance with the standards commonly used in the field, and its calculation can be expressed as follows: 22 $b l e u_{n} = \frac{\sum_{c \in c a n d i d a t e s} \sum_{n - g r a m \in c} C o u n t_{c l i p} (n - g r a m)}{\sum_{c^{'} \in c a n d i d a t e s} \sum_{n - g r a m \in c^{'}} C o u n t (n - g r a m^{^{'}})}$ where the sentence generated by the neural network is the candidate, and the standard translation given by the dataset is the reference and Count_clip (n – gram) denotes the number of a particular n-gram in the reference. The numerator indicates how many n-gram words in the given candidate appear in the reference. Similarly, the denominator represents the number of n-grams in all the candidates.

4.2.2

Experimental results

In this paper, we compare the experimental results of the proposed model with typical chapter-level translation models and sentence-level baseline models, where the sentence-level model uses the Transformer-base structure, and the chapter-level models include IMPR using an additional encoder, HAN using a hierarchical attention mechanism, SAN using a selective attention mechanism, and query-based QCN, and Flat-Transformer using a single encoder. The results of the model performance comparison experiments are shown in Table 3, where Snmt and CAEnc are the baselines of the sentence-level and chapter-level translation models implemented in this experiment, and the evaluation metric is BLEU.

Table 3.

Model comparison experiment results

Model	English-German			English-German
Model	TED	News	Europarl	IWSLT
Snmt	24.14	27.03	30.81	40.78
IMPR	25.09	23.25	30.23	--
HAN	25.58	26.35	30.78	--
SAN	25.61	25.62	30.27	--
QCN	26.47	23.42	30.39	--
Flat	25.73	24.95	31.55	--
CAEnc	25.21	27.47	31.21	41.01
This method	26.13	28.04	32.15	41.92

As can be seen from Table 3, the method in this paper achieves the expected performance, with a maximum improvement of +1.99 BLEU compared with the sentence-level baseline model and a maximum improvement of +0.94 BLEU compared with the chapter-level baseline model and achieves the optimal performance among a series of typical chapter-level translation models compared.

4.2.3

Experimental analysis

For the machine translation class of generative tasks, when fine-tuning based on the pre-trained model, the representation capability of the pre-trained model is more important compared to the generative capability. Meanwhile, the mask language model has been validated by the pre-trained models such as BERT and CeMAT. In order for the model to obtain the representation capability more effectively, applying the mask language model as an additional training task allows the encoder to obtain the representation capability faster and better.

In order to explore the impact of two key parameters in this method on the model performance, namely the mask ratio θ and the loss function weighting factor γ, supplementary comparison experiments were conducted, all of which were done using the English-German dataset of English and American poetry imagery translations. The results of the supplementary comparison experiments are shown in Fig. 5, where (a) and (b) denote the results of the comparison experiments for the parameters θ and γ, respectively, and (c) denotes the results of the comparison of the loss function values before and after the use of encoder-side supervised signals L_enc in the chapter-level translation model CAEnc.

As can be seen in Fig. 5, the model performance is ultimately optimal when the mask ratio is set to 20%, and the loss function weighting factor is set to 6. In the experiments, it is observed that after incorporating the encoder-side supervised signal, the model can converge quickly in the early stages of training, and the additional training task allows the encoder to gain representation capability quickly.

Also, in order to test whether it is the contextual information that plays an effect in helping the mask recovery in this process, this paper is supplemented with experiments where this method is used on a sentence-level model, and the results of the experiments are shown in Table 4.

Table 4.

Experimental results on the sentence level model

Model	BLEU	Δ
Snmt	24.14
Snmt+L_enc	20.27	-3.87
CAEnc +L_enc	26.13	1.99

As can be seen from Table 4, there is a significant decrease in performance (ΔBLEU of -3.87) when the method is used on the sentence model, which proves that contextual information plays a key role in recovering the mask portion, and at the same time proves the effectiveness of the method on chapter-level translation.

4.3

Statistical Analysis of Imagery Deep Meaning

In this paper, in order to study the subjective sentiments embedded behind the imagery of English and American poems and to help both humans and machines translate the imagery of English and American poems accurately, the deeper meanings are annotated based on the literal meanings labeled in conjunction with the reference books, and an attempt is made to analyze the way of contact between the literal meanings and the deeper meanings from the cognitive point of view.

4.3.1

Statistics on the deeper meaning of high-frequency imagery

In this paper, English and American poetic imagery are deeply labeled with artificial meanings for statistical reference. Examples of the top ten images and their deep meanings, ranked according to the number of deep meanings labeled, are shown in Table 5, where the higher the ranking, the more often the image is used to convey a deep meaning.

Table 5.

Some of the images of deep meaning and their deep meaning

Imagery	Artificial interpretation	Frequency	Artificial interpretation	Frequency
Moon	Think of	16	Think of one’s home	4
Rain	Tears	2	Strong	2
Floating clouds	The weather is unpredictable	2	Wanderer	2
Tears	Sadness	4	Grief	2
Green Mountain	Live in seclusion	6	Hometown	2
Wind	Miserable	2	Fast	2
Ape	Sorrowful	5	Sad	2
Smoke	Decline	2	Dense	2
Spring wind	Good time	3	Love	2
Sunset	Farewell	2	Decline	2

From these high-frequency images and their deep meanings, it can be seen that the emotional tone of the deep meanings of each image is mostly stable and related to the basic characteristics of the image. For example, the moon, the scene of looking at a common moon all over the world, often triggers the poet’s emotions, and thus the moon is associated with homesickness and nostalgia. Tears are the secretions of the glands when a person is in pain. Thus, tears are mainly associated with pain. Sunset is a symbol of dusk, and dusk gives people the feeling of loneliness and depression, so the deeper meaning of sunset also reflects the emotion of loneliness and depression. Because of this connotation, resulting in the same dusk but a more positive emotion of sunset appears less frequently, to a certain extent, reflecting the emotional tendency of English and American poetry.

On the whole, the deeper meanings of the imagery mostly reflect negative and painful emotions, and many of them are extremely rich in deeper meanings, which reflects the polysemy of poetry.

4.3.2

High-frequency deep meaning and its corresponding imagery

In turn, based on the deep meaning information, this paper hopes to find the things that carry these sentiments. Using manual interpretation for labeling, similar sentiments can be merged, and their corresponding imagery types can be counted and the top 15 high-frequency imagery types are shown in Table 6.

Table 6.

Top 15 high frequency imagery and corresponding imagery

Deep label	Frequency	Image type	Image example and frequency
Human	102	91	Pretty eyebrows	4	Fair complexion	4	Flower	4
Desolate	62	53	Yellow cloud	4	Withered grass	4	Fallen leaves	4
Think of	81	48	Moon	15	Bright moon	11	Wild goose	9
Lonely	42	36	Single light	5	Lonesome boat	5	Shadow	3
Place	35	32	Green Mountain	4	Checkpoint	4	Nine levels	2
Miserable	31	31	Snot	2	Flute	2	Frost	2
Sorrowful	43	31	Tears	6	Broken intestine	5	Heartbroken	5
Broad	28	25	Blue sky	3	Hirano	3	Greet famine	3
Cold	23	23	Chick	2	Ice	2	Cold day	2
Fact	27	23	Dust	5	Soot	3	Weapons of war	3
Sad	26	22	Pipa	6	Moon	6	Ape	4
Part	23	20	Rain	4	Cow hair	2	Willow	2
Time	24	20	Spring wind	4	Shakedown	4	Clock	2
Quiet	19	18	Wave	2	Chirping	2	Moon	2
Beautiful	16	16	Prosperous flower	2	Flower face	2	Mountain light	2

It can be seen from Table 6 that the same deep meaning can be expressed by things of different meanings, and through the ratio of image type and frequency, it can be seen that poets will choose different images to express the same deep meaning, except for the image of “Moon” which expresses longing and the image of “Tears” which expresses sadness, which has a high proportion, indicating that poets have concentrated convergence in the choice of images when expressing sadness and longing.

It can be seen that in English and American poems, a lot of rhetorical devices such as borrowing and simile are used, and the poet can refer to a person through many things, such as through the localization of a person, such as eyebrows and face, or through the things that have certain characteristics common to a person, such as the use of flowers to represent a person, and the rest of the ways, such as through the location, through the collateral references, etc. The imagery that expresses various emotional atmospheres is usually related to the attributes of the things themselves. The imagery of various emotional atmospheres is usually related to the attributes and characteristics of the things themselves. For example, “Yellow cloud” is often found in the desert, where people are rare and there is no soil, which often gives people a sense of decay.

4.3.3

Deeper meanings realized by metaphors

This paper takes metonymy, a deep-meaning type of imagery, as an example to explore the distribution of deep-meaning types of imagery in English and American poetry. The reasons for metonymy are roughly as follows: first, the target domain is more abstract or difficult to describe, and the use of metonymy enables readers to intuitively understand the meaning of the poem. Secondly, it is necessary to highlight certain aspects of the target domain’s characteristics, and the use of metaphors can highlight its characteristics without hindering the coherence of the meaning. Thirdly, it is to be closer to the reader’s life and more life-like. Statistics on the more concentrated types of metaphors and the source and target domains of different metaphors are shown in Table 7.

Table 7.

Top 15 high frequency imagery and corresponding imagery

Metonymy type	Image type	Frequency	Source domain-Target domain	Frequency	Metonymy examples
Part-whole	25	46	Location-Person	10	Pretty eyebrows	Belle
			Part-Ship	11	Ship	Set sail
			Component-Instrument	5	String	Musical instrument
Category metaphor	38	42	Scene-View	7	Spring scenery	Falling flower
			Place-Refers to a place in general	5	Place	Paradise
			Person-Person	5	Nobleman	Distant friend
Characteristic metaphor	26	29	Color-Object	8	Painting	Jadeite
			Characteristic-Group	5	Guards of honor	Army
			Characteristics-People	3	Fragrant	Belle
Causal metaphor	2	28	Component-Reason	16	White hair	Aged
Production metaphor	5	11	Materials-Tools	7	Canvas	Weapons
Production metaphor	5	11	Birds-Sound	2	Crying bird	Twitter
Possessive metaphor	5	21	Clothing-People	9	Dress	Government officials
			Tools-People	8	Chief minister’s seal	Bureaucrat
			Transportation-People	6	Enemy troops	Nobility
Container metaphor	4	9	Utensils-Beverages	7	Cup	Liquor
Container metaphor	4	9	Tools-Things	2	New cooking	Food
Place metaphor	27	40	Landscape-Nonspecific location	14	Hometown	Fields and gardens
			Landscape-Specific location	9	Capital	Area
			Location-People	12	High buildings	Concubines
Time metaphor	9	11	Tools-Time	5	Clock	Dusk
Time metaphor	9	11	Poultry-Time	5	Crow	Spring

As can be seen from Table 7, whole-part metaphors are predominant in metaphors, and various characters particularly characterize the ontology of metaphors. Whole-part metaphors, category metaphors, and feature metaphors are all characterized by “replacing the big with the small and the small with the big,” i.e., the use of a typical part, a typical feature, or a typical member to realize the metaphors.

5

Conclusion

In this paper, based on the original model of Transformer, a machine translation model incorporating chapter context validity recognition is constructed, and based on the corpus, the design of English and American poetry imagery translation strategy is realized. Through the experimental analysis of the model, the following conclusions are drawn: 1)

The top five words in word frequency in English and American modern poetry texts are the, of, and, a, and to, which are basically consistent with the word frequency statistics of the English Corpus for English original texts. Among the high-frequency words ranked in the top 30, personal pronouns and possessive pronouns are used more frequently. When the British classic poetry collection is used as the reference corpus, its high-frequency theme words include the, is, we, English, a, etc. Among the theme words, nouns indicate natural scenery, such as water, sea, lake, snow, sun, etc., and words related to the country, society, and culture, such as English, Englishman, students, school, social, etc., which reflect the breadth of the theme of British and American modern poetry.

2)

Compared with the sentence-level baseline model, the model in this paper achieves a maximum improvement of +1.99 BLEU, while compared with the chapter-level baseline model, it achieves a maximum improvement of +0.94 BLEU and outperforms all other typical chapter-level translation comparison models. In addition, the model performance is optimized when the mask ratio is set to 20%, and the loss function weighting factor is set to 6. After combining the encoder-side supervised signaling, the model can converge quickly in the early stage of training, and the additional training task allows the encoder to gain representation capability quickly.

3)

The deep sense-emotional tone of high-frequency imagery in modern British and American poetry is mostly stable and related to the basic characteristics of the imagery. On the whole, the deep meanings of imagery mostly reflect negative and painful emotions, and many of them are extremely rich in deep meanings, which reflects the polysemy of poetry. Different kinds of things can express the same deep meaning, and the choice of imagery has a centralized convergence. In addition, in English and American poetry, there are a lot of rhetorical devices such as borrowing and simile. In the imagery metaphors of English and American poetry, whole-part metaphors are mostly used, and the ontology of the metaphors is especially the various characters.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

A Study of Imagery Translation Strategies in English and American Poetry Aided by Natural Language Processing Technology

Yanyan Lei

Publié en ligne: 19 mars 2025

Reçu: 27 oct. 2024

Accepté: 02 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0479

Mots clésTransformer, Chapter context, Machine translation, English and American poetry imagery, Natural language processing techniques

© 2025 Yanyan Lei, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Transformer, Chapter context, Machine translation, English and American poetry imagery, Natural language processing techniques