1. bookAHEAD OF PRINT
Detalles de la revista
License
Formato
Revista
eISSN
2444-8656
Primera edición
01 Jan 2016
Calendario de la edición
2 veces al año
Idiomas
Inglés
Acceso abierto

Analysing the variation of metadiscourse verb patterns in English academic papers from within and between disciplines

Publicado en línea: 30 Nov 2022
Volumen & Edición: AHEAD OF PRINT
Páginas: -
Recibido: 19 May 2022
Aceptado: 16 Aug 2022
Detalles de la revista
License
Formato
Revista
eISSN
2444-8656
Primera edición
01 Jan 2016
Calendario de la edición
2 veces al año
Idiomas
Inglés
Introduction

Linguist Harris (R. Harris) first proposed the concept of metadiscourse in 1959 [1]. Since then, this concept has gradually become a research hotspot in the field of English writing discourse after being critically inherited and developed by J. Williams and A. Crismore [2]. As ‘discourse about discourse’, metadiscourse can not only organise discourse and propositional content, construct article framework, project self, and reveal the author's communicative intention but also communicate with readers and realise interactive functions [3]. An academic paper is an article that presents the author's latest research findings and results after discussing an academic issue. It plays an important role in promoting academic exchanges and academic progress [4]. Therefore, in terms of wording and sentences, the author should summarise his findings objectively, accurately and effectively so that the sentence is easy to be accepted by the reader so as to convince the reader. In view of this, authors need to rely on metadiscourse to organise language and maintain and strengthen their relationship with readers.

On the basis of previous theoretical research and metadiscourse classification, researchers began to exchange academic writing, especially empirical research on research papers. Foreign scholars have explored the application of metadiscourse in academic terminology from different perspectives. For example, a work [5] studied metadiscourse from the perspective of contrastive rhetoric. A previous study [6] explored how metadiscourse facilitates social interaction and knowledge exchange across disciplines and attempted to demonstrate the importance of metadiscourse by analysing the frequency and role of metadiscourse in 28 research papers from four disciplines. It is found that metadiscourse is a common feature of rhetorical writing in English majors, and there are also disciplinary differences in the use of metadiscourse. A previous work [7] used Hyland's metadiscourse interpersonal relationship model as the theoretical framework and comparatively studied the application of metadiscourse in English and Chinese academic texts from both quantitative and qualitative perspectives, and whether the application of metadiscourse in English and Chinese academic texts has a difference. Another work [8] studied using a corpus-based metadiscourse taxonomy. A comparative study of the cross-language variation of metadiscourse in the genre of Chinese and English academic book reviews have been carried out [5, 6, 7, 8].

However, the aforementioned individual words are largely lost. Therefore, this paper attempts to combine various traditional and latest sentence semantic representation methods, sentence distance calculation methods and clustering methods. Through a large number of comparative experiments, this paper finally proposes a recognition method combining the word movement distance (WMD) model and the R&L clustering algorithm model to cluster the verb variant word form recognition system. The best recognition performance is obtained on the problem of sentences with morphological forms of verbs.

Metadiscourse and academic papers

The classification of metadiscourse mainly focuses on vocabulary, different texts are expanded according to different perspectives [9], the classification mode is more specific and the classification of modern metadiscourse is also developed more rationally. According to the theoretical basis of functional linguistics, it is divided into two categories: discourse metadiscourse and interpersonal metadiscourse. Among them, discourse metadiscourse is divided into discourse connectives, code annotations and power markers according to different levels of content. The functions of interpersonal metadiscourse are mainly reflected in rationality, modality markers, attitude markers and so on. Irrespective of the kind of classification, in practical application, it pays attention to interaction and focuses on attracting and guiding readers [10]. Therefore, studying the application of metadiscourse in English papers is an important way to correctly understand the author's thoughts.

At the same time, by comparing the metadiscourse chunks used between disciplines, it is found that chunks in the papers of the two disciplines are significantly different in the dimensions of frequency and high-frequency chunks, formal structure and pragmatic function [11]. The specific performance is as follows: (1) the total amount of metadiscourse words in engineering English academic papers is significantly larger than that in medical papers, and the high-frequency words are also different; (2) there is a formal structure of metadiscourse in medical papers and engineering papers. There are differences in distribution: the proportion of intermediary phrases in engineering papers is higher than that of noun phrases, while the proportion of noun phrases in medical libraries is higher; (3) the pragmatic function distributions of metadiscourse in medical and engineering papers are different. Hedges are used more frequently in medical papers than in engineering papers, while capabilities and possibilities, coercion and commands are used more frequently in engineering papers than in medical papers. At the same time, the frequency of metadiscourse, textual reference and topic development chunks used by engineering dissertation scholars is higher than that in medicine papers. These differences are closely related to the communicative purpose, writing tradition and information expression of the two topic papers [12].

Metadiscourse is an important textual mechanism for persuasive and informative discourse [13], and academic discourse is an example of such discourse, so the application of metadiscourse is particularly important. When the concept of metadiscourse was first proposed, it was mainly to explain the concept of language itself. The author uses metadiscourse to provide readers with directions for analysing the entire article [14]. Over time, many scholars have further studied the concept of metadiscourse and expanded the scope of metadiscourse [15]. These scholars believe that metadiscourse is a form of language in which the author influences the mind of the reader through the intervention of the text. Compared with other mature language samples, metadiscourse is not very mature under the framework of theoretical models, and the nature of its application in practical applications is also vague. In the past, when scholars studied information transfer [16] and emotion transfer in linguistics, they usually made a clear distinction between the two, focusing on the understanding of propositional meaning and the expression of self-thought. Other scholars engaged in communication; however, this way of thinking ignores the function of creating discourse. Thus, the emergence of the concept of the metadiscourse theory strongly responds to the language proposition. During the reading process, readers can directly receive the information conveyed by the author in the work. In the process of information transmission, in addition to the content of information transmission, it also includes readers' sense of belonging to the information [17], as well as the fit between the information and their own experience. The reader's expectations of the information and the reader's level of expertise determine the entire process. These factors fully reflect participants' interpretation of the information [18], various responses and interactions between authors and the information. Compared with other genres, academic papers reflect the individualism of the author more. Therefore, the use of metadiscourse is deeper than other styles. The metadiscourse theory shares a common core with English academic papers, namely, the exchange of information and ideas between authors and readers. Under this exchange, readers can choose their own positions and form alliances with those who hold corresponding positions. Metadiscourse itself has certain information and emotion, and it can not only involve people, places and activities in the real world but also promote the formation of social relations. Therefore, many experts and scholars regard metadiscourse as an important feature of written language theory, and it is widely used in the evaluation of various books, magazines and English works.

Verb pattern recognition algorithm
Pattern extraction algorithm

Patterns are the result of the predisposition and selectivity of words to syntactic forms [19]. Traditional grammars describe speech structures in the form of subject–predicate, verb–object, or verb–complement. With the development of corpus linguistics, people have a deeper understanding of collocation. Hunston (2000), on the basis of corpus research such as Sinclair and Francis and other linguistic studies, combined with the com pilation work of Cobuild and the British dictionary database, expounded the description method system of lexical and syntactic features [20] and proposed the theory of formal grammar. The theory studies the definition of patterns, including concentrated parts of speech such as verbs, formal content words, adverbs and nouns. At the same time, the equation extraction principle and equation representation method of regularisation normalisation are proposed. Through the definition of form, the verb form is divided into the following three quantitative characteristics. The type extraction algorithm mentioned in this paper attempts to analyse these three characteristics.

First: Type typicality. A form is a common form of a verb. To quantify the typicality of a pattern, the proportion of that pattern among all patterns of equal length is used. In practical calculations, it is expressed as the ratio of the probability of a candidate type to the sum of the probabilities of all equal-length types in a data set, which is set to Gravity (Cx): Gravity(Cx)=P(Cx)i=1nP(Ci) Gravity\;\left({{C_x}} \right) = {{P\left({{C_x}} \right)} \over {\sum\limits_{i = 1}^n P\left({{C_i}} \right)}}

In Eq. (1), Cx is a candidate type term, P(Cx) is the probability of Cx in a data set and i=1nP(Ci) \sum\limits_{i = 1}^n P\left({{C_i}} \right) is the sum of the probabilities of all candidate types of equal length in the data set. (Note that this value is usually not equal to 1.)

Second: Stickiness. Stickiness refers to the strong mutual selectivity and affinity between elements within a pattern. As the main feature of word collocation, stickiness is widely used in the research of word collocation extraction. It is used to measure [21] word collocations. Commonly used collocation measurement methods in existing research include the t-test, Z-test, logarithmic test, chi-square test and mutual information. Due to the similarity between the pattern feature and the word collocation feature, a similar method is also used to measure the stickiness of pattern elements. In the calculation of pattern extraction, mutual information is set as an optional measurement method. The following is the mutual information equation: MI(X)=log(P(X)P(x1)P(x2)P(xi)P(xn)) MI(X) = \log \left({{{P(X)} \over {P\left({{x_1}} \right)P\left({{x_2}} \right) \ldots P\left({{x_i}} \right) \ldots P\left({{x_n}} \right)}}} \right)

In Eq. (2), X = (x1, x2, x3,…xn) is a candidate type item and xi is one of the type elements.

Third: The difference in the use of forms between verbs. A verb type is compared with other verbs; do other verbs have the same type? Is there a difference in the frequency of use? For transitive verbs, there will be no difference in the form of Vn; otherwise, the value of the form as a personal use of a verb will be diminished. Therefore, it makes sense to compare the usage of the forms between different verbs. For the modelling of differences, the method of comparing whether the same type of verbs have significant differences in different verbs [22] is mainly used. In addition, the proportion of other verbs of the same type with significant differences in usage was also used to measure the difference of this verb. The chi-square test is used to compare whether the formal difference between two verbs is significant. The chi-square test equation is as follows: I(Cx)=χ2=i,j(Oi,jEi,j)2Ei,j I\left({{C_x}} \right) = {\chi ^2} = \sum\limits_{i,j} {{{{\left({{O_{i,j}} - {E_{i,j}}} \right)}^2}} \over {{E_{i,j}}}}

In Eq. (3), Oi,j is the observed value, Ei,j is the expected value and Cx is a candidate type item.

The typicality of a verb form can be measured using the harmonic mean of Eqs (1) and (2): Cx=2*Gravity(Cx)*MI(Cx)(Gravity(Cx)+MI(Cx)) {C_x} = {{2*Gravity\left({{C_x}} \right)*MI\left({{C_x}} \right)} \over {\left({Gravity\left({{C_x}} \right) + MI\left({{C_x}} \right)} \right)}} where Cx is a candidate type term.

Clustering algorithm

Clustering is dividing a data set into several different clusters according to specific criteria [23] (such as distance criteria) so that the greater the similarity between data objects in the same class, the better, as well as the greater the difference between data objects in different classes. Currently, there are many kinds and numbers of clustering algorithms. Clustering algorithms commonly used in text clustering tasks include the k-means clustering algorithm and hierarchical clustering algorithm. This paper mainly introduces the partition-based R&L density peak clustering algorithm and selects the expectation-maximisation (EM) algorithm as a comparison. These two clustering algorithms are briefly described in the following text.

R&L density peak clustering algorithm

The clustering algorithm introduced in this article is a new clustering algorithm published in Science by Alex Rodriguez and Alessandro Laio in 2015. In this paper, the algorithm is collectively referred to as the R&L density peak clustering algorithm. The central idea of the R&L clustering algorithm is to describe the cluster centres. The algorithm believes that the data satisfying the following two requirements at the same time can become the cluster centre [24]:

Its own density is very large, that is, the density of surrounding data points cannot be greater than it.

It is far away from other cluster centres, that is, it is far away from other data points that are denser than itself.

After meeting the aforementioned requirements, the selection method of cluster centres is introduced in detail.

S={xi}i=1N S = \left\{{{x_i}} \right\}_{i = 1}^N and IS = {1,2,·⋯, N} are the data sets to be clustered, and the distance between data points xi and xj is denoted by dij = dist (xi,xj). For any data point xi in S, ρi and δi parameters can be defined to describe the cluster centres.

Local density ρi: It includes two calculation methods: truncation kernel and Gaussian kernel. The truncation kernel is directly defined as the number of data points in S, where the distance between ρi and xi is less than the truncation distance dc. The parameter dc > 0 is the cut-off distance, which needs to be specified in advance. The Gaussian kernel calculation method used in this paper is shown in Eq. (5): ρi=jIS\{i}e(dijdc)2 {\rho _i} = \sum\limits_{j \in {I_S}\backslash \{i\}} {e^{- {{\left({{{{d_{ij}}} \over {{d_c}}}} \right)}^2}}}

Distance δi: When xi has the largest local density, δi represents the distance between the data points in S and xi with the largest distance xi. Otherwise, δi represents the distance between the data points with the smallest distance xi and xi among all the data points whose local density is greater than xi. The calculation method of the distance δi is shown in Eq. (6): δi={min{dij},ISiΦmaxjIS{dij},ISi=Φ {\delta _i} = \left\{{\matrix{{\min \left\{{{d_{ij}}} \right\},} & {I_S^i \ne \Phi} \cr {{\max}\limits_{j \in {I_S}} \left\{{{d_{ij}}} \right\},} & {I_S^i = \Phi} \cr}} \right.

In Eq. (6), ISi={kIS:ρk>ρi} I_S^i = \left\{{k \in {I_S}:{\rho _k} > {\rho _i}} \right\} represents a set of indicators.

To sum up, for each data point xi in S, (ρi, δi) binary can be calculated for it. The binary pair {(ρi,δi)}i=1N \left\{{\left({{\rho _i},{\delta _i}} \right)} \right\}_{i = 1}^N of each data point is plotted on a plane [25], ρ is the horizontal axis and δ is the vertical axis, and then samples with high ρ and δ values are directly selected from the data set as cluster centres.

EM algorithm

The EM algorithm is a model-based clustering method. When implementing the EM algorithm, each category in the data is considered to conform to a multidimensional Gaussian distribution, which can be determined by the centre μh and covariance Σh, h = 1,2,…,k of the Gaussian distribution, and each Gaussian distribution has a density function as in Eq. (7): fh(x|μh,Σh)=1(2π)d||Σh|exp(12(xμh)T(Σh)1(xμh)) {f_h}\left({x|{\mu _h},{\Sigma _h}} \right) = {1 \over {\sqrt {{{(2\pi)}^d}|\left| {{\Sigma _h}} \right|}}}\exp \left({- {1 \over 2}{{\left({x - {\mu _h}} \right)}^T}{{\left({{\Sigma _h}} \right)}^{- 1}}\left({x - {\mu _h}} \right)} \right) where x has d attributes, μh is the column centre vector of dimension d and Σh represents the covariance of d ×d.

According to the basic principle of the EM algorithm, each multidimensional Gaussian distribution has corresponding weight coefficients wh and h=1kwh=1 \sum\limits_{h = 1}^k {w_h} = 1 . In this way, the estimation process of the parameter ϕ is the execution stage of the EM algorithm [26]. The ultimate goal of estimation is to maximise the log-likelihood L(ϕ) of the data set D. Eqs (8) and (9) give the calculation methods of ϕ and L(ϕ), respectively. ϕ={(wh,μh,Σh),h=1,,k} \phi = \left\{{\left({{w_h},{\mu _h},{\Sigma _h}} \right),h = 1, \ldots,k} \right\} L(ϕ)=xDlog(h=1kwhfh(x|μh,Σh)) L(\phi) = \sum\limits_{x \in D} \log \left({\sum\limits_{h = 1}^k {w_h} \cdot {f_h}\left({x|{\mu _h},{\Sigma _h}} \right)} \right)

EM first makes an initial estimate of the value of the parameter ϕ and then updates the likelihood L(ϕ) through iterations [27]. The algorithm is as follows:

Take the data object x from the data set D and calculate the probability wht(x) w_h^t(x) that x is the h(h = 1,…,k) cluster class by Eq. (10): wht(x)=whtfh(x|μht,Σht)i=1kwitfi(x|μit,Σit) w_h^t(x) = {{w_h^t \cdot {f_h}\left({x|\mu _h^t,\Sigma _h^t} \right)} \over {\sum\limits_{i = 1}^k w_i^t \cdot {f_i}\left({x|\mu _i^t,\Sigma _i^t} \right)}}

Update the parameter values in the model according to Eqs (11)(13): wht+1=xDwht(x) w_h^{t + 1} = \sum\limits_{x \in D} w_h^t(x) μht+1=xDwht(x)xxDwht(x) \mu _h^{t + 1} = {{\sum\limits_{x \in D} w_h^t(x) \cdot x} \over {\sum\limits_{x \in D} w_h^t(x)}} Σht+1=xDwht(x)(xμht+1)(xμht+1)TxDwht(x) \Sigma _h^{t + 1} = {{\sum\limits_{x \in D} w_h^t(x)\left({x - \mu _h^{t + 1}} \right){{\left({x - \mu _h^{t + 1}} \right)}^T}} \over {\sum\limits_{x \in D} w_h^t(x)}}

If |L(ϕt) − L (ϕt+1)| ≤ ɛ, the distribution of the data set is considered to be consistent with the estimated parameter values, and the algorithm ends. If not, make t = t + 1, then skip to step (1) and start iterating. Each iteration will follow the new parameters [28] until the distribution of the data set is consistent with the estimated parameter values.

The EM algorithm scans all objects in the data set to calculate the membership between each object and the corresponding class. This step is relatively computationally intensive and requires multiple iterations in order to maximise the log-likelihood L (ϕ) of the data set. Iteration means repetition, which is time-consuming [29]. The number of iterations is related to the data distribution. In addition, unreasonable parameter settings will increase the number of iterations.

Through the previous analysis, it can be found that as long as the distribution function is selected correctly, the EM algorithm can identify data sets of any shape. Since the number of iterations is parameter-dependent and unpredictable, the algorithm has no obvious advantage in efficiency.

C&W and HLBL word vector summation method

The method of analysing sentences using the mean of word vectors is widely accepted by most scholars because sentences usually have only a dozen words, so they can maintain relevant features even after averaging.

Since Bengio proposed neural probabilistic language models in 2003, word vectors have become a research hotspot. In 2008, Ronan Collobert and Jason Weston proposed a neural network language model called C&W. Different from Bengio's model, the model proposed by C&W is not a probabilistic semantic model but reads one N-gram, namely, x = {w1, w2, ⋯, wn}, from the training corpus each time and then splices the corresponding word vectors in some way to obtain the vector expression of x, to represent different semantics. Finally, the word vectors of various forms are combined into a word vector of a word, which is directly concatenated by C&W. Since the word vectors obtained by the neural probabilistic language model are distributed in a continuous semantic space, even if the form of the word is different, its position in the semantic space is similar. The final loss function is defined as Eq. (14): L(x)=max(0,1s(x)+s(x)) L(x) = \max (0,1 - s(x) + s(\sim x))

In the aforementioned equation, s(x) represents a positive example and s(~x) represents a negative example. Then, the parameters in the neural network are trained by using the gradient descent method to obtain the final word vector.

The log-bilinear model is a probabilistic linear model proposed by Mnih and Hiton. It is trained by giving an N-gram model and predicting word vectors by concatenating the word vectors of the first N-1 words. The study by Mnih and Hinton in Nips in 2013 improves on the most time-consuming part of the aforementioned method, increasing the speed and ensuring the same effect. This model is called the ‘hierarchical log-bilinear’ model because it integrates the log-bilinear and hierarchical models, or simply the ‘HLBL’ model. A comparison of C&W and HLBL in terms of training corpus, dimensions and features is shown in Table 1.

Comparison of C&W and HLBL word vectors

NameTraining corpusDimensionFeatures

C&WENGLISH RCV1150,000Not case sensitive
HLBLREUTERS RCV1260,000Case sensitive

Since C&W and HLBL word vectors perform well in many NLP tasks, this article will represent text based on these two word vector addition methods as a comparative experiment for verb pattern clustering. This study builds a corpus based on these two word vectors and conducts experiments under different parameter combinations, that is, the word vectors of each word in the sentence are added and combined, and then divided by the sentence length to represent the sentence. This text representation can be represented as C = a + b, where C is the context, and a and b are different sentence components. In this study, when using C&W word vector addition to represent sentences, 11263 words are covered [30] and 1497 words not covered. When using HLBL word vector addition to represent sentences, 11262 words are covered and 1496 words are not covered, which achieves complete coverage.

WMD model

The WMD model was proposed by Mattj et al. in 2015. At present, this method works best in the field of text similarity calculation. In this method, the n-dimensional vectors of all words are first obtained through the word2vec model, then the Euclidean distance between words is calculated and the word frequency of words in the article is calculated [31]. The calculation problem of the distance between sentences can be transformed as follows: how to convert all the word units carried by sentence A into the corresponding word units of sentence B at the minimum cost so as to transform the problem into a transportation optimisation problem, which can be solved by a transportation optimisation algorithm. solve. Among them, the similarity of the word units corresponding to A and B before and after transportation is the standard to measure the processing cost. For example, if the sentence units are the same, moving the word ‘president’ in article A to the word ‘Clinton’ in article B will cost less than moving the word ‘president’ in article B to the word ‘Clare’.

The specific calculation method of WMD is as follows: with the cosine distance between words and word frequency as weights, under the weight constraints, find the optimal solution of WMD linear programming [32]. Assuming that there are sentences A and B, the input is the word frequency of all words in file A, the word frequency of all words in file B and the distance value matrix of all single words in A to the corresponding words in B. Finally, get the distance between documents A and B, that is, the minimum weighted cumulative cost required to move all words from A to B. The calculation method is shown in Eqs (15)(17): minT0i,j=0nTijc(i,j) {\min}\limits_{T \ge 0} \sum\limits_{i,j = 0}^n {T_{ij}}c(i,j) st:j=1nTij=dii{1,,n} st:\;\sum\limits_{j = 1}^n {T_{ij}} = {d_i}\quad {\forall _i} \in \{1, \cdots,n\} j=1nTij=dj'j{1,,n} \sum\limits_{j = 1}^n {T_{ij}} = d_j^{'}\quad {\forall _j} \in \{1, \cdots,n\}

In the aforementioned equations, c(i, j) = ||xixj||2 represents the distance from word i in file A to word j in file B; Tij represents the weight of moving word i to word j; di=cik=1nck {d_i} = {{{c_i}} \over {\sum\limits_{k = 1}^n {c_k}}} represents the word frequency of word i in document A; and dj'=cjk=1nck d_j^{'} = {{{c_j}} \over {\sum\limits_{k = 1}^n {c_k}}} represents the word frequency in document A. The word frequency of the word j.

Clustering algorithm based on WMD model

In order to cluster verb patterns in text using statistical machine learning methods, it is first necessary to formalise the text and then calculate text similarity. Finally, the text similarity value is used as input to implement verb pattern clustering. The expression of the text needs to fully express the semantics of the sentence. To fully express the semantic information of a sentence, not only the lexical information within the sentence but also the relationship information between words is needed. This work will study the formal representation of sentences from the lexical information and sentence structure information of sentences. At the same time, combined with the current popular sentence similarity research methods, different methods are used to cluster verb patterns. First, the data preprocessing [33] method is introduced, and then the influence of different text representation methods and clustering algorithms on the verb pattern clustering results is analysed using the WMD model.

Before performing semantic similarity-based verb pattern clustering on the corpus, the vector expression of the sentence needs to be obtained first and then the sentence vector expression is used as the input of the clustering algorithm to calculate the distance matrix of the entire text set.

In this paper, the evaluation criterion used in the verb pattern clustering task is the F1 value of combined precision and recall [34]. The evaluation code is from the CPA evaluation task of SemEval 2015. The final evaluation score is obtained by averaging the F1 values of multiple target verbs. The F1 value calculation method and evaluation score calculation method of the verb pattern clustering task are shown in Eqs (18) and (19): FIverb=2×Precisionverb×RecallverbPrecisionverb+Recallverb {FI}_{verb} = {{2 \times {Precision}_{verb}} \times {Recall}_{verb}} \over {{Precision}_{verb} + {Recall}_{verb}} ScoreTask=i=1nvecbF1verbnverb {Score}_{Task} = {{\sum\limits_{i = 1}^{{n_{vecb}}} F{1_{verb}}} \over {{n_{verb}}}}

Among them, precision and recall are important criteria for algorithm evaluation. This paper adopts the B-cubed standard. It was originally used to refer to the evaluation of tasks and was later extended to evaluation metrics for cluster tasks. In this paper, all evaluation values are the average calculated based on the precision and recall rate of each category. The calculation of the precision rate and recall rate used in this paper takes sentence pairs as the count unit, that is, if there are N sentences in the data set, then there are N(N − 1)/2 sentence pairs in the data set. The accuracy and calculation equation of the verb pattern clustering task are Eqs (20) and (21): Precisioni=RTPNrun {Precision}_i = {{R - TP} \over {Nrun}} Recalli=TPNgold {Recall}_i = {{TP} \over {Ngold}}

In the aforementioned equation, T P represents the number of sentences of the same type clustered into the same cluster; R − T P represents the number of sentences of the same type clustered into the same cluster; Nrun represents the number of sentence pairs in each clustering result cluster; and Ngold represents the number of statement pairs per class in the standard annotation set.

Then, through the WMD model, with the text as the input, the distance value of the text is directly calculated – that is, based on the similarity of words, with the flow of word movement [35] as the weight, the distance value of the text is calculated according to Eq. (15), where the similarity of words is based on word2vec word vectors, calculated by cosine similarity, and the flow of word movement is calculated according to the constraints in Eq. (16).

The WMD model adopted in this study considers not only the word sense information and the semantic relationship between words but also the overall structural relationship of the sentence. Therefore, compared with other text representation models, the impact of the WMD model on the R&L algorithm will be significantly improved, which is beneficial to improve the accuracy of verb pattern clustering.

In verb pattern clustering, the most important work is to choose an appropriate text representation model and clustering method. Through comparative experiments, the WMD model is selected for text representation. At the same time, by comparing the experimental results of the multi-text representation models of the three clustering algorithms, it is found that the R&L algorithm is the most suitable algorithm for the verb pattern clustering task.

The R&L density peak clustering algorithm is a simple clustering algorithm. Its conception is novel [36], simple and vivid. It can identify clusters of various shapes whose parameters are easy to determine. When running the clustering algorithm, the cluster centre is based on the local density ρi value and distance δi value of all sample points and manually selects the sample point with a higher local density ρ value and a distance δ value as the cluster centre.

The R&L algorithm only needs to determine one parameter t to define the cut-off distance dc. In a sense, the choice of dc determines the effect of the algorithm, and its choice depends on the specific problem. For data of different magnitudes, different dc values need to be chosen. However, using the t value to determine the dc value as a percentage reduces the dependence of the parameter on a particular problem [37], and although the choice of the t value depends on the experience of the algorithm user, in general, the algorithm has only one parameter, which is more convenient to use.

Experiment and analysis of experimental results

In order to verify the effectiveness of the clustering algorithm based on the WMD model, this study compares the C&W word vector addition and the HLBL word vector addition with the WMD model and downloads five English papers from related papers and journals as experimental data.

The title, abstract, outline, full text and other information of the paper are extracted from the full-text data, and another person selects the metadiscourse word blocks in the paper to build a word vector training corpus and selects the appropriate training model according to the amount of data and natural language processing tasks, optimisation algorithms and parameters. In this paper, the full text is divided into sentences, and textual features such as position, segmentation and contour of sentences are extracted. Taking each subsection as a unit, based on WMD and C&W word vector addition and HLBL word vector addition, the aforementioned word vector files are used to construct the semantic similarity between sentences, and the improved R&L algorithm is used to identify topic sentences. The obtained data are compared, and the results are shown in Table 2.

One round of recognition results

MethodAccuracy (%)Recall (%)F1 (%)

HLBL word vector addition + R&L algorithm22.3421.2420.54
C&W word vector addition + R&L algorithm24.6722.7922.10
WMD model + R&L algorithm29.7028.0328.63

WMD, word movement distance

The experimental results in Table 2 show that although the results of this method are slightly better than those of other methods in the same text passage, the results of these three methods are not ideal. Using the analysis of the experimental data and results, the specific reasons are found as follows: In this paper, each subsection is used as the basic unit for identification, and the fixed proportion method is used for identification, assuming that the core topic sentences of the paper are evenly distributed in the text. In fact, the distribution of metadiscourses in the various parts of the paper is different. For example, the introduction, experimental results and conclusions reflect the main results of this paper. Its metadiscourse distribution is high, and its distribution is low in the introduction of related research and methods. The identification process of the same scale greatly affects the identification results.

Therefore, the selection method is improved, and the experimental process is the same as the aforementioned process. Five articles were randomly selected from the word vector training corpus, and the commonly recognised sentences were selected through multi-person collaborative annotation on the labelled data. At the same time, in the identification process, the identification ratio of the beginning and the end of the paper is twice that of other parts. The final experimental results are shown in Table 3.

Results of the second round of recognition

MethodAccuracy (%)Recall (%)F1 (%)

HLBL word vector addition + R&L algorithm25.8937.4229.54
C&W word vector addition + R&L algorithm27.7343.8734.10
WMD model + R&L algorithm30.0650.8937.63

WMD, word movement distance

Through some adjustments to the experimental process, it can be seen that the accuracy rate of the WMD model + R&L algorithm reaches 30.06%, while the other two algorithms are still between 20% and 30%, indicating that the fusion algorithm in this paper has higher accuracy in identifying mutations; from the table, it can be observed that the recall rate of the algorithm in this paper is 50.89%, and that of the other two algorithms are 37.42% and 43.87%, respectively, which proves that the accuracy rate of the verbs recognised by the algorithm in this paper is high. The recognition effect obtained is higher than the F1 of the previous experiment, nearly 10%; the recognition effect based on the WMD model is about 3% higher than that of the other two recognition methods.

To sum up, the R&L algorithm based on the WMD model proposed in this paper has a better effect in identifying the variation of verb forms in English papers, reaching the average level of manual evaluation.

Conclusion

In recent years, metadiscourse in writing can enable authors to write with the relationship between the author and the reader in mind and to use metadiscourse to organise the entire text by combining reader-centred writing methods using language writing methods that revolve around the content of the text. Metadiscourse helps readers understand the author's intentions more accurately and deeply. Based on the R&L algorithm and the WMD model, this study designs an automatic extraction of verb pattern variation and classifies the syntactic components of patterns according to their semantic roles. From the aforementioned comparative experiments, we can find that although C&W word vector addition and HLBL word vector addition achieve complete coverage, the F1 data obtained through the experiment can reach about 10%, and the WMD model is effective in text distance calculation, which proves its effectiveness in the verb pattern clustering task. And in terms of sentence usability, although the recognition effect of this study still has much room for improvement, through careful analysis of the recognition results of this method, it is found that the sentences missed in the result set are generally valuable sentences, and the overall quality is better than that of other two methods. The verb pattern clustering method combining the WMD model and the R&L algorithm achieves good results in the verb pattern variation pattern clustering task, which is a great improvement compared to the known data set results.

Comparison of C&W and HLBL word vectors

Name Training corpus Dimension Features

C&W ENGLISH RCV1 150,000 Not case sensitive
HLBL REUTERS RCV1 260,000 Case sensitive

One round of recognition results

Method Accuracy (%) Recall (%) F1 (%)

HLBL word vector addition + R&L algorithm 22.34 21.24 20.54
C&W word vector addition + R&L algorithm 24.67 22.79 22.10
WMD model + R&L algorithm 29.70 28.03 28.63

Results of the second round of recognition

Method Accuracy (%) Recall (%) F1 (%)

HLBL word vector addition + R&L algorithm 25.89 37.42 29.54
C&W word vector addition + R&L algorithm 27.73 43.87 34.10
WMD model + R&L algorithm 30.06 50.89 37.63

Zhang Y, Qian T, Tang W. Buildings-to-distribution-network integration considering power transformer loading capability and distribution network reconfiguration[J]. Energy, 2022, 244. ZhangY QianT TangW Buildings-to-distribution-network integration considering power transformer loading capability and distribution network reconfiguration[J] Energy 2022 244 10.1016/j.energy.2022.123104 Search in Google Scholar

T. Qian, Xingyu Chen, Yanli Xin, W. H. Tang, Lixiao Wang. Resilient Decentralized Optimization of Chance Constrained Electricity-gas Systems over Lossy Communication Networks [J]. Energy, 2022, 239, 122158. QianT. ChenXingyu XinYanli TangW. H. WangLixiao Resilient Decentralized Optimization of Chance Constrained Electricity-gas Systems over Lossy Communication Networks [J] Energy 2022 239 122158 10.1016/j.energy.2021.122158 Search in Google Scholar

Baining Zhao, Tong Qian, Wenhu Tang, Qiheng, Liang. A Data-enhanced Distributionally Robust Optimization Method for Economic Dispatch of Integrated Electricity and Natural Gas Systems with Wind Uncertainty[J] Energy, 2022, Energy, 2022: 123113. ZhaoBaining QianTong TangWenhu QihengLiang A Data-enhanced Distributionally Robust Optimization Method for Economic Dispatch of Integrated Electricity and Natural Gas Systems with Wind Uncertainty[J] Energy 2022 Energy, 2022: 123113 10.1016/j.energy.2022.123113 Search in Google Scholar

T. Qian, Y. Liu, W. H Zhang, W. H. Tang, M. Shahidehpour. Event-Triggered Updating Method in Centralized and Distributed Secondary Controls for Islanded Microgrid Restoration[J]. IEEE Transactions on Smart Gird, 2020, 11(2):1387–1395. QianT. LiuY. ZhangW. H TangW. H. ShahidehpourM. Event-Triggered Updating Method in Centralized and Distributed Secondary Controls for Islanded Microgrid Restoration[J] IEEE Transactions on Smart Gird 2020 11 2 1387 1395 10.1109/TSG.2019.2937366 Search in Google Scholar

Mauranen A. Contrastive ESP rhetoric: Metatext in Finnish-English economics texts[J]. English for Specific Purposes, 1993. MauranenA Contrastive ESP rhetoric: Metatext in Finnish-English economics texts[J] English for Specific Purposes 1993 10.1016/0889-4906(93)90024-I Search in Google Scholar

Hyland K. Persuasion and context: The pragmatics of academic metadiscourse[J]. Journal of Pragmatics, 1998, 30(4):437–455. HylandK Persuasion and context: The pragmatics of academic metadiscourse[J] Journal of Pragmatics 1998 30 4 437 455 10.1016/S0378-2166(98)00009-5 Search in Google Scholar

Xianyu R, Jin X U. A Contrastive Study on Application of Metadiscourse in English and Chinese Petroleum Academic Texts[J]. Journal of Southwest Petroleum University (Social Sciences Edition), 2016. XianyuR JinX U A Contrastive Study on Application of Metadiscourse in English and Chinese Petroleum Academic Texts[J] Journal of Southwest Petroleum University (Social Sciences Edition) 2016 Search in Google Scholar

Yang C. A Comparative Study of Metadiscourse in English and Chinese Academic Book Reviews[J]. Journal of Changchun University of Science and Technology (Social Sciences Edition), 2009. YangC A Comparative Study of Metadiscourse in English and Chinese Academic Book Reviews[J] Journal of Changchun University of Science and Technology (Social Sciences Edition) 2009 Search in Google Scholar

MR Estaji. A comparative analysis of interactional metadiscourse markers in the Introduction and Conclusion sections of mechanical and electrical engineering research papers[J]. Iranian Journal of Language Teaching Research, 2015, 3. EstajiMR A comparative analysis of interactional metadiscourse markers in the Introduction and Conclusion sections of mechanical and electrical engineering research papers[J] Iranian Journal of Language Teaching Research 2015 3 Search in Google Scholar

Account T, Schmid T, Vogel R. Dialectal Variation in German 3Verb Clusters[J]. journal of comparative germanic linguistics, 2004. AccountT SchmidT VogelR Dialectal Variation in German 3Verb Clusters[J] journal of comparative germanic linguistics 2004 Search in Google Scholar

Zhang, Ren. Symbolic flexibility and argument structure variation[J]. Linguistics, 2006, 44(4):689–720. ZhangRen Symbolic flexibility and argument structure variation[J] Linguistics 2006 44 4 689 720 10.1515/LING.2006.022 Search in Google Scholar

Baker M C. Pseudo Noun Incorporation as Covert Noun Incorporation: Linearization and Crosslinguistic Variation[J]. Language and Linguistics, 2014. BakerM C Pseudo Noun Incorporation as Covert Noun Incorporation: Linearization and Crosslinguistic Variation[J] Language and Linguistics 2014 10.1177/1606822X13506154 Search in Google Scholar

Lucas C, Bayley R. Variation in ASL: The Role of Grammatical Function[J]. Sign Language Studies, 2005, 6(1):38–75. LucasC BayleyR Variation in ASL: The Role of Grammatical Function[J] Sign Language Studies 2005 6 1 38 75 10.1353/sls.2006.0005 Search in Google Scholar

Uiboaed K, Hasselblatt C, Lindstroem L, et al. Variation of verbal constructions in Estonian dialects[J]. Literary & Linguistic Computing, 2013, 28(1):42–62. UiboaedK HasselblattC LindstroemL Variation of verbal constructions in Estonian dialects[J] Literary & Linguistic Computing 2013 28 1 42 62 10.1093/llc/fqs053 Search in Google Scholar

Oltra-Massuet I, Castroviejo E. A syntactic approach to the morpho-semantic variation of -ear[J]. Lingua, 2014, 151:120–141. Oltra-MassuetI CastroviejoE A syntactic approach to the morpho-semantic variation of -ear[J] Lingua 2014 151 120 141 10.1016/j.lingua.2014.07.018 Search in Google Scholar

Wong M L. Generalized Bounded Variation and Inserting point masses[J]. arXiv e-prints, 2007. WongM L Generalized Bounded Variation and Inserting point masses[J] arXiv e-prints 2007 10.1007/s00365-008-9024-0 Search in Google Scholar

Joan, Bybee, Rena, et al. Phonological and Grammatical Variation in Exemplar Models[J]. Studies in Hispanic & Lusophone Linguistics, 2008, 1(2). JoanBybee Rena Phonological and Grammatical Variation in Exemplar Models[J] Studies in Hispanic & Lusophone Linguistics 2008 1 2 10.1515/shll-2008-1026 Search in Google Scholar

C, Gordon, M, et al. Metadiscourse in group supervision: How school counselors-in-training construct their transitional professional identities[J]. Discourse Studies, 2015, 18(1). Gordon, MC Metadiscourse in group supervision: How school counselors-in-training construct their transitional professional identities[J] Discourse Studies 2015 18 1 10.1177/1461445615613180 Search in Google Scholar

Blom E, Baayen H R. The impact of verb form, sentence position, home language and proficiency on subject-verb agreement in child L2 Dutch[J]. Applied Psycholinguistics, 2013, 34(04):777–811. BlomE BaayenH R The impact of verb form, sentence position, home language and proficiency on subject-verb agreement in child L2 Dutch[J] Applied Psycholinguistics 2013 34 04 777 811 10.1017/S0142716412000021 Search in Google Scholar

TORE, HELSTRUP, BO, et al. Procedural dependence in action memory: Effects of verb form and individual vs group conditions[J]. Scandinavian Journal of Psychology, 1996, 37(3):329–337. HELSTRUPTORE BO Procedural dependence in action memory: Effects of verb form and individual vs group conditions[J] Scandinavian Journal of Psychology 1996 37 3 329 337 10.1111/j.1467-9450.1996.tb00665.x Search in Google Scholar

Blom, Elma, Baayen, et al. The impact of verb form, sentence position, home language, and second language proficiency on subject – verb agreement in child second language Dutch.[J]. Applied Psycholinguistics, 2013. BlomElma Baayen The impact of verb form, sentence position, home language, and second language proficiency on subject – verb agreement in child second language Dutch.[J] Applied Psycholinguistics 2013 10.1017/S0142716412000021 Search in Google Scholar

Xuan L, Michel J, Sylvie M, et al. The concept of Text Topology. Some applications to Verb-Form Distributions in Language Corpora[J]. Literary & Linguistic Computing, 2016(2):167–186(20). XuanL MichelJ SylvieM The concept of Text Topology. Some applications to Verb-Form Distributions in Language Corpora[J] Literary & Linguistic Computing 2016 2 167 186(20) Search in Google Scholar

Lyle, Lustigman, Eve, et al. Exposure and feedback in language acquisition: adult construals of children's early verb-form use in Hebrew[J]. Journal of Child Language, 2019. LustigmanLyle Eve Exposure and feedback in language acquisition: adult construals of children's early verb-form use in Hebrew[J] Journal of Child Language 2019 10.1017/S030500091800040530326987 Search in Google Scholar

Waard A D, Maat H P. Verb form indicates discourse segment type in biological research papers: Experimental evidence[J]. Journal of English for Academic Purposes, 2012, 11(4):357–366. WaardA D MaatH P Verb form indicates discourse segment type in biological research papers: Experimental evidence[J] Journal of English for Academic Purposes 2012 11 4 357 366 10.1016/j.jeap.2012.06.002 Search in Google Scholar

Bernander R. On the “Atypical” Imperative Verb Form in Manda[J]. Studia Orientalia Electronica, 2020, 8(3):22–42. BernanderR On the “Atypical” Imperative Verb Form in Manda[J] Studia Orientalia Electronica 2020 8 3 22 42 10.23993/store.69737 Search in Google Scholar

LeoneLjubical.leone1@lancaster.ac.uk Lancaster University, Lancaster, United Kingdom. Phrasal verb vs. Simplex pairs in legal-lay discourse: the Late Modern English period in focus[J]. Yearbook of Phraseology, 2021, 12(1):197–220. LeoneLjubical.leone1@lancaster.ac.uk Lancaster University, Lancaster, United Kingdom. Phrasal verb vs. Simplex pairs in legal-lay discourse: the Late Modern English period in focus[J] Yearbook of Phraseology 2021 12 1 197 220 10.1515/phras-2021-0008 Search in Google Scholar

Jdrzejowski U. On the habitual verb pflegen in German: Its use, origin, and development[J]. Linguistics, 2021, 59(6):1473–1530. JdrzejowskiU On the habitual verb pflegen in German: Its use, origin, and development[J] Linguistics 2021 59 6 1473 1530 10.1515/ling-2021-0161 Search in Google Scholar

Freudenthal D, Ramscar M, Leonard L B, et al. Simulating the Acquisition of Verb Inflection in Typically Developing Children and Children With Developmental Language Disorder in English and Spanish[J]. Cognitive Science, 2021, 45(3). FreudenthalD RamscarM LeonardL B Simulating the Acquisition of Verb Inflection in Typically Developing Children and Children With Developmental Language Disorder in English and Spanish[J] Cognitive Science 2021 45 3 10.1111/cogs.1294533682196 Search in Google Scholar

Alotaibi Y. Verb Form and Tense in Arabic[J]. International Journal of English Linguistics, 2020, 10(5):284. AlotaibiY Verb Form and Tense in Arabic[J] International Journal of English Linguistics 2020 10 5 284 10.5539/ijel.v10n5p284 Search in Google Scholar

M Keshev, Meltzer-Asscher A. Noisy is better than rare: Comprehenders compromise subject-verb agreement to form more probable linguistic structures[J]. Cognitive Psychology, 2021, 124. KeshevM Meltzer-AsscherA Noisy is better than rare: Comprehenders compromise subject-verb agreement to form more probable linguistic structures[J] Cognitive Psychology 2021 124 10.1016/j.cogpsych.2020.10135933254044 Search in Google Scholar

Tatsumi T, Chang F, Pine J M. Exploring the acquisition of verb inflections in Japanese: A probabilistic analysis of seven adult-child corpora[J]. First Language, 2020, 41(3). TatsumiT ChangF PineJ M Exploring the acquisition of verb inflections in Japanese: A probabilistic analysis of seven adult-child corpora[J] First Language 2020 41 3 10.1177/0142723720926320 Search in Google Scholar

Stockwell R. Contrast and verb phrase ellipsis: The case of tautologous conditionals[J]. Natural Language Semantics, 2022, 30(1):77–100. StockwellR Contrast and verb phrase ellipsis: The case of tautologous conditionals[J] Natural Language Semantics 2022 30 1 77 100 10.1007/s11050-022-09189-3 Search in Google Scholar

Lubis R, Widayati D. Marine Ecolexicon of Noun-Verb of the Coast Community in Pesisir Barus, Central Tapanuli[J]. Jurnal Arbitrer, 2021, 8(1):82. LubisR WidayatiD Marine Ecolexicon of Noun-Verb of the Coast Community in Pesisir Barus, Central Tapanuli[J] Jurnal Arbitrer 2021 8 1 82 10.25077/ar.8.1.82-92.2021 Search in Google Scholar

D'Angelo L. Gender Identity and Authority in Academic Book Reviews: An Analysis of Metadiscourse across Disciplines[J]. linguistica e filologia, 2008. D'AngeloL Gender Identity and Authority in Academic Book Reviews: An Analysis of Metadiscourse across Disciplines[J] linguistica e filologia 2008 Search in Google Scholar

Troup A C. Metadiscourse in Middle English and Early Modern English Religious Texts: A Corpus-based Study (review)[J]. Jegp Journal of English & Germanic Philology, 2012, 111(1):120–122. TroupA C Metadiscourse in Middle English and Early Modern English Religious Texts: A Corpus-based Study (review)[J] Jegp Journal of English & Germanic Philology 2012 111 1 120 122 10.5406/jenglgermphil.111.1.0120 Search in Google Scholar

Wang Q. A Study of Interactional Metadiscourse from the Perspective of Communicative Action Theory in English Academic Discourse[J]. Journal of Northeast Normal University (Philosophy and Social Sciences), 2018. WangQ A Study of Interactional Metadiscourse from the Perspective of Communicative Action Theory in English Academic Discourse[J] Journal of Northeast Normal University (Philosophy and Social Sciences) 2018 Search in Google Scholar

Birhan A T. An exploration of metadiscourse usage in book review articles across three academic disciplines: a contrastive analysis of corpus-based research approach[J]. Scientometrics, 2021(4). BirhanA T An exploration of metadiscourse usage in book review articles across three academic disciplines: a contrastive analysis of corpus-based research approach[J] Scientometrics 2021 4 10.1007/s11192-020-03822-w Search in Google Scholar

Artículos recomendados de Trend MD

Planifique su conferencia remota con Sciendo