1. bookAHEAD OF PRINT
Dettagli della rivista
License
Formato
Rivista
eISSN
2444-8656
Prima pubblicazione
01 Jan 2016
Frequenza di pubblicazione
2 volte all'anno
Lingue
Inglese
Accesso libero

A study on the phenomenon of anaphoric correction in college students’ English conversation

Pubblicato online: 23 Dec 2022
Volume & Edizione: AHEAD OF PRINT
Pagine: -
Ricevuto: 18 Jul 2022
Accettato: 16 Sep 2022
Dettagli della rivista
License
Formato
Rivista
eISSN
2444-8656
Prima pubblicazione
01 Jan 2016
Frequenza di pubblicazione
2 volte all'anno
Lingue
Inglese
Introduction

English for college students is one of the compulsory courses in university teaching, and its importance is reflected in all aspects, especially in conversation, and due to the limited English level of college students, various problems often occur in English conversation, which requires learning conversation repair; conversation self-repair is a self-corrective behaviour taken by the speaker after monitoring his or her own discourse errors, which is the main form of conversation repair [13]. Since 1977, when American sociologists Shegloff and Sacks [4] conducted a systematic study of conversational repair and first proposed the internal structure and judgment criteria of conversational self-repair, conversational self-repair has received the attention of many researchers at home and abroad. Over the past two decades, self-repair has achieved many results in the field of second language acquisition and other research areas. It is believed that self-repair can lead to comprehensible output, which is an important factor for successful language learning and a measure of success in second language learning.

Referring back is a common linguistic phenomenon that is frequently used in everyday conversation and writing. In order to meet the needs of communicative activities, language has developed a complete and complex system of reflexive language. We need to study not only the use of reflexive language in communication but also the structural features of the reflexive system itself. Specifically, the study of reflexives involves the following questions: Why do reflexives exist in their current structure? What are the roots of the differences between different reflexive structures and each other? How are reflexives used in discourse and what functions do they serve? How is the meaning of the reflexive interpreted in the process of communication? And so on. After the first mention of a person, thing or event in a discourse, we repeat the person, thing or event a second or third time in subsequent lines. This ‘‘‘repeated mention’’’ is a reflexive reference. The person, thing or event first mentioned exists in memory as a conceptual entity and becomes the object of the subsequent reflexive reference, which we call a prior concept. It is called a prior concept because it is already present in the context as opposed to the ‘‘‘repeated mention’’’ that follows. The word or phrase used in the first mention of the concept is called the ‘‘‘antecedent’’’, and it is the word that needs to be referred to in order to correctly interpret the reflexive that follows the repeated mention of the concept.

A reflexive is a word or phrase that is used to refer to a prior concept repeatedly in a discourse. The object referred to by the reflexive should be the same as the object referred to by the antecedent, i.e., it should be in a ‘‘‘homonymous’’’ relationship with the antecedent [510].

However, the questions of why there are structural and distributional differences between different reflexives and how they are interpreted in the communicative process are central to the study of reflexives. Different linguistic theories have attempted to answer these questions. As a typical linguistic phenomenon, it has become the best experimental object for linguistic schools to test the validity of their theoretical explanations. The explanation of the phenomenon of reflexivity depends fundamentally on our understanding of the nature of linguistic structure. Different linguistic theories have provided different explanations for the phenomenon from their own perspectives. Broadly speaking, there are three basic positions on the understanding of the nature of the reflexive phenomenon: formal, pragmatic and cognitive, and the corresponding representative theories are generative grammar, neo-Griceanism and accessibility theory, respectively [1116].

Generative grammar has long dominated the field of language studies. The main researchers who have used the theory of generative grammar to study reflexivity are Reinhart (1983) and Chomsky (1981a, 1982). Chomsky’s theory of constraint attempts to find a set of explanatory rules for the internal referents of sentences. His study focuses on three types of referential phenomena: referring expression, anaphor and pronoun. Referring expressions include proper names, noun phrases, etc. (equivalent to what we call name words); corresponding expressions include reflexive pronouns, mutual pronouns, etc.; pronouns refer mainly to what is traditionally called personal pronouns in grammar. Like all other theories, Chomsky’s theory is based on a set of basic categories, and Chomsky’s syntactic theoretical system consists mainly of a series of conversion rules for the generation of small sentences. These rules are mainly based on a system of categories of sentence constituents and their relations. His study of reflexivity is also based on this purely syntactic system of categories.

Among the many studies, Pillai [17] proposed a model of the internal structure of self-repair that has greatly influenced the subsequent related studies. The model he developed is as follows (Figure 1):

Fig. 1

Self-repairing internal structure model

The above model shows that the talker self-monitors his cognitive speech behaviour, and when he detects a speech error after crossing ‘‘‘at the’’’, he stops the flow and enters the second phase of self-repair, i.e., the editing phase. In order to fill the gaps in this stage, the speaker used the filler word ‘‘‘no’’’. In this stage, he tries to re-plan the speech and speech repair strategies in order to repair the speech errors (hereinafter referred to as speech errors). Finally, in the third stage of repair, a replacement strategy was used, replacing ‘‘‘left’’’ with ‘‘‘right’’’. In the third stage of repair, the speaker did not repair the verbal error directly but redirected it to the beginning of the verb phrase, and the process went from self-repair to self-replanning to self-repair, where ‘‘‘turn’’’ was the ‘‘‘initial word of retraced constituent’’’ (hereinafter referred to as the initial word of retraced).

It was found that in the error correction stage of self-repair, the phenomenon of retracing is more common. The reflexive indicates that the speaker tends to go back to the beginning of the repaired constituent when performing the repair, rather than directly after the wrong word. This terminology was introduced by Levelt in 1983 in his model of the internal structure of self-repair described above and has been used ever since. A corpus study by Nootboom [18] found that lexical errors were more likely to cause reflexivity than phonological errors, but there is not much literature on the prominent phenomenon of verbal reflexive repair. Later, in a colour naming experiment with native speakers, the self-repair of speaker errors was divided into three types, with instant repair (starting the repair directly with the word at error) accounting for 42% and reflexive repair (anticipatory retracing: tracing to and repeating some word prior to the trouble element) accounting for 35%, and starting with fresh material that was not part of, Levelt (henceforth L1) argues that retracing indicates that the speaker is trying to help the listener with the problem of speech continuation because the listener is faced with the problem of distinguishing and identifying the word that the speaker is trying to express from the interrupted word, and thus, retracing helps the listener to relate the corrected word to the original word. Nagano [19] drew on Levelt’s experimental model to compare the difference in the use of verbal self-repair between L1 and second language learners (hereafter L2, Japanese learners of English in this study) and found that L2 used more reflexive repair than L1 under proper repair and wrong repair, with a frequency of 33%:25% and 41%:63%, respectively. He explained this phenomenon by the fact that L2 used finger-back repair to enable listeners to keep up with the communication; the speakers bought time to re-plan the language and used it as a delaying strategy.

A domestic researcher, Yaojianpeng [20], classified the strategies used by speakers for finger retrieval as retracing without overt correction (hereafter referred to as self-repetition), retracing with correction (which can be subdivided into replacement and insertion retracing) and retracing with reformulation (retracing with reformulation). The study found that function words were more likely to cause repetition than real words in self-repetition and that 70% of the retracing was caused by pronouns, i.e., retracing was most likely to be caused by function words. The study also pointed out that one of the motivations of the speaker for reflexive repair was to make the repaired speech accurate and syntactically and semantically acceptable. In addition, by referring back, the speaker expects the hearer to follow and understand his or her communicative intention. However, this study investigated the L1’s reflexive repair. The present study aims to investigate this common linguistic phenomenon in order to fill the gap. This study aims to answer the following two questions: (1) What are the characteristics of Chinese college students’ use of English conversational reflexive repair strategies and (2) What is the distribution of the refer-back words among Chinese college students in English conversational reflexive repair?

Overview of related theories

The corpus of this study was obtained from the College Learners Spoken English Corpus (hereinafter referred to as COLSEC) [21]. The source of the COLSEC corpus was the oral test of the National College English Test (CET-SET). The corpus is based on the transcribed materials of the CET-SET. The corpus was selected using a random proportional sampling method, proportional to the candidates’ regional origin, major, test scores and topics of conversation [22]. The types of discourse in the candidate corpus include three types: teacher-student interview, student discussion and teacher-student discussion, and all three parts of the examination sessions revolve around certain topics. A simple random sample of 31 texts was selected from 2000 to 2002, involving 29 male and 65 female examinees. There were 14 topics in the test, including mass media, the use of computer, examination, staying healthy, spending, big or small families and caring for others. The topics are divided into three main categories: campus life, social and natural concerns and personal and interpersonal issues, which have a strong focus and can easily reflect the characteristics of vocabulary use and the use of communication strategies.

Corpus analysis and correction model

In classifying the reflexive repair strategies of the COLSEC corpus, we drew on previous classification models and adjusted the classification model with the present corpus to classify the reflexive repair strategies into three major categories: the first category is reflexive without apparent error correction (i.e., self-repetition, hereafter referred to as repetition), which can be subdivided as follows: two-word repetition, multi-word repetition and continuous repetition (excluding word fragment repetition and one-word repetition which are common in self-repair because they do not have the characteristic of referring back). Two-word repetition is the repetition of two lexical items, which is the most common in this corpus. Multi-word repetition refers to repetitions involving more than two lexical items. There is a particular type of repetition in this corpus, where the speaker consistently makes a two-word or multi-word repetition. Researchers have interpreted repetition differently, and Hieke [23] states that repetition performs two main functions, namely, delay and repair. In addition, it acts as a pause, indicating that the speaker is assembling speech, during which the speaker’s memory is temporarily blocked, lexical extraction becomes difficult and time is needed to extract the exact word to overcome the current lexical selection problem [24]. The second category is error correction back to the finger, which includes substitution, insertion and deletion. Substitution is mainly accomplished through the speaker’s self-monitoring of lexical items, which suggests that the speaker’s motivation for this repair is to use more appropriate lexical and grammatical expressions to convey his or her communicative intentions. The insertion not only indicates a high level of self-monitoring ability and awareness of natural speech communication but also clearly indicates that the discourse is made more complete and accurate by inserting information to express the discourse’s communicative intentions. Deletion is a new type found in this corpus. In this type of reflexive, the discourse participant undergoes a reflexive-repair process when performing substitution, insertion or deletion repair. The third category is reorganisation: in this type of repair, the speaker refers back to the beginning of the constituent, but after referring back to the word, he or she re-plans the discourse in terms of syntactic structure or semantic composition. We manually annotated the corpus according to the above classification model and then counted the frequency of each type of repair using the search function of corpus retrieval software AntConc 3.2.2.

However, the classification of reflexives can be multifaceted: we can classify them by their formal structural features, by their syntactic function or by the conceptual relationship between the reflexive and its antecedent. In a broad sense, all repeated references to people, things, features, behaviours, etc. in a discourse can be regarded as reflexive. In sentences, these different referred objects often belong to different semantic roles or event elements of the sentence proposition and correspond to different sentence constituents. For example, people and things often correspond to nouns, features often correspond to adjectives or adverbs and actions often correspond to verbs. Therefore, some scholars classify reflexives according to the syntactic features of the antecedent.

Modified classification model

Classification methods are based on context-sensitive contexts and can be traced back to coronal and prepositional error correction studies in the early days. In the classification approach, for a specific syntactic error type, the classification model eventually outputs a label representing the correct form of the target word (e.g., on the singular-plural type of a noun, 0 for the singular form and 1 for the plural form). Classification models are divided into two main categories: statistical classification models and deep classification models [25].

Statistical classification models

Statistical classification models are more common in GEC research before the recovery of deep learning and involve models such as average perceptual machines, plain Bayesian classifiers and N-gram-based language models [26, 27].

The perceptron is a linear classification model that works by classifying an input feature vector [28] into two classes (+1 or –1) by means of a linear discriminant function, specifically the linear discriminant function has the following mathematical Eq. (1): f(x)=sign(inωixi+b) where sign is the sign function Eq. (2): sign(x)={+1,x>01,x0

Linear Eq. (3) represents a hyperplane on the feature space Rn that divides positive and negative points. The learning goal of the perceptron is to compute the parameters ω and b such that the positive and negative points are completely separated on both sides of the hyperplane, assuming that the training data set is linearly separable. The parameters of the perceptron are updated according to its defined loss function and gradient descent method. Earlier, the loss function was defined as the total number of misclassified points of the perceptron, but in this way, the function was not a continuously derivable function of the parameters ω and b, thus posing difficulties in optimisation. Later, the researchers defined the loss function as the sum of the distances from the misclassified points to the separation hyperplane, which effectively solved the above difficulties. Assuming that the set of misclassified points is M, the loss function is defined as in Eq. (4).

inωixi+b=0 L(ω,b)=xiMyi(ωxi+b)

There is also the plain Bayesian classifier, a classification model based on Bayes’ theorem [29], which takes the category with the highest probability as the category of the item to be classified by finding the probability of occurrence of each category given the conditions of occurrence of the item to be classified. Bayes’ theorem is able to calculate the conditional probability P(B∣A) of event B occurring on the basis of knowing the probability P(A) of event A occurring, the probability P(B) of event B occurring and the conditional probability P(A∣B) of event A occurring on the basis of B occurring, as in Eq. (5): P(BA)=P(AB)P(B)P(A)

Suppose Eq. (6) is a term to be classified, where ai is a characteristic attribute of x. The set of categories [as in Eq. (7)] is calculated using the Yeast formula to find P(y1 ∣x), P(y2 ∣x), …, P(ynmidx); then, according to the plain Bayesian algorithm, if Eq. (8) is satisfied, the category of yk is as follows: x={a1,a2,,am} C={y1,y2,,yn} P(ykx)max{P(y1x),P(y2x),,P(ynx)}

N-gram is a natural language processing model based on a statistical approach [30], and its core principle is to process text using a sliding window of length N bytes to produce a sequence of byte fragments of length N. Each byte fragment is called a gram. The algorithm counts the self-count of all the grams that appear in the text, and it then filters out the grams with low occurrences according to a certain threshold, thus obtaining a list of key grams representing the feature space of the text vector, where each gram in the list represents one feature vector dimension of the space.

The N-gram assumes that the occurrence probability of the Nth word is related to the previous N-1 words only, and the occurrence probability of the whole sentence is the product of the occurrence probabilities of individual words. At present, the most commonly used N-grams are binary gram and ternary gram.

N-grams can be used to evaluate whether an utterance is reasonable or not. Suppose there is a sequence of m words and we want to get the probability p(ω1,ω2,,ωm), according to the chain rule, we can get Eq. (9): p(ω1,ω2,,ωm)=p(ω1)p(ω2ω1)p(ω3ω1,ω2)p(ωmω1,,ωm1)

Using the Markov chain assumption that the current word is only related to a limited number of words before it, the computational effort can be significantly reduced. The specific formula is as follows (10): p(ω1,ω2,,ωm)=p(ωiωin+1,,ωi1) where n denotes words related to the first n words.

The common binary and ternary models are defined as Eqs (11) and (12), respectively: P(ω1,ω2,,ωm)=i=1mP(ωi|ωi1) P(ω1,ω2,,ωm)=i=1mP(ωi|ωi2ωi1)

Therefore, given the training corpus, the above conditional probabilities can be computed using Bayes’ theorem. When selecting the more correct sentence among the two corrected sentences, the N-gram can be used to calculate the probabilities of the two sentences separately, and the one with the higher probability is selected as the final correct sentence.

Model overview

This paper presents important research results in the field of English grammar error correction in recent years and research works that are closely related to the algorithms in this paper. These research studies mainly include word embedding and Seq2Seq models.

Word embedding

Word embedding method is to map each word or phrase from a high-dimensional space with dimensionality of all the number of words to a low-dimensional real domain space, where each word or phrase corresponds to a vector in this low-dimensional space. Word2Vec and GloVe are two excellent algorithms for generating word embedding vectors.

Word2Vec

Word2Vec learns the vector representation of words from a corpus in an unsupervised way, including two models, CBOW and Skip-gram, assuming a window size of c. A word ωt in the corpus is randomly taken, and the context of the word consists of its first c and last c words, as shown in Eq. (13): context(ωt)=(ωtc,ωt2,ωt1;ωt+1,ωt+2,,ωt+c)

GloVe

GloVe is an unsupervised machine learning algorithm that trains word vectors based on global word frequency statistics. Similar to the Word2Vec method, GloVe word vectors can also obtain some semantic properties between words, such as similarity and analogy. GloVe is implemented in three steps as follows:

Construct a co-occurrence matrix X based on the corpus, where each element Xij denotes the number of occurrences of the word ωi and its context word ωj together in a context window of specified size. GloVe adopts a distance-dependent weight, assuming that the distance of two words in the context window is d, which corresponds to a decay weight of 1/d. Thus, if two words are further apart, the smaller the weight they account for the total count.

Construct the approximate relationship between the word vector and the co-occurrence matrix, and the formula is Eq. (14): ωiTω^j+bi+b^j=log(Xij) where the vector ωi represents the word itself and the vector ω^j represents the context of the word, both of which are the word vectors for the final required solution; bi and b^j are the bias terms of the two word vectors, respectively.

Construct a loss function based on the mean variance as in Eq. (15): L=i,j=1Vf(Xij)(ωiTω^j+bi+b^jlog(Xij))2 where the f(x) function is shown in Eq. (16): f(x)={(x/xmax)α,x<xmax1,otherwise

Here, the α value of 0.75 is taken, and the xmax value of 100 is taken.

Compared with Word2Vec, GloVe is easier to parallelise during the training process and therefore faster to train on large data sets. In practice, GloVec word vectors perform better on test sets for some tasks, and word2Vec word vectors perform better on test sets for other tasks. Overall, there are no clear advantages or disadvantages of these two methods, and both are the preferred word embedding methods today.

Seq2Seq model

The Seq2Seq model was first applied to neural machine translation research. The Seq2Seq model first obtains the vector representation of each word in the input sentence by word embedding and then inputs the vector sequence (V(A),V(B),V(C),V(EOS)) into the encoder RNN to obtain the encoding result h, where EOS represents the end point of the sentence. The decoder RNN first outputs the closest vector representation and then takes the decoder RNN output as the input for the next moment to obtain the closest vector representation of V(X),V(Y),V(Z),V(EOS) in turn.

The RNN unit in the Seq2Seq model is designed to evaluate the conditional probability, as in Eq. (17): p(y1,,yT2x1,,xT1)=t=1T2p(yth,y1,,yt1) where the input sequence Y1 is represented by Eq. (18) and the output sequence Y2 is represented by Eq. (19); the lengths T1 and T2 may not be equal, and p(yt|h,y1,,yt1) is based on the softmax distribution of all words in the word list. In the training phase, the objective function is defined by maximising the log probability, as shown in Eq. (20).

Y1=(x1,,xT1) Y2=(y1,,yT2) L=1|Z|(T,S)Zlogp(TS) where S denotes the input sentence, T denotes the sentence corresponding to the correct translation, and Z is the training set. When the training is finished, the most likely translation result T can be found by the decoder RNN, as shown in Eq. (21): T^=argmaxTp(TS)

The Seq2Seq model in English grammar error correction is basically the same as that in machine translation, except that the parallel corpus is no longer one country’s language corresponding to another country’s language, but sentences with grammatical errors corresponding to correct sentences. Various improvement and optimisation methods for the Seq2Seq model are applicable to both scenarios.

Results and discussion
Research results

We counted the refer-back words (same method as above) and compared them with the distribution of refer-back words in the Santa Barbara American Spoken English Corpus Linguistic Data Consortium (LDC) 2003S06 from Yaojianpeng [20] (Table 1).

Distribution of anaphora in corpus COLSEC

AnaphoraPart of speechFrequencyPercentage (%)
Function wordPronoun12045.2
Preposition209.1
Conjunction2511.2
In total 16565.5
Notional wordsNoun42.1
Verb4019.2
Adjective94.3
Adverb52.2
In total 5827.8
Back to the finger repair strategy characteristics analysis

From the previous discussion, it can be seen that Chinese college students used the most repetition among the finger-back repair strategies, among which two-word repetition, multi-word repetition and continuous repetition were dominant, which is basically consistent with the related studies at home and abroad. The majority of error-correcting reflexes were substitution, insertion, and deletion, and students showed a more flexible level of substitution, while only a small proportion of reorganisation reflexes were used. We suggest that these characteristics of students’ use of finger-repair strategies are influenced by their self-monitoring ability, information processing ability and attention allocation in verbal communication.

Self-monitoring is essential for self-repair of verbal errors, and in natural speech monitoring theory, perceptual circuit theory is involved in this process. In this context, the speech production model consists of four modules (conceptualiser, formulator, articulator and self-monitoring), two loops (internal loop and external loop perceptual loops) and a speech comprehension system (see Figure 2), in which the speaker self-monitors his or her speech through two perceptual loops. The inner loop monitors the information between the syntactic formers and the concept formers through the speech understanding system, while the outer loop monitors the information between the articulators and the concept formers through the listening and speech understanding system. In this way, not only implicit speech errors are monitored but also explicit speech errors are monitored so that the communication between the speaker and the listener can be carried out smoothly and accurately.

Fig. 2

Speech production model

Self-monitoring is closely related to controlled processing and attention, and since human attention is limited, if attention is not sufficient to perform the task, language use will be affected, as well as the efficiency of the monitoring process and the number and type of errors noticed by the speaker. Also, according to our research, the speaker’s attention allocated is related to the context and the task, and the speaker may only pay attention to certain errors or disfluencies and ignore others. In cognitive psychology, attention is seen as an internal mechanism that enables control of stimulus selection and regulates behaviour, i.e., discarding some information in order to process important information efficiently. Then there are two modes of information processing in attention theory: controlled processing and automatic processing, which are able to control the correction of college students’ English fingerspelling. However, controlled processing is a kind of processing that requires the application of attention and has a limited capacity, which is also called attentional processing; automatic processing is a kind of processing that is not controlled by human beings, which does not require the application of attention and does not have a certain capacity to control, and once formed, it is difficult to change. When applied to second language learning, it can be understood that learners have a limited capacity to process information. Therefore, to master a new language, one can only learn its sub-skills gradually and then connect each sub-skill in series and finally reach the level of proficiency in the whole skill, which is the process from controlled processing to automatic processing, and finally to the gradual internalisation of the learner’s language knowledge.

Our research shows that the most important reason why L2 monitoring is different from that of the native language is that L2 lexical, syntactic and phonological processing requires attentional capacity, which is limited. In the process of bilingual production, speakers have limited knowledge of the target language and encounter difficulties in extracting appropriate lexical items and syntactic structures, and the encoding of phonological plan components is far from automatic; so, they cannot express their thoughts coherently, which inevitably leads to many speech errors. Therefore, the limited attentional resources of the speakers can only focus on monitoring and self-repairing those linguistic components that may affect the accuracy of information, while preserving those errors that do not affect the accuracy of information. Take the substitutions ‘‘‘And you know, many big companies er make big deals through the Internet. There is a very fashion, is a very fashionable word(s) called commercial business.’’’ for example that occur more frequently in this corpus. This example is part of the candidate’s answer to the examiner’s question about the negative impact of computers. The candidate clearly had difficulty in extracting the word ‘‘‘fashionable’’’ and the concept ‘‘‘commercial business’’’, and his limited attention was focused on monitoring the language components that might have affected the accuracy of the information. The limited attention was focused on the monitoring of linguistic components that might affect the accuracy of the information and then stalled for time by repeating ‘‘‘is a very’’’ to restore the memory to its cognitive resources, finally replacing the wrong word ‘‘‘fashion’’’ with ‘‘‘fashionable’’’, while those errors that did not affect the accuracy of the information, such as ‘‘‘a very fashionable words’’’, were retained. The above example shows that when fixing a complete error of speech, the speaker tends to keep the syntax of the original speech in the repair, repeating a part of the original speech and fixing the part that needs to be replaced. However, when the error is merely incomplete, the speaker often reconstructs it, either by inserting new materials into the original discourse, using an insertion strategy, or by choosing to suspend the flow of discourse online and re-plan the discourse, using a restructuring strategy. These two repair strategies require a high level of information processing and online language processing skills.

Conclusion

English education for college students is of great developmental importance; this study analysed the patterns and characteristics of the finger-back repair and the characteristics of the use of finger-back words among Chinese college students and came to the following conclusions: (1) The frequency of finger-back repair is high, reflecting that college students have a strong sense of communicative interaction and prompt the listener to solve the problem of topic connection through finger-back repair; (2) Students’ excessive use of certain double-word or multi-word repetition gives a ‘‘‘self-centered’’’ (I perspective) perspective and often gives listeners a monotonous and inauthentic feeling, which may be due to the influence of their native language; and (3) The proportion of self-repetition is the highest among the college students, reaching about 75%, and the proportion of reorganisation and insertion strategies is very small, about 23%, which reflects the characteristics of college students’ ability in linguistic information processing and online processing. The present study aims to make an indicative role for the subsequent in-depth investigation of the function of verbal finger-referential repair.

Fig. 1

Self-repairing internal structure model
Self-repairing internal structure model

Fig. 2

Speech production model
Speech production model

Distribution of anaphora in corpus COLSEC

Anaphora Part of speech Frequency Percentage (%)
Function word Pronoun 120 45.2
Preposition 20 9.1
Conjunction 25 11.2
In total   165 65.5
Notional words Noun 4 2.1
Verb 40 19.2
Adjective 9 4.3
Adverb 5 2.2
In total   58 27.8

Tory, S, Thorkelson, et al. A Study of Verbal Error Correction’s Influence upon Freshmen Students of English Conversation Anxiety[J]. Korean Journal of Applied Linguistics, 2001, 17(2):147-155. Tory, S, Thorkelson, A Study of Verbal Error Correction’s Influence upon Freshmen Students of English Conversation Anxiety[J]. Korean Journal of Applied Linguistics, 2001, 17(2):147-155. Search in Google Scholar

She M, Ouyang J. A study on the code-switching in daily conversation of bilingual children[J]. Journal of Liaoning Technical University(Social Science Edition), 2017. She M, Ouyang J. A study on the code-switching in daily conversation of bilingual children[J]. Journal of Liaoning Technical University(Social Science Edition), 2017. Search in Google Scholar

Poesio M, Artstein R. The reliability of anaphoric annotation, reconsidered: taking ambiguity into account[C]//Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky. Association for Computational Linguistics, 2005. Poesio M, Artstein R. The reliability of anaphoric annotation, reconsidered: taking ambiguity into account[C]//Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky. Association for Computational Linguistics, 2005.10.3115/1608829.1608840 Search in Google Scholar

Schegloff E A, Sacks J H. The preference for self-correction in the organization of repair in conversation[J]. Language, 1977, 53(2):361-382. Schegloff E A, Sacks J H. The preference for self-correction in the organization of repair in conversation[J]. Language, 1977, 53(2):361-382.10.1353/lan.1977.0041 Search in Google Scholar

Savova G K, Chapman W W, Zheng J, et al. Anaphoric relations in the clinical narrative: corpus creation.[J]. Journal of the American Medical Informatics Association: JAMIA, 2011, 18(4). Savova G K, Chapman W W, Zheng J, Anaphoric relations in the clinical narrative: corpus creation.[J]. Journal of the American Medical Informatics Association: JAMIA, 2011, 18(4).10.1136/amiajnl-2011-000108312840321459927 Search in Google Scholar

Poesio M, Artstein R. Anaphoric Annotation in the ARRAU Corpus[C]//Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May – 1 June 2008, Marrakech, Morocco. DBLP, 2008. Poesio M, Artstein R. Anaphoric Annotation in the ARRAU Corpus[C]//Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, 26 May – 1 June 2008, Marrakech, Morocco. DBLP, 2008. Search in Google Scholar

Sharma R, Sharma N, Biswas K K. Machine Learning for Detecting Pronominal Anaphora Ambiguity in NL Requirements[C]//Applied Computing & Information Technology/Intl Conf on Computational Science/intelligence & Applied Informatics/Intl Conf on Big Data, Cloud Computing, Data Science & Engineering. IEEE, 2017. Sharma R, Sharma N, Biswas K K. Machine Learning for Detecting Pronominal Anaphora Ambiguity in NL Requirements[C]//Applied Computing & Information Technology/Intl Conf on Computational Science/intelligence & Applied Informatics/Intl Conf on Big Data, Cloud Computing, Data Science & Engineering. IEEE, 2017.10.1109/ACIT-CSII-BCD.2016.043 Search in Google Scholar

Jing S. Teaching Chinese Personal Anaphora to Second Language Learners[J]. Chinese Teaching in the World, 2013. Jing S. Teaching Chinese Personal Anaphora to Second Language Learners[J]. Chinese Teaching in the World, 2013. Search in Google Scholar

Gelbukh A. Editorial[J]. International journal of computational linguistics and applications, 2016, 7(2):7-10. Gelbukh A. Editorial[J]. International journal of computational linguistics and applications, 2016, 7(2):7-10. Search in Google Scholar

Xu T, Zhang D, Zhu L, et al. Topology Optimal Interior Permanent Magnet Machine to Improve the Utilization Ratio of Permanent Magnet[J]. Focus and Natural Language Processing; Volume 3: Discourse, 2015, 33(5):523-532. Xu T, Zhang D, Zhu L, Topology Optimal Interior Permanent Magnet Machine to Improve the Utilization Ratio of Permanent Magnet[J]. Focus and Natural Language Processing; Volume 3: Discourse, 2015, 33(5):523-532. Search in Google Scholar

Takeda T, Akamine S, Nakazawa S, et al. Information Providing System: US, US20100131534 A1[P]. 2010. Takeda T, Akamine S, Nakazawa S, Information Providing System: US, US20100131534 A1[P]. 2010. Search in Google Scholar

Atterer M, Schü H. The effect of corpus size in combining supervised and unsupervised training for disambiguation[C]//Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. Atterer M, Schü H. The effect of corpus size in combining supervised and unsupervised training for disambiguation[C]//Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.10.3115/1273073.1273077 Search in Google Scholar

Wang J. A Primary Contrastive Study of the Forms ofChinese-English Indirect Anaphora[J]. Journal of Sichuan International Studies University, 2005. Wang J. A Primary Contrastive Study of the Forms ofChinese-English Indirect Anaphora[J]. Journal of Sichuan International Studies University, 2005. Search in Google Scholar

Kaur J. Other-correction in next position: The case of lexical replacement in ELF interactions in an academic setting[J]. Journal of Pragmatics, 2020, 169(2):1-12. Kaur J. Other-correction in next position: The case of lexical replacement in ELF interactions in an academic setting[J]. Journal of Pragmatics, 2020, 169(2):1-12.10.1016/j.pragma.2020.06.013 Search in Google Scholar

CW Raymond. Indexing a contrast: The do-construction in English conversation[J]. Journal of Pragmatics, 2017. Raymond CW. Indexing a contrast: The do-construction in English conversation[J]. Journal of Pragmatics, 2017.10.1016/j.pragma.2017.07.004 Search in Google Scholar

Siegal. Longitudinal development of word search sequences in English as a lingua franca interaction. 2016. Siegal. Longitudinal development of word search sequences in English as a lingua franca interaction. 2016. Search in Google Scholar

Pillai S. Error-detection, self-monitoring and self-repair in speech production. Pillai S. Error-detection, self-monitoring and self-repair in speech production. Search in Google Scholar

Nooteboom S G. Speaking and unspeaking: Detection and correction of phonological and lexical errors in spontaneous speech[J]. frontiers in genetics, 1980. Nooteboom S G. Speaking and unspeaking: Detection and correction of phonological and lexical errors in spontaneous speech[J]. frontiers in genetics, 1980. Search in Google Scholar

Nagano R L. Self-repair of Japanese speakers of English: a preliminary comparison with a study by W. J. M. Levelt[J]. bulletin of language science & humanities, 1997. Nagano R L. Self-repair of Japanese speakers of English: a preliminary comparison with a study by W. J. M. Levelt[J]. bulletin of language science & humanities, 1997. Search in Google Scholar

Yaojianpeng, English conversation self repair: a study from the perspective of metacognition (1st Edition), Shanghai: Shanghai Jiaotong University Press,2007. Yaojianpeng, English conversation self repair: a study from the perspective of metacognition (1st Edition), Shanghai: Shanghai Jiaotong University Press, 2007. Search in Google Scholar

Yang Huizhong and Wei Naixing, construction and research of Chinese learners’ oral English Corpus (1st Edition), Shanghai: Shanghai Foreign Language Education Press, 2005. Yang Huizhong and Wei Naixing, construction and research of Chinese learners’ oral English Corpus (1st Edition), Shanghai: Shanghai Foreign Language Education Press, 2005. Search in Google Scholar

Wei Naixing, the initial study of Chinese learners’ oral English corpus, modern foreign languages, No. 2, 2004. Wei Naixing, the initial study of Chinese learners’ oral English corpus, modern foreign languages, No. 2, 2004. Search in Google Scholar

Hieke, A. E. Audio-lectal practice and fluency acquisition. Foreign Language Annals 14/3, 1981. Hieke, A. E. Audio-lectal practice and fluency acquisition. Foreign Language Annals 14/3, 1981.10.1111/j.1944-9720.1981.tb01634.x Search in Google Scholar

Shimannoff, S. B. & J. Brunak. Repairs in planned and unplanned discourse. In E.O.Keenan& T, 1977. Shimannoff, S. B. & Brunak J.. Repairs in planned and unplanned discourse. In E.O.Keenan& T, 1977. Search in Google Scholar

Heikkil M, Mutanen M, Kekkonen M, et al. Morphology reinforces proposed molecular phylogenetic affinities: a revised classification for Gelechioidea (Lepidoptera)[J]. Cladistics, 2014, 30(6). Heikkil M, Mutanen M, Kekkonen M, Morphology reinforces proposed molecular phylogenetic affinities: a revised classification for Gelechioidea (Lepidoptera)[J]. Cladistics, 2014, 30(6).10.1111/cla.1206434794251 Search in Google Scholar

Joachims T. A Statistical Learning Model of Text Classification for Support Vector Machines. ACM, 2001. Joachims T. A Statistical Learning Model of Text Classification for Support Vector Machines. ACM, 2001.10.1145/383952.383974 Search in Google Scholar

Satoh S. [Lecture Notes in Computer Science] Image and Video Technology Volume 10799. High School Statistical Graph Classification Using Hierarchical Model for Intelligent Mathematics Problem Solving[J]. 2018, 10.1007/978-3-319-92753-4(Chapter 8):91-101. Satoh S. [Lecture Notes in Computer Science] Image and Video Technology Volume 10799. High School Statistical Graph Classification Using Hierarchical Model for Intelligent Mathematics Problem Solving[J]. 2018, 10.1007/978-3-319-92753-4(Chapter 8):91-101. Apri DOISearch in Google Scholar

Gavrilov A, Lee Y K, Lee S. Hybrid neural network model based on multi-layer perceptron and adaptive resonance theory.[J]. Springer-Verlag, 2006. Gavrilov A, Lee Y K, Lee S. Hybrid neural network model based on multi-layer perceptron and adaptive resonance theory.[J]. Springer-Verlag, 2006.10.1007/11759966_104 Search in Google Scholar

Wikipedia F. Naive Bayes Classifier[J]. 2016. Wikipedia F. Naive Bayes Classifier[J]. 2016. Search in Google Scholar

Galescu L, Allen J F. Bi-directional Conversion Between Graphemes and Phonemes Using a Joint N-gram Model. 2001. Galescu L, Allen J F. Bi-directional Conversion Between Graphemes and Phonemes Using a Joint N-gram Model. 2001. Search in Google Scholar

Articoli consigliati da Trend MD

Pianifica la tua conferenza remota con Sciendo