Otwarty dostęp

Research on Corpus-Based Linguistic Feature Analysis and Pattern Recognition in English Majors in Colleges and Universities

 oraz   
19 mar 2025

Zacytuj
Pobierz okładkę

In this paper, five kinds of corpus feature analysis indexes such as density, MI value and Z value are proposed, and six texts of the corpus are linguistically characterized by WordSmith Tools 6.0. After that, the convolutional neural network (Le Net-5) is introduced and then applied to the corpus recognition analysis, and the linguistic feature analysis recognition model based on shallow convolutional neural network is constructed sequentially to analyze and recognize the corpus texts. The results show that in absolute co-occurrence frequency, the to manage marker code is the highest in single and objects, and the to lack and to go very raidly marker codes are the highest in less than objects. The 3 identifiers had the highest relative co-occurrence frequency (between 0.6667-0.9571) in single and objects. Among the morphological forms, their relative co-occurrence frequency was highest in the past participle, general present tense and past tense, respectively, between 0.2500-0.3770. The sum of the densities of the five lexicmes in the texts of the four examined corpora was 60.89%, 61.45%, 60.88%, and 58.75%, respectively. In the two reference corpora, it was 52.07% and 44.76%, respectively. A search of the four texts yielded that the high-frequency word in the first place was the, with a frequency and corresponding criticality between 7905-4940 and 341.05-67.03, respectively. Verb recognition found that the sum of is and are accounted for 61.65%, 31.70%, 43.08%, and 54.43% of the sentences in the four texts, respectively.

Język:
Angielski
Częstotliwość wydawania:
1 razy w roku
Dziedziny czasopisma:
Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne