Research on Corpus-Based Linguistic Feature Analysis and Pattern Recognition in English Majors in Colleges and Universities

In this paper, five kinds of corpus feature analysis indexes such as density, MI value and Z value are proposed, and six texts of the corpus are linguistically characterized by WordSmith Tools 6.0. After that, the convolutional neural network (Le Net-5) is introduced and then applied to the corpus recognition analysis, and the linguistic feature analysis recognition model based on shallow convolutional neural network is constructed sequentially to analyze and recognize the corpus texts. The results show that in absolute co-occurrence frequency, the to manage marker code is the highest in single and objects, and the to lack and to go very raidly marker codes are the highest in less than objects. The 3 identifiers had the highest relative co-occurrence frequency (between 0.6667-0.9571) in single and objects. Among the morphological forms, their relative co-occurrence frequency was highest in the past participle, general present tense and past tense, respectively, between 0.2500-0.3770. The sum of the densities of the five lexicmes in the texts of the four examined corpora was 60.89%, 61.45%, 60.88%, and 58.75%, respectively. In the two reference corpora, it was 52.07% and 44.76%, respectively. A search of the four texts yielded that the high-frequency word in the first place was the, with a frequency and corresponding criticality between 7905-4940 and 341.05-67.03, respectively. Verb recognition found that the sum of is and are accounted for 61.65%, 31.70%, 43.08%, and 54.43% of the sentences in the four texts, respectively.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

Research on Corpus-Based Linguistic Feature Analysis and Pattern Recognition in English Majors in Colleges and Universities

Yuemei Fu

Shichao Ying

Data publikacji: 19 mar 2025

Otrzymano: 28 paź 2024

Przyjęty: 31 sty 2025

DOI: https://doi.org/10.2478/amns-2025-0474

Słowa kluczoweCorpus, Shallow convolutional neural network, Co-occurrence frequency, Linguistic features

© 2025 Yuemei Fu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
Corpus, Shallow convolutional neural network, Co-occurrence frequency, Linguistic features