Text Value and Linguistic Characterization in Chinese Language Literature Based on Text Mining Techniques

This study applies text mining techniques to deeply analyze Chinese language and literature’s text value and linguistic features. The study adopts the methods of textual disambiguation, vector space modeling, semantic network and Labeled-LDA model. Taking the novels of Yu Hua and Ge Fei as an example, it reveals the differences between the two writers in linguistic features such as using punctuation, average word length, and sentence discrete degree. The study provides a comprehensive heat score for the novels based on three dimensions: reading base group, reading gain, and reading discussion. The results show that the frequency of period use in Yu Hua’s works is decentralized, while Ge Fei’s works are more concentrated. Ge Fei’s average word length is slightly higher, showing a tendency to use multi-syllabic words. The novel popularity and heat scores conform to a power law distribution, reflecting the Pareto rule that 80% of the popularity is concentrated on 20% of the hot novels. This study provides a new perspective on Chinese language and literature through the application of text mining technology, and its methods and tools can effectively enhance the effectiveness and efficiency of teaching.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

Text Value and Linguistic Characterization in Chinese Language Literature Based on Text Mining Techniques

Min Liu

Shuling Hu

Wenting Qing

Pubblicato online: 26 feb 2024

Ricevuto: 10 gen 2024

Accettato: 16 gen 2024

DOI: https://doi.org/10.2478/amns-2024-0486

Parole chiaveLabeled-LDA, Vector space modeling, Textual disambiguation, Semantic network, Chinese language and literature

© 2024 Min Liu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Labeled-LDA, Vector space modeling, Textual disambiguation, Semantic network, Chinese language and literature