Chinese Natural Language Processing: From Text Categorization to Machine Translation

The level and volume of automatic computerized processing of linguistic information has become one of the most important criteria for measuring whether a country has entered the information society. The study begins with statistical linguistics and aims to process complicated Chinese information. In this paper, after establishing the word database of the Chinese language, the language model is smoothed and compressed, the Chinese character information and Chinese language information are extracted, and the processing of Chinese grammar and Chinese semantic information is emphasized. Among them, Chinese grammar processing includes Chinese word analysis and basic phrase analysis based on the maximum entropy model, and Chinese semantic processing includes Bayesian-based word sense disambiguation, semantic role labeling based on the conditional random field model, and thesaurus-based semantic similarity calculation method. In addition, SECTILE-based Chinese text categorization and statistical linguistics-based machine translation methods are explored and tested for their effectiveness in Chinese natural language processing. The results show that the total average check accuracy and check the completeness of Chinese text are 78.65% and 72.24%, respectively, and the BLEU values of the translation methods are improved by [1.62,3.73] and [0.93,5.01] compared with the Baseline method, which is able to process the Chinese information accurately. The research plays an important role in the process of information processing based on Chinese language processing.

eISSN:: 2444-8656
Język:: Angielski

Częstotliwość wydawania:: Volume Open
Dziedziny czasopisma:: Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics

Kanał RSS czasopisma

Chinese Natural Language Processing: From Text Categorization to Machine Translation

Data publikacji: 05 sie 2024

Zakres stron: -

Otrzymano: 12 kwi 2024

Przyjęty: 06 lip 2024

DOI: https://doi.org/10.2478/amns-2024-1860

Słowa kluczoweStatistical linguistics, Chinese language processing, Maximum entropy model, Conditional random field model, Text categorization, Machine translation

© 2024 Haitao Peng., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Statistical linguistics, Chinese language processing, Maximum entropy model, Conditional random field model, Text categorization, Machine translation