Open Access

Parsing Korean Classical Literature by Integrating Text Mining and Semantic Analysis

 and   
Mar 19, 2025

Cite
Download Cover

The article takes Korean classical literature as the research object, and after web-crawling Korean classical literature works, it utilizes methods such as TF-IDF algorithm and LDA topic modeling for text mining and semantic analysis of Korean classical literature. High-frequency keywords in Korean classical literature texts were extracted and analyzed for co-occurrence. The TF-IDF values and Lift values of the keywords of Korean classical literature are calculated to explore the association between the keywords, and based on which the semantic network of Korean classical literature is visualized. In classical Korean literature, the most frequent keyword is “heaven”, followed by “earth”, with 2909 and 1965 occurrences, respectively. The co-occurrence frequency of these two words is 685, and they appear together 685 times in the same work. The keyword “legend” has the highest TF-IDF value (0.0359) in classical Korean literature. TF-IDF value of the work “Tale of Chun-hyang” (0.0342). The keywords with the highest Lift values in both “Tale of Chun-hyang” and “Tale of Heung-bu” are “legend” with 0.4093 and 0.3246 respectively. The keyword “Samguk Yusa” has the highest Lift value of 0.3458 for “heaven”.

Language:
English