Open Access

Chinese Language Word Embeddings Based on the Corpus Hanku

Journal of Linguistics/Jazykovedný casopis's Cover Image
Journal of Linguistics/Jazykovedný casopis
Building Web corpora as sources for linguistic research and its applications

Cite

Vector models based on word embeddings are an indispensable part of advanced Natural Language Processing research and language analysis. We describe several Chinese language (Pǔtōnghuà) word embeddings, the differences from “western” language models caused by specific orthographic and linguistic features of the written Chinese language, and introduce a publicly available web interface for querying the vector models, aimed at linguistically or pedagogically oriented users.

eISSN:
1338-4287
Language:
English
Publication timeframe:
2 times per year
Journal Subjects:
Linguistics and Semiotics, Theoretical Frameworks and Disciplines, Linguistics, other