Acerca de este artículo
Publicado en línea: 13 jun 2017
Páginas: 61 - 65
DOI: https://doi.org/10.1515/acss-2017-0008
Palabras clave
© Riga Technical University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
In both Chinese and Dzongkha languages, the greatest challenge is to identify the word boundaries because there are no word delimiters as it is in English and other Western languages. Therefore, preprocessing and word segmentation is the first step in Dzongkha language processing, such as translation, spell-checking, and information retrieval. Research on Chinese word segmentation was conducted long time ago. Therefore, it is relatively mature, but the Dzongkha word segmentation has been less studied by researchers. In the paper, we have investigated this major problem in Dzongkha language processing using a probabilistic approach for selecting valid segments with probability being computed on the basis of the corpus.