wchan757 Cantonese_Word_Segmentation - GitHub This repository offers a Cantonese dictionary specifically designed for enhancing Cantonese word segmentation It allows the integration with the jieba segmentation library, enabling you to use it as a custom dictionary for Cantonese text processing
結巴分詞處理粵語 - ayaka. shn. hk 首先安裝 PyCantonese,然後編寫程式匯出 PyCantonese 詞庫。執行後生成 dict_cantonese txt。 下載最新的結巴分詞大詞庫,存儲為 dict big txt。 編寫程式合併詞庫。執行後生成 merged_dict txt。 使用:
jieba·PyPI jieba “结巴”中文分词:做最好的 Python 中文分词组件 “Jieba” (Chinese for “to stutter”) Chinese text segmentation: built to be the best Python Chinese word segmentation module 完整文档见 README md GitHub: https: github com fxsjy jieba 特点 支持三种分词模式: