(beta 公測版本)

粵典數據


簡介:

counts the number of characters in the corpus and return a JSON dictionary
including non-CJK characters

用嚟統計corpus裡面嘅字(唔係詞)嘅使用頻率。

Data License: public domain. Credits to words.hk appreciated.
授權:公有領域。

數據更新日期:2020年2月10日19:20:53

簡介:

Query database for all word representations, and count the number of occurrences, without trying to do segmentation on the article content

用嚟統計資料庫現有嘅詞嘅使用頻率。
請留意,呢個清單只包括粵文庫入面見過嘅詞,唔包括《粵典》有收錄但粵文庫冇出現過嘅詞。

Data License: public domain. Credits to words.hk appreciated.
授權:公有領域。

數據更新日期:2020年2月10日19:20:55

簡介:

Query database for all word jyutpings, and just dump it out by character if it seems valid.

*** Use: just run the analysis on one single article (the "only need any one article" option should be turned on for this), any will do since the article contents do not matter. ***

可以用作{字=>粵拼}嘅數據

Data License: public domain. Credits to words.hk appreciated.
授權:公有領域。

數據更新日期:2020年2月10日19:21:52

簡介:

Contains the wordlist and pronunciations of all entries recorded in the dictionary. Result will be a dictionary (pun not intended) of written characters as keys, and their respective list of possible pronunciations values.

粵典所收錄嘅詞表同佢哋嘅拼音。

Data License: public domain. Credits to words.hk appreciated.
授權:公有領域。

數據更新日期:2020年2月10日19:20:52