英粵對照表 English index
This data set exposes the index used to search English words in the dictionary. The index is built by mapping the words seen in each entry's English explanation to its Cantonese word. This may be useful for English->Cantonese translation purposes The "English" terms are normalized (to US spelling variant using https://github.com/en-wl/wordlist/blob/master/varcon/README ). If they are prefixed with "!" it means that they are stemmed with PorterStemmer (see http://www.tartarus.org/~martin/PorterStemmer for implementations). The score number in each entry is the estimate of how important the English term is for the definition of the Cantonese word using some form of tf–idf. The formula is #magic and probably will change anyways. The range of score values is 0-100, but we have limited the dataset to >40 to reduce noise. Data License: public domain. Credits to words.hk appreciated.