I was testing the Japanese Analyzer I introduced in the last post and found it tends to classify unknown words as interjections. On the original documentation of MeCab, the analyzer engine that MECAPI is build on, they say that 'MeCab guesses the part-of-speech when the word is not registered in the dictionry'. So if the sentence to be analyzed contains unknown word or typo, the api may return inaccurate information. I think it's better to simply say that the word is unkown when it's not in the dictionary, but it seems there's no way to change the setting. It's a bit of a shame.
Japanese Analyzer - kynd.info
Japanese Analyzer - kynd.info



Leave a comment