I've been interested in analyzing and processing text ever since I tested MECAPI, a Japanese text analyzer.
The term extraction service that Yahoo! provides is a handy tool to pick up keywords or words that seem to characterize a given text(I've uploaded a sample to test this api: http://www.kynd.info/library/termextraction/).
According to Tatsuwo-no change log, the formula below can be used to display an index that shows how a word(i) is characteristic to the text(j). To calculate the index, a great amount of sample documents such as, for example, all the documents registered in Yahoo!'s database and the number of hits in search, is needed.

The term extraction service that Yahoo! provides is a handy tool to pick up keywords or words that seem to characterize a given text(I've uploaded a sample to test this api: http://www.kynd.info/library/termextraction/).
According to Tatsuwo-no change log, the formula below can be used to display an index that shows how a word(i) is characteristic to the text(j). To calculate the index, a great amount of sample documents such as, for example, all the documents registered in Yahoo!'s database and the number of hits in search, is needed.

- tfi,j is number of occurrences of i in j
- dfi is number of documents containing i
- N is total number of documents



Leave a comment