(Web-service) A note about keyword extraction

| No Comments | No TrackBacks
I've been interested in analyzing and processing text ever since I tested MECAPI, a Japanese text analyzer.

The term extraction service that Yahoo! provides is a handy tool to pick up keywords or words that seem to characterize a given text(I've uploaded a sample to test this api: http://www.kynd.info/library/termextraction/).

According to Tatsuwo-no change log, the formula below can be used to display an index that shows how a word(i) is characteristic to the text(j). To calculate the index, a great amount of sample documents such as, for example, all the documents registered in Yahoo!'s database and the number of hits in search, is needed.


  1. tfi,j is number of occurrences of i in j
  2. dfi is number of documents containing i
  3. N is total number of documents
In short, this means a word is 'characteristic' if it appears in the text for many times, but is rare in the total sample. Tatsuo-no change log also provides a sample code to test this formula using search results in Yahoo!.

No TrackBacks

TrackBack URL: http://www.kynd.info/cp-bin/mt/mt-tb.cgi/33

Leave a comment