O'Reilly logo

Storm Real-time Processing Cookbook by Quinton Anderson

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Calculating the relative importance of each term

The true power of Trident is demonstrated in this recipe, with many of the abstractions used in order to calculate the TF-IDF value. Before the recipe is presented, it is important to understand the simple math behind TF-IDF. We will need the following components to calculate the TF-IDF:

  • tf(t,d): This component specifies the term frequency, that is, the number of times a given term (t) appears in a given document (d)
  • df(t): This component specifies the document frequency, that is, how frequently a given term (t) appears across all documents
  • D: This component specifies the document count, that is, the total number of documents

There are many ways to calculate the term frequency; for this recipe, we will ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required