Performing context Ngram in Hive

Ngrams are sequences that are collected from specific sets of words and are based on their occurrence in a given text. N-grams are generally used to find the occurrence of certain words in a sequence, which helps in the calculation of sentiment analysis. Hive provides built-in support for Ngram calculations by providing a function. In this recipe, we will take a look at how to use this function in order to analyze text data.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.

How to do it...

N-gram can be used to find the most frequently used word after a sequence of words in a give text dataset. To do this, ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop: Data Processing and Modelling by Garry Turkington, Tanmay Deshpande, Sandeep Karanth

Performing context Ngram in Hive

Getting ready

How to do it...

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly