Creating n-grams from a list

An n-gram is a sequence of n items that occur adjacently. For example, in the following sequence of number [1, 2, 5, 3, 2], a possible 3-gram is [5, 3, 2].

n-grams are useful in computing probability tables to predict the next item. In this recipe, we will be creating all possible n-grams from a list of items. A Markov chain can easily be trained by using n-gram computation from this recipe.

How to do it…

  1. Define the n-gram function as follows to produce all possible n-grams from a list:
    ngram :: Int -> [a] -> [[a]]
    ngram n xs 
      | n <= length xs = take n xs : ngram n (drop 1 xs)
      | otherwise = []
  2. Test it out on a sample list as follows:
    main = print $ ngram 3 "hello world"
  3. The printed 3-gram is as follows:
    ["hel","ell","llo","lo ...

Get Haskell Data Analysis Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.