Cassandra and Hadoop in action

Now, with more than enough (rather boring) theory, we are ready to do something exciting. In this section, we will do a word count of a book. It will be more interesting than the grep example.

In this example, we load Lewis Carroll's novel Alice in Wonderland (http://en.wikipedia.org/wiki/Alice%27s_Adventures_in_Wonderland) in Cassandra. To prepare this data, we read the text file line by line and store 500 lines in one row. The row names are formatted as row_1, row_2, and so on, and the columns in each row have names such as col_1, col_2, and so on. Each row has almost 500 columns, and each column has one line from the file. To avoid noise, we have removed the punctuation from the lines during the load. We could ...

Get Mastering Apache Cassandra - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.