Managing and serializing data

Having a filesystem is all well and good, but we also need mechanisms to represent data and store it on the filesystems. We will explore some of these mechanisms now.

The Writable interface

It is useful, to us as developers, if we can manipulate higher-level data types and have Hadoop look after the processes required to serialize them into bytes to write to a file system and reconstruct from a stream of bytes when it is read from the file system.

The org.apache.hadoop.io package contains the Writable interface, which provides this mechanism and is specified as follows:

   public interface Writable
   {
   void write(DataOutput out) throws IOException ;
   void readFields(DataInput in) throws IOException ;
   }

The main purpose of ...

Get Learning Hadoop 2 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.