Data data everywhere...
In discussions concerning integration of Hadoop with other systems, it is easy to think of it as a one-to-one pattern. Data comes out of one system, gets processed in Hadoop, and then is passed onto a third.
Things may be like that on day one, but the reality is more often a series of collaborating components with data flows passing back and forth between them. How we build this complex network in a maintainable fashion is the focus of this chapter.
Types of data
For the sake of the discussion, we will categorize data into two broad categories:
- Network traffic, where data is generated by a system and sent across a network connection
- File data, where data is generated by a system and written to files on a filesystem somewhere ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.