The bigger picture

It's important to realize that "simply" getting data from one point to another is rarely the extent of your data considerations. Terms such as data lifecycle management have become widely used recently for good reason. Let's briefly look at some things to consider, ideally before you have the data flooding across the system.

Data lifecycle

The main question to be asked in terms of data lifecycle is for how long will the value you gain from data storage be greater than the storage costs. Keeping data forever may seem attractive but the costs of holding more and more data will increase over time. These costs are not just financial; many systems see their performance degrade as volumes increase.

This question isn't—or at least rarely ...

Get Hadoop Beginner's Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.