Chapter 2. Careful Simplification

Make things as simple as possible, but not simpler.

Roger Sessions Simplifying Einstein’s quote

“Keep it simple” is becoming the mantra for successful work in the big data sphere, especially for Hadoop-based computing. Every step saved in an architectural design not only saves time (and therefore money), but it also prevents problems down the road. Extra steps leave more chances for operational errors to be introduced. In production, having fewer steps makes it easier to focus effort on steps that are essential, which helps keep big projects operating smoothly. Clean, streamlined architectural design, therefore, is a useful goal.

But choosing the right way to simplify isn’t all that simple—you need to be able to recognize when and how to simplify for best effect. A major skill in doing so is to be able to answer the question, “How good is good?” In other words, sometimes there is a trade-off between simple designs that produce effective results and designs with additional layers of complexity that may be more accurate on the same data. The added complexity may give a slight improvement, but in the end, is this improvement worth the extra cost? A nominally more accurate but considerably more complex system may fail so often that the net result is lower overall performance. A complex system may also be so difficult to implement that it distracts from other tasks with a higher payoff, and that is very expensive.

This is not to say that complexity is ...

Get Practical Machine Learning: Innovations in Recommendation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.