Chapter 8. Platform Validation

After your hardware has been racked and the software installed, it’s time to validate the installation. You want to ensure that the cluster works within a reasonable, expected range of performance and that all the components work well with each other. Validation may include a variety of practices, such as:

Smoke testing

To detect bad hardware and platform misconfigurations. Any setup can have disks that are dead on arrival, memory sticks that aren’t seated right, network adapters that work intermittently, and more. Storage disks, in particular, tend to fail according to the bathtub curve.1 You can use burn-in tests to “smoke” these components out and replace them before demand on the system comes into play.

Baseline testing

To demonstrate or prove degraded performance. For this you need evidence, not expectations. If you exercise your system at regular intervals while you configure the hardware, the operating system, and the Hadoop components, you can correlate recent changes to a change in system efficiency. You can identify regressions (or progressions!) caused by new hardware or software upgrades simply by running regular, repeatable tests for performance.

Stress testing

To ensure that your monitoring, alerting, day-to-day operations, and other triage operations work as you expect. Rehearsing your recovery procedures and playbooks before the system goes into service—without the pressure of production demand or the clamor of angry tenants—is ...

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.