© Zubair Nabi 2016

Zubair Nabi, Pro Spark Streaming, 10.1007/978-1-4842-1479-4_5

5. Real-Time Route 66: Linking External Data Sources

Zubair Nabi

(1)Lahore, Pakistan

If you want to go somewhere, goto is the best way to get there.

—Ken Thompson

Most data in the wild is dynamic and has a firm lifecycle: it is created, ingested, analyzed, and then culled or put in cold storage. This lifecycle generally has a strict time budget outside of which it is useless. The time budget for streaming data can be on a millisecond scale. Regardless of latency requirements, the first step is invariably transporting the data to a processing platform while perhaps traversing the entire Internet. Any pipelined architecture can only be as fast as its slowest link. For ...

Get Pro Spark Streaming: The Zen of Real-Time Analytics Using Apache Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.