In this section, we will extend our Chicago crime use case to design and code the different layers of Lambda Architecture in Spark.
Let's extend our Chicago crime dataset and assume that the Chicago crime data is delivered in near real-time. Next, our custom consumers will consume the data and will need to find out the number of crimes for each crime category. Though, in most cases, users will require the grouping of data only for the chunk of data received in near real-time, but, in a few use cases, aggregations need to be done on historical data.
Seems like a Lambda use case, doesn't it?
Let's first analyze the complete architecture with all of its components, and then we will describe, code, and execute each ...