Windows-based calculations

Spark provides the ability to run Windows-based operations by aggregating the data for a feature continuously. Here is an example where the KPI average of heat rate is being calculated on a weekly basis over time. Windows-based calculations can be applied to both batch and streaming data to calculate the average:

powerDF.groupBy(window(powerDF.col("Date"),"1 week")).agg(avg("heatrate").as("weekly_average"))+-------------------+-------------------+------------------+|start |end |weekly_average |+-------------------+-------------------+------------------+ |2017-04-13 17:00:00|2017-04-20 17:00:00|208.5694 ||2017-04-20 17:00:00|2017-04-27 17:00:00|203.5780 ||2017-04-27 17:00:00|2017-05-04 17:00:00|299.5316 ||2017-05-04 ...

Get Industrial Internet Application Development now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.