Spark provides the ability to run Windows-based operations by aggregating the data for a feature continuously. Here is an example where the KPI average of heat rate is being calculated on a weekly basis over time. Windows-based calculations can be applied to both batch and streaming data to calculate the average:
powerDF.groupBy(window(powerDF.col("Date"),"1 week")).agg(avg("heatrate").as("weekly_average"))+-------------------+-------------------+------------------+|start |end |weekly_average |+-------------------+-------------------+------------------+ |2017-04-13 17:00:00|2017-04-20 17:00:00|208.5694 ||2017-04-20 17:00:00|2017-04-27 17:00:00|203.5780 ||2017-04-27 17:00:00|2017-05-04 17:00:00|299.5316 ||2017-05-04 ...