Other tools used for transformation

The following are the other tools that should be considered when designing a transformation solution for HDInsight based on Data Lake.

Oozie

Oozie allows creation and scheduling of workflows in order to manage and orchestrate Apache Hadoop workloads such as Pig, MapReduce, and Hive programs. Workflows are defined in XML and submitted to the Oozie orchestration engine, which executes on the HDInsight cluster. Oozie workflows can be monitored using the command line, web interface, or PowerShell.

Spark

Spark is an open source processing engine for Hadoop data and designed for speed, ease of use, and sophisticated analytics. It claims to run Hadoop MapReduce 100 times faster in memory and 10 times faster even when running ...

Get HDInsight Essentials - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.