Hadoop and Spark are the stars of the Big Data world. This course covers the basics of Spark and how to use Spark and Hadoop together for big data analytics. Designed for developers, architects, and data analysts with a fundamental understanding of Hadoop, it begins with an overview of how Hadoop and Spark are used in today's big data ecosystem before moving into hands-on labs that demonstrate Spark and Spark-Hadoop integration.
You'll learn about the Spark shell, RDDs, and DataFrames; how to query data in Hadoop Hive Tables from Spark; and how to develop Spark applications and run them on YARN.
- Discover how to integrate the Hadoop and Spark big data analytics platforms
- Get access to 11 hands-on labs demonstrating the core aspects of Hadoop-Spark integration
- Learn the basics of the Spark framework: Spark shell, RDDs and DataFrames
- Explore methods for analyzing data in Hadoop HDFS and Hive using Spark
- Gain an understanding on how to write Spark applications and run them on YARN
Sujee Maniyam is the co-founder of Elephant Scale, a Big Data training company specializing in Hadoop, NoSQL, and data science. An open-source author/developer since 2000, Sujee ran the analytics company CoverCake for five years, founded the Santa Clara Big Data Guru Meet-Up, developed a Hadoop course for Intel, worked as a software engineer for IBM for six years, and is co-author of the O'Reilly title HBase Design Patterns. He earned a Bachelor of Science in Computer Engineering from the University of Melbourne and holds certifications in both Hadoop and Spark.