O'Reilly logo
live online training icon Live Online training

Amazon Web Services (AWS) Data Analytics for Beginners

Configure, deploy, and visualize data using Amazon Redshift

Kim Schmidt

The vast majority of big data use cases deployed in the cloud today run on AWS. Analyzing extensive datasets requires compute capacity that normally exceeds available on-premises processing capabilities. With AWS you can build virtually any big data analytics application, support any workload regardless of the volume, velocity, and variety of data, and extract insights and actionable information from data within minutes rather than in months. The AWS cloud gives every customer the benefit of a data center, with network architecture built to meet the requirements of the most security-sensitive customers. And with AWS Marketplace’s pay-as-you-go cloud model, where applications can scale up and down (horizontally and vertically) based on demand, you can access as many resources as you need, almost instantly, and only pay for what you use.

Join AWS expert Kim Schmidt to learn everything you need to know to collect, store, process, analyze, and visualize big data in the cloud. Using Amazon Redshift as an AWS analytics service example, you’ll explore the configuration, deployment, and typical usage common to almost all of the AWS analytical services so that using any AWS service will feel familiar and intuitive.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • How AWS works and how it is designed for big data analytical processing
  • What types of analytical results are possible on the Amazon cloud
  • The primary configurations and setup requirements common to AWS analytical services
  • Best practices for using AWS Marketplace efficiently: how to make the most of solutions that are preconfigured for one-click deployment, decreasing the time it takes to plan, forecast, and make decisions on which vendor solutions to use

And you’ll be able to:

  • Set up and perform configurations for working with Redshift and other AWS analytics services
  • Utilize Amazon Redshift for analysis and visualization
  • Navigate through the AWS console wizards to set up an analytical service and connect to third-party IDE tools

This training course is for you because...

  • You are a developer, business analyst, or IT decision maker who understands the value that analytics provide and wants to learn how to quickly configure and set up AWS analytics services—particularly Amazon Redshift

Prerequisites

  • A working knowledge of databases, data types, querying languages, and business intelligence solutions
  • A basic understanding of big data analytic frameworks and cloud storage

Materials and downloads needed:

  • An AWS account (Necessary to follow along; do not use a company account because of permission issues and using root credentials.)

Recommended Preparation:

Getting started with Amazon Redshift

Learning AWS

About your instructor

  • Kim Schmidt is the founder of cloud and AWS specialist company DataLeader, an end-to-end business consultancy specializing in AWS, data, business intelligence, and advanced analytics. Kim also does contract work with AWS Marketplace and top independent software vendors. Kim has worked in the technology field for over 10 years for companies including Microsoft, Dun & Bradstreet, and Amazon Web Services in roles ranging from software engineer to positions in marketing and management for an augmented reality company. She holds a number of industry certifications and awards.

    Kim has extensive experience as a volunteer in the technology arena: notably, she has helped developers remain current on state-of-the-art software and taught through the organization Teaching Kids Programming. Her session at AWS re:Invent 2015, presented with Lynn Langit, was voted the best big data session of the year. Kim blogs at AWSkimschmidt.com and Kimschmidtsbrain.com. She also creates AWS technical videos, which she hosts on her YouTube channel.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (4 minutes)

  • Welcome
  • What you should know before participating in this course
  • What we won’t be covering

Amazon Web Services (AWS): The preeminent data analytics cloud (25–27 minutes)

  • Amazon Web Services as a business
  • AWS data analytics services advantages overview and examples
  • AWS Marketplace data analytics services advantages overview and examples
  • AWS data stores for data analytics
  • AWS advantages in data analytics quiz
  • Q&A

Amazon Redshift commonalities with other AWS analytics services (13 minutes)

  • Why this is beneficial: Common configurations, settings, and setup
  • Connecting: Drivers and tools
  • Identity and access management (IAM)
  • The importance of AWS regions
  • Amazon EC2 virtual machines (compute)
  • Network configurations and settings
  • Database configurations, settings, and setup
  • Costs and cost optimizations
  • AWS best practices: Tagging and CloudWatch monitoring
  • Service interface familiarity
  • Starting, stopping, backing up, and terminating running instances

Amazon Redshift data warehouse overview (20 minutes)

  • Amazon Redshift explained
  • Amazon Redshift versus traditional data warehousing
  • Amazon Redshift’s benefits, features, and ideal usage patterns
  • Amazon Redshift overview quiz
  • Amazon Redshift overview Q&A

Launching an Amazon Redshift cluster (25–28 minutes)

  • Install SQL client drivers and tools
  • Overview of popular third-party tools used to connect
  • Using CLIs and APIs to connect
  • Create an IAM role and authentication keys
  • Determine firewall rules
  • Launching an Amazon Redshift cluster
  • Understanding the Redshift console interface
  • Launching a Redshift cluster quiz
  • Launching a Redshift cluster Q&A

Break (10 minutes)

Connecting to the cluster (16–18 minutes)

  • Configuring access to enable you to connect to the cluster
  • Finding the connection string
  • Find the Redshift driver on your machine
  • Connecting to the cluster
  • Connecting to the cluster quiz
  • Connecting to the cluster Q&A

Loading and querying data in the cluster (23–25 minutes)

  • Amazon S3: The most common repository for preload Redshift data
  • Using a third-party tool to create Redshift tables
  • Load data from S3 to Redshift tables
  • Query the data
  • Reviewing executed queries in the AWS console
  • Loading and querying data quiz
  • Loading and querying data Q&A

Amazon Redshift partners and their solutions (6 minutes)

  • Why use a third-party solution in AWS Marketplace
  • The BI and Analytics Marketplace page

Business intelligence and data visualization using Tableau (22 minutes)

  • Tableau’s analysis and visualization overview
  • Types of analytics Tableau supports
  • AWS Marketplace interface for launching preconfigured software solutions overview
  • Visualizing data in Redshift using Tableau
  • Visualizing data with Tableau quiz
    Visualizing data with Tableau Q&A

Other BI, data visualizations, and advanced analytics possible on Amazon Redshift (5–6 minutes)

  • Looker BI and data visualization
  • MicroStrategy BI and data visualization

Matillion ETL/ELT for Amazon Redshift overview (6–8 minutes)

  • How Matillion takes data orchestrations and transformations to another level
  • Watch data flow in real time

Other AWS Analytic Services (5 minutes)
- Overview of the other AWS analytics services and use cases

Conclusion: Where to go from here (3 minutes)