O'Reilly logo
live online training icon Live Online training

TensorFlow Extended: Data Validation and Transform

enter image description here

Build an end-to-end machine learning pipeline with TFX

Armen Donigian

Companies are looking for ways to incorporate machine learning into their business to lower costs and increase revenue. But machine learning models are only as good as their training data, which is often generated by ad hoc pipelines involving multiple products, systems, and usage logs. Code bugs, system failures, or human errors can occur at multiple points of this generation process. As a result, understanding the data and finding any anomalies early is critical for preventing data errors downstream. As a machine learning platform scales to larger data and runs continuously, there's a strong need for a reusable component that enables rigorous checks for data quality and promotes best practices for data management.

Join expert Armen Donigian to explore the crucial skills of data analysis, transformation, and validation. Over three hours, you'll gain hands-on practical experience designing and transforming features, experimenting, and analyzing, serving, and profiling machine learning models using the recently open-sourced TensorFlow Extended (TFX), which allows you to leverage the state-of-the-art technology that powers most of Google’s ML systems to solve your particular business or scientific problems.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • The problems TFX can help you solve
  • How to integrate various parts of the Tensorflow ecosystem together
  • How to validate your data using TensorFlow Data Validation
  • How to transform & process features using TensorFlow Transform

And you’ll be able to:

  • Apply the same design and implementation principles from the technology that made Google successful to your specific project
  • Develop an end-to-end machine learning pipeline for supervised learning projects using TensforFlow Extended
  • Get hands-on experience with an end-to-end example integrating various parts of TensorFlow and make it part of your workflow

This training course is for you because...

  • You're a data scientist, business analyst, or machine learning engineer who needs to leverage machine learning to solve a specific business problem.
  • You want to learn how to build an end-to-end machine learning pipeline and release it into production.

Prerequisites

  • Experience with an object-oriented programming language, such as Python (All code demos during the training will be in Python.)
  • A working knowledge of TensorFlow or another machine learning framework, such as scikit-learn or PyTorch (useful but not required)

Recommended preparation

Recommended follow-up:

About your instructor

  • Armen Donigian has undergraduate and graduate degrees in Computer Science from UCLA and USC. He started his career building tracking & navigation algorithms at Orincon (later acquired by Lockheed Martin). Armen then joined the Global Differential GPS group at Jet Propulsion Laboratory (NASA), performing clock and orbit corrections using GPS/GLONASS satellites, which were also used for testing of Mars Science Laboratory Curiosity Rover.

    Bitten by the startup bug, Armen has helped several startups build data driven products and scale infrastructure as a Senior Data & Machine Learning Engineer. Armen has previously led the development of machine learning explainability methods & currently works as the Head of Personalization & Recommender Systems at Honey Science.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

An overview of the problems TFX can help you solve (30 minutes)

  • Lecture: Problem statements; common vocabulary; context for rest of the course
  • Hands-on exercise: Knowledge check

Case study: An end-to-end example integrating various parts of the TensorFlow ecosystem together (50 minutes)

  • Lecture: Exploring an end-to-end notebook demonstrating TFX data analysis, transform, and validate
  • Hands-on exercise: Knowledge check

Break (10 minutes)

How to validate your data using TensorFlow Data Validation (45 minutes)

  • Lecture: Computing descriptive data statistics; inferring a schema; schema environments and why we need them; check evaluation data for errors; check data drift and skew; data monitoring at scale
  • Hands-on exercise: Knowledge check

Break (5 minutes)

How to transform & process features using TensorFlow Transform (40 mins)

  • Lecture: How to define a preprocessing function (a logical description of the pipeline that transforms the raw data into the data used to train a machine learning model); the Apache Beam implementation used to transform data by converting the preprocessing function into a Beam pipeline; how to define data formats and schema; integrating with TensorFlow Training
  • How to define a preprocessing function (a logical description of the pipeline that transforms the raw data into the data used to train a machine learning model).
  • Hands-on exercise: Knowledge check