Cleaning data at scale
Boosting performance of industrial data science
The internet is full of examples of how to train models. But the reality is that most of the time spent on industrial projects involves working with data. Thus, the largest improvements in performance can often be found through improving the underlying data.
In this hands-on three-hour course, expert Philip Winder teaches you fundamental techniques to improve and make the best use of your data. You'll learn how to impute missing data, clean corrupted data, remove anomalies, and convert features into a suitable format. You'll also discover how and why you should be transforming features and how to generate new features to boost performance.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand:
- Why improving data quality improves results and performance
- The many ways in which data can become corrupt
- Why the type of data affects data cleaning
- Why derived data can be better than the original
And you’ll be able to:
- Determine when and how to clean data
- Spot different types of corruption
- Transform the data to produce better representations of the original
- Clean all types of data: categorical, continuous, time series, etc.
This training course is for you because...
- You're an engineer who has to clean and improve data to remove anomalies (such as for monitoring purposes).
- You're a data scientist who has to clean and improve data to make solutions more robust, more performant, and simpler.
- Familiarity with Python
- A working knowledge of basic statistics
- Watch Introduction to Python (video, 3h 28m)
- Watch Intermediate Python (video, 2h 56m)
- Watch Hands-On Machine Learning with Python (video, 2h 39m)
- Watch Machine Learning with Python (video, 5h 17m)
- Explore the course companion website
- Watch Deploying Spark ML Pipelines in Production on AWS (video, 23m)
- Watch An Introduction to Machine Learning Models in Production (video, 39m)
- Explore Building Machine Learning Pipelines Using Spark, Docker, and AWS (Learning Path, 2h 38m)
- Watch Apache Kafka Series: Kafka Streams for Data Processing (video, 4h 46m)
- Read An Introduction to Apache Flink (book)
- Watch Deploying Machine Learning Models as Microservices Using Docker (video, 24m)
About your instructor
Dr. Philip Winder is a multidisciplinary Engineer who creates data-driven software products. His work incorporates Data Science, Cloud Native and traditional software development using a range of languages and tools.
Phil is the CEO of Winder, a Data Science consultancy in the UK, which operates throughout Europe delivering training, development and consultancy services. He has Ph.D. and a Masters degree in Electronics from the University of Hull, UK.
The timeframes are only estimates and may vary according to how the class is progressing