Hands-on Machine Learning with Python: Clustering, Dimension Reduction, and Time Series Analysis
It's common knowledge that when undertaking a machine learning project, most of your time is spent preparing and tweaking your data so that the libraries and algorithms will work on it. But many don't know that you can take advantage of Python's optimized libraries to run your algorithms more quickly.
Join Matt Harrison for an overview of machine learning with Python using Jupyter and pandas—the same tools used throughout industry to prepare data to analyze. You'll review key Jupyter and pandas features, explore dimension reduction to visualize data and reduce datasets, and use clustering to group similar items together and see what features tie them together. Matt also demonstrates how to do time series forecasting with the Prophet library, helping you predict future performance from past observations.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand:
- Basic machine learning tasks
- How to use Python and Jupyter to perform machine learning
And you’ll be able to:
- Use pandas to load and preprocess data
- Run dimension reduction, clustering, and time series analysis
This training course is for you because...
- You are a programmer and would like to see how to use Python for machine learning tasks of clustering, dimension reduction, and time series analysis.
- You are a data scientist with experience in SAS or R and would like an introduction to the Python ecosystem
- Programming experience in any language
- Familiarity with the Python programming language (useful but not required)
Materials or Downloads Needed in Advance:
- A machine with Anaconda and Jupyter installed and set up. (Please try them out to get comfortable before the course.)
Learning the Pandas Library (book)
About your instructor
Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.
The timeframes are only estimates and may vary according to how the class is progressing
Introduction to Jupyter - 20 min
- Explore the functionality you will need to be successful with Jupyter
Common Data Cleaning Operations - 30 min
- Most machine learning algorithms require some data preparation to run. We will cover them here.
Break - 5 min
Dimension Reduction - 35 min
- We will use scikit-learn and Yellowbrick to explore Principal Component Analysis. This is a powerful tool for dimensionality reduction, but also understanding the dataset and for visualization.
Break - 5 min
Clustering - 35 min
- We will look at two clustering techniques to divide data into similar segments. We will use visualization to help determine the appropriate number of divisions.
Break - 5 min
Time series forecasting - 40 min
- The Prophet library from Facebook is a powerful library for extracting trends from timeseries data and forecasting into the future. We will introduce it and use it to predict future events.
Conclusion/QA (10 minutes)