O'Reilly logo
live online training icon Live Online training

Machine Learning in Python and Jupyter for Beginners

Unlock the secrets of Machine Learning without losing your mind

Dave Valentine

The field of data science is globally exploding. McKinsey has estimated that in 2018 that there will be a fifty percent shortage of data scientists. We all live in a world transformed by big data and the implications of data science. One of the most exciting applications of data science is machine learning.

While the promise of a general artificial intelligence that can be fed data and predict the future is still far off… there are present-day practical applications for machine learning that provide business value and real results.

Today’s business leaders and data scientists are looking to understand, what, how, where - to apply these machine learning technologies. For many everyday people, even understanding what is possible is a challenge. People struggle to navigate both the jargon and hyperbole of this emerging industry. Every developer should know the basics of data science and machine learning, as our applications become more data-driven.

In this gentle introduction of machine learning for the beginner, join data scientist Dave Valentine for a introduction to the core concepts of machine learning and develop a simple machine learning application using real-world data and real-world techniques. Going from “ideas” to “implementation” will help anyone who has struggled to unlock these mysteries gain an understanding of the mechanisms behind the technology that is rapidly affecting our lives.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • What is data science and what makes a data scientist.
  • What is machine learning and how that fits into data science.
  • Types and examples of machine learning.
  • How to evaluate the success of a machine learning algorithm.
  • Common challenges with machine learning.
  • Basic python programming.
  • Basics of common python modules used in data science (Pandas, NumPy, SciPy, and Scikit-learn)
  • The steps involved in creating a machine learning application.
  • Where to continue your machine learning journey.

And you’ll be able to:

  • Create and run simple python programs; interactively, in stand-alone python files and within Jupyter.
  • Develop Jupyter notebooks to experiment with python and machine learning
  • Importing and exploring data.
  • Pre-processing data for use in a machine learning algorithm.
  • Select a machine learning algorithm
  • Train a machine learning algorithm against a data set
  • Apply the machine learning algorithm you’ve trained
  • Evaluate the machine learning algorithm against new data.

This training course is for you because...

  • You're a beginner who has some technical experience, but not necessarily in data science or in Python
  • You're interested to see the hands-on “mechanics behind” a machine learning implementation

Prerequisites

  • Students will need a system that has a minimum of a Pentium 4 processor (or equivalent)
  • 4 GB of system ram
  • 10 GB hard drive
  • Solid internet connection
  • Students will be installing Anaconda 5.1, which includes Python 3.6 and associated libraries.
  • Students are encouraged to download the software in advance of the course. Installation instructions for those who want to have it installed in advance of the class. This is especially helpful for those with slower computers. Please follow the installation instructions, as there is an example that will not work with a default installation.

Recommended preparation:

This session is designed to be accessible for beginners, who have some basic programming knowledge. Concepts such as arrays, variables, basic data types, and operations are assumed understanding. Students who need a refresher should begin with Learning Python, by Fabrizio Romano.

About your instructor

  • Dave Valentine has worked in the computer industry for 19+ years in some of the largest organizations in the world. He has been teaching courses since 2016, where he enthusiastically communicates complicated topics without jargon, making them accessible to everyday people. As the “Back Yard Data Scientist”, Dave takes that sense of experimental learning into the field of data science, artificial intelligence, and machine learning.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Part 1 Introduction and Setup - 10 mins

  • Begin at the beginning!
  • Introduction – Who is “The Backyard data scientist?”
  • Overview of what we will be covering in the session today and how the course will be delivered.
  • What to download (if you have not done so already)!
  • Introducing the Titanic survivability project that we will be building and where to obtain the data.
  • Platform selections of the course and why.

Part 2 Crash course in Python – 20 mins

  • During the installation, we will get things started by briefly exploring the foundations of Data Science, Machine Learning, and Python. In this section, we will cover the essential parts of the python language that we will be using.

Part 3 Hands on Running Python – 5 mins

  • Python can be run in three different ways. Interactively, in standalone scripts and in Jupyter notebooks. Jupyter is a favorite tool used by data scientists, and we will be using it during the remainder of the course.

Part 4 Foundations of Machine Learning and Data Science – 10 mins

  • A quick introduction (or refresher) to the key concepts of data science and machine learning that we will be using during the rest of the course, including:
  • What is data science and what makes a data scientist?
  • What is machine learning and how does that fit into data science?
  • Types of learning, including supervised, unsupervised Learning, and (reinforcement learning).
  • Examples of machine learning algorithms.
  • How do you evaluate the success of a machine-learning algorithm?
  • Common challenges (overfitting, and underfitting)

Part 5 Python for Data Science and Machine Learning – 20 mins

  • In this part, we will introduce and apply several python modules used in machine learning applications. Including:
  • pandas - the Python Data Analysis Library
  • NumPY – an array implementation
  • SciPy - the fundamental package for scientific computing with Python
  • Scikit-learn – providing simple and efficient tools for data mining, data analysis, and Machine Learning
  • For each module, we will explore what the module does, how it’s useful, and provide examples of how to use them.

Part 6 Sink or Swim? Developing the Machine Learning Application using Scikit-learn – 30 mins

  • Finally, students will put their newfound knowledge to the test, by implementing a machine learning application to predict survival upon the Titanic. This is a famous data science competition hosted by Kaggle. We’ll introduce Kaggle for students, but students won’t need to join Kaggle for this course.
  • We’ll be using the popular Scikit-learn package in the development of our machine learning application. Scikit provides simple and efficient tools for data mining, data analysis, and machine learning.
  • In developing their application, students will get a taste of the Machine Learning process, including:
  • Asking the right question
  • Identifying, obtaining, and preparing the right data
  • Identifying and applying a Machine Learning Algorithm.
  • Evaluating the performance of the model and adjusting as needed.
  • Using and presenting the model.

Part 7 Conclusion and Next Steps – 5 mins

  • You made it! Thank you!
  • Resources and next steps to continue your learning Journey.

Part 8 Q/A Session – 20 mins

  • A brief period to address questions. Questions that are very detailed may be answered outside of the live session and posted on www.tbdatascientist.com