Reinforcement Learning with Tensorflow and Keras
Use Reinforcement Learning applications in games and robotics
Reinforcement Learning algorithms are behind some of the most impressive breakthroughs in Artificial Intelligence. In this course, we will cover the fundamentals of reinforcement learning with an emphasis on their applications in video games and robotics.
What you'll learnand how you can apply it
 How to use Qfunctions to obtain optimal policies for your agent, be it a gameplaying bot or a simulated robot
 Choose optimal behavior in complicated environments
 How to use actorcritic methods to finetune and efficiently train your policy
 How to use Monte Carlo Tree search to speed up the learning process
And you'll be able to:
 Teach a bot to play video games
 Train a simulated robot to perform a task, and then transfer it to a real robot
 Understand stateoftheart methods in artificial intelligence
This training course is for you because...
This session is targeted for anyone with basic software development skills (data scientists, software engineers, amateur programmers, and managers) looking to understand at a high level the main concepts of applied deep learning and artificial intelligence. You will gain valuable practical knowledge and a new perspective on your daytoday challenges.
Prerequisites
Working knowledge of R and/or Python and familiarity with calculus and probability
Recommended preparation:
 Course requirements can be found here: http://www.datastart.eu/index.php/packttraining/
 R Deep Learning Projects is suggested as it contains an introduction to some of the methods we will use in this course. A Github repository with course material is also available: https://github.com/jpmaldonado
About your instructor

Pablo Maldonado is an applied mathematician and data scientist with a taste for software development since his days of programming BASIC on a Tandy 1000. As an academic and business consultant, he spends a great deal of his time building applied artificial intelligence solutions for text analytics, sensor and transactional data, and reinforcement learning. Pablo earned his PhD in applied mathematics (with focus on mathematical game theory) at the Universite Pierre et Marie Curie in Paris, France. Pablo is the founder of Maldonado Consulting which is a technologyagnostic data analytics consultancy based in Prague, Czech Republic, that leverage the latest tools and research to develop custom solutions around like Data Analytics, Mathematical Modelling, and Machine Learning and Artificial Intelligence. Pablo has been an adjunct professor, teaching AI (Reinforcement Learning) and Machine Learning at Czech Technical University in Prague, the oldest technical university in Central Europe. He has coauthored a book “R Deep Learning Projects” published by Packt.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
Day 1:
Introduction to Reinforcement Learning (20 min)
 In this lecture, we will cover some examples and use cases of Reinforcement Learning, both in practice and in (applied) research as well as provide the motivation and fundamentals for the following lectures.
 Where can we find RL in our daily lives?
 Problem definition
 History and motivation of the field. Outline of research frontiers and state of the art.
From Markov chains to Markov Decision Processes (40 min)
 This lecture provides the theoretical foundation to understand the rest of the course. It is important to at least grasp the main ideas, as it will help you make the most out of this course.
 Markov chains: definition and examples
 Markov Decision Processes: definition and examples
 The Dynamic programming principle
 Value and policy iteration
 A stochastic approximation perspective to learning
RL as a blackbox optimization problem (20 min)
 We consider Reinforcement Learning as a “blackbox” optimization problem and apply different approaches to it: cross entropy method and natural evolution strategies. These approaches are conceptually simple yet very powerful and competitive against more sophisticated methods.
 Blackbox algorithms: random search, genetic algorithms, and other heuristics
 The CrossEntropy method
 Natural Evolution strategies
Temporal Difference Methods. QLearning and Sarsa. Eligibility traces (40 min)
 In this session, we will go through the different methods for estimating value functions, which are used later for estimating the optimal behavior.
 Qfunction: definition and examples
 Estimating Qfunctions via QLearning
 Sarsa: Offpolicy methods
 Beyond QLearning: double QLearning, Zap QLearning, and others
 Eligibility traces: forward and backward perspective
Practice: Solving CartPole and MountainCar (60 min)
 We will practice our hardearned knowledge with two fun and challenging tasks. We will compare different algorithms and see “live” their advantages and disadvantages.
 Solve CartPole and MountainCar using different methods
 Instructor support and solutions will be provided in the end
Deep Reinforcement Learning (4050 min)
 Building on the temporal difference methods lecture, we will show how to use stateoftheart methods in artificial intelligence (deep learning) to calculate value functions.
 Function approximation methods
 Linear functions approximation methods
 Using a multilayer perceptron for function approximation
 Experience replay
 Improving baseline Deep QLearning
 Deep CrossEntropy Method: a deep learning approach for blackbox reinforcement learning
Wrap up and lessons learned (1020 min)
 We will summarize the learnings of the first day and suggest how you can apply your newly acquired knowledge in a number of settings.
 More detailed example applications
 Suggestions for projects: build your portfolio
Day 2:
Policy methods (DDPG and TRPO) and ActorCritic methods (60 min)
 In this session, we will consider a new class of algorithms: policy methods. These methods are more suitable for complicated tasks such as robotic arm manipulation, as they do not rely on computing value functions in advance.
 Introduction to Policy Gradients: finite differences
 Use likelihood ratios for improving the quality of the policy
 REINFORCE: a Monte Carlo approach for policy approximation
 Improving policy methods with a baseline: actorcritic methods
Practice: Solving Cliffwalk (30 min)
 We will apply the methods to solve a challenging environment. Although this is a toy problem, it exhibits a number of features characteristic of more complicated setups. Once you can debug your algorithms here, you can apply them to more complicated tasks.
 Implementation of several algorithms to solve a simple environment:
 REINFORCE
 REINFORCE with baseline
 Policy Gradients (different algorithms)
Practice: Solving Pong (30 min)
 In this session, we will tackle the problem of teaching a bot to play the classic game Pong. We will use this as an excuse to practice the policy methods we learned before.
 Video frame preprocessing pipeline: reading and combining video frames
 Implementation of a policy gradient algorithm using Numpy
 Policy gradients using Keras
 Testing different algorithms at once: using OpenAI baselines
Introduction to RL for Robotics (30 min
 In this session, we will provide an overview of how reinforcement learning is used in robotics. This includes roughly two scenarios: teaching a robot to imitate what a human lecturer does (through virtual reality) and teaching a robot how to discover the correct behaviour through trial and error.
 The robotics control problem: definition and challenges
 Imitation learning
 Modelbased methods
Efficient Algorithms for Robotics (30 min)
 Reinforcement Learning can be quite datahungry, and in this session, we will explore methods to reduce the number of simulations needed to obtain highquality policies.
 Imitation Learning algorithms
 Estimating the dynamics of the system via Gaussian Processes
Practice: Humanoid robot control through RL in a simulator (60 min)
 To conclude, we will apply our hardearned knowledge to train a humanoid robot (NAO) to learn from simulations. If you have access to NAO, the optimal policy discovered can be deployed to the real robot.
 Set up your environment: VREP simulator and NAOQi Python development kit
 Using VREP as an OpenAI Gym environment
 Training NAO robot with reinforcement learning
 Where to go from here? Ideas for projects