Get full access to Deep Learning Quick Reference and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Training

As you train this agent, you'll notice that the first thing it learns to do is hover the lander, and avoid landing. When the lander finally lands, it receives a very strong reward, either +100 for landing successfully or -100 for crashing. This -100 reward is so strong that the agent would rather incur small penalties for hovering at first. It takes quite a few episodes for our agent to finally get the hint that good landings are better than no landings, because crash landings are so very bad.

It's possible to shape the reward signal to help the agent learn faster, but doing so is outside of the scope of this book. For more information, check out reward shaping.

Because of this extreme negative feedback for crash landings, it will ...

Get Deep Learning Quick Reference now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now