課程目錄: 基于樣本的學(xué)習(xí)方法培訓(xùn)
4401 人關(guān)注
(78637/99817)
課程大綱:

    基于樣本的學(xué)習(xí)方法培訓(xùn)

 

 

 

Welcome to the Course!
Welcome to the second course in the Reinforcement Learning Specialization:
Sample-Based Learning Methods, brought to you by the University of Alberta,
Onlea, and Coursera.
In this pre-course module, you'll be introduced to your instructors,
and get a flavour of what the course has in store for you.
Make sure to introduce yourself to your classmates in the "Meet and Greet" section!
Monte Carlo Methods for Prediction & Control
This week you will learn how to estimate value functions and optimal policies,
using only sampled experience from the environment.
This module represents our first step toward incremental learning methods
that learn from the agent’s own interaction with the world,
rather than a model of the world.
You will learn about on-policy and off-policy methods for prediction
and control, using Monte Carlo methods---methods that use sampled returns.
You will also be reintroduced to the exploration problem,
but more generally in RL, beyond bandits.
Temporal Difference Learning Methods for Prediction
This week, you will learn about one of the most fundamental concepts in reinforcement learning:
temporal difference (TD) learning.
TD learning combines some of the features of both Monte Carlo and Dynamic Programming (DP) methods.
TD methods are similar to Monte Carlo methods in that they can learn from the agent’s interaction with the world,
and do not require knowledge of the model.
TD methods are similar to DP methods in that they bootstrap,
and thus can learn online---no waiting until the end of an episode.
You will see how TD can learn more efficiently than Monte Carlo, due to bootstrapping.
For this module, we first focus on TD for prediction, and discuss TD for control in the next module.
This week, you will implement TD to estimate the value function for a fixed policy, in a simulated domain.
Temporal Difference Learning Methods for ControlThis week,
you will learn about using temporal difference learning for control,
as a generalized policy iteration strategy.
You will see three different algorithms based on bootstrapping and Bellman equations for control: Sarsa,
Q-learning and Expected Sarsa. You will see some of the differences between
the methods for on-policy and off-policy control, and that Expected Sarsa is a unified algorithm for both.
You will implement Expected Sarsa and Q-learning, on Cliff World.
Planning, Learning & ActingUp until now,
you might think that learning with and without a model are two distinct,
and in some ways, competing strategies: planning with
Dynamic Programming verses sample-based learning via TD methods.
This week we unify these two strategies with the Dyna architecture.
You will learn how to estimate the model from data and then use this model
to generate hypothetical experience (a bit like dreaming)
to dramatically improve sample efficiency compared to sample-based methods like Q-learning.
In addition, you will learn how to design learning systems that are robust to inaccurate models.

主站蜘蛛池模板: 国产综合色在线视频区| 色天使久久综合网天天| 中文字幕乱码人妻综合二区三区| 久久综合久久综合久久综合| 狠狠色综合网站久久久久久久高清| 色婷婷综合缴情综免费观看 | 国产激情电影综合在线看 | 国产精品亚洲综合一区| 中文字幕亚洲综合久久菠萝蜜| 日韩欧国产精品一区综合无码| 三级韩国一区久久二区综合| 国产精品日韩欧美久久综合| 国产欧美日韩综合一区在线播放| 久久婷婷五月综合色奶水99啪| 久久综合偷偷噜噜噜色| 久久综合成人网| 国产成+人欧美+综合在线观看 | 国产亚洲综合成人91精品| 狠色狠色狠狠色综合久久| 国产成人综合久久精品尤物| 亚洲色欲久久久综合网东京热 | 久久综合狠狠综合久久97色| 久久综合狠狠综合久久激情 | 激情五月激情综合网| 亚洲综合精品一二三区在线| 色婷婷综合在线| 亚洲欧洲国产成人综合在线观看| 香蕉综合在线视频91| 亚洲 欧洲 日韩 综合在线| 亚洲欧美精品综合中文字幕| 国产成人综合日韩精品无码不卡| 2020久久精品亚洲热综合一本| 狠狠色噜噜狠狠狠狠狠色综合久久| 亚洲综合自拍成人| 婷婷综合缴情亚洲狠狠图片| 色欲综合一区二区三区| 一本色道久久88—综合亚洲精品| 狼狼综合久久久久综合网| 一本一本久久aa综合精品| 综合久久一区二区三区| 国产色综合久久无码有码|