課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 伊伊人成亚洲综合人网7777| 久久综合给久久狠狠97色| 亚洲国产日韩综合久久精品| 一本色道久久99一综合| 国产精品九九久久精品女同亚洲欧美日韩综合区 | 狠狠色综合网站久久久久久久| 狠狠做深爱婷婷综合一区| 一本色道久久88—综合亚洲精品| 亚洲色偷偷偷鲁综合| 一本综合久久国产二区| 亚洲AV综合色一区二区三区| 久久婷婷五月综合成人D啪| 久久涩综合| 国产日韩欧美综合| 久久香综合精品久久伊人| 一本一本久久aa综合精品| 一个色综合久久| 狠狠色噜噜狠狠狠狠狠色综合久久| 亚洲综合日韩精品欧美综合区| 国产综合精品一区二区三区| 狠狠久久综合| 狠狠狠色丁香婷婷综合久久五月| 欧美伊人久久大香线蕉综合69| 色老头综合免费视频| 狠狠色狠狠色综合日日五| 欧美韩国精品另类综合| 伊人色综合久久天天人守人婷| 亚洲国产成人久久综合一 | 婷婷色香五月激情综合2020| 五月六月综合欧美网站| 久久久久青草线蕉综合超碰| 色综合久久最新中文字幕| 亚洲欧美日韩综合一区二区| 狠狠色丁香久久婷婷综合图片| 精品国产国产综合精品| 午夜激情影院综合| 狠狠色噜狠狠狠狠色综合久| 亚洲狠狠婷婷综合久久蜜芽| 欧美日韩国产码高清综合人成| 7国产欧美日韩综合天堂中文久久久久 | 精品国产综合成人亚洲区|