課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: 日本一道综合色视频| 亚洲国产欧美国产综合一区 | 欧美亚洲日韩国产综合网| 国产综合久久久久久鬼色| 色综合AV综合无码综合网站| 欧美日韩在线精品一区二区三区激情综合 | 青青草原综合久久大伊人精品| 亚洲av综合avav中文| 亚洲伊人成无码综合网| 色噜噜狠狠狠综合曰曰曰| 狠狠狠色丁香婷婷综合久久五月 | 国产综合在线观看| 狠狠色丁香婷婷久久综合不卡| 人人狠狠综合88综合久久| 天天做天天爱天天爽天天综合| 亚洲va欧美va国产综合| 国产亚洲综合久久系列| 亚洲成A人V欧美综合天堂麻豆 | 亚洲婷婷五月综合狠狠爱| 99久久综合国产精品免费| 亚洲色欲色欲综合网站| 久久99国产综合精品女同| 最新狠狠色狠狠色综合| 亚洲国产天堂久久综合网站| 国产精品欧美亚洲日本综合| 日韩欧美在线综合网| 亚洲综合另类小说色区| 91探花国产综合在线精品| 亚洲第一区欧美国产不卡综合| 91精品国产综合久久香蕉| 伊色综合久久之综合久久| 久久综合久久自在自线精品自| 天天做天天爱天天爽综合区| 国产V综合V亚洲欧美久久 | 伊人久久成人成综合网222| 一97日本道伊人久久综合影院| 亚洲综合成人网| 狠狠色噜噜狠狠狠狠色综合久AV| 狠狠色综合色区| 国产在线五月综合婷婷| 五月天激激婷婷大综合丁香|