您现在的位置是：首页 > 其它

当前栏目

DDPG Project「建议收藏」

建议收藏 project DDPG

2023-06-13 09:12:10 时间

大家好，又见面了，我是你们的朋友全栈君。

1. Remember the difference between the DQN and DDPG in the Q function learning is that the Target’s next MAX Q value is estimated by the actor, not the critic itself. (In continuous action space, the critic cannot estimate the MAX Q value without optimization. So the best choice is to use actor directly gives the BEST action.)

The code of 1st pic is wrong:

71: the critic_target network is to output the maximum Q value based on the estimation of actor_target network, so there is no need once more max operation (But in DQN we do need that max operation because in DQN the next Max Q value is directly estimated by critic_target itself (Q value function).)

72. the critic (Q function) in DDPG can directly output the relative input action Q value, so there is not need to gather the action index relative Q value.

74. Because optimizer will accumulate the gradient values. so use optimizer.zero_grad() to clear it.(instead of network.zero_grad)

75. Optimizer should call the step() function for backward the error.

. Do not forget to add the determination of final state: 1- dones.

79. In the actor learning part, the input actions of the critic_local is not the sample action, is the action estimated by actor. (Be careful with that). Also, it should calculate the mean of it. Finally, we want to maximize the performance but the optimizer is used to minimize object, so we have to set the negative sign.

In the soft_update, remember to use the attributes of the data to copy.

发布者：全栈程序员栈长，转载请注明出处：https://javaforall.cn/148618.html原文链接：https://javaforall.cn

猜你喜欢

SQL Server：一种数据库管理系统（sqlserver是什么）
工地车辆未冲洗识别抓拍系统
jquery中对于批量deferred的处理方法
宽字节注入原理分析[通俗易懂]
万能的工具
防范Redis泄露做好预防带来安全（怎么防护redis）
PHP中计算字符串相似度的函数代码
百度网盘：Linux用户也能使用（百度网盘linux）
基于隧道的Kubernetes跨集群通讯
MSSQL快速入门：一个简单的实例（mssql简单实例）
关于”__IO uint32_t” 中的__IO 表达的意思
ORA-25958: join index where clause predicate may only contain column references ORACLE 报错故障修复远程处理
MySQL Error number: MY-011966; Symbol: ER_IB_MSG_141; SQLSTATE: HY000 报错故障修复远程处理
MySQL Error number: 3945; Symbol: ER_REQUIRE_ROW_FORMAT_INVALID_VALUE; SQLSTATE: HY000 报错故障修复远程处理
有效清除Oracle字段中的空格（oracle字段去除空格）
数据库系统的特点_关系数据模型只能表示
Delphi教程推荐
操作Linux控制台串口操作指南（linux控制台串口）
一个简易需要注册的留言版程序
行如何使用SQL Server命令行管理数据库（sqlserver用命令）

zl程序教程

当前栏目

DDPG Project「建议收藏」

相关文章