DDPG Project「建议收藏」
大家好,又见面了,我是你们的朋友全栈君。
1. Remember the difference between the DQN and DDPG in the Q function learning is that the Target’s next MAX Q value is estimated by the actor, not the critic itself. (In continuous action space, the critic cannot estimate the MAX Q value without optimization. So the best choice is to use actor directly gives the BEST action.)
The code of 1st pic is wrong:
71: the critic_target network is to output the maximum Q value based on the estimation of actor_target network, so there is no need once more max operation (But in DQN we do need that max operation because in DQN the next Max Q value is directly estimated by critic_target itself (Q value function).)
72. the critic (Q function) in DDPG can directly output the relative input action Q value, so there is not need to gather the action index relative Q value.
74. Because optimizer will accumulate the gradient values. so use optimizer.zero_grad() to clear it.(instead of network.zero_grad)
75. Optimizer should call the step() function for backward the error.
. Do not forget to add the determination of final state: 1- dones.
79. In the actor learning part, the input actions of the critic_local is not the sample action, is the action estimated by actor. (Be careful with that). Also, it should calculate the mean of it. Finally, we want to maximize the performance but the optimizer is used to minimize object, so we have to set the negative sign.
In the soft_update, remember to use the attributes of the data to copy.
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/148618.html原文链接:https://javaforall.cn
相关文章
- Java项目毕业设计:基于springboot+vue的电影视频网站系统「建议收藏」
- 微信小程序不在以下 request 合法域名列表中「建议收藏」
- windows环境下,如何在Pycharm下安装TensorFlow环境「建议收藏」
- LoadLibrary failed with error 1114:动态链接库(DLL)初始化例程失败 解决方法「建议收藏」
- Ubuntu 常用解压与压缩命令「建议收藏」
- mysql c preparestatement「建议收藏」
- hostapd 配置「建议收藏」
- 【2020】DBus,一个更能满足企业需求的大数据采集平台「建议收藏」
- java 刷屏器「建议收藏」
- Java和Python有什么区别,初学者学Java还是Python?「建议收藏」
- Ubuntu16.04 环境 Kubeedge安装「建议收藏」
- 使用JAX-WS进行应用程序身份验证「建议收藏」
- 如何定制zencart模板「建议收藏」
- J2me开发大致框架「建议收藏」
- 用LoadRunner开发开心网外挂「建议收藏」
- c语言fread6,c语言中fread的用法「建议收藏」
- setrequestproperty参数_HttpURLConnection的addRequestProperty和setRequestProperty「建议收藏」
- 三极管饱和的判断「建议收藏」
- 用Xshell连接虚拟机Linux「建议收藏」
- STM8S之STVD问题解决47 can’t openfile crtsi0.sm8「建议收藏」
- Oracle 参数 ADG_ACCOUNT_INFO_TRACKING 官方解释,作用,如何配置最优化建议
- Oracle 参数 DIAGNOSTIC_DEST 官方解释,作用,如何配置最优化建议
- Oracle 数据库的统计分析表: 实用性分析与应用建议(oracle统计分析表)