利用行动影响的规律性和部分已知的模型进行离线强化学习
众所周知,离线强化学习--从一批数据中学习策略--是很难的:如果不做强有力的假设,很容易构建反例,使现有算法失败。在这项工作中,我们转而考虑某些现实世界中离线强化学习应该有效的问题的属性:那些行动只对部分状态产生有限影响的问题。我们正式确定并介绍了这个行动影响规律性(AIR)属性。我们进一步提出了一种假设并利用AIR属性的算法,并约束了MDP满足AIR时输出策略的次优性。最后,我们证明我们的算法优于现有的离线强化学习算法,在两个模拟环境中的不同数据收集政策中,该规律性是成立的。
原文题目:Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning
原文:Offline reinforcement learning-learning a policy from a batch of data-is known to be hard: without making strong assumptions, it is easy to construct counterexamples such that existing algorithms fail. In this work, we instead consider a property of certain real world problems where offline reinforcement learning should be effective: those where actions only have limited impact for a part of the state. We formalize and introduce this Action Impact Regularity (AIR) property. We further propose an algorithm that assumes and exploits the AIR property, and bound the suboptimality of the output policy when the MDP satisfies AIR. Finally, we demonstrate that our algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in two simulated environments where the regularity holds.
相关文章
- 好玩的Linux命令
- Spring MVC入门2
- Lombok简介
- Spring MVC入门1
- 理解RESTful架构
- PowerBuilder开发简单计算器
- PowerBuilder窗口设计
- 基于Bmob后台的AutoBank安卓客户端(实习最后两天)
- Jsp+JavaBean+Servlet实现模拟银行账户存取款等功能的网站(实习第4-8天)
- Linux下shell编程实例
- 浅谈C++多态性
- Linux下Shell编程
- CentOS下火狐浏览器安装flash插件以及中文输入法
- 3种浏览器性能测试
- Linux上机笔记(2) vim 下编写C语言
- RSA与AES混合加密算法的实现
- RSA公钥加密—私钥解密&私钥加密—公钥解密&私钥签名—公钥验证签名
- Linux上机笔记(1)
- Github pages搭建网页
- DES加密算法