您现在的位置是：首页 > 工具

当前栏目

面向分布式强化学习的经验回放框架——Reverb: A Framework for Experience Replay

学习框架分布式分布式 for 经验 Framework 面向

2023-09-11 14:19:19 时间

论文题目：

Reverb: A Framework for Experience Replay

地址：

https://arxiv.org/pdf/2102.04736.pdf

框架代码地址：

https://github.com/deepmind/reverb

相关文章：

面向分布式强化学习的经验回放框架（使用例子Demo）——Reverb: A Framework for Experience Replay

pip安装方式：（该方式大概率无法成功安装，此时可以参考本文底部的详细安装教程）

pip install dm-reverb

注意事项：

由于该框架是为TensorFlow所设计的，因此该框架的输入和输出变量均为TensorFlow的向量tensor，如果其他深度学习框架需要使用该分布式经验池框架则需要手动将输入和输出的变量转为numpy.array再进行转换，比如pytorch的tensor需要先转为numpy.array，然后再转为tensorflow.tensor 。

reverb框架和TensorFlow框架均为Google内部使用的框架，因此可以参考的使用案例和教程代码都很少，这也是Google的计算框架难以被外界使用的一个原因，对于reverb框架来说，没有比较成熟的教程代码，因此难以使用。

------------------------------------------------------------------

偶然间看到了这个experience replay框架，这个框架可以被看做是公开的工业界使用的面向分布式的经验回放框架，这方面的工作一直较少，可能这样的工作更偏向于工程而不是学术所以导致很少有人在做，即使是那些工业界也少有人在做这方面的工作，但是这样的工作还是蛮有必要的，毕竟算法这东西最后还是要服务于工业界的。

-------------------------------------------------------------------------

介绍一个reverb的函数:

reverb.rate_limiters.SampleToInsertRatio

帮助文档：

SampleToInsertRatio(samples_per_insert: float, min_size_to_sample: int, error_buffer: Union[float, Tuple[float, float]])
|
| Maintains a specified ratio between samples and inserts.
|
| The limiter works in two stages:
|
| Stage 1. Size of table is lt `min_size_to_sample`.
| Stage 2. Size of table is ge `min_size_to_sample`.
|
| During stage 1 the limiter works exactly like MinSize, i.e. it allows
| all insert calls and blocks all sample calls. Note that it is possible to
| transition into stage 1 from stage 2 when items are removed from the table.
|
| During stage 2 the limiter attempts to maintain the ratio
| `samples_per_inserts` between the samples and inserts. This is done by
| measuring the "error" in this ratio, calculated as:
|
| number_of_inserts * samples_per_insert - number_of_samples
|
| If `error_buffer` is a number and this quantity is larger than
| `min_size_to_sample * samples_per_insert + error_buffer` then insert calls
| will be blocked; sampling will be blocked for error less than
| `min_size_to_sample * samples_per_insert - error_buffer`.
|
| If `error_buffer` is a tuple of two numbers then insert calls will block if
| the error is larger than error_buffer[1], and sampling will block if the error
| is less than error_buffer[0].
|
| `error_buffer` exists to avoid unnecessary blocking for a system that is
| more or less in equilibrium.

该函数通过设置：samples_per_insert和error_buffer变量实现对sample和insert操作的权衡，主要思想就是如果sample的过少就阻塞insert操作；如果insert的太少就阻塞sample。

通过判断number_of_inserts * samples_per_insert - number_of_samples的值来判断现在的sample和insert操作的权衡情况，如果该值大于min_size_to_sample * samples_per_insert + error_buffer，那么说明insert的太多了，需要阻塞insert操作，此时sample可以正常继续；如果该值小于min_size_to_sample * samples_per_insert - error_buffer，那么说明此时sample的太多了，此时需要阻塞sample操作，而insert操作可以正常继续。

========================================================

这个框架的安装方法（ubuntu系统环境下）：

强化学习分布式经验回放框架（experience replay）reverb的安装

=====================================================

猜你喜欢

Android 手机的高级终端 Termux 安装、使用
如何建立数据分析的思维框架
产品周报第27期｜会员新增拉黑用户权益；CSDN APP V5.1.0版本发布……
140 混合的推荐机制
Docker Dockerfile使用详解
LinkedHashMap插入无序且链式操作
编译器链接器加载器
js获取当前域名、Url、相对路径和参数以及指定参数
第15周报告1：冒泡排序
多巴胺：谷歌开源新型增强学习框架
测试开发实战项目 | 搭建Pytest接口自动化框架
Hibernate单向多对一对象关系模型映射
剑指offer编程题解法汇总25-复杂链表的复制
超链接标签
Verizon公司数据中心故障导致捷蓝航空公司航班延误
keepalived初探
Python Open3d 完成 ICP 点云配准
通过鼠标的位置获取窗口的类名和窗口名
使用JavaScript+Selenium玩转Web应用自动化测试
Bluetooth（蓝牙）连接过程分析
视频编码ffmpeg
【Java】你应该知道的JDK19新特性

相关主题

latex学习
汇编-学习笔记
(5)Quartz学习