您现在的位置是：首页 > 其他

当前栏目

无人机+强化学习开源项目、工具包汇总

算法项目开源学习

2023-04-18 16:40:01 时间

写在最前：科研小废物一枚，在搞强化学习+无人机，以下内容均为我的导师在github上给我找到的开源项目/工具包，仅供参考。P.s：目前只是做一个汇总，并没有按个安装与尝试，DDDD。如果有类似研究方向的老铁请务必留言，交流学习。

1.引导式策略搜索

此代码是引导策略搜索算法和基于 LQG 的轨迹优化的重新实现，旨在帮助其他人理解、重用和构建现有工作。它包括通过 ROS 为 PR2 机器人提供的完整机器人控制器和传感器接口，以及用于 Box2D 和 MuJoCo 中模拟代理的接口。源代码可在GitHub 上获得。

GitHub - cbfinn/gps: Guided Policy Search

相关论文：

Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. JMLR 2016. [pdf]
William Montgomery, Sergey Levine. Guided Policy Search as Approximate Mirror Descent. NIPS 2016. [pdf]
Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel. Learning Deep Neural Network Policies with Continuous Memory States. ICRA 2016. [pdf]
Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel. Deep Spatial Autoencoders for Visuomotor Learning. ICRA 2016. [pdf]
Sergey Levine, Nolan Wagener, Pieter Abbeel. Learning Contact-Rich Manipulation Skills with Guided Policy Search. ICRA 2015. [pdf]
Sergey Levine, Pieter Abbeel. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics. NIPS 2014. [pdf]

2.Minimal Policy Search是Matlab的工具箱，提供RL算法的实现。该存储库最初专注于策略搜索（因此得名），尤其是 REPS 和策略梯度，但现在它包含多种算法（PPO、TRPO、DQN、DPG、FQI ......）。它还具有多目标 RL 算法、基准 MDP 和优化问题，以及常见的策略类。

GitHub - sparisi/mips: Minimal Policy Search Toolbox

3.Pilco软件包V0.9：该包实现了 PILCO RL 策略搜索框架。学习框架可以应用于具有连续状态和控制/动作的 MDP，并且基于动态的概率建模和用于政策评估和改进的近似贝叶斯推理。

GitHub - ICL-SML/pilco-matlab: PILCO policy search framework (Matlab version)

参考文献：

MP Deisenroth 和 CE Rasmussen：PILCO：A Data-Efficient and Model-based Approach to Policy Search (ICML 2011) MP Deisenroth：Efficient Reinforcement Learning Using Gaussian Processes (KIT Scientific Publishing, 2010)

4.策略搜索的变分推理（Variational Inference by Policy Search）

策略搜索的变分推理 (VIPS) 是一种学习高斯混合模型近似的方法，用于推理（例如采样）的难处理概率密度函数。
VIPS 不需要有关梯度或归一化常数的知识。优化利用来自策略搜索（因此得名）的见解，通过使用信息几何信任区域以受控方式改进近似值，以实现更好的稳定性和探索性。

https://github.com/OlegArenz/VIPS

参考文献：

Arenz, O.; Zhong, M.; Neumann, G. Efficient Gradient-Free Variational Inference using Policy Search. Proceedings of the 35th International Conference on Machine Learning. 2018.

5.深度MPC

策略搜索和深度神经网络的结合有望实现各种决策任务的自动化。模型预测控制 (MPC) 通过利用系统的动态模型并在较短的规划范围内在线解决优化问题，为机器人控制任务提供稳健的解决方案。策略搜索和模型预测控制 (MPC) 是机器人控制的两种不同范式：策略搜索具有使用经验数据自动学习复杂策略的优势，而 MPC 可以使用模型和轨迹优化提供最佳控制性能。一个开放的研究问题是如何利用和结合两种方法的优势。

GitHub - AYUSH-ISHAN/Deep-RL-Policy-Search-for-MPC: This repo is related to UAV Confrontation using Heirarchial MultiAgent Reinforcement Learning

参考文献：

[1.] Y. Song and D. Scaramuzza, "Policy Search for Model Predictive Control with Application to Agile Drone Flight," IEEE Transaction on Robotics (T-RO), 2021.
[2.] Y. Song and D. Scaramuzza, "Learning High-Level Policies for Model Predictive Control," IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, 2020.
[3.] Aravind Venugopal 1 , Elizabeth Bondi 2 , Harshavardhan Kamarthi 3 , Keval Dholakia 1 , Balaraman Ravindran 1 , Milind Tambe 2 Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty
[4.] Baolai Wang , Shengang Li ,Xianzhong Gao ,and Tao Xie UAV Swarm Confrontation Using Hierarchical Multiagent Reinforcement Learning
[5.] Brandon Amos, J. Zico Kolter OptNet: Differentiable Optimization as a Layer in Neural Networks
[6.] Jacopo Panerati (1 and 2), Hehui Zheng (3), SiQi Zhou (1 and 2), James Xu (1), Amanda Prorok (3), Angela P. Schoellig (1 and 2) ((1) University of Toronto Institute for Aerospace Studies, (2) Vector Institute for Artificial Intelligence, (3) University of Cambridge) Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control
[7.] Brandon Amos, Ivan Dario Jimenez Rodriguez, Jacob Sacks, Byron Boots, J. Zico Kolter Differentiable MPC for End-to-end Planning and Control
[8.] Ahmad Taher Azar, Anis Koubaa, Nada Ali Mohamed, Habiba A. Ibrahim, Zahra Fathy Ibrahim, Muhammad Kazim, Adel Ammar, Bilel Benjdira, Alaa M. Khamis, Ibrahim A. Hameed and Gabriella Casalino : Drone Deep Reinforcement Learning: A Review

6.无人机深度强化学习自主避障算法项目

这是一个关于无人机深度强化学习自主避障算法的项目。整个工程包括静态环境避障和动态环境避障。在静态环境下，多智能体强化学习与人工势场算法相结合。在动态环境下，该项目采用了扰动流场算法和单智能体强化学习算法相结合的方式。

GitHub - ZYunfeii/UAV_Obstacle_Avoiding_DRL: This is a project about deep reinforcement learning autonomous obstacle avoidance algorithm for UAV.

7.具有时间注意的深度循环强化学习的无人机避障

GitHub - abhiksingla/UAV_obstacle_avoidance_controller: UAV Obstacle Avoidance using Deep Recurrent Reinforcement Learning with Temporal Attention

该项目中致谢的Atari中的深度循环注意强化学习

GitHub - yilunc2020/Attention-DQN: Deep Recurrent Attention Reinforcement Learning in Atari

8.深度强化学习无人机控制

这是一个基于深度强化学习的无人机控制系统，用 python (Tensorflow/ROS) 和 C++ (ROS) 实现。要对其进行测试，神经网络模型是端到端的，是 A3C 模型的非异步实现（https://arxiv.org/pdf/1602.01783.pdf），因为凉亭模拟器（gazebo）无法并行运行多个副本。训练是基于监督学习任务的预训练权重进行的，因为模拟器非常耗费资源并且训练非常耗时。

GitHub - tobiasfshr/deep-reinforcement-learning-drone-control: A drone control system based on deep reinforcement learning with Tensorflow and ROS

9.复杂行为的分层强化学习（无人机包）

GitHub - MickyDowns/deep-theano-rnn-lstm-car: Pack of Drones: Layered reinforcement learning (Q-learning w/ RNN) for complex "hunt" behaviors

10.学习飞行：具有强化学习的混合无人机的计算控制器设计

GitHub - eanswer/LearningToFly: [SIGGRAPH 2019] Learning to Fly: Computational Controller Design for Hybrid UAVs with Reinforcement Learning

参考文献：

Jie Xu, Tao Du, Michael Foshey, Beichen Li, Bo Zhu, Adriana Schulz, Wojciech Matusik

ACM Transactions on Graphics, 38(4) 42:1-42:12 (SIGGRAPH), 2019

This repository implements the code for the paper Learning to Fly: Computational Controller Design for Hybrid UAVs with Reinforcement Learning (SIGGRAPH 2019).

11.自主无人机深度强化学习的长期规划

在本文中，我们研究了一个基于现实生活中举行的无人机竞赛的长期规划方案。我们在 NeurIPS 2019 上为“无人机游戏：无人机赛车比赛”创建的框架上进行了这项实验。赛车环境是使用微软的 AirSim 无人机赛车实验室创建的。一个强化学习代理，在我们的例子中是一个模拟四旋翼，经过策略近端优化（PPO）算法训练，能够成功地与另一个运行经典路径规划算法的模拟四旋翼竞争。代理观察包括来自 IMU 传感器的数据、通过模拟获得的无人机 GPS 坐标和对手无人机 GPS 信息。在训练期间使用对手无人机 GPS 信息有助于处理复杂的状态空间，作为专家指导，可以实现高效稳定的培训过程。本文中进行的所有实验都可以在我们的 GitHub 存储库中找到并使用代码进行复制

GitHub - ugurkanates/NeurIRS2019DroneChallengeRL: Long-Term Planning with Deep Reinforcement Learning on Autonomous Drones

参考文献：

https://arxiv.org/abs/2007.05694

12.使用Tube的鲁棒模型预测控制

该存储库包括管模型预测控制 (tube-MPC)[1] 以及用 MATLAB 编写的通用模型预测控制 (MPC) 的示例。

GitHub - lucattycord/uav-tube-mpc

13.双引擎卫星固定翼无人机的模型预测控制设计

GitHub - Astik-2002/Design-of-a-model-predictive-control-for-a-twin-engine-micro-fixed-wing-UAV: The following project is an implementation of "Hakan Ülker, Cemal Baykara, Can Özsoy, "Design of MPCs for a fixed wing UAV", Aircraft Engineering and Aerospace Technology, https://doi.org/10.1108/AEAT-08-2015-0198".

参考文献：

The following project is an implementation of "Hakan Ülker, Cemal Baykara, Can Özsoy, "Design of MPCs for a fixed wing UAV", Aircraft Engineering and Aerospace Technology, Design of MPCs for a fixed wing UAV | Emerald Insight".

14.MPC（模型预测控制）+RL

MPC-GPS—强化学习算法控制无人机 - 知乎

模型预测控制（MPC）和基于模型的强化学习（Model-based RL）之间的联系是什么？ - 知乎

猜你喜欢

Jease 2.6发布 Java开源内容框架
EasyCVR对接华为iVS订阅摄像机和用户变更请求接口介绍
JVM调优总结：反思
【技术种草】cdn+轻量服务器+hugo=让博客“云原生”一下
JVM调优总结：调优方法
前端面试【JavaScript】— typeof 是否能正确判断类型？
JVM调优总结：新一代的垃圾回收算法
前端面试【JavaScript】— instanceof 能否判断基本数据类型？
JVM调优总结：典型配置举例
前端面试【JavaScript】— 能不能手动实现一下 instanceof 的功能？
前端面试【JavaScript】— Object.is和=== 有什么区别？
JVM调优总结：分代垃圾回收详述
前端面试【JavaScript】— JS中类型转换有哪几种？
WPF开发入门尝试
前端面试【JavaScript】— == 和 ===有什么区别？
一个Java程序员对2011年的回顾
前端面试【JavaScript】— 对象转原始类型是根据什么流程运行的？
JVM调优总结：垃圾回收面临的问题
直接在代码里面对list集合进行分页
JVM调优总结：基本垃圾回收算法

zl程序教程

当前栏目

无人机+强化学习开源项目、工具包汇总

相关文章