Learning to Explore in Motion and Interaction Tasks
张宁 Learning to Explore in Motion and Interaction Tasks
学习探索运动和互动任务
Miroslav Bogdanovic1 and Ludovic Righetti1,2
https://arxiv.org/abs/1908.03731
Learning to Explore in Motion and Interaction Tasks
Model free reinforcement learning suffers from the high sampling complexity inherent to robotic manipulation or locomotion tasks. Most successful approaches typically use random sampling strategies which leads to slow policy convergence. In this paper we present a novel approach for efficient exploration that leverages previously learned tasks. We exploit the fact that the same system is used across many tasks and build a generative model for exploration based on data from previously solved tasks to improve learning new tasks. The approach also enables continuous learning of improved exploration strategies as novel tasks are learned. Extensive simulations on a robot manipulator performing a variety of motion and contact interaction tasks demonstrate the capabilities of the approach. In particular, our experiments suggest that the exploration strategy can more than double learning speed, especially when rewards are sparse. Moreover, thealgorithmisrobusttotaskvariationsandparametertuning, making it beneficial for complex robotic problems.
自由模型的强化学习受到机器人操纵或运动任务固有的高采样复杂性的困扰。大多数成功的方法通常使用随机抽样策略,这会导致政策收敛缓慢。在本文中,我们提出了一种利用以前学过的任务进行有效探索的新颖方法。我们利用在多个任务中使用同一系统这一事实,并基于先前解决的任务中的数据构建了用于探索的生成模型,以改善对新任务的学习。随着学习新任务,该方法还能够不断学习改进的勘探策略。在执行各种运动和接触交互任务的机器人操纵器上的大量仿真证明了该方法的功能。特别是,我们的实验表明,探索策略可以使学习速度提高一倍以上,尤其是在奖励稀少的情况下。此外,该算法对于任务变化和参数调整具有鲁棒性,使其对于复杂的机器人问题非常有益。
相关文章
- [XState] Use Internal Transitions in XState to Avoid State Exit and Re-Entry
- [Node.js] Write or Append to a File in Node.js with fs.writeFile and fs.writeFileSync
- [Angular] Remove divs to Preserve Style and Layout with ng-container in Angular
- [Nuxt] Build a Vue.js Form then use Vuex Actions to Post to an API in Nuxt
- [Typescript] Using Generic Context to Avoid Distributivity in Conditional Types
- [CSS] Using single grid-template-area and justify-self to select arrow down icon in select
- [Bash] Use jq and grep to Find Unused Dependencies in a Project
- [Yarn] Use yarn up to Update Dependencies In A Yarn Workspace
- [HTML5] Add an SVG Image to a Webpage and Get a Reference to the Internal Elements in JavaScript
- [PWA] Add Push Notifications to a PWA with React in Chrome and on Android
- [Tools] Convert SVG to a PDF in Node with PDFKit and SVG.js
- [AngularJS] Store the entry url and redirect to entry url after Logged in
- [Angular] Remove divs to Preserve Style and Layout with ng-container in Angular
- [Nuxt] Add Arrays of Data to the Vuex Store and Display Them in Vue.js Templates
- [PostgreSQL] Use Foreign Keys to Ensure Data Integrity in Postgres
- [CSS3] Using CSS Combinators to Identify Siblings and Descendants in CSS
- [quacker] Browser Extension to Clean Website ADs in Quasar BEX
- how to extend a SAPUI5 Fiori application on both view and controller in WebIDE
- how to extend a SAPUI5 Fiori application on both view and controller in WebIDE
- How to create and consume web service in CRM
- Unable to install breakpoint in Modify compiler options to generate line number attributes
- very important tip - when to add / in binding path Fiori
- Matlab:成功解决In an assignment A(I)=B,the number of elements in B and I must be the same
- 已解决FutureWarning: The default value of regex will change from True to False in a future version. In
- PyTricks-Differebt ways to test multiple flags at once in