Abstract

We introduce a novel deep reinforcement learning (RL) approach called Movement Primitive- based Planning Policy (MP3). By integrating movement primitives (MPs) into the deep RL framework, MP3 enables the generation of smooth trajectories throughout the whole learning process while effectively learning from sparse and non-Markovian rewards. Additionally, MP3 maintains the capability to adapt to changes in the environment during execution. Although many early successes in robot RL have been achieved by combining RL with MPs, these approaches are often limited to learning single stroke-based motions, lacking the ability to adapt to task variations or adjust motions during execution. Building upon our previous work, which introduced an episode-based RL method for the non-linear adaptation of MP parameters to different task variations, this paper extends the approach to incorporating replanning strategies. This allows adaptation of the MP parameters throughout motion execution, addressing the lack of online motion adaptation in stochastic domains requiring feedback. We compared our approach against state-of-the-art deep RL and RL with MPs methods. The results demonstrated improved performance in sophisticated, sparse reward settings and in domains requiring replanning.

MP3-BB: Black-Box Reinforcement Learning

Instead of generating atomic action for each state, we generate a whole trajectory of actions at once using movement primitives. This enables us to efficiently handle sparse and even non-Markovian rewards. In this setting, reinforcement learning is treated as a black-box optimization problem. We refer this setting as MP3-BB.

Hopper Jump (Maximum Height)

PPO

MP3-PPO-BB

MP3-BB

Box-Pushing (Dense Reward)

PPO

MP3-PPO-BB

MP3-BB

Box-Pushing (Sparse Reward)

PPO

MP3-PPO-BB

MP3-BB

Beerpong

PPO

MP3-PPO-BB

MP3-BB

Table Tennis

PPO

MP3-PPO-BB

MP3-BB

MP3-Replan: Replanning with Movement Primititves

By incorperating dynamic-based movement primitives, such as DMPs (Dynamic Movement Primitives) and ProDMPs (Probabilistic Dynamic Movement Primitives), into our framework, we gain the ability to adapt the motion during online execution. This capability enables us to effectively handle unforeseen changes in the environment, such as shifting target positions or external perturbations. In constrat to the black-box setting MP3-BB, we specifically designate this setting as MP3-Replan.