1 Center for Robotics, MINES ParisTech, PSL University, Paris, France
2 Institute for Computer Science and Control, Hungarian Research Network, Budapest, Hungary
3 CoLocation Center for Academic and Industrial Cooperation, Eötvös Loránd University, Budapest, Hungary
* Corresponding author: Dániel Horváth: daniel.horvath@sztaki.hu
Paper |
Results |
Presentation |
</Code> |
</Citation> |
Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. In this work, we propose:
Fig. 1: The overview of HiER+.
Our contributions were validated on 8 tasks of 3 robotic benchmarks:
For the experiments SAC was utilized as base RL algorithm. For further details we refer the reader to the article.
Our experimental results show that the HiER versions significantly outperform their correspondiong baselines. Fig 2 shows the aggregated results across the 8 tasks in all metrics.
Fig. 2: Aggregated results on all tasks.
The performance profiles depicted on Fig 3 shows that HiER and HiER[HER] have stochastic dominance over their baselines.
Fig. 3: Performance profiles.
Additionally, Fig 4 shows the probability of improvement.
Fig. 4: Probability of improvement.
Our experimental results, presented on Fig 5-8 and in Tab. 1-3, show that the HiER significantly outperform the baselines. Additionally, HiER+ further improves the performance of HiER. E2H-ISE alone slightly improves the performance of the baselines.
Fig. 5: The push, slide, and pick-and-place tasks of the Panda-Gym benchmark with the learning curves of selected algorithms.
Fig. 6: Aggregated results on the push, slide, and pick-and-place tasks of the Panda-Gym benchmark.
Fig. 7: Performance profiles on the push, slide, and pick-and-place tasks of the Panda-Gym benchmark.
Fig. 8: Probability of improvement on the push, slide, and pick-and-place tasks of the Panda-Gym benchmark.
Tab. 1: HiER and HiER+ compared to the state-of-the-art based on success rates on the Panda-Gym robotic benchmark.
Tab. 2: HiER and HiER+ compared to the state-of-the-art based on success rates on the Panda-Gym robotic benchmark.
Tab. 3: HiER and HiER+ compared to the state-of-the-art based on accumulated reward on the Panda-Gym robotic benchmark.
Our experiment results are depicted in Fig 9-10. In all cases, a version of HiER yields the best score.
Fig. 9: The push, slide, and pick-and-place tasks of the Gymnasium-Robotics Fetch benchmark with the learning curves of selected algorithms.
Fig. 10: Aggregated results on the push, slide, and pick-and-place tasks of the Gymnasium-Robotics Fetch benchmark.
Our experiment setup and the results are depicted in Fig 11-12.
Fig. 11: The tasks of the Gymnasium-Robotics PointMaze benchmark.
Fig. 12: The learning curves of the tasks of the Gymnasium-Robotics PointMaze benchmark.
Our results are depicted in Fig 13-15.
Fig. 13: The analysis of HiER versions.
Fig. 14: The different HiER λ methods on the slide task of the Panda-Gym benchmark.
Fig. 15: The different HiER ξ methods on the slide task of the Panda-Gym benchmark.
Our results are depicted in Fig 16.
Fig. 16: The analysis of E2H-ISE versions on the slide task of the Panda-Gym benchmark.
Our results are depicted in Fig 17.
Fig. 16: HiER+ with DDPG and TD3 on the slide task of the Panda-Gym benchmark.
@article{horvath_hier_2024,
title = {{HiER}: {Highlight} {Experience} {Replay} for {Boosting} {Off}-{Policy} {Reinforcement} {Learning} {Agents}},
volume = {12},
issn = {2169-3536},
shorttitle = {{HiER}},
url = {https://ieeexplore.ieee.org/document/10595054},
doi = {10.1109/ACCESS.2024.3427012},
urldate = {2024-07-26},
journal = {IEEE Access},
author = {Horváth, Dániel and Bujalance Martín, Jesús and Gàbor Erdos, Ferenc and Istenes, Zoltán and Moutarde, Fabien},
year = {2024},
note = {Conference Name: IEEE Access},
keywords = {Training, Robots, robotics, Task analysis, Standards, Reinforcement learning, reinforcement learning, Curriculum learning, Process control, Data collection, experience replay, Random variables, Curriculum development},
pages = {100102--100119},
}