Advancing Data-Driven Robotics
with Transfer and Curriculum Learning

written by

Dániel Horváth

Dániel HORVÁTH


Zoltán Istenes

Zoltán ISTENES
Supervisor
ELTE

Ferenc G. Erdős

Ferenc G. ERDŐS
Industrial Supervisor
SZTAKI

Fabien Moutarde

Fabien MOUTARDE
Campus France Supervisor
MINES Paris


Documents

PhD Thesis 🇬🇧

PhD Thesis Booklet 🇬🇧

PhD Tézisfüzet 🇭🇺


Abstract

The deep learning revolution has fundamentally reshaped numerous fields, including robotics. However, as in other fields, certain challenges must be overcome to exploit the power of deep learning algorithms and create truly adaptive intelligent robots. The difficulty lies less in adult-level intelligence than in the skills of perception and mobility, also referred to as Moravec's paradox. In this context, the key issues are transferability and universality. This thesis addresses data-driven robotics, with a focus on transfer and curriculum learning. My main contributions are as follows.

Robots operating in unstructured environments need to effectively sense and interpret their surroundings. A major challenge for deep learning models in the field of robotics is the lack of domain-specific labelled data for various industrial applications. To bridge the reality gap, I developed a sim2real transfer learning method based on domain randomization for object detection (S2R-ObjDet), enabling automatic generation of labelled synthetic data. In addition, I propose the generalised confusion matrix (GCM) which addresses the limitations of the classical precision-recall-based metrics. I also introduce a public and annotated real-world dataset of industrial objects (InO-10-190) for evaluating sim2real object detection methods.

In object manipulation, it is essential to estimate not only object positions but also their poses. Thus, I propose two vision-based, multi-object grasp pose estimation models – the real-time MOGPE-RT and the high-precision MOGPE-HP – as well as the extension of the S2R-ObjDet method to pose estimation (S2R-PosEst). This framework provides an industrial tool for rapid data generation and model training while requiring minimal data from the target distribution.

Reinforcement learning – inspired by human learning – aims to offer a universal solution to various problems. Nevertheless, the field of robotics poses significant challenges. To facilitate the exploration of reinforcement learning robot agents, I propose a data exploitation curriculum learning method, called highlight experience replay (HiER). The experimental results demonstrate that HiER significantly improves the performance of the state-of-the-art, exhibiting stochastic dominance over them. To further enhance HiER, I introduce HiER+, which integrates an arbitrary data collection curriculum learning method for which I propose the easy2hard initial state entropy method (E2H-ISE).

Although the results presented in this thesis are my own, henceforth, I will use plural wording for stylistic purposes.


Introduction

Deep learning (DL) is often regarded as the flagship of the modern artificial intelligence (AI) revolution. It has significantly transformed numerous fields including robotics. However, several challenges remain to be solved in order to fully harness the potential of DL algorithms and develop truly adaptive intelligent robots. This thesis tackles some of the key challenges in data-driven robotics, with a particular emphasis on transfer and curriculum learning. Due to space constraints, readers are encouraged to refer to the thesis for a more detailed introduction.

Sim2Real Knowledge Transfer for Object Detection

Robots operating in unstructured environments must be capable of sensing and interpreting their surroundings. One of the main obstacles to deep-learning-based models in the field of robotics is the lack of domain-specific labelled data for different industrial applications. Thus, our first research question is the following: How to transfer knowledge from simulation to the real world in the case of object detection? Our theses regarding the first research question are as follows:


Thesis I: The synthetic images generated by our sim2real domain randomization method (S2R-ObjDet) enable object detection models to learn general representations of the objects, thereby bridging the gap between simulation and real-world environments.


We propose S2R-ObjDet, a domain-randomization-based sim2real synthetic data generation method for object detection. The 3D models of the given objects are loaded in the simulator, each with a random texture or monochromatic colour. Both the number and types of objects are randomised. Simulating gravitational force, the objects are dropped to a plane where they end up in one of their stable positions. The camera extrinsic and intrinsic parameters are set randomly with some constraints to ensure that the given objects are in the field of view. After an image is rendered, a post-processing method is applied to it involving multi-colour pepper-and-salt noise, gaussian blur, and optionally rectangular, circular, and line cutouts. The ground truth annotations of each object are automatically computed based on all points of the objects instead of the 8-points of the axis-aligned bounding boxes of the objects. This process is repeated until the required number of images for the training dataset is generated. S2R-ObjDer is capable of shrinking the reality gap between simulation and the real world to a satisfactory level, achieving 86.32% and 97.38% mAP50 scores respectively in the case of zero-shot and one-shot transfers, on our publicly available manually annotated Ino-10-190 dataset, containing 190 real images of 920 object instances of 10 classes. The class selection was simultaneously based on different and similar objects in order to test the robustness of the model in terms of detecting different classes and differentiating between similar objects. Our solution fits industrial needs as the data generation process requires less than 0.5s per image enabling a fast training process. The training pipeline is presented in Fig. 1. This thesis is associated with [1].


s2r_objdet_flowchart

Figure 1. Top. Pipeline of knowledge transfer. Bottom. Flowchart diagram of our data generation, training, and evaluation process. The picture of the Boston bull is from ImageNet.


Thesis II: In object detection, misclassifications, false positives, and false negatives – factors not captured by traditional metrics – can be effectively quantified and evaluated using our generalised confusion matrix (GCM).


Our novel generalised confusion matrix (GCM) – depicted in Fig. 2 – is an adaptation of the classical confusion matrix to object detection. It addresses the limitations of the traditional precision-recall-based mAP and f1 scores. Using the GCM, errors from misclassification, false positives, and false negatives can be effectively quantified and evaluated. Compared to the traditional confusion matrix \( \boldsymbol{D} \in \mathbb{N}^{C \times C} \), where \( C \in \mathbb{N} \) is the number of the classes, in our GCM \( \boldsymbol{D}^\text{gen} \in \mathbb{N}^{C+1 \times C+1} \), one extra row and one extra column are added to the false positives and the false negatives cases. The correct detections are in the diagonal, \( D^\text{gen}_{i,i} \), as in the case of the standard confusion matrix. \( D^\text{gen}_{C+1,C+1} \doteq 0 \). This thesis is associated with [1].


gcm

Figure 2. Generalised confusion matrix (GCM).

Short video presentation at ICRA 2023:

For further details, we refer the reader to the project page: https://www.danielhorvath.eu/sim2real


Sim2Real Grasp Pose Estimation

In the previous section, our sim2real domain randomization method was presented, focusing on object detection. Nevertheless, in object manipulation, it is essential to estimate not only object positions but also their orientations. Thus, our second research question is the following: How to extend our S2R-ObjDet method to multi-object grasp pose estimation? Our theses regarding the second research question are as follows:


Thesis III: Our novel two-stage multi-object grasp pose estimation methods – the real-time MOGPE-RT and the high-precision MOGPE-HP – enable a modular training approach for multi-object grasp pose estimation by utilizing sequential phases of object detection and class-specific orientation estimation.


We propose two vision-based, multi-object grasp pose estimation models – the real-time MOGPE-RT and the high-precision MOGPE-HP – depicted in Fig. 3. Both models are built upon two core components: an object detection model and an orientation estimation model. The output of the object detection model is \( \boldsymbol{y} = \{(\boldsymbol{b}_i,c^\text{class}_i,p^\text{con}_i) \mid i = 1,2, \ldots, N \} \), where \( \boldsymbol{b}_i = [ x_i, y_i, w_i, h_i ] \in [0,1]^4 \) represents the axis-aligned bounding box of the \( i \)th detection, \( c^\text{class}_i \in \mathbb{N} \) is the class label of the \( i \)th detection, \( p^\text{con}_i \in [0,1] \) is the confidence score of the \( i \)th detection, and \( N \in \mathbb{N} \) is the number of detected objects. The detections with \( p^\text{con}_i < \tau_\text{con} \) are filtered out, where \( \tau_\text{con} \in [0,1] \) is the confidence threshold. The ROI cropping module extracts specific objects from the image and resizes them to the appropriate dimensions and shape. The class-specific orientation estimation models compute the \( sin(\theta_i) \) and \( cos(\theta_i) \) for all objects, where \( \theta_i \in [-\pi, \pi] \) is the orientation angle. Then, with the $atan2$ function, the \( \theta_i \) angles are computed which is the output of the MOGPE-RT model. In the case of the MOGPE-HP model, an additional local pattern-matching algorithm is incorporated, allowing for the estimation of a more precise \( \theta^* \in [-\pi, \pi] \) at the expense of the extra computation. This thesis is associated with [4].


gcm

Figure 3. Top. Illustration of our S2N-ObjDet and S2N-PosEst methods. Bottom. Flowchart diagram of our multi-object grasp pose estimation (MOGPE) methods.


Thesis IV: Our novel S2R-PosEst method facilitates rapid synthetic data generation for single-class orientation estimation models, effectively bridging the reality gap.


We propose S2R-PosEst, a sim2real domain randomization method for pose estimation, based on our S2R-ObjDet method. The 3D model of the given object is placed in the simulator and rotated around the z-axis – perpendicular to the plane where the object is placed – while random textures are added to the plane and to the object as well. All together, there are \( n_{\text{rot}} = \lfloor \frac{2 \pi}{\beta_{\text{res}}} \rfloor \) rotations, where \( n_{\text{rot}} \in \mathbb{N} \) is the number of rotation and \( \beta_{\text{res}} \in \mathbb{R} \) is the resolution in radian. For each rotation, an image is taken and the label is automatically generated with it. The data generation requires 0.25--0.5s per image, making it suitable for industrial applications. This thesis is associated with [4].


Qualitative evaluation:

Short video presentation at IFAC World Congress 2023:

For further details, we refer the reader to the project page: https://www.danielhorvath.eu/mogpe


Highlight Experience Replay

In the previous sections, the main focus was on transferring knowledge from simulation to the real world in cases of supervised learning problems, namely object detection and pose estimation. Nonetheless, the endeavour for adaptive robots is coupled not only with transferability but universality as well. It is important to note that universal solutions are – by definition – easily transferable. An important building block in this attempt might be reinforcement learning (RL). Similarly to humans, RL algorithms learn from trial and error through interactions with the environment. Compared to supervised learning, RL is especially beneficial for robotic tasks that require a high level of dexterity. Nevertheless, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Thus, our first research question is the following: How to improve the training process of state-of-the-art reinforcement learning algorithms with curriculum learning? Our theses regarding the third research question are as follows:


Thesis V: Our novel highlight experience replay (HiER) method enhances the training of reinforcement learning agents by separately storing and replaying the most relevant experiences, leading to a significant improvement in state-of-the-art performance.


Inspired by human learning, we propose HiER, the highlight experience replay method. A secondary experience replay buffer is created to store the most relevant transitions. At training, the transitions are sampled from both the standard experience replay buffer and the highlight experience replay buffer. It can be added to any off-policy RL agent and applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). HiER is depicted in Fig. 4 and detailed in Algorithm 1. If only positive experiences are stored in its buffer, HiER can be viewed as a special, automatic demonstration generator as well. HiER is classified as a data exploitation or implicit curriculum learning method. HiER significantly improves the performance of RL baselines, having stochastic dominance over the state-of-the-art, validated on 8 tasks of three robotic benchmarks. This thesis is associated with [2].


gcm

Figure 4. Overview of HiER and HiER+.


Thesis VI: Our novel HiER+ approach enhances our highlight experience replay (HiER) method by increasing the availability of positive experiences – achieved through controlling task difficulty – particularly during the early stages of the training.


We propose HiER+ which is an enhancement of HiER with an arbitrary data collection (traditional) curriculum learning method. The overview of HiER+ is depicted in Fig. 4 and detailed in Algorithm 2. Furthermore, as an example of the data collection CL method, we propose E2H-ISE, a universal, easy-to-implement easy2hard data collection CL method that requires minimal prior knowledge and controls the initial state-goal entropy (ISE) distribution \( \mathcal{H}(\mu_0) \) which indirectly controls the task difficulty. Our experimental results show that HiER+ further improves HiER's performance. Moreover, HiER+ demonstrates stochastic dominance over HiER, based on the results from three robotic tasks of the Panda-Gym benchmark. This thesis is associated with [2].


Qualitative evaluation:

Short video presentation:

For further details, we refer the reader to the project page: https://www.danielhorvath.eu/hier