User:Simonjung/Robot Behavior Programming by Demonstration

Robot Behavior Programming by Demonstration ...

Algorithm Overview
Their approach bases on the observation that each demonstrated robot action can be described by the linear combination of primitive relative trajectories. For instance, grasping an object can be described by distance vectors between robot hands and the object. Hence, the problem is posed as Blind Source Separation (BSS). The overall procedure is as follows. First, the latent space of the motion is determined by linearly projecting the data onto a subspace of lower dimensionality using Principal Component Analysis (PCA). Second, the signals are temporally aligned using a Dynamic Time Warping (DTW). Third, probabilistic representation of the data in the latent space is determined by estimating the optimal Guassian Mixture Model (GMM) and Bernoulli Mixture Model (BMM) to encode the motion.

Experiments
The authors used three different scenarios for experiments. First, moving a chess piece. Second, grasping a bucket and moving it forward. Third, grasping a sugar cube and bringing it to the robot's head. This clearly demonstrates a robot can choose latent motion spaces appropriately. When moving arms to the bucket to grasp it, the latent space should be mainly the distance between the robot's hands and the bucket. When moving the objects to constant positions or directions, the latent space should be mainly the joint angles of the robot's arms. However, in their experiments, it is not clearly shown that the how much the robot can generalize its learned motions because each episode has small positional changes.

Discussion
The reason that the problem is posed as BSS by the authors is that they thought that the different movement profiles have redundant information when describing the same motion. However, our experience testifies otherwise. When an episode consists of a series of actions that are executed with respect to different objects in different space (Cartesian or joint). the degree of effects that positional changes in each episode are done differently for different actions. Hence, we observed that only a small number of profiles describe each action appropriately when each episode is introduced arbitrary configuration changes (e.g. different initial positions of the robot arms and objects). Hence, it may be a better idea to find the profiles that have dominant information explicitly.

The training is done with a single signal made up from the set of original signals by regression using GMMs. However, as mentioned earlier, the amount of effects that configuration changes at each episode may be different for different actions and for different profiles. Hence there is a high chance that a generalized signal may fail to encapsulate all the essential features of the original data over time. For instance, the first action of the demonstration may be well explained by a certain profile over multiple episodes and the second action may be explained well by a totally different profile over multiple episodes. Unless the regression result using GMMs can capture this, it may compromise the training results. In this sense, I believe it is wise to use all the original data with proper segmentation during training.

The training results is the weighting matrix that can be used to reproduce the trajectory, and the relationship between the robot's motor resource and the objects is not explicitly defined. However, since the results do not have the explicit representation of the relation between the motor resources and the objects, it may give little room for further optimization or dynamic adaptation. So I believe that it is wise to have more explicit representation that describes the relation between the robot's motor resource and the objects.