Using approximate models in robot learning
MetadataShow full item record
Trajectory following is one of the complicated control problems because its dynamics are nonlinear, stochastic and includes large number of parameters. The problem has major difficulties including large number of trials required for data collection, and huge volume of computations required to find a closed-loop controller for high dimensional and stochastic domains. For solving this type of problems, if we have an appropriate reward function and dynamics model, finding an optimal control policy is possible by using model-based reinforcement learning and optimal control algorithms. As defining an accurate dynamics is not possible for complicated problems, Pieter Abbeel and & Andrew Ng recently presented an algorithm that requires only an approximate model, and only a small number of real-life trials. This algorithm has wide applicability, however there are some problems regarding to convergence of the algorithm. In this research required modifications are presented that provide more powerful assurance for converging to an optimal control policy. Also updated algorithm implemented to evaluate the efficiency of the new algorithm by comparing the acquired results with human expert performance. We are using DDP (Differential Dynamic Programming) as the locally trajectory optimizer and a 2D dynamics and kinematics simulator is used to evaluate the accuracy of the presented algorithm.