[Review] RL for coverage planning and path generation for robotic inspection

November 25, 2018 · 6 minute read

A Computational Framework for Automatic Online Path Generation of Robotic Inspection Tasks via Coverage Planning and Reinforcement Learning

W. Jing et al., IEEE Access, vol. 6, pp. 54854-54864, 2018. doi: 10.1109/ACCESS.2018.2872693

The Problem

*Coverage Path Planning (CPP) Is a task to find the optimal path for a given surface or shape for an agent to follow such that the agent covers the surface. This could be physical traversal, or in this case the coverage of a camera. Two subproblems exist in this system which is the View Planning Problem (VPP) and the Path Planning Problem. This paper deals with the problem of single agent (industrial arm) inspection of a single known object with uncertainty in pose, surface variation and measurement noise.

Aims of the paper

This paper presents a novel method for an online solution to the View Planning Problem. In particular for generating inspection policies for different target workspaces with varying sizes and geometries.

• Presenting a formal approach by modelling the problem as a Markov Decision Process (MDP)
• Explaining their developed framework in detail.
• Providing justification for their framework through experimentation
• Providing a comparison to the current best method

Paper Summary

The View Planning Problem is an NP-hard Set covering problem, and the path planning is usually modelled as a Travelling Salesman problem for the viewpoints and are optimized as one process to generate the best results. Often these are offline planning models and require accurate inspection object placement which does not happen in practice. The best online planner which can adapt on-the-fly is the Next-Best-View set of methods which select the next view-point greedily but may generate a less efficient planning policy.

MDP formulation

The robotic surface shape inspection is modelled as a CPP which is to generate a motion plan which consists of :

• set of viewpoints which cover the required areas
• paths to move 3D scanner between viewpoints
• visiting sequence/ order

Where the Objective function is to minimize the total cycle time which is the sum of the of inspection time and travelling time. An MDP can model this as an episodic setting with:

• Actions: Choosing a viewpoint and moving the robot to the viewpoint
• State: All previous observed information appended to current robot poses, i.e. the current cartesian pose of the robot and a growing vector $M$ which contains all observed surface patches of the target object
• Reward: Negative (i.e. minimization problem) of the total cost, i.e. the sum of the travelling and inspection times.

Proposed RL Framework

The framework takes the inputs of the robot model, target object model and sensor specifications. Then there are four sub-modules, 3 offline:

• View points are randomly sampled from a gaussian distribution within a ellipsoid hemisphere around the target objects and robot inverse kinematics are computer for those positions
• Local planning computes the collision free trajectories using Rapidly-exploring Random Trees-Connect and MoveIt within ROS a motion graph and visibility matrix is computed. The transition matrix is generated from updating the state visibility information with the visibility matrix and using the motion graph to update the robot pose
• Visibility modelling and approximation evaluates each viewpoint and surface patch.

Then an online RL planning algorithm is applied on the set of viewpoints to choose the next action for inspection until inspection is completed. They use a variation of Monte-Carlo Tree Search (MCTS) which they called $\epsilon$-greedy Forward Tree Search which, similar to other MCTS algorithms consists of Selection, Expansion, Simulation and BackPropagation steps. The primary difference is the use of an $\epsilon$-greedy policy to balance exploration and exploitation to search for the best moves during the Simulation stage.

Experimentation

Only computational simulations of this project were run with an industrial manipulator inspecting several virtual objects on a conveyer belt in front of it. Qualitatively, these can be inspected for their trajectories, motions and so on. Statistics were collected on the cycle time given the deviation of the inspected object as well as a comparison with the Next Best View (NBV) algorithm with regards to cycle time for a given coverage for the given set of objects. A comparison was also done with NBV on time required for a given level of coverage and it was seen that their approach was better than NBV for high levels of required coverage ($>96\%$).

Paper Review

A very interesting paper to read that gives insights on how to apply reinforcement learning to the problem of view point selection. The introduction did a clear job of setting out the problem at hand without needing further reading. A lot of fairly justified reasoning was given for the MDP modelling which was intuitive and the framework itself was well explained, detailed and had no leaps in logic or inconsistencies. The use of many pictures, code snippets and flow charts was useful in understanding the process. I do like the fact that they explain the packages and give implementation details as these are usually left out in many papers, these are important for understanding the results that the authors arrived at.

The experimentation itself is fairly good at convincing me of the improvement of their RL method over the NBV algorithm. It does a good job of showing that the problem at which this method is optimised for solving for is performing better than previous methods for varying shape uncertainties, which was the aim of the paper. However, I do feel as if there are several details left unspoken that I would have liked to have seen addressed:

• Only computation experiments done and at no point is it tested or even mentioned that they would like to test it on a real robot, even though the technique seems fairly robust in simulation, is it actually as robust?
• The reason RL based approaches are rare is due to the difficulty in training the system. No mention is made of how the system was trained, for how long and whether it is actually practical. Computational requirements are also glossed over with regards to the offline computation which seem to be a lot more intensive than that of NBV which may be an issue despite the performance gains.
• Only a small number of fairly uniform objects were used, and all of fairly consistent sizes and shapes and for fairly small deviations (up to 2cm). This leads to a fairly limited space for which the experiment results hold true. Further experimentation on more uncertain cases would be useful.

Some of these points are touched upon in the discussion, but I feel more work needs to be done to validate the method, as much of the result is focused on how they reduced cycle-time, by an admittedly large amount, but it is important to know the other impacts of this method. The future work section is also fairly lacklustre as it merely expands upon what they investigated in this report rather than giving new directions and applications for the research.

It is an interesting paper, and I would perhaps recommend for an example on reinforcement learning in robotics, but further work needs to be done before it could possibly be applied.