Paper Review 02-12-2018
Multi-UAV Collaborative Monocular SLAM
Patrik Schmuck and Margarita Chli, ETH Zurich, ICRA 2017
Aims of the Paper
To demonstrate their novel method for combining local mappings from multiple UAVs to generate a single encompassing map while distributing this aggregate map for each UAV to localise itself in.
- Describe the methodology and architecture of the system including further details in the novel parts of the system.
- Describe their experimentation and Analyse the results to prove that their system is capable of robust real time multi-UAV SLAM.
Collaborative / Distributed SLAM across multiple robots has the potential to boost the capabilities of a mission as it shares task workloads while passing heavy computations like map optimization to the central processor on the ground station. Further to this, sharing information from the ground station to the drones can increase the accuracy of the output as every agent can use aggregated information. The difficulties in the system are primarily reliable communications and merging the maps of multiple drones.
There are several critical parts in this system, but they can be broadly split into drone functionality, and ground server functionality. The agent performs local visual odometry to generate key frames (KF) and (feature) map points (MP). These are stored in a small limited memory local map for autonomous flying - N closest KFs. A communications module then periodically updates the ground server with new KF/MPs. The ground server then executes non-critical, expensive processes such as place recognition, map fusion and bundle adjustment which are required for an accurate aggregate map asynchronously in parallel. The server then sends adjusted KF/MPs local to the agent so the agent can update itself. This system can use any keyframe based SLAM approach, and ORB-SLAMv2 is chosen. The major parts of the system are as followed:
- Visual Odometry: Local Map keeps generated N nearest KFs in a pose graph. New landmarks (MPs) added based on distance or feature point overlaps. Local mapping is run in parallel onboard. Server merges maps of same area. Local maps will contain KFs/MPs from all agents who have encountered this area which are communicated back from the server.
- Agent Handler: This interface to the server manages the agent-server communications channel and handles loop closures in the agents server stored copy of the local map. It also holds a transformation matrix from the server map to the local map.
- Data Structures: Three major data structures include (1) Local Map - KF Pose Graph (2) Server Maps and Global Map Stack - Map stack contains all unconstrained maps from each agent. Maps from map stack can be merged to form a server map and used by participating agents’ handlers. (3) KF Database for Place Recognition - Incrementally built from all acquired KFs from all agents.
- Communication: Built from ROS infrastructure. Runs at pre-defined frequency. In every iteration, all new and modified KFs/MPs are sent to the agents. Processes any agent sent messages and modifies server pose graph as necessary. Multiple security checks are performed include packet drop and double insertions to check consistency between server and agent. In map merge conflicts, map from the server is favoured. Each agent can also work independently with no feedback from the server. Time delays may cause late information but will not cause the system to crash.
- Bundle Adjustment: Used on the server side for optimisation of the global pose graph. During this, pose graph updates are queued in the communications modules to update on BA completion to ensure no lost information.
- Place Recognition: Detects whether an agent has visited a location so takes current KF and queries for similar KFs which are evaluated by appearance and geometrically. This loop closure can be used in BA. Task split into two types, Intra-map which looks in same Server Map and Inter-map which looks for KFs in global map
- Map Fusion: Transformation matrices are calculated between two maps with corresponding KFs which takes into account scale. New links are made in resultant pose graph for merged KFs. BA run on resultant pose graph.
Experimental Results and Conclusions
Two experiments are described by the authors which both provide satisfactory results.
The first experiment involves using two handheld cameras to explore an area. The exploration is done with and without collaborative SLAM. This is done to demonstrate inter-map loop closure which recognised feature locations from one camera on the second. With this it performed much more accurately compared to single agent SLAM.
The second more extensive experiment used 4 UAVs to map an outdoor garden patch. They use 4 AscTec Neo drones. This experiment demonstrated all parts of their system. Initially, all 4 drones operate on their own server maps. Map fusion occurs when drones intersect features and the drone trajectories can be seen to be rescaled. Eventually all drones intersect with one of the stored maps and a single map contains the locations of all the drones. On monitoring of map points communicated between drones, we see how processed MPs have been used successfully by other drones for localisation.
During the experiment, timings were also made to assess the scalability of the system. The input stream rate for each drone is constant. The communications time is seen to increases slowly with the number of agents and a note is made at the heavy communications after a BA step.
I am very interested in the topic of both SLAM and Multi-agent systems and as so found this paper incredibly interesting. The client-server methodology seems robust and is a logical extension of some of the previous works mentioned in their related works section. A lot of care is taken in the explanations within the methodology which are clear and concise and succinctly convey some of the complex ideas within their system. I felt that most of the required theoretical details were there, but it would have been better to be more consistent at including specific technologies or data structures to make it possible to reproduce the system. For instance specifying the type of KF Database (ferns?, heaps etc.) and the method of generating the transformation matrix within map fusion was not mentioned, whereas the SLAM method and Bundle adjustment methods were.
In terms of experimentation, I felt the experiments provided were mostly sufficient at convincing me that the collaborative method did work and is a useful addition over single agent SLAM. However I felt it lacked direct quantitative comparison between collaborative and single-agent, primarily replying on a single qualitative trial. In the outdoor scenario, it would have been interesting to see the performance in other locations as well as potentially on 3D structures to determine the effectiveness and limits of the system. In their scaling analysis, the results provided were useful, but I would have preferred a full time analysis to know the full performance of the system. I did feel that the BA comparison however, was superfluous as BA is the same regardless of single or multi-agent SLAM and the results echo those of other papers.
Overall, despite some of the shortcomings, I did enjoy reading the paper and I find the results were incredibly promising with plenty of future research direction. It was well structured and well written while providing a decent explanation and analysis of their methodology and results. I would recommend this paper to those who are looking for the state-of-the-art in what is capable with multi-agent systems.