# [Review] Overview of Multi-Agent Learning

November 3, 2018 · 7 minute read

# Multiagent Learning: Basics, Challenges and Prospects

K. Tuyls, G. Weiss, Maastricht University, AAAI AI Magazine, Fall 2012, 41-52

Article was presented in class here

## Aims of the paper

This article serves as an introduction to the field of multi-agent learning (MAL) for a reader who is familiar with basic machine learning techniques and game theory, i.e. somebody who is in a technical field but outside of MAL specifically. The article looks to do several things:

• Introduce basic concepts in MAL, including mathematics and foundational ideas
• Provide a historical summary of previous developments in MAL
• Identify challenges in the field of MAL
• Identify promising future research avenues

In doing so it seeks to ready the reader for reading and understanding literature in the field. It is worth noting that this is the perspective from 2012.

## Paper Summary

The starting point is the definition of a Multi-Agent System

A Multi-Agent System (MAS) is a distributed system of independent agents which cooperate or compete to achieve a certain objective.

From which we focus specifically on Multi-Agent Learning which is the integration of machine learning and design of distributed algorithms to create a network of adaptive agents. This is important as it provides a link between machine learning and MAS.

### Reinforcement Learning: MDPs and Markov Games

Following the definition, a very concise but useful overview is given of the formal setting for learning in MAL. Most MAL systems follow a reinforcement Learning and/or game theory approach.

An intuitive explanation is given with the analogy of a dog and learned behaviour. This is formalised for the single agent as a Markov Decision Process or MDP. An MDP represents a sequence of decisions for fully observable worlds which, most importantly, follow the Markov Property which is that states $s$ depend only on the previous state $s_{t-1}$ and no other. An MDP can be represented as a tuple of $(S, A, T, R)$ which are the set of states; set of actions; a Transition Function $T:S \times A \times S \rightarrow [0, 1]$ which is the probability of ending in a given state $s’$ after taking an action $a$ in state $s$ which ends up return a reward $r$ for being in a state $s’$. The aim, or learning task is to find a policy $\pi:S \rightarrow S$ or a set of actions to take at each state such that we get the maximal expected future reward, which is known as the value of the policy $V^{\pi}$. This value can found iteratively through the Bellman Equation

$$V^{\pi}(s, t+1) = R(s) + \sum_{s’\in S} T(s, \pi(s), s’) V^{\pi}(s’, t)$$

When the environment is unknown, we can instead use Q-learning which aims to estimate the value of state-action pairs $Q^\pi (S,A)$ directly instead of the value of each state $V^\pi(s)$. Again the Bellman Equation can be applied for iterative learning updates.

This can then be ported through to multi-agent settings through Markov Games in which $P$ agents each have their own value functions where players take simultaneous actions and receive immediate payoff depending on their joint action. Therefore the Transition function maps over the entire joint action space. Markov Games act as the foundation for most multi-agent learning frameworks and there exists many extensions to these ideas.

### Nature of the problem

The main goal of MAL is summed up in the following definition

An agent in a stochastic game needs to learning to behave optimally in the presence of other (learning) agents

In MAL, we can define optimality in terms of the Game Theoretic idea of a Nash Equilibrium which is defined as a state where any action taken by any agent cannot guarantee a better final outcome.

We follow on with a major discussion upon the classification and characterisation of current techniques in MAL. Several widely used classifications are presented along with justifications. These include:

• Cooperative vs Competitive Learning - which seeks to define a difference between swarms of agents together have a joint task to optimise vs a swarm of individual, selfish agents.
• Agent Awareness Levels - Which segregates on how aware agents are of other agents’ goals.
• Proposed Research Agendas - Which segregates on the five possible goals of MAL research.

Following this discussion, a brief historical overview is given from the 1980s to 2012 describing the difference between the Startup Period and the Consolidation Period and the research that was conducted in each. From these periods, 3 milestones where chosen by the author as major works during this period

• Joint Action Learning, (Claus and Boutillier 1998) - Seeks to learn Q values for joint actions in a cooperative game.
• Nash Q-Learning, (Hu and Wellman 2000) - The Nash equilibrium is computed at each update step and therefore guaranteed convergence to Nash equilibrium.
• Infinitesimal Gradient Ascent, (Singh, Kearns and Mansour 2000) - Where the policy is updated using gradient learning which on average converges to the Nash equilibrium.

### Challenges in MAL and the future

The author goes on to discuss what they think are 3 potential challenges in the field of MAL

• Classification limitations
• They regard much of the current research as being either multiplied or divided learning in which either agents learn independently of one another and as so become Generalists or the task is divided and each agent learns a specific action and become Specialists
• The issue is that the agents rarely react and learn from each other which is what they term interactive learning. They advocate more research being done in the field of knowledge intensive interactions.
• Expanding the scope from RL and Game Theory
• Considering the use of different techniques such as agent-agent transfer learning and imitation learning, swarm intelligence.
• MAL in Complex Systems
• Many of the current (2012) methods are insufficient for real-world applications. Many works only model 2 or 3 agent interactions and more scalable solutions need to be found.

## Paper Review

Overall I enjoyed reading this article, it was really well pitched for the target audience, which I myself fell into. It was easy to read and a well structured walkthrough of the field of MAL. I found the foundation section especially useful as not only a primer onto MAL, but as a recap onto the ideas of RL and MDPs. On this topic, all the explanations were concise and well thought through explanations of all technical content. There was always just enough detail to understand the material, but not soo much irrelevant content that it became “waffley”. In addition all references were clearly labelled and referred to throughout the text so it was easy to reach back towards the author’s sources for clarification or more detail.

On the other hand, there are several caveats to watch out for. Firstly, this is not a survey paper - this article is not only a review of the field of MAL but a commentary, summary and discussion of the field of MAL. Many of the author’s opinions are intertwined with the work, in terms of both the summarisation and the chosen highlighted content. Fortunately much of the major opinions come in the challenges and future, but it is worth keeping a note that there are other challenges and views that are equally valid. Analysing the author’s interpreted challenges, I feel that from the background presented, they are fair statements to be made, although perhaps further reading is needed to ascertain this properly. On top of this, I believe the author’s are well known and well regarded professors in the field of MAL.

I believe this paper has thoroughly met its aims for me as I feel I now have a firmer grasp on RL and specifically how single agent RL has been transformed into the Markov Game concept for Multi-Agent RL. I also now understand the development path and through understanding the milestone works, understand the direction that many MAL solutions come from. It has also been interesting seeing which challenges were presented at the time and seeing which of these challenges have been solved and worked on since the release of this article.

I believe that this article is an incredible useful resource for those looking to get into the field of MAL but, as with most fields, don’t know where to start while lacking the background to understand survey papers.