CMU Associate Professor: In the Current Era of Multi-Agent Systems, Don’t Ignore Single-Agent Systems
 
                    Recently, "multi-agent system" is one of the hottest buzzwords in the field of artificial intelligence, and it is also the focus of research in open source frameworks such as MetaGPT and Autogen.
But are multi-agent systems necessarily perfect?
Recently, Graham Neubig, an associate professor at Carnegie Mellon University, emphasized in his article "Don't Sleep on Single-agent Systems" that single-agent systems should not be ignored.
Graham Neubig elaborates from the following aspects:
- Elements of contemporary AI agent development, including large language models, cues, and action spaces;
- Examples of multi-agent systems;
- Problems with multi-agent systems;
- How to transition from using multiple specialized agents to a single powerful agent, and some of the problems that need to be solved.
Tianqi Chen, assistant professor of machine learning and computer science at CMU, retweeted the study and commented: "This is a profound insight into how to make single-agent systems more powerful, and it also has great implications for machine learning systems. Hint prefix caching will become a key technology that interacts with other general reasoning optimization techniques."

LLM-based Agent
Most intelligent agents are built on top of large language models, such as Anthropic’s Claude or OpenAI’s language models. But language models are not enough to build a great intelligent agent. Building an intelligent agent requires at least three components:
- Large Language Model LLM;
- Hints: These can be system prompts that specify the general behavior of the model, or types of information extracted from the agent’s surroundings;
- Action space: The above two items are auxiliary tools provided by researchers to LLM so that the agent can generate actions in the real world.
Generally speaking, when it comes to multi-agent systems, at least one of these three components has to change.
Multi-agent Example
Suppose you are building an AI software development assistant. Here the author takes CodeR as an example, which is a multi-agent framework for AI software development. It includes multiple agents, all of which use the same underlying LM, but with different prompts and action spaces:
- Manager: This agent’s hint specifies that it should write a plan for other agents to execute, and output the action space of the plan;
- Reproducer: The agent has a prompt telling it to reproduce the problem and an action space to write code to a file called reduce.py that reproduces the error.
- Fault Localizer: The agent has a prompt telling it to find the file that caused the error and an action space to use software engineering tools to localize the fault and list the files for later use;
- Editor: This agent has a prompt for receiving the results of the reproducer and fault locator, and an action space that allows it to edit the file;
- Verifier: This agent has prompts, receives the results of other agents, and outputs an action space indicating whether the problem has been solved.
This is the structure you need when building a system, but there are some difficulties in building such a system.
Some problems with multi-agent systems
When building a multi-agent system, you may encounter many problems, such as:
Getting the right structure: Multi-agent systems solve problems by adding structure. This works well when the problem the agents are facing matches the assigned structure exactly, but the question is what happens when it doesn’t?
Transfer of contextual information: Multi-agent systems often transfer information between multiple agents, but this can be a source of information loss. For example, if a fault locator only passes its summary information to other agents, this often results in the loss of important contextual information that could be useful to downstream agents.
Maintainability: Finally, these agents usually have their own independent code base, or at least independent prompts. Therefore, multi-agent systems may have larger and more complex code bases.
Interestingly, many of these challenges also apply to human organizations! We’ve all had experiences where a team was disorganized, communication was poor, or when a member left and the necessary skills were not maintained.
How to build great single-agent systems
Why do people build multi-agent systems? One important reason is that agents dedicated to specific tasks usually perform very well, and given the right structure and tools, they can do a good job of the corresponding tasks.
Is a single agent capable of competing?
It may be easier than we expected. The author said that there is already a good prototype here: https://github.com/All-Hands-AI/OpenHands/tree/main/agenthub/codeact_agent
Let’s take a look at what it takes to create excellent single LLM, single action space, and single cue engineering technology.
Single LLM: This is the relatively easy part. There have been some excellent general LLMs recently, including closed-source models such as Claude and GPT-4o, and open-source models such as Llama and Qwen. Although these models are not omnipotent, they can indeed complete a wide variety of tasks. Even if they lack a certain function, they can be added through continuous training without affecting other functions too much.
Single action space: This is not hard either. If we have multiple agents that use different tools, then we can (1) provide the models with relatively common tools to help them solve problems; (2) if different agents have different tool sets, we can connect them together. For example, in OpenHands, we can provide agents with tools to write code, run code, and perform web browsing. This common approach allows models to use software tools developed for human developers, thereby increasing their capabilities and doing things that other multi-agents can do.
Single-cue engineering: This is where it gets hard! We need to make sure the agent gets the right instructions on how to solve the task, while also getting the right information from its environment.
Here are two options:
- Connect all the prompts: If we have a multi-agent system that uses 10 different prompts, why not connect them together? Recent long-context models have the ability to handle up to hundreds of thousands of tokens. For example, Cluade can handle 200,000 tokens, and Llama is 128,000. OpenHands also uses this method. But this method also has some disadvantages. The first is the cost. Longer prompts require more money and time, but now there are some technologies (such as Anthropic's prompt word caching technology) to reduce its cost. Another disadvantage of this method is that if there are too many prompts, LLM may not be able to focus on the key points, but as the model capabilities improve, LLM is getting better and better at determining important information in long contexts.
- Retrieval-augmented hints: Another possible option is to use retrieval. As in the retrieval-augmented generation (RAG) system, long contexts can be pruned for efficiency or accuracy. Here is some research progress on selecting examples to provide LLMs: https://arxiv.org/abs/2209.11755
Summary
This is not to say that multi-agent systems have no use. For example, in situations where one agent has access to proprietary information and another agent is acting on behalf of another person, multi-agent systems certainly have a lot of uses!
The purpose of this article is to think critically about the tendency to make systems more complex. Sometimes simple is best - having powerful models, powerful tools, and a variety of prompts is enough.