Humans are not simple creatures. Although we like to present ourselves as coherent wholes to the outside world, on the inside, we’re divided. Marvin Minsky argues in The Society of Mind that minds are composed of smaller agents, each with specialized roles and drives. Inside Out similarly imagines the mind as made up of joyful, sad, fearful, disgusted, and angry sub-selves.
Multi-agent systems offer an analogous model for AI. They combine groups of agents together to accomplish tasks in a way that any one of the constituent agents wouldn’t be able to alone. We believe these systems may represent the best path towards agentic AI, becoming more intelligent through the diversity of their specialized sub-agents instead of the scale of a single centralized model.
Some multi-agent architectures are more hierarchical, with a lead orchestrating agent acting as a central brain to coordinate the actions of specialized sub-agents. Microsoft Research recently published a paper outlining such a system called Magnetic-One. Their orchestrating agent works with a Coder, ComputerTerminal, WebSurfer, and FileSurfer to accomplish various computer tasks. Nunu developed an orchestrating agent that plays video games by working with specialized vision, gameplay, and action models.
We’re already seeing the first signs of how these orchestrators can be usefully productized and deployed. ChatGPT’s brain uses its search tool to answer users’ queries when it thinks necessary. Anthropic recently demoed a Claude-orchestrated system that uses a vision model and computer controls for computer use. Although these central brains are just working with simple tools, the ChatGPT and Claude architectures could eventually integrate autonomous sub-agents like the ones Magnetic-One uses to expand their capabilities.
In other contexts, the outputs of a multi-agent system might be an emergent result of the actions of a many-to-many network of sub-agents that all coordinate with each other instead of with one central planner. The Generative Agents paper details how a group of autonomous Sims-like agents came together to organize a Valentine’s Day party in a totally decentralized way. They imagine applying this technology to simulating human behavior, constructing immersive worlds, and emulating users to optimize their product experiences.
One advantage of multi-agent systems is that their component agents can use focused and easily specified objective functions. Having one clear goal for a sub-system to optimize for makes it easier to leverage deep learning techniques like RL to become exceptional at that function. While LLMs learn to play chess implicitly by becoming generally intelligent and seeing some chess game tokens, a chess sub-agent could more effectively learn the game directly through RL. And while you could fine-tune a large model to learn a new speciality like chess, it might be expensive to propagate that knowledge through a huge model and could have unexpected side effects on its behavior in other domains.
Another advantage of multi-agent systems is that they could make interpretability easier. By replacing the activations of an opaque neural net with exchanges between agents in natural language, multi-agent systems enable observers to better understand why they behave as they do. Furthermore, if the agents are internally coordinating in natural language, users could potentially intervene in and debug them directly as needed.
It’s still an open question which flavor of multi-agent system will be best suited to which tasks. But, if multi-agent systems are advantaged over single large models at all, it will be because there are gains to specialization that outweigh the advantages of generalism and cross-learning. This dynamic would neutralize the frontier labs’ advantage in training very large models. If further progress starts to come from the breadth of specialized models an agent can access, there will be an opportunity for an upstart to build a more open ecosystem where developers compete to share the most useful small models for a wide variety of functions. If you’re building towards that future, we would love to meet you.