Benchmarking LLMs' Swarm intelligence

Kai Ruan,Mowen Huang,Ji-Rong Wen,Hao Sun

Published 2025 in arXiv.org

ABSTRACT

Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-05-07
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2505.04364 arXiv 2505.04364
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
2025cited by this paper
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
2025cited by this paper
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
2025cited by this paper
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents
2025cited by this paper
Towards an AI co-scientist
2025cited by this paper
Large Language Models for Multi-Robot Systems: A Survey
2025cited by this paper
EMOS: Embodiment-aware Heterogeneous Multi-robot Operating System with LLM Agents
2024cited by this paper
LaMMA-P: Generalizable Multi-Agent Long-Horizon Task Allocation and Planning with LM-Driven PDDL Planner
2024cited by this paper
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
2024cited by this paper
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
2024cited by this paper
VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft
2024cited by this paper
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization
2024cited by this paper
Challenges Faced by Large Language Models in Solving Multi-Agent Flocking
2024cited by this paper
Self-organizing Nervous Systems for Robot Swarms
2024cited by this paper
Embodied LLM Agents Learn to Cooperate in Organized Teams
2024cited by this paper
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
2024cited by this paper
ARC Prize 2024: Technical Report
2024influential reference
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
2024cited by this paper
OASIS: Open Agent Social Interaction Simulations with One Million Agents
2024cited by this paper
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
2024cited by this paper
Project Sid: Many-agent simulations toward AI civilization
2024cited by this paper
Large language models empowered agent-based modeling and simulation: a survey and perspectives
2023cited by this paper
Low-distortion information propagation with noise suppression in swarm networks
2023influential reference
A Survey of Large Language Models
2023cited by this paper
Generative Agents: Interactive Simulacra of Human Behavior
2023cited by this paper
Emergent autonomous scientific research capabilities of large language models
2023cited by this paper
Individuality in Swarm Robots with the Case Study of Kilobots: Noise, Bug, or Feature?
2023cited by this paper
Building Cooperative Embodied Agents Modularly with Large Language Models
2023cited by this paper
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
2023cited by this paper
Communicative Agents for Software Development
2023cited by this paper
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
2023cited by this paper
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
2023cited by this paper
A survey on large language model based autonomous agents
2023cited by this paper
The Rise and Potential of Large Language Model Based Agents: A Survey
2023cited by this paper
SMART-LLM: Smart Multi-Agent Robot Task Planning using Large Language Models
2023cited by this paper
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models
2023cited by this paper
Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models
2023cited by this paper
Theory of Mind for Multi-Agent Collaboration via Large Language Models
2023cited by this paper
Spontaneous vortex formation by microswimmers with retarded attractions
2022cited by this paper
Swarm intelligence begins now or never
2021cited by this paper
Swarm Learning for decentralized and confidential clinical machine learning
2021cited by this paper
Individual error correction drives responsive self-assembly of army ant scaffolds
2021cited by this paper
Stewardship of global collective behavior
2021cited by this paper
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019cited by this paper
Simulating Kilobots Within ARGoS: Models and Experimental Validation
2018cited by this paper
Kilombo: a Kilobot simulator to enable effective research in swarm robotics
2015cited by this paper
Army ants dynamically adjust living bridges in response to a cost–benefit trade-off
2015cited by this paper
Programmable self-assembly in a thousand-robot swarm
2014cited by this paper
Swarm robotics: a review from the swarm engineering perspective
2013cited by this paper
Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit
2010cited by this paper
Inherent noise can facilitate coherence in collective swarm motion
2009cited by this paper
From Disorder to Order in Marching Locusts
2006cited by this paper
Swarm Intelligence: From Natural to Artificial Systems
2002cited by this paper
Flocks, herds, and schools: a distributed behavioral model
1987cited by this paper
Evidence for a Collective Intelligence Factor in the Performance of Human Groups
year unknowncited by this paper

CITED BY

Tacit Coordination of Large Language Models
2026cites this paper
Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution
2025cites this paper