Real-world autonomous planning requires coordinating tightly coupled constraints where a single decision dictates the feasibility of all subsequent actions. However, existing benchmarks predominantly feature loosely coupled constraints solvable through local greedy decisions and rely on idealized data, failing to capture the complexity of extracting parameters from dynamic web environments. We introduce \textbf{WorldTravel}, a benchmark comprising 150 real-world travel scenarios across 5 cities that demand navigating an average of 15+ interdependent temporal and logical constraints. To evaluate agents in realistic deployments, we develop \textbf{WorldTravel-Webscape}, a multi-modal environment featuring over 2,000 rendered webpages where agents must perceive constraint parameters directly from visual layouts to inform their planning. Our evaluation of 10 frontier models reveals a significant performance collapse: even the state-of-the-art GPT-5.2 achieves only 32.67\% feasibility in text-only settings, which plummets to 19.33\% in multi-modal environments. We identify a critical Perception-Action Gap and a Planning Horizon threshold at approximately 10 constraints where model reasoning consistently fails, suggesting that perception and reasoning remain independent bottlenecks. These findings underscore the need for next-generation agents that unify high-fidelity visual perception with long-horizon reasoning to handle brittle real-world logistics.
WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints
Zexuan Wang,Chenghao Yang,Yingqi Que,Zhenzhu Yang,Huaqing Yuan,Yiwen Wang,Zhengxuan Jiang,Sheng Fang,Zhenhe Wu,Zhaohui Wang,Zhixin Yao,Jiashuo Liu,Jincheng Ren,Yuzhe Li,Yang Yang,Jiaheng Liu,Jian Yang,Zaiyuan Wang,Ge Zhang,Zhoufutu Wen,Wenhao Huang
Published 2026 in Unknown venue
ABSTRACT
PUBLICATION RECORD
- Publication year
2026
- Venue
Unknown venue
- Publication date
2026-02-09
- Fields of study
Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-22 of 22 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1