ART: Automatic multi-step reasoning and tool-use for large language models

Bhargavi Paranjape,Scott M. Lundberg,Sameer Singh,Hannaneh Hajishirzi,Luke Zettlemoyer,Marco Tulio Ribeiro

Published 2023 in arXiv.org

ABSTRACT

Large language models (LLMs) can perform complex reasoning in few- and zero-shot settings by generating intermediate chain of thought (CoT) reasoning steps. Further, each reasoning step can rely on external tools to support computation beyond the core LLM capabilities (e.g. search/running code). Prior work on CoT prompting and tool use typically requires hand-crafting task-specific demonstrations and carefully scripted interleaving of model generations with tool use. We introduce Automatic Reasoning and Tool-use (ART), a framework that uses frozen LLMs to automatically generate intermediate reasoning steps as a program. Given a new task to solve, ART selects demonstrations of multi-step reasoning and tool use from a task library. At test time, ART seamlessly pauses generation whenever external tools are called, and integrates their output before resuming generation. ART achieves a substantial improvement over few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks, and matches performance of hand-crafted CoT prompts on a majority of these tasks. ART is also extensible, and makes it easy for humans to improve performance by correcting errors in task-specific programs or incorporating new tools, which we demonstrate by drastically improving performance on select tasks with minimal human intervention.

PUBLICATION RECORD

Publication year
2023
Venue
arXiv.org
Publication date
2023-03-16
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2303.09014 arXiv 2303.09014
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Toolformer: Language Models Can Teach Themselves to Use Tools
2023influential reference
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022influential reference
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
2022influential reference
TALM: Tool Augmented Language Models
2022cited by this paper
Large Language Models are Zero-Shot Reasoners
2022cited by this paper
Decomposed Prompting: A Modular Approach for Solving Complex Tasks
2022influential reference
Ask Me Anything: A simple strategy for prompting language models
2022influential reference
Measuring and Narrowing the Compositionality Gap in Language Models
2022influential reference
Automatic Chain of Thought Prompting in Large Language Models
2022influential reference
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
2022influential reference
PAL: Program-aided Language Models
2022cited by this paper
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
2022influential reference
Successive Prompting for Decomposing Complex Questions
2022cited by this paper
Prompting Is Programming: A Query Language for Large Language Models
2022influential reference
LaMDA: Language Models for Dialog Applications
2022influential reference
Chain of Thought Prompting Elicits Reasoning in Large Language Models
2022cited by this paper
Training language models to follow instructions with human feedback
2022influential reference
Self-Consistency Improves Chain of Thought Reasoning in Language Models
2022cited by this paper
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion
2022influential reference
PaLM: Scaling Language Modeling with Pathways
2022cited by this paper
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
2022cited by this paper
WebGPT: Browser-assisted question-answering with human feedback
2021cited by this paper
Are NLP Models really able to Solve Simple Math Word Problems?
2021cited by this paper
Evaluating Large Language Models Trained on Code
2021influential reference
Internet-Augmented Dialogue Generation
2021influential reference
Finetuned Language Models Are Zero-Shot Learners
2021cited by this paper
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
2021cited by this paper
Multitask Prompted Training Enables Zero-Shot Task Generalization
2021cited by this paper
Training Verifiers to Solve Math Word Problems
2021cited by this paper
An Explanation of In-context Learning as Implicit Bayesian Inference
2021cited by this paper
Language Models are Few-Shot Learners
2020cited by this paper
UnifiedQA: Crossing Format Boundaries With a Single QA System
2020cited by this paper
Measuring Massive Multitask Language Understanding
2020influential reference

CITED BY

Emerging from Ground: Addressing Intent Deviation in Tool-Using Agents via Deriving Real Calls into Virtual Trajectories
2026cites this paper
AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction
2026cites this paper
Decision-Making Large Language Model for Wireless Communication: A Comprehensive Survey on Key Techniques
2026cites this paper
When control meets large language models: From words to dynamics
2026cites this paper
Guided by Trajectories: Repairing and Rewarding Tool-Use Trajectories for Tool-Integrated Reasoning
2026cites this paper
AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution
2026cites this paper
Enhancing Transparency and Compliance in Automated Decision-Making: A Multi-Agent System Approach Using Language Models
2025cites this paper
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
2025cites this paper
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning
2025cites this paper
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
2025cites this paper
Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation
2025cites this paper
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
2025cites this paper
OmniNova:A General Multimodal Agent Framework
2025cites this paper
Agentic Reasoning for Social Event Extrapolation: Integrating Knowledge Graphs and Language Models
2025cites this paper
Prompt-Driven and Kubernetes Error Report-Aware Container Orchestration
2025cites this paper
Provable Benefits of In-Tool Learning for Large Language Models
2025cites this paper
Context-aware chatbot for personal healthcare assistance using LLMs and LangChain
2025cites this paper
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
2025cites this paper
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management
2025cites this paper
An Auditable Agent Platform For Automated Molecular Optimisation
2025cites this paper
ILearnRobot: An Interactive Learning-Based Multi-modal Robot with Continuous Improvement
2025cites this paper
Magentic-UI: Towards Human-in-the-loop Agentic Systems
2025cites this paper
A comprehensive review of Intelligent Question-Answering Systems in Traditional Chinese Medicine Based on LLMs
2025cites this paper
Model-Grounded Symbolic Artificial Intelligence Systems Learning and Reasoning with Model-Grounded Symbolic Artificial Intelligence Systems
2025cites this paper
A Survey on Mathematical Reasoning and Optimization with Large Language Models
2025cites this paper
AgentRxiv: Towards Collaborative Autonomous Research
2025cites this paper
A Toolbox, Not a Hammer - Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation
2025cites this paper
LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs
2025cites this paper
Deciding the Path: Leveraging Multi-Agent Systems for Solving Complex Tasks
2025cites this paper
UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making
2025cites this paper
Embracing large language model (LLM) technologies in hydrology research
2025cites this paper
A study on classification based concurrent API calls and optimal model combination for tool augmented LLMs for AI agent
2025cites this paper
Learn to Think: Bootstrapping LLM Logic Through Graph Representation Learning
2025cites this paper
WTU-EVAL: A Whether-or-Not Tool Usage Evaluation Benchmark for Large Language Models
2025cites this paper
Coft: Making Large Language Models Better Zero-Shot Learners for Code Generation
2025cites this paper
Querying Large Automotive Software Models: Agentic vs. Direct LLM Approaches
2025cites this paper
A Second-Generation Agentic Framework for Generative Ai-Driven Augmented Reality Educational Games
2025cites this paper
AidAI: Automated Incident Diagnosis for AI Workloads in the Cloud
2025cites this paper
Computational Thinking Reasoning in Large Language Models
2025cites this paper
MCP-Zero: Active Tool Discovery for Autonomous LLM Agents
2025cites this paper
Semi-structured LLM Reasoners Can Be Rigorously Audited
2025cites this paper
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
2025cites this paper
ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
2025cites this paper
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
2025cites this paper
Learn to Think: Bootstrapping LLM Reasoning Capability Through Graph Representation Learning
2025cites this paper
Enhancing Reasoning with Collaboration and Memory
2025cites this paper
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
2025cites this paper
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning
2025cites this paper
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
2025cites this paper
Orchestrating Agents and Data for Enterprise: A Blueprint Architecture for Compound AI
2025cites this paper
Bridging Language Models and Financial Analysis
2025cites this paper
OmniNova: A General Multimodal Multi-Agent Framework
2025cites this paper
Lifelong Learning of Large Language Model based Agents: A Roadmap
2025influential citation
Open-Source Large Language Models in Radiology: A Review and Tutorial for Practical Research and Clinical Deployment.
2025cites this paper
A guide to prompt design: foundations and applications for healthcare simulationists
2025cites this paper
SEM-CTRL: Semantically Controlled Decoding
2025cites this paper
A Taxonomy of Failures in Tool-Augmented LLMs
2025cites this paper
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming
2025cites this paper
Prompt Engineering in Large Language Models: A Systematic Survey of Optimization Techniques and Real-World Applications
2025cites this paper
Tool learning with language models: a comprehensive survey of methods, pipelines, and benchmarks
2025cites this paper
DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs
2025cites this paper
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation
2025cites this paper
NFVAgent: A Retrieval-Augmented LLM Agent for Resilient NFV Failure Recovery
2025cites this paper
Automated Prompt Generation for Code Intelligence: An Empirical study and Experience in WeChat
2025cites this paper
CE-Prompt: enhance prompt expression stability by multiple understanding
2025cites this paper
ReCode: Unify Plan and Action for Universal Granularity Control
2025cites this paper
An Architecture for Integrating Large Language Models with Digital Twins and Automation Systems
2025cites this paper
Empowering Real-World: A Survey on the Technology, Practice, and Evaluation of LLM-driven Industry Agents
2025cites this paper
Learning to Plan for Language Modeling from Unlabeled Data
2024cites this paper
Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation
2024cites this paper
A Study on Training and Developing Large Language Models for Behavior Tree Generation
2024cites this paper
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
2024cites this paper
RE-GAINS & EnCHANT: Intelligent Tool Manipulation Systems For Enhanced Query Responses
2024cites this paper
GuReT: Distinguishing Guilt and Regret related Text
2024cites this paper
Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions
2024cites this paper
Large Language Models: A Survey
2024cites this paper
A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications
2024cites this paper
SwissNYF: Tool Grounded LLM Agents for Black Box Setting
2024cites this paper
OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models
2024cites this paper
AgentScope: A Flexible yet Robust Multi-Agent Platform
2024cites this paper
MATHSENSEI: A Tool-Augmented Large Language Model for Mathematical Reasoning
2024cites this paper
Stepwise Self-Consistent Mathematical Reasoning with Large Language Models
2024cites this paper
Data Interpreter: An LLM Agent For Data Science
2024cites this paper
Navigating Hallucinations for Reasoning of Unintentional Activities
2024cites this paper
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
2024cites this paper
What Are Tools Anyway? A Survey from the Language Model Perspective
2024influential citation
Leveraging large language model to generate a novel metaheuristic algorithm with CRISPE framework
2024cites this paper
Octopus v2: On-device language model for super agent
2024cites this paper
Tell Me Your Prompts and I Will Make Them True: The Alchemy of Prompt Engineering and Generative AI
2024cites this paper
Exploring Autonomous Agents through the Lens of Large Language Models: A Review
2024cites this paper
HAMMR: HierArchical MultiModal React agents for generic VQA
2024cites this paper
Large Language Models for Networking: Workflow, Advances, and Challenges
2024cites this paper
Don't Train, Just Prompt: Towards a Prompt Engineering Approach for a More Generative Container Orchestration Management
2024cites this paper
Large Language Models for Education: A Survey
2024cites this paper
Tool learning with large language models: a survey
2024cites this paper
Adaptive In-conversation Team Building for Language Model Agents
2024cites this paper
Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning
2024cites this paper
Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering
2024cites this paper
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
2024cites this paper
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
2024cites this paper