From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

David Dinucu-Jianu,Jakub Macina,Nico Daheim,Ido Hakimi,Iryna Gurevych,Mrinmaya Sachan

Published 2025 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy which requires strategically withholding answers. To mitigate this, we propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors using simulated student-tutor interactions by emphasizing pedagogical quality and guided problem-solving over simply giving away answers. We use our method to train a 7B parameter tutor model without human annotations which reaches similar performance to larger proprietary models like LearnLM. We introduce a controllable reward weighting to balance pedagogical support and student solving accuracy, allowing us to trace the Pareto frontier between these two objectives. Our models better preserve reasoning capabilities than single-turn SFT baselines and can optionally enhance interpretability through thinking tags that expose the model's instructional planning.

PUBLICATION RECORD

Publication year
2025
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2025-05-21
Fields of study
Computer Science, Education
Identifiers
DOI 10.48550/arXiv.2505.15607 arXiv 2505.15607
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
2025influential reference
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
2025cited by this paper
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
2025cited by this paper
Beyond Final Answers: Evaluating Large Language Models for Math Tutoring
2025cited by this paper
Training LLM-based Tutors to Improve Student Learning Outcomes in Dialogues
2025cited by this paper
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
2024cited by this paper
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
2024influential reference
Pedagogical Alignment of Large Language Models
2024cited by this paper
Book2Dial: Generating Teacher-Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots
2024cited by this paper
Towards the Pedagogical Steering of Large Language Models for Tutoring: A Case Study with Modeling Productive Failure
2024cited by this paper
Building Math Agents with Multi-Turn Iterative Preference Learning
2024cited by this paper
Multi-turn Reinforcement Learning with Preference Human Feedback
2024cited by this paper
Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors
2024influential reference
Capabilities of Gemini Models in Medicine
2024cited by this paper
Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
2024cited by this paper
Efficient Memory Management for Large Language Model Serving with PagedAttention
2023cited by this paper
Opportunities and Challenges in Neural Dialog Tutoring
2023cited by this paper
CLASS: A Design Framework for Building Intelligent Tutoring Systems Based on Learning Science principles
2023cited by this paper
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
2023cited by this paper
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
2023influential reference
Let's Verify Step by Step
2023cited by this paper
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
2023cited by this paper
Reward Model Ensembles Help Mitigate Overoptimization
2023cited by this paper
Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes
2023cited by this paper
Training language models to follow instructions with human feedback
2022cited by this paper
Automatic Generation of Socratic Subquestions for Teaching Math Word Problems
2022cited by this paper
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
2022cited by this paper
Training Verifiers to Solve Math Word Problems
2021cited by this paper
8-bit Optimizers via Block-wise Quantization
2021cited by this paper
Measuring Massive Multitask Language Understanding
2020influential reference
Proximal Policy Optimization Algorithms
2017cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
Active learning increases student performance in science, engineering, and mathematics
2014cited by this paper
The ICAP Framework: Linking Cognitive Engagement to Active Learning Outcomes
2014cited by this paper
Efficient Reductions for Imitation Learning
2010cited by this paper
Education
1964cited by this paper

CITED BY

Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education
2026influential citation
Letting Tutor Personas"Speak Up"for LLMs: Learning Steering Vectors from Dialogue via Preference Optimization
2026cites this paper
Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning
2025cites this paper
MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation
2025cites this paper