Thompson Sampling-Based Learning and Control for Unknown Dynamic Systems

Published 2025 in arXiv.org

ABSTRACT

Thompson sampling (TS) is a Bayesian randomized exploration strategy that samples options (e.g., system parameters or control laws) from the current posterior and then applies the selected option that is optimal for a task, thereby balancing exploration and exploitation; this makes TS effective for active learning-based controller design. However, TS relies on finite parametric representations, which limits its applicability to more general spaces, which are more commonly encountered in control system design. To address this issue, this work proposes a parameterization method for control law learning using reproducing kernel Hilbert spaces and designs a data-driven active learning control approach. Specifically, the proposed method treats the control law as an element in a function space, allowing the design of control laws without imposing restrictions on the system structure or the form of the controller. A TS framework is proposed in this work to reduce control costs through online exploration and exploitation, and the convergence guarantees are further provided for the learning process. Theoretical analysis shows that the proposed method learns the relationship between control laws and closed-loop performance metrics at an exponential rate, and the upper bound of control regret is also derived. Furthermore, the closed-loop stability of the proposed learning framework is analyzed. Numerical experiments on controlling unknown nonlinear systems validate the effectiveness of the proposed method.

PUBLICATION RECORD

Publication year
2025
Venue
arXiv.org
Publication date
2025-06-27
Fields of study
Computer Science, Engineering
Identifiers
DOI 10.48550/arXiv.2506.22186 arXiv 2506.22186
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Robust Data-Driven Control of Discrete-Time Linear Systems With Errors in Variables
2025cited by this paper
Comparison of Regret in Ucb Algorithms and Thompson Sampling Under Different Reward Distributions
2025cited by this paper
Consensus-Based Thompson Sampling for Stochastic Multiarmed Bandits
2025cited by this paper
Thompson sampling for networked control over unknown channels
2024cited by this paper
Interactive Prediction and Decision-Making for Autonomous Vehicles: Online Active Learning With Traffic Entropy Minimization
2024cited by this paper
Data-Driven Methods Applied to Soft Robot Modeling and Control: A Review
2023cited by this paper
Dual Control of Exploration and Exploitation for Auto-Optimization Control With Active Learning
2023cited by this paper
Data-Driven Control: Part One of Two: A Special Issue Sampling from a Vast and Dynamic Landscape
2023cited by this paper
Data-Driven Control Based on the Behavioral Approach: From Theory to Applications in Power Systems
2023cited by this paper
Sequential Learning and Control: Targeted Exploration for Robust Performance
2023cited by this paper
On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL
2022cited by this paper
Active Learning of Discrete-Time Dynamics for Uncertainty-Aware Model Predictive Control
2022cited by this paper
Thompson Sampling Achieves Õ(√T) Regret in Linear Quadratic Control
2022cited by this paper
Human-Tailored Data-Driven Control System of Autonomous Vehicles
2022cited by this paper
Direct Data-Driven Control of Linear Time-Varying Systems
2021cited by this paper
Active Learning in Robotics: A Review of Control Principles
2021cited by this paper
Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control
2021cited by this paper
Robust Designs Through Risk Sensitivity: An Overview
2021cited by this paper
How to measure uncertainty in uncertainty sampling for active learning
2021cited by this paper
Sliding-Window Thompson Sampling for Non-Stationary Settings
2020cited by this paper
Nonlinear Dynamic System Identification
2020cited by this paper
Fast active learning for pure exploration in reinforcement learning
2020cited by this paper
Scalable Thompson Sampling using Sparse Gaussian Process Models
2020cited by this paper
Localized active learning of Gaussian process state space models
2020cited by this paper
Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods
2019cited by this paper
Active Learning-Based Grasp for Accurate Industrial Manipulation
2019cited by this paper
Thompson Sampling for Stochastic Control: The Continuous Parameter Case
2019influential reference
Robust exploration in linear quadratic reinforcement learning
2019cited by this paper
Active Learning of Dynamics for Data-Driven Control Using Koopman Operators
2019cited by this paper
Formulas for Data-Driven Control: Stabilization, Optimality, and Robustness
2019cited by this paper
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
2019cited by this paper
Model predictive control with active learning under model uncertainty: Why, when, and how
2018cited by this paper
Taming Non-stationary Bandits: A Bayesian Approach
2017cited by this paper
Stochastic model predictive control with active uncertainty learning: A Survey on dual control
2017cited by this paper
Thompson Sampling for Stochastic Control: The Finite Parameter Case
2017cited by this paper
Learning Algorithms for Active Learning
2017cited by this paper
Thompson Sampling for Linear-Quadratic Control Problems
2017cited by this paper
A Tutorial on Thompson Sampling
2017cited by this paper
Linear Thompson Sampling Revisited
2016cited by this paper
Thompson Sampling for Learning Parameterized Markov Decision Processes
2014cited by this paper
Predictive Entropy Search for Efficient Global Optimization of Black-box Functions
2014cited by this paper
Decision making using Thompson Sampling
2014cited by this paper
Thompson Sampling for Contextual Bandits with Linear Payoffs
2012cited by this paper
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits
2011cited by this paper
On Upper-Confidence Bound Policies for Switching Bandit Problems
2011cited by this paper
RISK SENSITIVITY, A STRANGELY PERVASIVE CONCEPT
2002cited by this paper
The consistency of posterior distributions in nonparametric problems
1999cited by this paper
Universal approximation bounds for superpositions of a sigmoidal function
1993cited by this paper

CITED BY

No citing papers are available for this paper.