Stochastic Structured Prediction under Bandit Feedback

Artem Sokolov,Julia Kreutzer,S. Riezler,Christopher Lo

Published 2016 in Neural Information Processing Systems

ABSTRACT

Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.

PUBLICATION RECORD

Publication year
2016
Venue
Neural Information Processing Systems
Publication date
2016-06-02
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1606.00739
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Bandit structured prediction for learning from partial feedback in statistical machine translation
2016cited by this paper
Learning Structured Predictors from Bandit Feedback for Interactive NLP
2016influential reference
Learning to Search Better than Your Teacher
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
Human Effort and Machine Learnability in Computer Aided Translation
2014cited by this paper
A Survey of Preference-Based Online Learning with Bandit Algorithms
2014cited by this paper
Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations
2013cited by this paper
Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming
2013influential reference
Probabilistic models of vision and max-margin methods
2012cited by this paper
Hope and Fear for Discriminative Training of Statistical Translation Models
2012cited by this paper
Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT
2012cited by this paper
Learning with stochastic inputs and adversarial outputs
2012cited by this paper
Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.
2010cited by this paper
cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models
2010cited by this paper
A contextual-bandit approach to personalized news article recommendation
2010cited by this paper
Interactively optimizing information retrieval systems as a dueling bandits problem
2009cited by this paper
First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests
2009cited by this paper
Truncated Importance Sampling
2008cited by this paper
The Epoch-Greedy algorithm for contextual multi-armed bandits
2007cited by this paper
Stochastic Learning
2003cited by this paper
Shallow Parsing with Conditional Random Fields
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002influential reference
Optimizing search engines using clickthrough data
2002cited by this paper
The Nonstochastic Multiarmed Bandit Problem
2002cited by this paper
Finite-time Analysis of the Multiarmed Bandit Problem
2002cited by this paper
An EÆcient Boosting Algorithm for Combining Preferences
2001cited by this paper
Large Margin Rank Boundaries for Ordinal Regression
2000cited by this paper
Policy Gradient Methods for Reinforcement Learning with Function Approximation
1999cited by this paper
An Efficient Boosting Algorithm for Combining Preferences
1998cited by this paper
Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero
1998cited by this paper
A law of comparative judgment.
1994cited by this paper
Introduction to optimization
1987cited by this paper

CITED BY

RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
2025cites this paper
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward
2024cites this paper
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
2024cites this paper
Single Loop Gaussian Homotopy Method for Non-convex Optimization
2022cites this paper
Teacher Forcing Recovers Reward Functions for Text Generation
2022cites this paper
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits
2021cites this paper
Revisiting the Weaknesses of Reinforcement Learning for Neural Machine Translation
2021cites this paper
Scalable Projection-Free Optimization
2021cites this paper
An Efficient Algorithm for Deep Stochastic Contextual Bandits
2021cites this paper
Reinforcement Learning for Machine Translation: from Simulations to Real-World Applications
2020cites this paper
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
2020cites this paper
Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP
2020cites this paper
Interactive Text Ranking with Bayesian Optimization: A Case Study on Community QA and Summarization
2019cites this paper
Black Box Submodular Maximization: Discrete and Continuous Settings
2019cites this paper
Preference-based interactive multi-document summarisation
2019cites this paper
Response-Based and Counterfactual Learning for Sequence-to-Sequence Tasks in NLP
2019cites this paper
A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation
2018cites this paper
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning
2018cites this paper
A Generic Approach for Accelerating Stochastic Zeroth-Order Convex Optimization
2018cites this paper
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
2018cites this paper
Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction
2018influential citation
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning
2018cites this paper
Can Neural Machine Translation be Improved with User Feedback?
2018cites this paper
A Shared Task on Bandit Learning for Machine Translation
2017cites this paper
Bandit Structured Prediction for Neural Sequence-to-Sequence Learning
2017influential citation
Counterfactual Learning from Bandit Feedback under Deterministic Logging : A Case Study in Statistical Machine Translation
2017cites this paper
Counterfactual Learning for Machine Translation: Degeneracies and Solutions
2017cites this paper
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
2017cites this paper
The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task
2017cites this paper
Learning Structured Predictors from Bandit Feedback for Interactive NLP
2016cites this paper