Differentiable Dynamic Programming for Structured Prediction and Attention

Published 2018 in International Conference on Machine Learning

ABSTRACT

Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, DP algorithms are usually non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

PUBLICATION RECORD

Publication year
2018
Venue
International Conference on Machine Learning
Publication date
2018-02-11
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1802.03676
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

SparseMAP: Differentiable Sparse Structured Inference
2018cited by this paper
Differentiable Dynamic Programming for Structured Prediction and Attention
2018cited by this paper
A Regularized Framework for Sparse and Structured Neural Attention
2017influential reference
A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models
2017cited by this paper
OptNet: Differentiable Optimization as a Layer in Neural Networks
2017cited by this paper
Smooth and Sparse Optimal Transport
2017cited by this paper
Structured Attention Networks
2017influential reference
Soft-DTW: a Differentiable Loss Function for Time-Series
2017influential reference
Differentiable Learning of Submodular Functions
2017cited by this paper
Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper)
2016cited by this paper
FastText.zip: Compressing text classification models
2016cited by this paper
Divide and Conquer Networks
2016cited by this paper
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
2016cited by this paper
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
2016cited by this paper
Neural Architectures for Named Entity Recognition
2016cited by this paper
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
Smooth and Strong: MAP Inference with Linear Convergence
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Metric Learning for Temporal Sequence Alignment
2014cited by this paper
Smoothing and First Order Methods: A Unified Framework
2012cited by this paper
Minimum-Risk Training of Approximate CRF-Based NLP Systems
2012cited by this paper
Natural Language Processing (Almost) from Scratch
2011cited by this paper
Soundprism: An Online System for Score-Informed Source Separation of Music Audio
2011cited by this paper
Entropy functions and functional equations
2011cited by this paper
Convex optimization
2010influential reference
An Introduction to Conditional Random Fields
2010cited by this paper
A Finite Algorithm for Finding the Projection of a Point onto the Canonical Simplex of R "
2009cited by this paper
Efficient projections onto the l1-ball for learning in high dimensions
2008cited by this paper
Graphical Models, Exponential Families, and Variational Inference
2008influential reference
Predicting Structured Data
2008cited by this paper
Minimum Risk Annealing for Training Log-Linear Models
2006cited by this paper
A Tutorial on Energy-Based Learning
2006cited by this paper
Large Margin Methods for Structured and Interdependent Output Variables
2005influential reference
Smooth minimization of non-smooth functions
2005influential reference
Why Delannoy numbers?
2004cited by this paper
ON A ROUTING PROBLEM
2004cited by this paper
Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
2003cited by this paper
OBJECTS COUNTED BY THE CENTRAL DELANNOY NUMBERS
2003cited by this paper
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms
2002cited by this paper
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001cited by this paper
The generalized distributive law
2000cited by this paper
Global training of document processing systems using graph transformer networks
1997cited by this paper
Fast Exact Multiplication by the Hessian
1994cited by this paper
Probabilistic reasoning in intelligent systems: Networks of plausible inference
1991cited by this paper
A tutorial on hidden Markov models and selected applications in speech recognition
1989cited by this paper
Entropy of Terminal Distributions and the Fibonacci Trees
1988cited by this paper
Abstract dynamic programming models under commutativity conditions
1987cited by this paper
A finite algorithm for finding the projection of a point onto the canonical simplex of ∝n
1986cited by this paper
Dynamic programming algorithm optimization for spoken word recognition
1978cited by this paper
Control of uncertain systems with a set-membership description of the uncertainty
1971cited by this paper
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
1967cited by this paper
Statistical Inference for Probabilistic Functions of Finite State Markov Chains
1966cited by this paper
The Theory of Max-Min, with Applications
1966cited by this paper
Proximité et dualité dans un espace hilbertien
1965cited by this paper
On the Theory of Dynamic Programming.
1952influential reference

CITED BY

Maximum Likelihood Reinforcement Learning
2026cites this paper
Fast and General Automatic Differentiation for Finite-State Methods
2026cites this paper
DWA-KD: Dual-Space Weighting and Time-Warped Alignment for Cross-Tokenizer Knowledge Distillation
2026cites this paper
ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos
2026influential citation
Combinatorial Optimization Augmented Machine Learning
2026cites this paper
A Unified Perspective on CTC and Soft-DTW Using Differentiable DTW
2026influential citation
Differentiable Knapsack and Top-k Operators via Dynamic Programming
2026influential citation
Scaling Neuro-symbolic Problem Solving: Solver-Free Learning of Constraints and Objectives
2025cites this paper
Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems
2025cites this paper
Towards Measuring and Modeling Geometric Structures in Time Series Forecasting via Image Modality
2025cites this paper
Bregman Conditional Random Fields: Sequence Labeling with Parallelizable Inference Algorithms
2025cites this paper
BlockDTW: Efficient and Scalable Similarity Search Algorithm for Healthcare-Focused Time-Series
2025cites this paper
Computing Hamming Distance and Levenshtein Distance Using ReLU Neural Networks
2025cites this paper
An overview of beam-tracking techniques for mmWave wireless communications
2025cites this paper
Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction
2025influential citation
Convergence of regularized agent-state-based Q-learning in POMDPs
2025cites this paper
Learning with Local Search MCMC Layers
2025cites this paper
Rule-Guided DRL for UAV-Assisted Wireless Sensor Networks With No-Fly Zones Safety
2025cites this paper
A comprehensive benchmarking study of protein structure alignment tools based on downstream task performance
2025cites this paper
Neural differentiable modeling with diffusion-based super-resolution for two-dimensional spatiotemporal turbulence
2024cites this paper
Soft Condorcet Optimization for Ranking of General Agents
2024cites this paper
A Fast and Sound Tagging Method for Discontinuous Named-Entity Recognition
2024influential citation
Learning to Predict Activity Progress by Self-Supervised Video Alignment
2024cites this paper
Dynamic programming with meta-reinforcement learning: a novel approach for multi-objective optimization
2024cites this paper
DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context
2024cites this paper
Soft Dynamic Time Warping with Variable Step Weights
2024cites this paper
SynJax: Structured Probability Distributions for JAX
2023cites this paper
Revisiting the Entropy Semiring for Neural Speech Recognition
2023cites this paper
A Unified Perspective on Regularization and Perturbation in Differentiable Subset Selection
2023cites this paper
Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
2023influential citation
Differentiable Clustering with Perturbed Spanning Forests
2023cites this paper
Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize
2023cites this paper
Maximum entropy GFlowNets with soft Q-learning
2023cites this paper
$$\alpha$$ α ILP: thinking visual scenes as differentiable logic programs
2023cites this paper
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization
2023cites this paper
gcDLSeg: Integrating Graph-cut into Deep Learning for Binary Semantic Segmentation
2023cites this paper
Revisiting Implicit Differentiation for Learning Problems in Optimal Control
2023cites this paper
Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
2023cites this paper
Protein remote homology detection and structural alignment using deep learning
2023influential citation
Discovering Lin-Kernighan-Helsgaun heuristic for routing optimization using self-supervised reinforcement learning
2023cites this paper
Stabilizing Training with Soft Dynamic Time Warping: A Case Study for Pitch Class Estimation with Weakly Aligned Targets
2023cites this paper
Investigating Pretrained Language Models and Learning Methods for Vietnamese Text-to-SQL
2023cites this paper
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
2022cites this paper
A Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search
2022cites this paper
DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer
2022cites this paper
Fixed-Point Automatic Differentiation of Forward-Backward Splitting Algorithms for Partly Smooth Functions
2022cites this paper
Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters
2022cites this paper
Learning the random variables in Monte Carlo simulations with stochastic gradient descent: Machine learning for parametric PDEs and financial derivative pricing
2022cites this paper
Optimizing Slimmable Networks for Multiple Target Platforms
2022cites this paper
Multivariate Time Series Prediction Based on Temporal Change Information Learning Method
2022cites this paper
Learning with Latent Structures in Natural Language Processing: A Survey
2022cites this paper
TM-Vec: template modeling vectors for fast homology detection and alignment
2022influential citation
Moment Distributionally Robust Tree Structured Prediction
2022cites this paper
A Noise-tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving
2022cites this paper
Knowledge distillation for fast and accurate DNA sequence correction
2022cites this paper
Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses
2022cites this paper
TILDE-Q: A Transformation Invariant Loss Function for Time-Series Forecasting
2022cites this paper
Uncertainty-DTW for Time Series and Sequences
2022cites this paper
A deep learning network with differentiable dynamic programming for retina OCT surface segmentation
2022influential citation
A dynamic programming algorithm for span-based nested named-entity recognition in O(n^2)
2022cites this paper
SIMPLE: A Gradient Estimator for k-Subset Sampling
2022cites this paper
Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses
2022cites this paper
Latent Topology Induction for Understanding Contextualized Representations
2022cites this paper
DOGE-Train: Discrete Optimization on GPU with End-to-end Training
2022cites this paper
Unsupervised Dependency Graph Network
2022cites this paper
Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies
2022cites this paper
A Surrogate Objective Framework for Prediction+Optimization with Soft Constraints
2021cites this paper
Learning Discriminative Prototypes with Dynamic Time Warping
2021cites this paper
Attention, please! A survey of neural attention models in deep learning
2021cites this paper
Deep Time Series Forecasting With Shape and Temporal Criteria
2021cites this paper
Differentiable Greedy Algorithm for Monotone Submodular Maximization: Guarantees, Gradient Estimators, and Applications
2021cites this paper
Representation Learning via Global Temporal Alignment and Cycle-Consistency
2021cites this paper
Combinatorial Optimization for Panoptic Segmentation: An End-to-End Trainable Approach
2021cites this paper
Differentiable Robust LQR Layers
2021cites this paper
Polymorphic dynamic programming by algebraic shortcut fusion
2021cites this paper
Sparse Continuous Distributions and Fenchel-Young Losses
2021cites this paper
DeepConsensus: Gap-Aware Sequence Transformers for Sequence Correction
2021cites this paper
Cycle-skipping mitigation using misfit measurements based on differentiable dynamic time warping
2021influential citation
To be Closer: Learning to Link up Aspects with Opinions
2021cites this paper
Differentiable Spline Approximations
2021cites this paper
Differentiable Equilibrium Computation with Decision Diagrams for Stackelberg Models of Combinatorial Congestion Games
2021cites this paper
Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach
2021cites this paper
Learning with Algorithmic Supervision via Continuous Relaxations
2021cites this paper
Twice regularized MDPs and the equivalence between robustness and regularization
2021cites this paper
Misfit functions based on differentiable dynamic time warping for waveform inversion
2021cites this paper
End-to-end learning of multiple sequence alignments with differentiable Smith–Waterman
2021influential citation
Deep embedding and alignment of protein sequences
2021cites this paper
Scaling Structured Inference with Randomization
2021cites this paper
Directed Probabilistic Watershed
2021cites this paper
Differentiable Design With Dynamic Programming Anonymous Authors
2021influential citation
Approximate Newton Policy Gradient Algorithms
2021cites this paper
Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems
2020cites this paper
Learning with Differentiable Perturbed Optimizers
2020cites this paper
RePose: Learning Deep Kinematic Priors for Fast Human Pose Estimation
2020cites this paper
Deep Multiview Learning From Sequentially Unaligned Data
2020cites this paper
A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings
2020influential citation
Latent Template Induction with Gumbel-CRFs
2020cites this paper
Protein Structural Alignments From Sequence
2020cites this paper
A contribution to Optimal Transport on incomparable spaces
2020cites this paper
Meta-Learning for Domain Generalization in Semantic Parsing
2020influential citation