Hard Non-Monotonic Attention for Character-Level Transduction

Published 2018 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Character-level string-to-string transduction is an important component of various NLP tasks. The goal is to map an input string to an output string, where the strings may be of different lengths and have characters taken from different alphabets. Recent approaches have used sequence-to-sequence models with an attention mechanism to learn which parts of the input string the model should focus on during the generation of the output string. Both soft attention and hard monotonic attention have been used, but hard non-monotonic attention has only been used in other sequence modeling tasks and has required a stochastic approximation to compute the gradient. In this work, we introduce an exact, polynomial-time algorithm for marginalizing over the exponential number of non-monotonic alignments between two strings, showing that hard attention models can be viewed as neural reparameterizations of the classical IBM Model 1. We compare soft and hard non-monotonic attention experimentally and find that the exact algorithm significantly improves performance over the stochastic approximation and outperforms soft attention.

PUBLICATION RECORD

Publication year
2018
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2018-08-29
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/D18-1473 arXiv 1808.10024
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

On Mathematics
2020influential reference
Speeding Up Neural Machine Translation Decoding by Shrinking Run-time Vocabulary
2017cited by this paper
Automatic differentiation in PyTorch
2017cited by this paper
CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages
2017cited by this paper
Morphological Inflection Generation with Hard Monotonic Attention
2016influential reference
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
2016cited by this paper
Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection
2016cited by this paper
Efficient softmax approximation for GPUs
2016cited by this paper
Weighting Finite-State Transductions With Neural Context
2016cited by this paper
Sequence-to-sequence neural network models for transliteration
2016cited by this paper
Strategies for Training Large Vocabulary Neural Language Models
2015cited by this paper
Whitepaper of NEWS 2015 Shared Task on Machine Transliteration
2015influential reference
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015influential reference
Effective Approaches to Attention-based Neural Machine Translation
2015influential reference
Sequence-to-sequence neural net models for grapheme-to-phoneme conversion
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Statistical Machine Translation
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Noise-contrastive estimation: A new estimation principle for unnormalized statistical models
2010cited by this paper
Moses: Open Source Toolkit for Statistical Machine Translation
2007cited by this paper
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
2004cited by this paper
A Neural Probabilistic Language Model
2003influential reference
Classes for fast maximum entropy training
2001cited by this paper
Long Short-Term Memory
1997influential reference
HMM-Based Word Alignment in Statistical Translation
1996cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
Finding Structure in Time
1990influential reference
A tutorial on hidden Markov models and selected applications in speech recognition
1989cited by this paper
Parallel Networks that Learn to Pronounce English Text
1987cited by this paper
Ponapean Reference Grammar
1981cited by this paper

CITED BY

PRSA: Prompt Reverse Stealing Attacks against Large Language Models
2024cites this paper
Tü-CL at SIGMORPHON 2023: Straight-Through Gradient Estimation for Hard Attention
2023cites this paper
On the Learning Dynamics of Attention Networks
2023cites this paper
Cross-lingual Inflection as a Data Augmentation Method for Parsing
2022cites this paper
Inducing and Using Alignments for Transition-based AMR Parsing
2022cites this paper
SSR-Net: A Spatial Structural Relation Network for Vehicle Re-identification
2022cites this paper
Equivariant Transduction through Invariant Alignment
2022cites this paper
On Biasing Transformer Attention Towards Monotonicity
2021influential citation
A study of latent monotonic attention variants
2021cites this paper
Differentiable Generative Phonology
2021cites this paper
Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models
2021cites this paper
Computational Morphology with Neural Network Approaches
2021cites this paper
Searching for More Efficient Dynamic Programs
2021cites this paper
Sequence-to-Sequence Learning with Latent Neural Grammars
2021cites this paper
Comparative Error Analysis in Neural and Finite-state Models for Unsupervised Character-level Transduction
2021influential citation
Analogy Models for Neural Word Inflection
2020cites this paper
Exploring A Zero-Order Direct Hmm Based on Latent Attention for Automatic Speech Recognition
2020cites this paper
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models
2020cites this paper
Neural Data-to-Text Generation via Jointly Learning the Segmentation and Correspondence
2020influential citation
Applying the Transformer to Character-level Transduction
2020cites this paper
Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation
2020cites this paper
Frugal Paradigm Completion
2020influential citation
One-Size-Fits-All Multilingual Models
2020cites this paper
Leveraging Principal Parts for Morphological Inflection
2020cites this paper
University of Illinois Submission to the SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
2020cites this paper
Investigation of Transformer-based Latent Attention Models for Neural Machine Translation
2020influential citation
Autoregressive Modeling is Misspecified for Some Sequence Distributions
2020cites this paper
Sequence-level Mixed Sample Data Augmentation
2020cites this paper
Limitations of Autoregressive Models and Their Alternatives
2020cites this paper
Sparse Sequence-to-Sequence Models
2019cites this paper
AX Semantics’ Submission to the SIGMORPHON 2019 Shared Task
2019cites this paper
A Simple Joint Model for Improved Contextual Neural Lemmatization
2019cites this paper
Modeling text embedded information cascades
2019cites this paper
IT–IST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection
2019cites this paper
The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
2019cites this paper
Multi-Team: A Multi-attention, Multi-decoder Approach to Morphological Analysis.
2019cites this paper
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
2019cites this paper
Neural Finite-State Transducers: Beyond Rational Relations
2019cites this paper
Exact Hard Monotonic Attention for Character-Level Transduction
2019influential citation
Latent Alignment and Variational Attention
2018cites this paper
A Tutorial on Deep Latent Variable Models of Natural Language
2018cites this paper
2 Background : Latent Alignment and Neural Attention
2018cites this paper
Deep Latent Variable Models of Natural Language
2018cites this paper