Sparse and Constrained Attention for Neural Machine Translation

Chaitanya Malaviya,Pedro Ferreira,André F. T. Martins

Published 2018 in Annual Meeting of the Association for Computational Linguistics

ABSTRACT

In neural machine translation, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.

PUBLICATION RECORD

Publication year
2018
Venue
Annual Meeting of the Association for Computational Linguistics
Publication date
2018-05-21
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/p18-2059 arXiv 1805.08241
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Learning When to Attend for Neural Machine Translation
2017cited by this paper
Non-Autoregressive Neural Machine Translation
2017cited by this paper
Six Challenges for Neural Machine Translation
2017cited by this paper
Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers
2017influential reference
Attention is All you Need
2017cited by this paper
OpenNMT: Open-Source Toolkit for Neural Machine Translation
2017cited by this paper
Coverage Embedding Models for Neural Machine Translation
2016cited by this paper
A Convolutional Encoder Model for Neural Machine Translation
2016cited by this paper
Neural Machine Translation with Reconstruction
2016cited by this paper
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
2016influential reference
Modeling Coverage for Neural Machine Translation
2016cited by this paper
Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation
2016cited by this paper
Context Gates for Neural Machine Translation
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016influential reference
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Meteor Universal: Language Specific Translation Evaluation for Any Target Language
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
A Simple, Fast, and Effective Reparameterization of IBM Model 2
2013cited by this paper
Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
2013cited by this paper
The Mathematics of Statistical Machine Translation: Parameter Estimation
1993cited by this paper
An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds
1990cited by this paper
Time Bounds for Selection
1973cited by this paper

CITED BY

BaSFormer: A Balanced Sparsity Regularized Attention Network for Transformer
2024cites this paper
Sparsing and Smoothing for the seq2seq Models
2023cites this paper
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective
2023cites this paper
Explainability of Text Processing and Retrieval Methods: A Critical Survey
2022cites this paper
Compositional Generalisation with Structured Reordering and Fertility Layers
2022cites this paper
Alternating Differentiation for Optimization Layers
2022cites this paper
Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
2022cites this paper
Sparse Attention with Learning to Hash
2022cites this paper
Denoising Self-Attentive Sequential Recommendation
2022cites this paper
Explainability of Text Processing and Retrieval Methods: A Survey
2022cites this paper
Sparse summary generation
2022cites this paper
Sparse Attention with Linear Units
2021cites this paper
Measuring and Improving Faithfulness of Attention in Neural Machine Translation
2021cites this paper
Decision Machines: Interpreting Decision Tree as a Model Combination Method
2021cites this paper
Neural Collaborative Graph Machines for Table Structure Recognition
2021cites this paper
Speeding Up Entmax
2021cites this paper
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
2021cites this paper
SpSAN: Sparse self-attentive network-based aspect-aware model for sentiment analysis
2021cites this paper
Pile-up mitigation using attention
2021cites this paper
A self-interpretable module for deep image classification on small data
2021cites this paper
Is Sparse Attention more Interpretable?
2021cites this paper
Decision Machines: Congruent Decision Trees
2021cites this paper
Learning credible DNNs via incorporating prior knowledge and model local explanation
2020cites this paper
A High-Quality Multilingual Dataset for Structured Documentation Translation
2020cites this paper
Efficient Content-Based Sparse Attention with Routing Transformers
2020cites this paper
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
2020cites this paper
Towards Transparent and Explainable Attention Models
2020cites this paper
A Mixture of h - 1 Heads is Better than h Heads
2020cites this paper
Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport
2020cites this paper
Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity
2020cites this paper
DeepSPIN: Deep Structured Prediction for Natural Language Processing
2020cites this paper
SMYRF: Efficient Attention using Asymmetric Clustering
2020cites this paper
Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation
2020cites this paper
Neural Text Generation with Artificial Negative Examples
2020influential citation
Learning Hard Retrieval Decoder Attention for Transformers
2020cites this paper
Interpreting Multi-Head Attention in Abstractive Summarization
2019cites this paper
Attention in Natural Language Processing
2019cites this paper
Learning Credible Deep Neural Networks with Rationale Regularization
2019cites this paper
Latent Structure Models for Natural Language Processing
2019cites this paper
Improving Robustness in Real-World Neural Machine Translation Engines
2019cites this paper
The Limited Multi-Label Projection Layer
2019cites this paper
Sparse Sequence-to-Sequence Models
2019cites this paper
Selective Attention for Context-aware Neural Machine Translation
2019cites this paper
SSN: Learning Sparse Switchable Normalization via SparsestMax
2019cites this paper
Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing
2019cites this paper
Attention With Sparsity Regularization for Neural Machine Translation and Summarization
2019influential citation
Sparse And Structured Visual Attention
2019cites this paper
Neural Machine Translation: A Review
2019cites this paper
Understanding Multi-Head Attention in Abstractive Summarization
2019cites this paper
The Differentiable Cross-Entropy Method
2019cites this paper
Differentiable Convex Optimization Layers
2019cites this paper
Fine-tune BERT with Sparse Self-Attention Mechanism
2019cites this paper
IT–IST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection
2019cites this paper
Adaptively Sparse Transformers
2019influential citation
Improving the Quality of Neural Machine Translation
2018influential citation
Word Rewarding for Adequate Neural Machine Translation
2018cites this paper
Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT
2018cites this paper
Interpretable Structure Induction via Sparse Attention
2018influential citation
CAN: Constrained Attention Networks for Multi-Aspect Sentiment Analysis
2018cites this paper
Decision Machines: Enhanced Decision Trees
year unknowncites this paper
Alma Mater Studiorum Università di Bologna Archivio istituzionale della ricerca
year unknowncites this paper
Alma Mater Studiorum
year unknowncites this paper
Decision Machines: An Extension of Decision Trees
year unknowncites this paper