Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio

Published 2014 in International Conference on Learning Representations

ABSTRACT

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

PUBLICATION RECORD

Publication year
2014
Venue
International Conference on Learning Representations
Publication date
2014-09-01
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1409.0473
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

LINKED PAPERS

On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
20142 semantic links2 concept links0 claim links
- rnn encoder--decoder is a · An RNN Encoder--Decoder is a recurrent neural machine translation model that follows the encoder-decoder architecture of encoding a source sentence and decoding a target translation.
- neural machine translation related to · Both neural machine translation concepts describe the same neural-network-based translation approach that maps a source sentence to a target sentence.

CLAIMS

Qualitative analysis suggests that the learned soft alignments agree well with intuition.
Confidence 0.90

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
On English-to-French translation, the proposed model achieves performance comparable to the existing state-of-the-art phrase-based machine translation system.
Confidence 0.96

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
A neural machine translation model is proposed that jointly learns to align and translate by replacing the fixed-length vector bottleneck in the basic encoder-decoder architecture with a soft alignment mechanism.
Confidence 0.98

Unknown

CONCEPTS

encoder-decoder architecture
architecture

A neural model structure that encodes a source sentence and then decodes a target translation from the encoded representation.

Aliases: encoder-decoder

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
english-to-french translation
task

The translation task that maps English source sentences to French target sentences.

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
fixed-length vector
representation

A single vector used to summarize the entire source sentence in the basic encoder-decoder model.

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
neural machine translation
method

A machine translation approach that uses a single neural network to model translation from source to target.

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
phrase-based machine translation system
baseline, system

A phrase-based statistical machine translation baseline used as the comparison system.

Aliases: phrase-based system

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review
soft alignment
mechanism

A learned mechanism that assigns soft relevance across source-sentence positions while predicting each target word.

Aliases: soft-search, soft-alignments

뀨 (7c402c1b98) extractionAnonymous (12632b8b5f) review

REFERENCES

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation
2014cited by this paper
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches
2014influential reference
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014influential reference
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
2014cited by this paper
Statistical Machine Translation
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014influential reference
How to Construct Deep Recurrent Neural Networks
2013cited by this paper
Generating Sequences With Recurrent Neural Networks
2013cited by this paper
Multilingual Distributed Representations without Word Alignment
2013influential reference
Maxout Networks
2013cited by this paper
Recurrent Continuous Translation Models
2013influential reference
Audio Chord Recognition with Recurrent Neural Networks
2013influential reference
Hybrid speech recognition with Deep Bidirectional LSTM
2013cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Sequence Transduction with Recurrent Neural Networks
2012cited by this paper
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
Theano: new features and speed improvements
2012cited by this paper
Continuous Space Translation Models for Phrase-Based Statistical Machine Translation
2012cited by this paper
Domain Adaptation via Pseudo In-Domain Data Selection
2011cited by this paper
The conference paper
2011cited by this paper
Continuous Space Language Models for Statistical Machine Translation
2006cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
Statistical Phrase-Based Translation
2003cited by this paper
Bidirectional recurrent neural networks
1997cited by this paper
Biological and Artificial Computation: From Neuroscience to Technology
1997cited by this paper
Recursive Hetero-associative Memories for Translation
1997cited by this paper
Long Short-Term Memory
1997cited by this paper
Learning long-term dependencies with gradient descent is difficult
1994cited by this paper
Untersuchungen zu dynamischen neuronalen Netzen
1991cited by this paper

CITED BY

Pursuit-evasion with a focus – attention-augmented reinforcement learning for swarm-to-swarm orbital game
2026cites this paper
PalmRachis-BiLSTM-Attn: An Anatomically Guided Explainable Deep Learning Framework for Spatial–Temporal Progression Modeling of Date-Palm Leaf Diseases
2026cites this paper
Physics of generative AI’s atom: Repetition, bias, and beyond
2026cites this paper
An efficient YOLOv12n model is used for multi-scale brain tumor detection
2026cites this paper
Multilingual Neural Machine Translation for Asian Language Treebank
2026cites this paper
Exploring the Role of Large Language Models in Translation Education: A Systematic Review
2026cites this paper
Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding
2026cites this paper
Hindsight Quality Prediction Experiments in Multi-Candidate Human-Post-Edited Machine Translation
2026cites this paper
A Deep Learning Framework for Predicting Business Process Violations
2026cites this paper
Attention–based HAPS–to–ground nodes optimization for differential privacy towards secure semantic communications
2026cites this paper
TiledAttention: a CUDA Tile SDPA Kernel for PyTorch
2026cites this paper
Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning
2026influential citation
An Improved Yolo Algorithm Based on Concise Decoupled Head for Real-Time Object Detection in Night Scenarios
2026cites this paper
Effective document summarization: a hybrid clustering approach using transformer model
2026cites this paper
An uncertainty and conviction-aware attention model for automatically estimating reviewer confidence from peer review texts
2026cites this paper
Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
2026cites this paper
Integrating deep learning into future hydrological modeling under climate change scenarios in an arid region
2026cites this paper
A maximum discrepancy adaptive graph neural network for semi-supervised fault diagnosis of electromechanical equipment
2026cites this paper
A Lightweight Hybrid Encoder-Decoder Framework for Multiple Degree of Freedom Muscle Force Estimation
2026cites this paper
Nonparametric Teaching of Attention Learners
2026cites this paper
Innovative Drug Recommendation Systems: Decision Making Through Patient Reviews and Attention Mechanism
2026cites this paper
A review of attention-enhanced deep-learning models for state-of-charge estimation in lithium-ion batteries: Current progress and future directions
2026cites this paper
Enhanced Adaptive LSTM Framework for Robust Early-Life Prediction of Lithium-Ion Battery Degradation
2026cites this paper
ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
2026cites this paper
A temporal convolutional network model with attention mechanisms and quantile regression for state of health estimation of lithium batteries
2026cites this paper
Forecasting Saudi Weekly Equity Returns Using Bilingual News Sentiment and Machine Learning
2026cites this paper
Perbandingan Akurasi dan Keberterimaan Terjemahan al-Rahīq al-Makhtūm antara Terjemahan Ahli dan Terjemahan Microsoft
2026cites this paper
A low-complexity and lightweight detection network for surface defects on liquid crystal display
2026cites this paper
An optimized deep learning model with error correction for forecasting particulate matter 2.5 concentrations near tailings ponds
2026cites this paper
Engine remaining useful life prediction method based on deep residual network and attention mechanism
2026cites this paper
Fault Diagnosis of a Ship’s Permanent Magnet Propulsion Motor Based on CBAM and Multi-Input CNN
2026cites this paper
Corrosion Quantification and Prediction of Steel Structures Using Electromechanical Impedance based Sensors and Deep Neural Networks
2026cites this paper
Signbuddy: from sign language research to scalable co-created solutions
2026cites this paper
2Mamba2Furious: Linear in Complexity, Competitive in Accuracy
2026influential citation
Application of Explainable AI in Neuroscience: Enhancing Autism Screening
2026cites this paper
GFASNet: Gait feature attention-driven deep sequential network for dementia-related gait pattern analysis
2026cites this paper
Investigating a Multi-Modal Attention-Based Deep-Learning Framework for Long-Term IMF Bz\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt}
2026cites this paper
When to Think Fast and Slow? AMOR: Entropy-Based Metacognitive Gate for Dynamic SSM-Attention Switching
2026cites this paper
Using Deep Learning to Generate Semantically Correct Hindi Captions
2026influential citation
Probing Human Articulatory Constraints in End-to-End TTS with Reverse and Mismatched Speech-Text Directions
2026cites this paper
AllMem: A Memory-centric Recipe for Efficient Long-context Modeling
2026cites this paper
The Sufficiency-Conciseness Trade-off in LLM Self-Explanation from an Information Bottleneck Perspective
2026cites this paper
Deep Learning to Rank in Industrial Search Engines, Recommender Systems and Online Advertising: An Overview and New Perspectives
2026cites this paper
Automated Classification of Kidney Tumours Using Deep Convolutional Neural Networks
2026cites this paper
RadonFAN: Intelligent Real-Time Radon Mitigation Through IoT, Rule-Based Logic, and AI Forecasting
2026cites this paper
SEGS-MobileNetV2: An Enhanced Lightweight CNN for Efficient Feature Representation in Industrial Defect Classification
2025cites this paper
A Survey of Large Language Models (LLMs) for Cybersecurity: Opportunities and Directions
2025cites this paper
A Novel Attention Fusion Approach for Joint Object Detection and Classification in Facial Expression Recognition
2025cites this paper
Unsupervised Sentence Representation Learning via Rank and Self Distillation with LLM-Augmented Negative Sampling
2025cites this paper
Accelerating Industrial Geological Stratification from Well Logs with Deep Learning and Statistical Constraints
2025cites this paper
TP-Bundle: Interactive Hierarchical Edge Bundling for Large Graphs with Transformer-Based Prefetching
2025cites this paper
A Discourse-Aware of Entailment Reasoning for Spoken Language Understanding Model
2025cites this paper
Chinese NER for UAV Fault Texts via Local–Global Joint Modeling and Diffusion-Based Semantic Denoising
2025cites this paper
3D-FireRecon: Single-Image 3D Firearm Reconstruction Using Video-Derived Voxel Supervision
2025cites this paper
Transformer for Heterogeneous Graphs
2025cites this paper
Explainable Artificial Intelligence for Ransomware Detection
2025cites this paper
Detecting Emotions from Text Using Natural Language Processing
2025cites this paper
Dual Attention Network for Multimodal Physiological Signal-Based Fatigue Detection
2025cites this paper
VISOR: An AI-Powered Guiding Shield for Vision
2025cites this paper
TSAD: Architecture Design for Time Series Forecasting
2025cites this paper
Detecting Emotions from Hindi Text Using Natural Language Processing
2025cites this paper
Innovative Approaches to Sentence Rephrasing in Natural Language Processing
2025cites this paper
Conversations From Make-Believe: An Attentive Encoder–Decoder Chatbot Trained on Scripted Dialogue
2025cites this paper
Automated Multilingual Content Delivery for the Visually Impaired via AI-Driven Document Parsing
2025cites this paper
LiTANet: Lightweight Semantic Segmentation Network with Triple Attention and Depthwise Separable Convolution for Remote Sensing Imagery
2025cites this paper
Comparative Analysis of Loss Functions in Deep Learning Models for Water Level Forecasting
2025cites this paper
MacBERT-based Multi-granularity Fusion Text Semantic Matching
2025cites this paper
LVM-OCR: A Transformer-Based Architecture for Context-Aware Document Understanding
2025cites this paper
Efficient Incomplete Utterance Rewriting with Modern Convolutional Neural Networks
2025cites this paper
Emotion and Cognitive Stress Radar (ECSR): An Integrated Multimodal Architecture with Adaptive Fusion for Robust Affective Computing
2025cites this paper
Cybersecurity Data Extraction from Common Crawl
2025cites this paper
STREAM: Hierarchical Dynamic Traffic Pattern Inference for Sparse Trajectory Recovery
2025cites this paper
A Comparative Analytical Framework of Decoding Strategies for Offline Handwriting Recognition in Cursive Scripts
2025cites this paper
DelayNetODE: Delay-Aware System Modelling Using Graph Attention and Continuous-Time Neural Dynamics
2025cites this paper
Application of Remote Sensing and AI in Precision Agriculture: Monitoring Plant Health and Growth
2025cites this paper
A Multi-Modal AI Framework for Real-Time American Sign Language Translation
2025cites this paper
Trustworthy Equipment Monitoring via Cascaded Anomaly Detection and Saliency-Guided Inspection
2025cites this paper
AI in specialised translation
2025cites this paper
Deep Learning-Based Sign Language Communication System with Multi-Language Support
2025cites this paper
Fed-CoID:An Event-Triggered Federated Learning Intrusion Detection Algorithm for Smart Grids
2025cites this paper
Evaluation of GMM Clustering Augmentation and Attention Mechanism Integration in LSTM for Electricity Consumption Forecasting
2025cites this paper
Vision-to-Voice: Enhanced CNN-LSTM-Based Image Captioning with Assistive Text-to-Speech
2025cites this paper
Transformer-Based Bidirectional Attention Network for Segmentation-Free Word-Level Text Recognition with Overlapping Characters
2025cites this paper
Low-Resource Language Models: Leveraging Transfer and Zero-Shot Learning for Underrepresented Languages
2025cites this paper
TrustGuardAI: A Human-Centered Explainable Real-Time Anomaly Detection Framework for Time-Series Sensor Data
2025cites this paper
An Advanced AI-Driven Complaint Management System for RailMadad
2025cites this paper
Disentangling LLM Predictions: A Framework for Transparent Decision-Making in NLP
2025cites this paper
Multi-head Attention-Based Audio Classification for Communications System
2025cites this paper
A Region-Specific Nutritional Model Using LSTM Encoder and Attention-Enhanced Decoder
2025cites this paper
A Review on Vision Transformer and Explainable AI Approaches for ECG-based Heart Disease Detection
2025cites this paper
GRU-OptiCom: Revolutionizing Computation Offloading in Edge Computing Through Meta-Reinforcement Learning with GRU
2025cites this paper
Natural Language to SQL Queries Using LLM Models
2025cites this paper
Lightweight Transformer for Image Interpolation Via Unrolling of Multiple Learned Graph Laplacian Regularizers
2025cites this paper
Attention-based CNN-BiLSTM for sleep state classification of spatiotemporal wide-field calcium imaging data
2024cites this paper
Evaluating Natural Language Generation via Unbalanced Optimal Transport
2020cites this paper
Expert Systems With Applications
year unknowncites this paper
Retrieval-Augmented Generation and Knowledge-Grounded Reasoning for Faithful Patient Discharge Instructions
year unknowncites this paper
UvA-DARE (
year unknowncites this paper
Generating Research Highlights from Scientific Literature: Findings from the FIRE 2025 SciHigh Track
year unknowncites this paper
Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
year unknowncites this paper