Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Kyunghyun Cho,B. V. Merrienboer,Çaglar Gülçehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio

Published 2014 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

In this paper, we propose a novel neural network model called RNN Encoder‐ Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder‐Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

PUBLICATION RECORD

Publication year
2014
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2014-06-03
Fields of study
Mathematics, Computer Science
Identifiers
DOI 10.3115/v1/D14-1179 arXiv 1406.1078
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

An Autoencoder Approach to Learning Bilingual Word Representations
2014cited by this paper
Fast and Robust Neural Network Joint Models for Statistical Machine Translation
2014cited by this paper
How to Construct Deep Recurrent Neural Networks
2013cited by this paper
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
2013cited by this paper
Bilingual Word Embeddings for Phrase-Based Machine Translation
2013influential reference
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Decoding with Large-Scale Neural Language Models Improves Translation
2013cited by this paper
Joint Language and Translation Modeling with Recurrent Neural Networks
2013cited by this paper
Recurrent Continuous Translation Models
2013cited by this paper
Maxout Networks
2013cited by this paper
Barnes-Hut-SNE
2013cited by this paper
Learning Semantic Representations for the Phrase Translation Model
2013cited by this paper
Theano: new features and speed improvements
2012cited by this paper
Supervised Sequence Labelling with Recurrent Neural Networks
2012cited by this paper
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
Continuous Space Translation Models for Phrase-Based Statistical Machine Translation
2012influential reference
Advances in optimizing recurrent networks
2012cited by this paper
Continuous Space Translation Models with Neural Networks
2012cited by this paper
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Domain Adaptation via Pseudo In-Domain Data Selection
2011cited by this paper
Deep Sparse Rectifier Neural Networks
2011cited by this paper
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
2011cited by this paper
Intelligent Selection of Language Model Training Data
2010cited by this paper
Continuous space language models
2007influential reference
Continuous space language models for the IWSLT 2006 task
2006cited by this paper
Europarl: A Parallel Corpus for Statistical Machine Translation
2005cited by this paper
A Neural Probabilistic Language Model
2003influential reference
Statistical Phrase-Based Translation
2003cited by this paper
A Phrase-Based,Joint Probability Model for Statistical Machine Translation
2002cited by this paper
Long Short-Term Memory
1997cited by this paper

CITED BY

Context-Aware adaptive normalization LSTM (CAAN-LSTM) for immunotherapy decision support in cancer clinical data analysis.
2026cites this paper
Improving snow water equivalent modelling: a comparative study of hybrid machine learning techniques
2026cites this paper
Using time-frequency transformation approach to mitigate impacts of total nitrogen fluctuation on prediction results obtained by deep learning models
2026cites this paper
Energy scheduling optimization of a renewable-powered microgrid with load and generation forecasting enabled by a novel deep learning method
2026cites this paper
Source term inversion of nuclear accidents based on the Optuna–GRU model
2026cites this paper
Decorrelating the Future: Joint Frequency Domain Learning for Spatio-temporal Forecasting
2026cites this paper
When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift
2026influential citation
SCoUT: Scalable Communication via Utility-Guided Temporal Grouping in Multi-Agent Reinforcement Learning
2026influential citation
Kraus Constrained Sequence Learning For Quantum Trajectories from Continuous Measurement
2026cites this paper
SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation
2026cites this paper
Uncertainty-aware Blood Glucose Prediction from Continuous Glucose Monitoring Data
2026cites this paper
KGLMQA: enhancing medical visual question answering with knowledge graphs and LLMs
2026cites this paper
Unsupervised LSTM-Autoencoder Approach for Early Detection of Sensor Failures in High-Dimensional Monitoring Environments
2026cites this paper
A Rapid Prediction Method for Rotary Kiln Head Temperature Based on Residual TCN-GRU-Self Attention Integration
2026cites this paper
Inhibitory Cross-Talk Enables Functional Lateralization in Attention-Coupled Latent Memory
2026cites this paper
Dual-Interaction-Aware Cooperative Control Strategy for Alleviating Mixed Traffic Congestion
2026cites this paper
Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights
2026cites this paper
Efficient Sparse Selective-Update RNNs for Long-Range Sequence Modeling
2026cites this paper
From Simulation to Reality: Practical Deep Reinforcement Learning-based Link Adaptation for Cellular Networks
2026cites this paper
Dynamic Spatio-Temporal Graph Neural Network for Early Detection of Pornography Addiction in Adolescents Based on Electroencephalogram Signals
2026influential citation
A comparative study of transformer models and recurrent neural networks for path-dependent composite materials
2026cites this paper
PromptStereo: Zero-Shot Stereo Matching via Structure and Motion Prompts
2026cites this paper
Pulse-Driven Neural Architecture: Learnable Oscillatory Dynamics for Robust Continuous-Time Sequence Processing
2026cites this paper
APMVS: Learning Multi-View Stereo Based on Adjacent Stage and Pair-Wise Stage Uncertainty Estimation
2026cites this paper
Phishing Fraud Identity Inference Based on Graph Gated Recurrent Neural Network
2026cites this paper
Machine-learning acceleration of granular and solid-fluid flow simulations: A review
2026cites this paper
Longitudinal modality prediction learns gene regulatory patterns: insights from a single-cell competition
2026cites this paper
Curriculum Reinforcement Learning for Quadrotor Racing with Random Obstacles
2026cites this paper
A comprehensive review of machine learning and deep learning approaches for rainfall forecasting: current progress, challenges, and future directions
2026cites this paper
Integrating LSTM and Transformer for Improved Daily Runoff Prediction: A Parallel Computing Approach
2026influential citation
Open-vocabulary models for object detection and segmentation in visual art: survey and comparative study
2026cites this paper
Hypergraph-based multi-scale spatio-temporal graph convolution network for traffic forecasting
2026cites this paper
Detecting Fraud in Moroccan E-Commerce Platforms Using an Explainable Deep Learning Model for Arabic Fake Reviews Classification
2026cites this paper
Global River Forecasting with a Topology-Informed AI Foundation Model
2026influential citation
Distilling Privileged Knowledge From Transformers to Lightweight CNNs for On-Device Time Series Forecasting
2026cites this paper
A Survey on Deep Learning Models for Anomaly Trajectory Detection
2026cites this paper
Force Policy: Learning Hybrid Force-Position Control Policy under Interaction Frame for Contact-Rich Manipulation
2026influential citation
Graph-based radical structure tree representation for zero-shot Chinese character recognition
2026cites this paper
Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting
2026cites this paper
Enhanced Adaptive LSTM Framework for Robust Early-Life Prediction of Lithium-Ion Battery Degradation
2026cites this paper
Convolutional Long Short-Term Memory Neural Network for Spatiotemporal Forecasting of Surface Currents from HF-Radar
2026cites this paper
Correlated bivariate time series forecasting using long short-term memory network: an AutoML approach
2026cites this paper
Denoising Particle Filters: Learning State Estimation with Single-Step Objectives
2026cites this paper
DECAF: Dynamic Envelope Context-Aware Fusion for Speech-Envelope Reconstruction from EEG
2026cites this paper
A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations
2026cites this paper
Non-Interfering Weight Fields: Treating Model Parameters as a Continuously Extensible Function
2026cites this paper
Practical Challenges in Applying QAT to RNN Models
2025cites this paper
BiGRU: Bi-Directional GRU-Based Approach for Audio Source Separation
2025cites this paper
Comprehensive Evaluation of Transformer Models for Multilingual PII Detection
2025cites this paper
LORSTransformerDRL: A Novel Deep Reinforcement Learning Framework for Intelligent Stock Trading with Chaotic Oscillators and Attention Mechanisms
2025cites this paper
AI-Driven Log Analysis: Advances and Challenges
2025cites this paper
Graph Neural Network (GNN) and its Application: A State-of-the-Art Survey
2025cites this paper
Chinese NER for UAV Fault Texts via Local–Global Joint Modeling and Diffusion-Based Semantic Denoising
2025cites this paper
FedEnergy: Federated Learning for Energy Consumption Forecasting on Smart Meters using Hybrid TCN-Transformer Model
2025cites this paper
Phase-Aware Spectrogram Fusion with Dual-Stream Residual Networks for Underwater Acoustic Recognition
2025cites this paper
Dynamic Graph Convolutional Recurrent Network for Typhoon-Induced Electricity Consumption Loss Prediction
2025cites this paper
Bringing Shape to Spatio-Temporal Graph Contrastive Learning
2025cites this paper
End-to-End Autonomous Driving Based on Dual-Branch Strategies and Attention Mechanisms
2025cites this paper
CGFNet: Frequency-Domain Causal Discovery and Dual-Path Spectral Filtering for Wildfire Prediction
2025cites this paper
EmotionChat: Emotional Chain of Thought based MLLM for Dialogue Generation
2025cites this paper
Forecasting Option-Implied Dynamics with Google TimesFM: From Model to Market
2025cites this paper
Time-Aware Ordinal Modelling of Sequential Text Data: A Two-Stage Architecture Combining Llm Classification and Lightweight Temporal Models
2025cites this paper
Emotion Recognition based on Text Summarization and Word Features
2025cites this paper
CSAT-Former: A Cross-Scale Aligned Transformer for Hierarchical Wind Power Forecasting with Temporal Consistency
2025cites this paper
Dual Attention Network for Multimodal Physiological Signal-Based Fatigue Detection
2025cites this paper
Predictive PID Control for Nonlinear Systems Using a Neural Network Model: A Case Study in Fluid Simulation
2025cites this paper
Deep Learning Approaches for Alzheimer's Disease Diagnosis: a Comprehensive Review
2025cites this paper
Designing a Framework for Deepfake Text Detection Using Multi Model Ensembling Techniques
2025cites this paper
Expert-Agnostic AI for Intelligent Tutoring Systems: Leveraging Self-Supervised Knowledge Mining
2025cites this paper
Memory-Encoded Deep-ONet (M-Deep-ONet) for Transient Circuit Behavioral Modeling
2025cites this paper
A GRU-Based Approach for Fault Diagnosis in Shipboard MVDC Microgrids
2025cites this paper
An Intelligent Proofreading Framework for Structured Official Documents Based on Large Language Models
2025cites this paper
Comparative Analysis of Loss Functions in Deep Learning Models for Water Level Forecasting
2025cites this paper
A Review of Data-Driven Frameworks for State of Health (SOH) Estimation and Remaining Useful Life (RUL) Prediction of Supercapacitors
2025cites this paper
EEG Emotion Recognition Based on Gated Recurrent Unit and Adaptive Loss Regulation Mechanism
2025cites this paper
Compressed Spatio Temporal Graph Neural Networks for Multivariate Time-Series Forecasting
2025cites this paper
Federated LSTM-GAN Approach to Predicting Malaria and Dengue in Indonesia
2025cites this paper
SupConWI-RL: Wafer Inspection with Reinforcement Learning Enhanced by Supervised Contrastive Learning
2025cites this paper
Comparative Analysis of Deep Learning Approaches for Stock Chart Pattern Detection in Technical Analysis
2025cites this paper
IMU-Based Activity Recognition and Payload Estimation for Augmentative Exoskeleton
2025cites this paper
Informer-Based Long-Horizon Power Load Forecasting: An Empirical Study on the SG-HL Dataset
2025cites this paper
Hybrid Linear–Nonlinear Hyperspectral Unmixing of Homogeneous Solutions
2025cites this paper
Hybrid GRU-PINN Model for Pedestrian Trajectory Prediction at Unsignalized Intersections
2025cites this paper
Enhancing Scholarship Allocation Fairness Using GRU and Synthetic Minority Oversampling Technique
2025cites this paper
A Stock Return Prediction Model Based on Dynamic-Weighted Multi-Scale Embedding Mamba
2025cites this paper
Aligning Multimodal Data for Fine-Grained Video Understanding via Cross-Attentive Recurrent Fusion
2025cites this paper
LLM-Driven Trajectory Prediction at Intersections with Chain-of-Thought Reasoning
2025cites this paper
WACU: Multi-Modal Wristband Assistant for Contextual Understanding
2025cites this paper
Migration learning based fast prediction algorithm for traveling wave tube lifetime
2025cites this paper
Computer Methods and Programs in Biomedicine VOC-DL: Deep learning prediction model for COVID-19 based on VOC virus variants
2022cites this paper
Artificial Intelligence Applications and Innovations: 16th IFIP WG 12.5 International Conference, AIAI 2020, Neos Marmaras, Greece, June 5–7, 2020, Proceedings, Part II
2020cites this paper
Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part I
2019influential citation
Dronaquatics: Real-time Swimming Analytics Using Drone Captured Imagery
year unknowncites this paper
ACuRE: Accurate Continuity-Regularized SpO 2 Estimation Using Liquid Time-Constant Networks: Supplementary
year unknowncites this paper
Retrieval-Augmented Generation and Knowledge-Grounded Reasoning for Faithful Patient Discharge Instructions
year unknowncites this paper
MSA 2 : Multi-task Framework with Structure-aware and Style-adaptive Character Representation for Open-set Chinese Text Recognition
year unknowncites this paper
Future Generation Computer Systems
year unknowncites this paper
Tidal: Tackling Concept Drift in Provenance-based Advanced Persistent Threats Detection
year unknowncites this paper