Massive Exploration of Neural Machine Translation Architectures

D. Britz,Anna Goldie,Minh-Thang Luong,Quoc V. Le

Published 2017 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, it has become unclear which elements of NMT architectures have a significant impact on translation quality. In this work, we present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on a WMT English to German translation task. Our experiments provide practical insights into the relative importance of factors such as embedding size, network depth, RNN cell type, residual connections, attention mechanism, and decoding heuristics. As part of this contribution, we also release an open-source NMT framework in TensorFlow to make it easy for others to reproduce our results and perform their own experiments.

PUBLICATION RECORD

Publication year
2017
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2017-03-11
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/D17-1151 arXiv 1703.03906
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

OpenNMT: Open-Source Toolkit for Neural Machine Translation
2017cited by this paper
Neural Machine Translation with Reconstruction
2016cited by this paper
A Character-level Decoder without Explicit Segmentation for Neural Machine Translation
2016cited by this paper
Densely Connected Convolutional Networks
2016cited by this paper
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
2016influential reference
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
2016cited by this paper
Modeling Coverage for Neural Machine Translation
2016cited by this paper
Edinburgh Neural Machine Translation Systems for WMT 16
2016influential reference
Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016influential reference
Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
2016cited by this paper
A Convolutional Encoder Model for Neural Machine Translation
2016cited by this paper
TensorFlow: A system for large-scale machine learning
2016cited by this paper
A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015influential reference
Effective Approaches to Attention-based Neural Machine Translation
2015influential reference
LSTM: A Search Space Odyssey
2015cited by this paper
A Diversity-Promoting Objective Function for Neural Conversation Models
2015cited by this paper
Neural Responding Machine for Short-Text Conversation
2015cited by this paper
A Neural Conversational Model
2015cited by this paper
Highway Networks
2015cited by this paper
Deep Residual Learning for Image Recognition
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014influential reference
Show and tell: A neural image caption generator
2014cited by this paper
On Using Very Large Target Vocabulary for Neural Machine Translation
2014cited by this paper
Going deeper with convolutions
2014cited by this paper
Addressing the Rare Word Problem in Neural Machine Translation
2014influential reference
Deep visual-semantic alignments for generating image descriptions
2014cited by this paper
Recurrent Continuous Translation Models
2013influential reference
Long Short-Term Memory
1997influential reference

CITED BY

Monthly Service Prediction for 4G/5G Systems: A Short Time Series-Based Neural Network Solution
2026cites this paper
Urban Flood Prediction Model Based on Explainable Deep Learning and Attention Mechanism
2026cites this paper
Loepso, local optimum escape particle swarm optimization, an algorithm for traffic forecasting in software-defined networking using deep-learning models
2025cites this paper
Communicating Smartly in Molecular Communication Environments: Neural Networks in the Internet of Bio-Nano Things
2025cites this paper
KAN‐LSTM: A New LSTM Structure for the Prediction of the Stock Market
2025cites this paper
Human-AI Collaboration in Data-Driven Software Engineering: A Seq2Seq Model with Attention for Automated Code Debugging
2025cites this paper
STAR: A Privacy-Preserving, Energy-Efficient Edge AI Framework for Human Activity Recognition via Wi-Fi CSI in Mobile and Pervasive Computing Environments
2025cites this paper
Design Principles for Sequence Models via Coefficient Dynamics
2025cites this paper
CMIS-Net: A Cascaded Multi-Scale Individual Standardization Network for Backchannel Agreement Estimation
2025cites this paper
Autonomous Navigation in Crowded Space Using Multi-Sensory Data Fusion
2025cites this paper
A novel hybrid architecture for video frame prediction: combining convolutional LSTM and 3D CNN
2025cites this paper
An Efficient Approach for Encrypted Traffic Classification and Intrusion Detection using Packet Transformer Encoder and CNN
2025cites this paper
Generalization and the Rise of System-level Creativity in Science
2025cites this paper
ChatAssert: LLM-Based Test Oracle Generation With External Tools Assistance
2025cites this paper
Intelligent Bilingual Reading Translation System Based on Natural Language Processing
2025cites this paper
Sine and cosine based learning rate for gradient descent method
2025cites this paper
Machine Translation vs. Human Translation: A Linguistic Analysis
2025cites this paper
MPCNet: multi-scale decomposition network with perception weaving and context-aware fusion for robust 2D medical image classification
2025cites this paper
Prediction Analysis of Heat Penetration in Ohmic Heating using Multivariate Long Short-Term Memory Networks
2025cites this paper
Content-aware sentiment understanding: cross-modal analysis with encoder-decoder architectures
2025cites this paper
Anomaly Detection for IoT Global Connectivity
2025cites this paper
Structural condition evaluation using unsupervised Bayesian optimized BiLSTM networks
2025cites this paper
AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model
2025cites this paper
Quantum Deep Learning Still Needs a Quantum Leap
2025cites this paper
Automatic question generation for bahasa indonesia examination using copynet
2024cites this paper
Diffusion-Based Generative Self-Supervised Model for Few-Shot PolSAR Image Classification
2024cites this paper
Unified and Real-Time Image Geo-Localization via Fine-Grained Overlap Estimation
2024cites this paper
Application of Deep Learning Algorithm to Detect Fraud in Online Transaction Networks
2024cites this paper
A novel deep learning model based on transformer and cross modality attention for classification of sleep stages
2024cites this paper
Predicting Heart Failure with Attention Learning Techniques Utilizing Cardiovascular Data
2024cites this paper
Unleashing the Power of Open-Source Transformers in Medical Imaging: Insights from a Brain
2024cites this paper
Transforming Human-Machine Interaction: Generative AI Virtual Asst
2024cites this paper
Research on Vehicle Trajectory Prediction Methods in Urban Main Road Scenarios
2024cites this paper
Top-philic machine learning
2024cites this paper
GASA-UNet: Global Axial Self-Attention U-Net for 3D Medical Image Segmentation
2024cites this paper
A systematic literature review on the impact of AI models on the security of code generation
2024cites this paper
Structural acceleration response reconstruction based on BiLSTM network and multi-head attention mechanism
2024cites this paper
HintMiner: Automatic Question Hints Mining From Q&A Web Posts with Language Model via Self-Supervised Learning
2024cites this paper
High-Dimension Human Value Representation in Large Language Models
2024cites this paper
Multimodal Human Action Recognition for Rehabilitation Exercise of Upper Body for Individuals With Cerebral Palsy
2024cites this paper
An attention-augmented bidirectional LSTM-based encoder–decoder architecture for electrocardiogram heartbeat classification
2024cites this paper
Software Development Efficiency Metrics Prediction Using Fed-Layered Self Attention Grid
2024cites this paper
ARDN: Attention Re-distribution Network for Visual Question Answering
2024cites this paper
Comparison of Conversational Corpus and News Corpus on Gender Bias in Indonesian-English Transformer Model Translation
2024cites this paper
Quantifying the Hyperparameter Sensitivity of Neural Networks for Character-level Sequence-to-Sequence Tasks
2024cites this paper
Benchmark and Neural Architecture for Conversational Entity Retrieval from a Knowledge Graph
2024cites this paper
An Analysis of BPE Vocabulary Trimming in Neural Machine Translation
2024cites this paper
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
2024cites this paper
A comprehensive review on transformer network for natural and medical image analysis
2024cites this paper
Reasoning in Transformers - Mitigating Spurious Correlations and Reasoning Shortcuts
2024cites this paper
Combining machine translation and automated scoring in international large-scale assessments
2024cites this paper
Radar Signal Sorting With Multiple Self-Attention Coupling Mechanism Based Transformer Network
2024cites this paper
Time Series Predictions Based on PCA and LSTM Networks: A Framework for Predicting Brownian Rotary Diffusion of Cellulose Nanofibrils
2024cites this paper
Comprehensive Survey of Abstractive Text Summarization Techniques
2024cites this paper
Artificial Intelligence-Enabled 5G Network Performance Evaluation With Fine Granularity and High Accuracy
2024cites this paper
HyFish: hydrological factor fusion for prediction of fishing effort distribution with VMS dataset
2024cites this paper
Transformer-based deep learning for predicting protein properties in the life sciences
2023cites this paper
Aggregating Bilateral Attention for Few-Shot Instance Localization
2023cites this paper
A Triple-Stream Network With Cross-Stage Feature Fusion for High-Resolution Image Change Detection
2023cites this paper
Vulnerability Mimicking Mutants
2023cites this paper
Trajectory Prediction for Heterogeneous Road-Agents Using Dual Attention Model
2023cites this paper
Assertion Inferring Mutants
2023cites this paper
Integrating Human Vision Perception in Vision Transformers for Classifying Waste Items
2023cites this paper
Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation
2023cites this paper
Design of a Modified Transformer Architecture Based on Relative Position Coding
2023cites this paper
ViPE: Visualise Pretty-much Everything
2023cites this paper
Enabling Efficient Assertion Inference
2023cites this paper
An Ensemble Approach to Question Classification: Integrating Electra Transformer, GloVe, and LSTM
2023cites this paper
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
2023cites this paper
adaptNMT: an open-source, language-agnostic development environment for neural machine translation
2023influential citation
How Are We Detecting Inconsistent Method Names? An Empirical Study from Code Review Perspective
2023cites this paper
A short text clustering study integrating context-awareness and contrast learning
2023cites this paper
LanYUAN, a GPT large model using Curriculum Learning and Sparse Attention
2023cites this paper
Abstractive Text Summarization using Pre-Trained Language Model "Text-to-Text Transfer Transformer (T5)"
2023cites this paper
Development of a Predictive Model of Honey Bee Foraging Activity Under Different Climate Conditions
2023cites this paper
Corpus Generation to Develop Amharic Morphological Segmenter
2023cites this paper
A circularity accounting network: CO2 measurement along supply chains using machine learning
2023cites this paper
Dual variational generative model and auxiliary retrieval for empathetic response generation by conversational robot
2023cites this paper
Improved Blending Attention Mechanism in Visual Question Answering
2023cites this paper
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
2023cites this paper
Improving Formality-Sensitive Machine Translation Using Data-Centric Approaches and Prompt Engineering
2023cites this paper
Finding the Pillars of Strength for Multi-Head Attention
2023cites this paper
Accurate Knowledge Distillation via n-best Reranking
2023cites this paper
Deep-Learning-Based Morphological Feature Segmentation for Facial Skin Image Analysis
2023cites this paper
On Comparing Mutation Testing Tools through Learning-based Mutant Selection
2023cites this paper
Enhancing Wind Power Forecast Precision via Multi-head Attention Transformer: An Investigation on Single-step and Multi-step Forecasting
2023cites this paper
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
2023cites this paper
Temporal Information Fusion Network for Driving Behavior Prediction
2023cites this paper
Learning Transductions and Alignments with RNN Seq2seq Models
2023cites this paper
A Systematic Review of Transformer-Based Pre-Trained Language Models through Self-Supervised Learning
2023cites this paper
Leveraging Synthetic Targets for Machine Translation
2023cites this paper
On Nonlinear Learned String Indexing
2023cites this paper
HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers
2023cites this paper
Post-Hoc Interpretation of Transformer Hyperparameters with Explainable Boosting Machines
2022cites this paper
Annotated History of Modern AI and Deep Learning
2022cites this paper
An End-to-End Contrastive Self-Supervised Learning Framework for Language Understanding
2022cites this paper
MVSTT: A Multiview Spatial-Temporal Transformer Network for Traffic-Flow Forecasting
2022cites this paper
A generic shared attention mechanism for various backbone neural networks
2022cites this paper
EA-VTP: Environment-Aware Long-Term Vessel Trajectory Prediction
2022cites this paper
Deep Sequence Models for Packet Stream Analysis and Early Decisions
2022cites this paper