Convolutional Sequence to Sequence Learning

Jonas Gehring,Michael Auli,David Grangier,Denis Yarats,Yann Dauphin

Published 2017 in International Conference on Machine Learning

ABSTRACT

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Machine Learning
Publication date
2017-05-08
Fields of study
Computer Science
Identifiers
arXiv 1705.03122
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Convolutional Sequence to Sequence Learning
2017cited by this paper
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2017cited by this paper
Quasi-Recurrent Neural Networks
2016cited by this paper
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
2016cited by this paper
Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization
2016cited by this paper
Vocabulary Manipulation for Neural Machine Translation
2016cited by this paper
Key-Value Memory Networks for Directly Reading Documents
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016influential reference
A Convolutional Encoder Model for Neural Machine Translation
2016cited by this paper
Pixel Recurrent Neural Networks
2016cited by this paper
Layer Normalization
2016cited by this paper
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
2016influential reference
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
2016cited by this paper
Neural Machine Translation in Linear Time
2016influential reference
Findings of the 2016 Conference on Machine Translation
2016cited by this paper
Neural Machine Translation with Recurrent Attention Modeling
2016cited by this paper
Edinburgh Neural Machine Translation Systems for WMT 16
2016influential reference
Conditional Image Generation with PixelCNN Decoders
2016cited by this paper
Language Modeling with Gated Convolutional Networks
2016influential reference
Neural Headline Generation with Sentence-wise Optimization
2016influential reference
Vocabulary Selection Strategies for Neural Machine Translation
2016cited by this paper
HyperNetworks
2016cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015influential reference
Encoding Source Language with Convolutional Neural Network for Machine Translation
2015cited by this paper
Deep Residual Learning for Image Recognition
2015influential reference
Effective Approaches to Attention-based Neural Machine Translation
2015influential reference
Montreal Neural Machine Translation Systems for WMT’15
2015cited by this paper
Attention-Based Models for Speech Recognition
2015cited by this paper
A Neural Attention Model for Abstractive Sentence Summarization
2015influential reference
End-To-End Memory Networks
2015cited by this paper
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
2015cited by this paper
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
2015influential reference
Sequence to Sequence Learning with Neural Networks
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014cited by this paper
On the importance of initialization and momentum in deep learning
2013cited by this paper
A Simple, Fast, and Effective Reparameterization of IBM Model 2
2013cited by this paper
On the difficulty of training recurrent neural networks
2012cited by this paper
Japanese and Korean voice search
2012cited by this paper
Torch7: A Matlab-like Environment for Machine Learning
2011cited by this paper
Understanding the difficulty of training deep feedforward neural networks
2010cited by this paper
DUC in context
2007cited by this paper
ROUGE: A Package for Automatic Evaluation of Summaries
2004cited by this paper
The handbook of brain theory and neural networks /
2001cited by this paper
Convolutional networks for images, speech, and time series
1998cited by this paper
Long Short-Term Memory
1997cited by this paper
Finding Structure in Time
1990cited by this paper
Phoneme recognition using time-delay neural networks
1989cited by this paper

CITED BY

PECC: Position Encoding Coordinate Classification System Design for Human Pose Estimation
2026cites this paper
Positional-aware Spatio-Temporal Network for Large-Scale Traffic Prediction
2026cites this paper
SENDE: extractive summarization of legal documents by sentence noising-reconstruction and dilated-gated convolutional networks
2026cites this paper
Distracted Driving Behavior Recognition System Based on Deep Learning Approach and Multi-View Imaging
2026cites this paper
BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series Forecasting
2026cites this paper
Long-term Prediction of Saltwater Intrusion Based on Sequence Learning Framework.
2026cites this paper
Hierarchical Shift Mixing - Beyond Dense Attention in Transformers
2026cites this paper
Reward-free Alignment for Conflicting Objectives
2026cites this paper
MUFFIN: A Meta-Knowledge Decoupling-Based Approach to Few-Shot IoT Traffic Classification
2026cites this paper
Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics
2026cites this paper
A comprehensive review of convolutional neural networks: foundations, enhancements and applications
2026cites this paper
Temporal and modal contributions to smartphone-based multimodal driving behavior classification: a comparative study of classical, deep learning, and patch-based time series transformer models
2026cites this paper
PPG-based continuous arterial blood pressure estimation via multi-scale cross attention fusion
2026cites this paper
A dynamic hybrid attention-based autoencoder model with adaptive contextual attention for grammatical error correction
2026cites this paper
LCD-YOLO: an automatic steel surface defect detection model based on YOLOv11
2026cites this paper
Multi-scale EEG feature decoding with Swin Transformers for subject independent motor imagery BCIs
2026cites this paper
Transformer encoder and data augmentation for real-time speech emotion recognition
2026cites this paper
Dual-path modeling of global swell and local wave features for short-term significant wave height spatio-temporal forecasting using hybrid convolutional networks
2026cites this paper
Research challenges and future directions in transformer-based neural machine translation
2026cites this paper
A multi-sensor fusion network with multi-cognitive visual adaptation and adaptive dynamic convolution
2026cites this paper
Toward General Industrial Intelligence: A Survey of Large Models as a Service in Industrial IoT
2026cites this paper
Attention-Based Anomaly Detection in Dynamic Network
2026cites this paper
Multi-branch heterogeneous spatial-temporal graph convolutional network for traffic flow forecasting
2026cites this paper
CẢI TIẾN MÔ HÌNH DỊCH MÁY MẠNG NƠ-RON ANH-VIỆT SỬ DỤNG ĐỒ THỊ TRI THỨC
2025cites this paper
Unsupervised Fault Detection Method via Time-Series Segmentation and Contrastive Masking Learning
2025cites this paper
Collaborative local-global context modeling for session-based recommendation
2025cites this paper
Modeling enteric methane emission from dairy cows using deep learning approach.
2025cites this paper
Enhancing Domain-Specific English-Chinese Neural Machine Translation with Data Augmentation and Term Adaptation Techniques
2025cites this paper
Clarifying orthography: Orthographic transparency as compressibility
2025cites this paper
Chinese Morph Resolution in E-commerce Live Streaming Scenarios
2025cites this paper
A deep learning based framework for enhanced reference evapotranspiration estimation: evaluating accuracy and forecasting strategies
2025cites this paper
A survey of IPv6 address scanning techniques
2025cites this paper
Advancements in Machine Translation and Cross-Language Computational Applications: Techniques, Challenges, and Future Directions
2025cites this paper
Predicting multi-port vessel traffic flow: An improved spatial-temporal graph neural network with uncertainty quantification
2025cites this paper
Toward Specialized Learning-based Approaches for Visual Odometry: A Comprehensive Survey
2025cites this paper
Beamformed Fingerprint-Based Transformer Network for Trajectory Estimation and Path Determination in Outdoor mmWave MIMO Systems
2025cites this paper
Rethinking Time Encoding via Learnable Transformation Functions
2025cites this paper
ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models
2025cites this paper
State Fourier Diffusion Language Model (SFDLM): A Scalable, Novel Iterative Approach to Language Modeling
2025cites this paper
Improving Neural Machine Translation Through Code‐Mixed Data Augmentation
2025cites this paper
Prediction-Based Tip-Over Prevention for Planetary Exploration Rovers
2025cites this paper
Preface: Advancing deep learning for remote sensing time series data analysis
2025cites this paper
Data-Driven Decision-Making for SCUC: An Improved Deep Learning Approach Based on Sample Coding and Seq2Seq Technique
2025cites this paper
KIKE: Linguistic Steganalysis Based on Knowledge Infusion and Knowledge Encoding
2025cites this paper
Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
2025cites this paper
State of Health Estimation for Lithium-Ion Batteries Using Separable LogSparse Self-Attention Transformer
2025cites this paper
Multi-Token Attention
2025cites this paper
Flexible Transformer: A Simple Novel Transformer-based Network for Image Classification in Variant Input Image Sizes
2025cites this paper
Exploring Various Sequential Learning Methods for Deformation History Modeling
2025cites this paper
Virtual-Real Spatial-Temporal Dual Layer Transformer for virtual sensor state perception
2025cites this paper
SE-Enhancer: Low-Resource Machine Translation Based on Enhanced SimCSE and Layer Fusion
2025influential citation
Coformer for session-based recommendation with dual positional information
2025cites this paper
Fostering non-intrusive load monitoring for smart energy management in industrial applications: an active machine learning approach
2025cites this paper
Formation permeability estimation using mud loss data by deep learning
2025cites this paper
ComPO: Preference Alignment via Comparison Oracles
2025cites this paper
A generative framework for detection and classification of plant leaf disease using diffusion network
2025cites this paper
Speech-Based Phonetic Transcript Metrics
2025cites this paper
A 2D Semantic-Aware Position Encoding for Vision Transformers
2025cites this paper
Towards Cultural Bridge by Bahnaric-Vietnamese Translation Using Transfer Learning of Sequence-To-Sequence Pre-training Language Model
2025cites this paper
LiTformer: Efficient Signal Integrity Analysis for High-Speed Link Transmitters Using Non-Autoregressive Transformer
2025cites this paper
Systematic Generalization in Language Models Scales with Information Entropy
2025cites this paper
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
2025cites this paper
Modality Imbalance? Dynamic Multi-Modal Knowledge Distillation in Automatic Alzheimer's Disease Recognition
2025cites this paper
Estimating track geometry irregularities from in-service train accelerations using deep learning
2025cites this paper
The Role of Sparsity for Length Generalization in Transformers
2025cites this paper
Representation Learning for Place Recognition Using MIMO Radar
2025cites this paper
RTF: Recursive TransFusion for Multi-Modal Image Synthesis
2025cites this paper
Context-aware Biases for Length Extrapolation
2025cites this paper
Review of Deep Learning and Bioinformatics in Breast Cancer
2025cites this paper
SocialTrans：Transformer Based Social Intentions Interaction for Pedestrian Trajectory Prediction
2025cites this paper
An Introspective Study on Attention-Based Transfer Learning in CNNs for Alzheimer's Disease Detection
2025cites this paper
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
2025cites this paper
Enhancing Speaker Recognition with CRET Model: a fusion of CONV2D, RESNET and ECAPA-TDNN
2025cites this paper
FuXi-α: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
2025cites this paper
UTR-Insight: integrating deep learning for efficient 5′ UTR discovery and design
2025cites this paper
DAGCAN: Decoupled Adaptive Graph Convolution Attention Network for Traffic Forecasting
2025cites this paper
Discovering Physics Laws of Dynamical Systems via Invariant Function Learning
2025cites this paper
Omni-scale spatio-temporal attention network for impact localization of sandwich composite panels
2025cites this paper
Prediction of Clinical Complication Onset using Neural Point Processes
2025cites this paper
Natural resources dependence and climate vulnerability: Do women's political empowerment and political ideology make the difference?
2025cites this paper
Character-Level Encoding based Neural Machine Translation for Hindi language
2025cites this paper
A Survey of Graph Transformers: Architectures, Theories and Applications
2025cites this paper
A HEART for the environment: Transformer-Based Spatiotemporal Modeling for Air Quality Prediction
2025cites this paper
Recognizing the Traffic State of Urban Road Networks: A Resilience-Based Data-Driven Approach
2025cites this paper
Multi-Viewpoint and Multi-Evaluation With Felicitous Inductive Bias Boost Machine Abstract Reasoning Ability
2025cites this paper
Deep Causal Behavioral Policy Learning: Applications to Healthcare
2025cites this paper
HT-AggNet: Hierarchical temporal aggregation network with near-zero-cost layer stacking for human activity recognition
2025cites this paper
Valve Token Masked Autoencoder for Missing Recordings on Cardiac Abnormality Classification
2025cites this paper
Heterogeneous Packet Translation for Cross-Technology Communication
2025cites this paper
Dual Decoder for Fast Inference in Natural Language Generation
2025cites this paper
Time Series Analysis Neural Networks for Detecting False Data Injection Attacks of Different Rates on Power Grid State Estimation
2025cites this paper
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
2025influential citation
Depth-Aware Range Image-Based Model for Point Cloud Segmentation
2025cites this paper
Ankle Sensor-Based Detection of Freezing of Gait in Parkinson’s Disease in Semi-Free Living Environments
2025cites this paper
Intelligent Bilingual Reading Translation System Based on Natural Language Processing
2025cites this paper
A Clause-Based Data Augmentation Method for Low-Resource Neural Machine Translation
2025cites this paper
Image Captioning Using Deep Learning: Bridging the Gap between Vision and Natural Language Processing
2025cites this paper
Coarse-to-Fine Learning for Multi-Pipette Localisation in Robot-Assisted In Vivo Patch-Clamp
2025cites this paper
Attention-based BiLSTM with positional embeddings for fake review detection
2025cites this paper
Pre-Training a Graph Recurrent Network for Text Understanding
2025cites this paper