Quasi-Recurrent Neural Networks

James Bradbury,Stephen Merity,Caiming Xiong,R. Socher

Published 2016 in International Conference on Learning Representations

ABSTRACT

Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Despite lacking trainable recurrent layers, stacked QRNNs have better predictive accuracy than stacked LSTMs of the same hidden size. Due to their increased parallelism, they are up to 16 times faster at train and test time. Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic building block for a variety of sequence tasks.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Learning Representations
Publication date
2016-11-05
Fields of study
Computer Science
Identifiers
arXiv 1611.01576
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Under review as a conference paper at ICLR 2020 many domain adaptation methods
2019cited by this paper
A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs
2016cited by this paper
Fully Character-Level Neural Machine Translation without Explicit Segmentation
2016cited by this paper
Query-Reduction Networks for Question Answering
2016cited by this paper
Strongly-Typed Recurrent Neural Networks
2016influential reference
Sequence-to-Sequence Learning as Beam-Search Optimization
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
Pixel Recurrent Neural Networks
2016cited by this paper
Virtual Adversarial Training for Semi-Supervised Text Classification
2016cited by this paper
Densely Connected Convolutional Networks
2016cited by this paper
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
2016cited by this paper
Neural Machine Translation in Linear Time
2016cited by this paper
Dynamic Memory Networks for Visual and Textual Question Answering
2016cited by this paper
MetaMind Neural Machine Translation System for WMT 2016
2016cited by this paper
Conditional Image Generation with PixelCNN Decoders
2016cited by this paper
Pointer Sentinel Mixture Models
2016cited by this paper
Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers
2016cited by this paper
A C-LSTM Neural Network for Text Classification
2015cited by this paper
Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory
2015cited by this paper
RECURRENT NEURAL NETWORKS
2015cited by this paper
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
2015cited by this paper
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2015cited by this paper
Character-Aware Neural Language Models
2015cited by this paper
Character-level Convolutional Networks for Text Classification
2015cited by this paper
Chainer : a Next-Generation Open Source Framework for Deep Learning
2015cited by this paper
Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Recurrent Neural Network Regularization
2014influential reference
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Multi-Dimensional Sentiment Analysis with Learned Representations
2012cited by this paper
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification
2012cited by this paper
ImageNet classification with deep convolutional neural networks
2012cited by this paper
Recurrent neural network based language model
2010cited by this paper
Long Short-Term Memory
1997cited by this paper

CITED BY

Parallelizable Neural Turing Machines
2026cites this paper
Why Are Linear RNNs More Parallelizable?
2026cites this paper
AI-Enhanced Digital Twin Modeling of Cell-Level Lithium-Ion Batteries via Cross-Task Attention-Based Multitask Learning
2026influential citation
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
2025cites this paper
TCN-BiSRU-V2 fall detection model with performance evaluation and comparative analysis
2025cites this paper
Low-Resource Neural Machine Translation Using Recurrent Neural Networks and Transfer Learning: A Case Study on English-to-Igbo
2025cites this paper
Deep prediction enhancement in TCN-based language modeling using arithmetic meta-heuristic optimization
2025cites this paper
Vision-Based Collision Warning Systems with Deep Learning: A Systematic Review
2025cites this paper
DSTC: Dual-Side Sparse Tensor Core for DNNs Acceleration on Modern GPU Architectures
2025cites this paper
Fast weight programming and linear transformers: from machine learning to neurobiology
2025cites this paper
A Spatial-Aware Temporal Modeling Network for Imitation Learning-Based Drone Navigation
2025cites this paper
mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling
2025cites this paper
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
2025cites this paper
Development of a hip osteoarthritis index for gait quality assessment: a data-driven comparative study
2025cites this paper
RAT: Bridging RNN Efficiency and Attention Accuracy in Language Modeling
2025cites this paper
Test-time regression: a unifying framework for designing sequence models with associative memory
2025cites this paper
TCN-QRNN model for short term energy consumption forecasting with increased accuracy and optimized computational efficiency
2025cites this paper
A Group Activity Based Method for Early Recognition of Surgical Processes Using the Camera Observing Surgeries in an Operating Room and Spatio-Temporal Graph Based Deep Learning Model
2025cites this paper
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
2025cites this paper
Physics-inspired Energy Transition Neural Network for Sequence Learning
2025cites this paper
Inferring Speaking Styles for Conversational Speech Synthesis by Learning Contextual Dependencies
2025cites this paper
A Domain-Specific Turkish QA System for University Internship Regulations Using a Synthetic Context Approach
2025cites this paper
Chronological pufferfish optimization algorithm for task scheduling in cloud computing
2025cites this paper
STAGNet: A Spatio-Temporal Graph and LSTM Framework for Accident Anticipation
2025cites this paper
ME3-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception
2025cites this paper
Lithium-Ion Battery State-of-Charge and State-of-Energy Simultaneous Estimation via Sparse- Quasi Recurrent Neural Networks(S-QRNN)
2025cites this paper
Sequential-Parallel Duality in Prefix Scannable Models
2025cites this paper
Artificial Intelligence-Assisted Design of Nanomedicines for Breast Cancer Diagnosis and Therapy: Advances, Challenges, and Future Directions
2025cites this paper
CIPred: A Convolutional Neural Network and Global Pedestrianscene Interaction Based Model for Trajectory Prediction
2025cites this paper
Vertically Recurrent Neural Networks for Sub‐Grid Parameterization
2025cites this paper
Hyperspectral Image Denoising via Quasi-Recursive Spectral Attention and Cross-Layer Feature Fusion
2025cites this paper
FranSys—A Fast Non-Autoregressive Recurrent Neural Network for Multi-Step Ahead Prediction
2024cites this paper
Explainable Artificial Intelligence Techniques for Irregular Temporal Classification of Multidrug Resistance Acquisition in Intensive Care Unit Patients
2024cites this paper
Improving Armed People Detection on Video Surveillance Through Heuristics and Machine Learning Models
2024cites this paper
Orthogonal Constrained Minimization with Tensor ℓ2,p Regularization for HSI Denoising and Destriping
2024cites this paper
Dilated-RNNs: A Deep Approach for Continuous Volcano-Seismic Events Recognition
2024cites this paper
Birdie: Advancing State Space Models with Reward-Driven Objectives and Curricula
2024cites this paper
Instantaneous 2D extreme wind speed prediction using the novel Wind Gust Prediction Net based on purely convolutional neural mechanism
2024cites this paper
An Efficient Brain-Switch for Asynchronous Brain-Computer Interfaces
2024cites this paper
The Expressive Capacity of State Space Models: A Formal Language Perspective
2024cites this paper
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
2024cites this paper
Enhancing Peer Review with AI-Powered Suggestion Generation Assistance: Investigating the Design Dynamics
2024cites this paper
Spatial-temporal Offshore Current Field Forecasting Using Residual-learning Based Purely CNN Methodology with Attention Mechanism
2024cites this paper
Graph(Graph): A Nested Graph-Based Framework for Early Accident Anticipation
2024cites this paper
Time Series Clustering with General State Space Models via Stochastic Variational Inference
2024cites this paper
Repeat After Me: Transformers are Better than State Space Models at Copying
2024influential citation
A densely connected causal convolutional network separating past and future data for filling missing PM2.5 time series data
2024cites this paper
Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints
2024cites this paper
QRNN-Transformer: Recognizing Textual Entailment
2024cites this paper
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
2024cites this paper
Streaming Detection of Queried Event Start
2024cites this paper
Regional inflation analysis using social network data
2024cites this paper
Performance Analysis of Deepfake Text Detection Techniques on Social-media
2024cites this paper
Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph
2024cites this paper
Language Models for Multi-Lingual Tasks - A Survey
2024cites this paper
A Fuzzy Multigranularity Convolutional Neural Network With Double Attention Mechanisms for Measuring Semantic Textual Similarity
2024cites this paper
Combination of GRU and CNN Deep Learning Models for Sentiment Analysis on French Customer Reviews Using XLNet Model
2023cites this paper
Fine-Tuning MultiFit for Enhanced Legal Sentence Basis Classification
2023cites this paper
Heterogeneous Encoders Scaling in the Transformer for Neural Machine Translation
2023cites this paper
Consonant is all you need: a compact representation of English text for efficient NLP
2023cites this paper
Tsunami tide prediction in shallow water using recurrent neural networks: model implementation in the Indonesia Tsunami Early Warning System
2023cites this paper
Accident Prediction Model Using Divergence Between Visual Attention and Focus of Expansion in Vehicle-Mounted Camera Images
2023cites this paper
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
2023cites this paper
Improving Question Intent Identification by Exploiting Its Synergy With User Age
2023cites this paper
DIEU: A Dynamic Interaction Emotion Unit for Emotion Recognition in Conversation
2023cites this paper
Advancing State of the Art in Language Modeling
2023influential citation
Hybrid Spectral Denoising Transformer with Guided Attention
2023cites this paper
Enhanced Exploration of Neural Network Models for Indoor Human Monitoring
2023cites this paper
Document-Level Chemical-Induced Disease Semantic Relation Extraction Using Bidirectional Long Short-Term Memory on Dependency Graph
2023cites this paper
Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments
2023influential citation
Multi-loop graph convolutional network for multimodal conversational emotion recognition
2023cites this paper
Detection of Privacy-Harming Social Media Posts in Italian
2023cites this paper
Hybrid Spectral Denoising Transformer with Learnable Query
2023cites this paper
A New Approach to Traffic Accident Anticipation With Geometric Features for Better Generalizability
2023cites this paper
Exploring the Promise and Limits of Real-Time Recurrent Learning
2023cites this paper
A Quantitative Review on Language Model Efficiency Research
2023cites this paper
Neural Decoding for Intracortical Brain–Computer Interfaces
2023cites this paper
Neural Abstractive Summarization: A Brief Survey
2023cites this paper
RWKV: Reinventing RNNs for the Transformer Era
2023influential citation
Text Sentiment Classification Based on BERT Embedding and Sliced Multi-Head Self-Attention Bi-GRU
2023cites this paper
A survey and study impact of tweet sentiment analysis via transfer learning in low resource scenarios
2023cites this paper
Koopman Invertible Autoencoder: Leveraging Forward and Backward Dynamics for Temporal Modeling
2023cites this paper
Hyperspectral image denoising via spectral noise distribution bootstrap
2023cites this paper
Gradient Sparsification For Masked Fine-Tuning of Transformers
2023cites this paper
Unsupervised Deep Learning for IoT Time Series
2023cites this paper
Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions
2023influential citation
Data-driven Communicative Behaviour Generation: A Survey
2023cites this paper
Streaming Intended Query Detection using E2E Modeling for Continued Conversation
2022cites this paper
Machine Learning Approaches to Classify Anatomical Regions in Rodent Brain from High Density Recordings
2022influential citation
Anomaly Detection in Time Series with Robust Variational Quasi-Recurrent Autoencoders
2022cites this paper
Using BERT and Knowledge Graph for detecting triples in Vietnamese text
2022cites this paper
Simplified State Space Layers for Sequence Modeling
2022cites this paper
Exploring the sequence length bottleneck in the Transformer for Image Captioning
2022cites this paper
Harmless Transfer Learning for Item Embeddings
2022cites this paper
FedorAS: Federated Architecture Search under system heterogeneity
2022cites this paper
Illuminati: Towards Explaining Graph Neural Networks for Cybersecurity Analysis
2022cites this paper
Intrusion Detection Method Based on Stacked Sparse Autoencoder and Sliced GRU for Connected Healthcare Systems
2022cites this paper
Arabic Speech Analysis for Classification and Prediction of Mental Illness due to Depression Using Deep Learning
2022cites this paper
Simple Recurrence Improves Masked Language Models
2022influential citation
Water Quality Prediction for Smart Aquaculture Using Hybrid Deep Learning Models
2022cites this paper