QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Adams Wei Yu,David Dohan,Minh-Thang Luong,Rui Zhao,Kai Chen,Mohammad Norouzi,Quoc V. Le

Published 2018 in International Conference on Learning Representations

ABSTRACT

Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q\&A architecture called QANet, which does not require recurrent networks: Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions. On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models. The speed-up gain allows us to train the model with much more data. We hence combine our model with data generated by backtranslation from a neural machine translation model. On the SQuAD dataset, our single model, trained with augmented data, achieves 84.6 F1 score on the test set, which is significantly better than the best published F1 score of 81.8.

PUBLICATION RECORD

Publication year
2018
Venue
International Conference on Learning Representations
Publication date
2018-02-15
Fields of study
Computer Science
Identifiers
arXiv 1804.09541
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Machine Comprehension using match-LSTM and Answer-Pointer
2017cited by this paper
Learning to Paraphrase for Question Answering
2017cited by this paper
Learning to Skim Text
2017cited by this paper
Structural Embedding of Syntactic Trees for Machine Comprehension
2017influential reference
Making Neural QA as Simple as Possible but not Simpler
2017influential reference
Depthwise Separable Convolutions for Neural Machine Translation
2017cited by this paper
Reading Wikipedia to Answer Open-Domain Questions
2017influential reference
Convolutional Sequence to Sequence Learning
2017cited by this paper
Globally Normalized Reader
2017influential reference
Adversarial Examples for Evaluating Reading Comprehension Systems
2017cited by this paper
Neural Question Generation from Text: A Preliminary Study
2017cited by this paper
Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering
2017influential reference
Ruminating Reader: Reasoning with Gated Multi-hop Attention
2017influential reference
Gated Self-Matching Networks for Reading Comprehension and Question Answering
2017cited by this paper
Attention is All you Need
2017influential reference
Paraphrasing Revisited with Neural Machine Translation
2017cited by this paper
DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding
2017cited by this paper
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
2017cited by this paper
Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
2017cited by this paper
Stochastic Answer Networks for Machine Reading Comprehension
2017cited by this paper
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
2017influential reference
Reinforced Mnemonic Reader for Machine Comprehension
2017influential reference
Simple and Effective Multi-Paragraph Reading Comprehension
2017cited by this paper
Multi-Perspective Context Matching for Machine Comprehension
2016cited by this paper
End-to-End Reading Comprehension with Dynamic Answer Chunk Ranking
2016influential reference
Layer Normalization
2016cited by this paper
Bidirectional Attention Flow for Machine Comprehension
2016influential reference
Attention-over-Attention Neural Networks for Reading Comprehension
2016cited by this paper
WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
2016cited by this paper
Xception: Deep Learning with Depthwise Separable Convolutions
2016cited by this paper
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2016cited by this paper
SQuAD: 100,000+ Questions for Machine Comprehension of Text
2016influential reference
Dynamic Coattention Networks For Question Answering
2016influential reference
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
2016cited by this paper
ReasoNet: Learning to Stop Reading in Machine Comprehension
2016cited by this paper
Learning Recurrent Span Representations for Extractive Question Answering
2016influential reference
Deep Networks with Stochastic Depth
2016influential reference
Character-level Convolutional Networks for Text Classification
2015cited by this paper
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
Teaching Machines to Read and Comprehend
2015cited by this paper
Improving Neural Machine Translation Models with Monolingual Data
2015cited by this paper
Highway Networks
2015cited by this paper
The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations
2015cited by this paper
GloVe: Global Vectors for Word Representation
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014cited by this paper
Convolutional Neural Networks for Sentence Classification
2014cited by this paper
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
2014cited by this paper
Long Short-Term Memory
1997cited by this paper

CITED BY

TAG: Triple Alignment With Rationale Generation for Knowledge-Based Visual Question Answering
2026cites this paper
UQuAD+: Benchmark Dataset for Urdu Machine Reading Comprehension
2025cites this paper
Diversified Augmentation with Domain Adaptation for Debiased Video Temporal Grounding
2025cites this paper
Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval
2025cites this paper
Dual-guided multi-modal bias removal strategy for temporal sentence grounding in video
2025cites this paper
Secret Point Recognition Algorithm via Test-Time Augmentation Based on Large Language Models
2025cites this paper
Optimizing a model for library intelligent question-answering system through constructivist theory lens
2025cites this paper
Batch Aggregation: An Approach to Enhance Text Classification with Correlated Augmented Data
2025cites this paper
Abstractive Summarization for Urdu Video Description Generation
2025cites this paper
EveMRC: Two-Stage Bidirectional Evidence Modeling for Multi-Choice Machine Reading Comprehension
2025cites this paper
Enhancing Online Grooming Detection via Backtranslation Augmentation
2025cites this paper
Resolving passage ambiguity in machine reading comprehension using lightweight transformer architectures
2025cites this paper
Learnable Counterfactual Attention for Music Classification
2025cites this paper
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
2025cites this paper
Multiscale transformers and multi-attention mechanism networks for pathological nuclei segmentation
2025cites this paper
Binary mask tuning on gradient: Towards multi-data question answering
2025cites this paper
CP-FN: A Collaborative Perception Fusion Network for Infrared and Visible Image Fusion
2025cites this paper
Overcoming Data Shortage in Critical Domains With Data Augmentation for Natural Language Software Requirements
2025cites this paper
FREE-Net: A dual-modality emotion recognition network for fusing raw and enhanced data
2025cites this paper
Evaluating rule-based and generative data augmentation techniques for legal document classification
2025influential citation
Fake News Detection After LLM Laundering: Measurement and Explanation
2025cites this paper
Few-shot machine reading comprehension for bridge inspection via domain-specific and task-aware pre-tuning approach
2025cites this paper
AKER: Arabic Knowledge-enriched Reader for Machine Reading Comprehension
2025cites this paper
An augmented multi-label neural network-based approach for text classification in small and unbalanced datasets: the case of digital innovation in the EIP-AGRI Operational Groups
2025cites this paper
An Improved Convolutional Networks Model for Carbon Emission Prediction in Power Systems
2024cites this paper
A study on extraction QA-model based on disturbing word embedding and adversarial self-attention mechanism
2024cites this paper
Multimodal Information Enhancing for Reasoning Question and Answering
2024cites this paper
XMQAs: Constructing Complex-Modified Question-Answering Dataset for Robust Question Understanding
2024cites this paper
SGC: Similarity-Guided Gradient Compression for Distributed Deep Learning
2024cites this paper
No Query Left Behind: Query Refinement via Backtranslation
2024cites this paper
Graph-based Dense Event Grounding with relative positional encoding
2024cites this paper
An Empirical Study on Sentiment Intensity Analysis via Reading Comprehension Models
2024cites this paper
CareCorpus+: Expanding and Augmenting Caregiver Strategy Data to Support Pediatric Rehabilitation
2024cites this paper
Spatio-temporal progressive optimization network for video bit depth enhancement
2024cites this paper
Automated recognition of innovative sentences in academic articles: semi-automatic annotation for cost reduction and SAO reconstruction for enhanced data
2024cites this paper
Improving Quality and Domain-Relevancy of Paraphrase Generation with Graph-Based Retrieval Augmented Generation
2024cites this paper
CALM: Context Augmentation with Large Language Model for Named Entity Recognition
2024cites this paper
Multi-Paragraph Machine Reading Comprehension with Hybrid Reader over Tables and Text
2024cites this paper
Efficient Learning-based Top-k Representative Similar Subtrajectory Query
2024cites this paper
KRA: K-Nearest Neighbor Retrieval Augmented Model for Text Classification
2024cites this paper
Automatic Question Answering From Large ESG Reports
2024cites this paper
Exploring Language Model Generalization in Low-Resource Extractive QA
2024influential citation
Optimized Biomedical Question-Answering Services with LLM and Multi-BERT Integration
2024cites this paper
A Persuasion-Based Prompt Learning Approach to Improve Smishing Detection through Data Augmentation
2024cites this paper
Automatical sampling with heterogeneous corpora for grammatical error correction
2024cites this paper
Large Language Model Data Augmentation for Text-Pair Classification Tasks
2024cites this paper
An Aspect Sentiment Triplet Extraction Method based on Syntax-Guided Muti-Turn Machine Reading Comprehension
2024cites this paper
ConcVAE: Conceptual Representation Learning
2024cites this paper
Data Augmentation Techniques for Process Extraction from Scientific Publications
2024cites this paper
Numerical reasoning reading comprehension on Vietnamese COVID-19 news: task, corpus, and challenges
2024cites this paper
Attention Mechanisms in Deep Learning : Towards Explainable Artificial Intelligence
2024cites this paper
TinyML-Enabled Intelligent Question-Answer Services in IoT Edge Consumer Devices
2024cites this paper
Retrieval-Augmented Generation Approach: Document Question Answering using Large Language Model
2024influential citation
Task-Oriented Paraphrase Analytics
2024cites this paper
Bridging Actions: Generate 3D Poses and Shapes In-Between Photos
2024cites this paper
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement
2024cites this paper
Hierarchical and Multiple-Perspective Interaction Network for Long Text Matching
2024cites this paper
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
2024cites this paper
On Evaluation Protocols for Data Augmentation in a Limited Data Scenario
2024cites this paper
Data augmentation and adversary attack on limit resources text classification
2024cites this paper
Reason Generation for Point of Interest Recommendation Via a Hierarchical Attention-Based Transformer Model
2024cites this paper
Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method
2024cites this paper
A Comprehensive Survey of Text Encoders for Text-to-Image Diffusion Models
2024cites this paper
Finding a Needle in the Adversarial Haystack: A Targeted Paraphrasing Approach For Uncovering Edge Cases with Minimal Distribution Distortion
2024cites this paper
Improving Black-box Robustness with In-Context Rewriting
2024cites this paper
Event-aware Video Corpus Moment Retrieval
2024cites this paper
Combining permuted language model and adversarial training for Chinese machine reading comprehension
2024cites this paper
The Effectiveness of Merdeka Mengajar Platform towards the Learning of English Reading Comprehension as the Implementation of Independent Curriculum at UPTD SMPN 19 Barru
2024cites this paper
A novel ensemble deep network framework for scene text recognition
2024cites this paper
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
2024influential citation
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News
2024cites this paper
Advancing Chatbot Conversations: A Review of Knowledge Update Approaches
2024cites this paper
Data Augmentation for Conversational AI
2024cites this paper
Self-training improves few-shot learning in legal artificial intelligence tasks
2024cites this paper
ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
2024influential citation
Prediction of Cement Slurry Formulation Based on RMAPCN
2024influential citation
Multi-Granularity Relational Attention Network for Audio-Visual Question Answering
2024cites this paper
Sparse Mobile Crowdsensing for Gas Monitoring in Coal Mine Working Face
2024cites this paper
Temporal Sentence Grounding with Relevance Feedback in Videos
2024cites this paper
A Study of DistilBERT-Based Answer Extraction Machine Reading Comprehension Algorithm
2024cites this paper
Data Augmentation for Speech-Based Diacritic Restoration
2024cites this paper
HCSAM-Net: multistage network with a hybrid of convolution and self-attention mechanism for low-light image enhancement
2024cites this paper
CollabAS2: Enhancing Arabic Answer Sentence Selection Using Transformer-Based Collaborative Models
2024cites this paper
A Geometric Approach to Textual Augmented Data Filtering
2024cites this paper
GEM-RAG: Graphical Eigen Memories for Retrieval Augmented Generation
2024cites this paper
Leveraging Domain Adaptation and Data Augmentation to Improve Qur’anic IR in English and Arabic
2023cites this paper
Heterogeneous Encoders Scaling in the Transformer for Neural Machine Translation
2023cites this paper
DeepInsight: a CNN-based approach for machine reading comprehension in query answering systems and its applications
2023cites this paper
APTM: Structurally Informative Network Representation Learning
2023cites this paper
Multi-head attention based candidate segment selection in QA over hybrid data
2023cites this paper
CASSI: Contextual and Semantic Structure-based Interpolation Augmentation for Low-Resource NER
2023cites this paper
Novel data augmentation for named entity recognition
2023cites this paper
Data-Augmented and Retrieval-Augmented Context Enrichment in Chinese Media Bias Detection
2023cites this paper
Perturbation-based Active Learning for Question Answering
2023cites this paper
Using GPT-4 to Augment Unbalanced Data for Automatic Scoring
2023cites this paper
CrowNER at ROCLING 2023 MultiNER-Health Task: Enhancing NER Task with GPT Paraphrase Augmentation on Sparsely Labeled Data
2023cites this paper
RoBERTa-CoA: RoBERTa-Based Effective Finetuning Method Using Co-Attention
2023cites this paper
RankAug: Augmented data ranking for text classification
2023cites this paper
Is ChatGPT the ultimate Data Augmentation Algorithm?
2023cites this paper
Intégration du raisonnement numérique dans les modèles de langue : État de l’art et direction de recherche
2023cites this paper