A Decomposable Attention Model for Natural Language Inference

Ankur P. Parikh,Oscar Täckström,Dipanjan Das,Jakob Uszkoreit

Published 2016 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters than previous work and without relying on any word-order information. Adding intra-sentence attention that takes a minimum amount of order into account yields further improvements.

PUBLICATION RECORD

Publication year
2016
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2016-06-06
Fields of study
Linguistics, Computer Science
Identifiers
DOI 10.18653/v1/D16-1244 arXiv 1606.01933
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

A Fast Unified Model for Parsing and Sentence Understanding
2016influential reference
Long Short-Term Memory-Networks for Machine Reading
2016influential reference
Reasoning about Entailment with Neural Attention
2015influential reference
Order-Embeddings of Images and Language
2015cited by this paper
Natural Language Inference by Tree-Based Convolution and Heuristic Matching
2015cited by this paper
A large annotated corpus for learning natural language inference
2015influential reference
Learning Natural Language Inference with LSTM
2015cited by this paper
ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs
2015influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Statistical Machine Translation
2014cited by this paper
Convolutional Neural Network Architectures for Matching Natural Language Sentences
2014cited by this paper
Dropout: a simple way to prevent neural networks from overfitting
2014influential reference
GloVe: Global Vectors for Word Representation
2014cited by this paper
Paraphrase-Driven Learning for Open Question Answering
2013cited by this paper
Semantic Parsing as Machine Translation
2013cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Deep Sparse Rectifier Neural Networks
2011cited by this paper
Discriminative Learning over Constrained Latent Representations
2010cited by this paper
An extended model of natural logic
2009cited by this paper
Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition
2009cited by this paper
A brief history of natural logic
2008cited by this paper
A Phrase-Based Alignment Model for Natural Language Inference
2008cited by this paper
A Discourse Commitment-Based Framework for Recognizing Textual Entailment
2007cited by this paper
Learning to recognize features of valid textual entailments
2006cited by this paper
Recognising Textual Entailment with Logical Inference
2005cited by this paper
Robust Textual Inference via Graph Matching
2005cited by this paper
Classification of Semantic Relations by Humans and Machines
2005cited by this paper
Long Short-Term Memory
1997cited by this paper
Handwritten Digit Recognition with a Back-Propagation Network
1989cited by this paper

CITED BY

Incremental Expansion Analysis and State-of-Health Estimation for Lithium-Ion Batteries
2026cites this paper
EGAM: Extended Graph Attention Model for Solving Routing Problems
2026cites this paper
Entropic nonlocal TV with learnable self-similarity in multi-scale deep feature space for image denoising
2026cites this paper
CMedMi: Text Similarity Detection of Chinese Medical Question Based on Mutual Information
2025cites this paper
General, personalized, and artistic image aesthetic assessment: a survey
2025cites this paper
Why Softmax Attention Outperforms Linear Attention
2025cites this paper
How to Talk to Language Models: Serialization Strategies for Structured Entity Matching
2025cites this paper
The Origin of Self-Attention: Pairwise Affinity Matrices in Feature Selection and the Emergence of Self-Attention
2025cites this paper
Adversarial Attacks Against Automated Fact-Checking: A Survey
2025cites this paper
Stain Normalization of Histopathological Images Based on Deep Learning: A Review
2025cites this paper
Misspellings in Natural Language Processing: A survey
2025cites this paper
Biophysically interpretable basal cell carcinoma classification using Raman spectroscopy transformer model
2025cites this paper
Transformer Based Deep Learning Model For Object Detection Using Meta Learning
2025cites this paper
Constrained Sequential Inference in Machine Learning Using Constraint Programming
2025cites this paper
Deep Reinforcement Learning for CT-Based Non-Invasive Prediction of SOX9 Expression in Hepatocellular Carcinoma
2025cites this paper
Contradiction Detection in RAG Systems: Evaluating LLMs as Context Validators for Improved Information Consistency
2025cites this paper
BoostViT: Booth-Serial Skipping and Tunable Scaling for Vision Transformers
2025cites this paper
Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding
2025cites this paper
A Review on Vision Transformer and Explainable AI Approaches for ECG-based Heart Disease Detection
2025cites this paper
Explainable Semantic Text Relations: A Question-Answering Framework for Comparing Document Content
2025cites this paper
Advancing code completion through rotary position embedding
2025cites this paper
Multimodal Emotion Recognition: A Tri-modal Approach Using Speech, Text, and Visual Cues for Enhanced Interaction Analysis
2025cites this paper
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
2025cites this paper
Transformer Meets Twicing: Harnessing Unattended Residual Information
2025cites this paper
Revisiting Kernel Attention with Correlated Gaussian Process Representation
2025cites this paper
DANet: A Dual-Branch Framework With Diffusion-Integrated Autoencoder for Infrared–Visible Image Fusion
2025cites this paper
Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions
2025cites this paper
Seismic Data Reconstruction Using Generative Adversarial Networks and Global Grouping Mechanism
2025cites this paper
Hybrid Attention-Based Residual Network for Image Classification
2025cites this paper
A Contrastive Learning Approach to Paraphrase Identification*
2025cites this paper
Automatic Paraphrase Generation at Phrasal, and Sentence Level for Urdu Language: Data and Methods
2025cites this paper
DMCM: Dwo-branch multilevel feature fusion with cross-attention mechanism for infrared and visible image fusion
2025cites this paper
When automated fact-checking meets argumentation: Unveiling fake news through argumentative evidence
2025cites this paper
An overview of transformers for video anomaly detection
2025cites this paper
Enhancing Alzheimer's Diagnosis with Pre-trained Embeddings and Attention Mechanisms
2025cites this paper
Fast weight programming and linear transformers: from machine learning to neurobiology
2025cites this paper
Deep Lookup Network
2025cites this paper
Unsupervised Insider Threat Detection Using Multi-Head Self-Attention Mechanisms
2024cites this paper
A Self-Attention Synthesizing Model with Privacy-Preserving(ACCT-GAN) for Medical Tabular Data
2024cites this paper
Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph
2024cites this paper
Multi-View Sentence Matching Model Based on Equal-Length Interactive Attention
2024cites this paper
Effects of Common Sense and Supporting Texts for the Important Words in Solving Text Entailment Tasks - A Study on the e-SNLI Dataset
2024cites this paper
SAITI-DCGAN: Self-Attention Based Deep Convolutional Generative Adversarial Networks for Data Augmentation of Infrared Thermal Images
2024cites this paper
Dense Paraphrasing for multimodal dialogue interpretation
2024cites this paper
A Study of the State of the Art Approaches and Datasets for Multilingual Natural Language Inference
2024cites this paper
ExGAT: Context extended graph attention neural network
2024cites this paper
Efficient Machine Translation with a BiLSTM-Attention Approach
2024cites this paper
Robust and resource-efficient table-based fact verification through multi-aspect adversarial contrastive learning
2024cites this paper
Enhancing Arabic Cyberbullying Detection with End-to-End Transformer Model
2024cites this paper
A Geometric Approach to Textual Augmented Data Filtering
2024cites this paper
QRNN-Transformer: Recognizing Textual Entailment
2024cites this paper
TCME: Thin Cloud Removal Network for Optical Remote Sensing Images Based on Multidimensional Features Enhancement
2024cites this paper
Bridge to better understanding: Syntax extension with virtual linking-phrase for natural language inference
2024cites this paper
A Deep Learning-Based Question-Answering System for Grid Customer Service
2024cites this paper
Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank
2024cites this paper
CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search
2024cites this paper
First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI
2024cites this paper
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
2024cites this paper
KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogue Argument Mining
2024cites this paper
Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting
2024cites this paper
A Multimodal Fusion Framework for Fake News Detection via Multi-Attention Mechanism
2024cites this paper
Reason Generation for Point of Interest Recommendation Via a Hierarchical Attention-Based Transformer Model
2024cites this paper
Research on Document Image Binarization: A Survey
2024cites this paper
Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning
2024cites this paper
Are Transformers a Useful Tool for Tiny devices in Human Activity Recognition?
2024cites this paper
Top-philic machine learning
2024cites this paper
Towards energy efficiency: A comprehensive review of deep learning-based photovoltaic power forecasting strategies
2024cites this paper
Transformers meets neoantigen detection: a systematic literature review
2024cites this paper
Hybrid mutation driven testing for natural language inference
2024cites this paper
A comprehensive review on transformer network for natural and medical image analysis
2024cites this paper
Elliptical Attention
2024cites this paper
User Engagement Triggers in Social Media Discourse on Biodiversity Conservation
2024cites this paper
DRA: dynamic routing attention for neural machine translation with low-resource languages
2024cites this paper
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
2024cites this paper
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
2024cites this paper
A deep neural network model for Chinese toponym matching with geographic pre-training model
2024cites this paper
Unveiling the potential of progressive training diffusion model for defect image generation and recognition in industrial processes
2024cites this paper
A Transformer with Stack Attention
2024cites this paper
Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models
2024influential citation
Multi-Evidence Based Fact Verification via A Confidential Graph Neural Network
2024cites this paper
An Al-BERT-Bi-GRU-LDA algorithm for negative sentiment analysis on Bilibili comments
2024cites this paper
Recognizing Value Resonance with Resonance-Tuned RoBERTa Task Definition, Experimental Validation, and Robust Modeling
2024cites this paper
Attention to quantum complexity
2024cites this paper
Faithful Reasoning over Scientific Claims
2024cites this paper
CARL: A Framework for Equivariant Image Registration
2024cites this paper
A syntactic evidence network model for fact verification
2024cites this paper
MESCM: A Multi-stage Explainable Similar Case Matching Framework
2024cites this paper
Unlocking the language barrier: A Journey through Arabic machine translation
2024cites this paper
A Primal-Dual Framework for Transformers and Neural Networks
2024cites this paper
A Comprehensive Survey of Foundation Models in Medicine
2024cites this paper
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
2024cites this paper
On Large Language Models’ Resilience to Coercive Interrogation
2024cites this paper
Medical image registration in the era of Transformers: a recent review
2024cites this paper
Transformer Driven Matching Selection Mechanism for Multi-Label Image Classification
2024cites this paper
A systematic survey of natural language processing for the Greek language
2024cites this paper
HLC: A Hardware-friendly Quantization and Cache-based Accelerator for Transformer
2024cites this paper
An explainable vision transformer with transfer learning based efficient drought stress identification
2024cites this paper
Evaluating Intelligence and Knowledge in Large Language Models
2024cites this paper
A Turing Test for Beatmap-Generation
2024cites this paper
A Survey Of Automatic Fact Verification Research
2024cites this paper