A Structured Self-attentive Sentence Embedding

Zhouhan Lin,Minwei Feng,C. D. Santos,Mo Yu,Bing Xiang,Bowen Zhou,Yoshua Bengio

Published 2017 in International Conference on Learning Representations

ABSTRACT

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Learning Representations
Publication date
2017-03-09
Fields of study
Computer Science
Identifiers
arXiv 1703.03130
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS
2018cited by this paper
A Fast Unified Model for Parsing and Sentence Understanding
2016cited by this paper
Learning Distributed Representations of Sentences from Unlabelled Data
2016influential reference
A Batch-Normalized Recurrent Network for Sentiment Classification
2016cited by this paper
Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks
2016cited by this paper
Neural Tree Indexers for Text Understanding
2016influential reference
Long Short-Term Memory-Networks for Machine Reading
2016cited by this paper
Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering
2016cited by this paper
A Decomposable Attention Model for Natural Language Inference
2016cited by this paper
Theano: A Python framework for fast computation of mathematical expressions
2016cited by this paper
Neural Semantic Encoders
2016influential reference
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
2016influential reference
Improved Representation Learning for Question Answer Matching
2016cited by this paper
Attentive Pooling Networks
2016cited by this paper
Not All Contexts Are Created Equal: Better Word Representations with Variable Attention
2015cited by this paper
Skip-Thought Vectors
2015cited by this paper
Natural Language Inference by Tree-Based Convolution and Heuristic Matching
2015influential reference
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
2015cited by this paper
A large annotated corpus for learning natural language inference
2015cited by this paper
Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval
2015cited by this paper
Applying deep learning to answer selection: A study and an open task
2015cited by this paper
Order-Embeddings of Images and Language
2015influential reference
Discriminative Neural Sentence Modeling by Tree-Based Convolution
2015cited by this paper
Convolutional Neural Network for Paraphrase Identification
2015cited by this paper
Dependency-based Convolutional Neural Networks for Sentence Embedding
2015cited by this paper
Convolutional Neural Networks for Sentence Classification
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
Distributed Representations of Sentences and Documents
2014cited by this paper
A Convolutional Neural Network for Modelling Sentences
2014cited by this paper
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
2014cited by this paper
Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts
2014cited by this paper
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Learning to Relate Images
2011influential reference
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
2011cited by this paper
Conference Paper
2009cited by this paper
A Neural Probabilistic Language Model
2003cited by this paper
Long Short-Term Memory
1997cited by this paper

CITED BY

Effective document summarization: a hybrid clustering approach using transformer model
2026cites this paper
Root-associated protein prediction using a protein large language model and hypergraph convolutional networks
2026cites this paper
Selecting Language Models for Social Science: Start Small, Start Open, and Validate
2026cites this paper
MICE: Minimal Interaction Cross-Encoders for efficient Re-ranking
2026cites this paper
Multi-stage representation learning for blind Room-Acoustic parameter estimation with uncertainty quantification.
2026cites this paper
A multimodal spatiotemporal convolutional network with attention mechanism for athlete anxiety behavior recognition
2026cites this paper
Macro-Equi-Diff (MED): Scaffold-based Macrocycles Generation Using Equivariant Diffusion
2026cites this paper
LOCUS: Low-Dimensional Model Embeddings for Efficient Model Exploration, Comparison, and Selection
2026cites this paper
AC2Next: A Novel Model That Can Predict the Next Animation API by Fusing the Animation API Context and the UI Animation Task
2026cites this paper
Mamba-integrated spatio-temporal attention graph convolutional network for session-based recommendation
2026cites this paper
Transformer-based intelligent detection model for early dental caries in panoramic radiographs
2026cites this paper
Cross-Language Speaker Attribute Prediction Using MIL and RL
2026cites this paper
Implementation of Near-Real-Time Satellite Data Retrieval for CO₂ Concentration Using an Enhanced Transformer Network
2026cites this paper
GCT: A Granger-Causal Transformer for Multivariate Traffic Analysis in Smart Villages
2026cites this paper
Symphonym: Universal Phonetic Embeddings for Cross-Script Toponym Matching via Teacher-Student Distillation
2026cites this paper
HCT-Net: hybrid CNN-transformer network with multi-scale feature aggregation and progressive decode for medical image segmentation
2026cites this paper
Prompt-Aware Adapter: Learning Adaptive Visual Tokens for Multimodal Large Language Models
2026cites this paper
Research on plug-and-play correlation enhancement modules in deep multi-label learning
2026cites this paper
Arabic speech command recognition using an enhanced CNN-LSTM model with attention and data augmentation
2026cites this paper
HIM2A: Hierarchical interactive multi-modal entity alignment with semantic augmentation
2026cites this paper
Investigators at CheckThat! 2025: Using LLMs to Improve Fact-Checking
2025cites this paper
Visual attention-guided fractional-order system for collision perception in complex dynamic scenes
2025cites this paper
Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets Training
2025cites this paper
Tactics and Techniques Text Classification Based on Adversarial Contrastive Learning and Meta-Path
2025cites this paper
A Lightweight Framework for Trigger-Guided LoRA-Based Self-Adaptation in LLMs
2025influential citation
A Zenith Tropospheric Delay Prediction Model Based on VMD-LSTM Neural Network
2025cites this paper
Optimal Multi-agent Reinforcement Learning for Efficient Partially Observable Multi-robot Collaboration in Warehousing
2025influential citation
Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity
2025cites this paper
DPCformer: An Interpretable Deep Learning Model for Genomic Prediction in Crops
2025cites this paper
Crossrecsmart: a cross-network anchor-based representation learning for the recommendation of smart services
2025cites this paper
Leveraging Entropy-Driven Attention to Adapt Semantic Segmentation of Aerial Images for Autonomous Driving
2025cites this paper
Enhanced Detection of Multiple Sclerosis Using Recurrent Slice-wise Attention Network (RSANet) on Brain MRI Scans
2025cites this paper
HAMGFusion:A Joint Entity and Relation Extraction Model Based on Hierarchical Attention and Feature Fusion
2025cites this paper
PLMCCL-TP: The protein language model and clustering method based on contrastive learning applied to the multifunctional therapeutic peptide identification model
2025cites this paper
A Large Scale Study of AI-based Binary Function Similarity Detection Techniques for Security Researchers and Practitioners
2025cites this paper
Focus Your Attention: Towards Data-Intuitive Lightweight Vision Transformers
2025cites this paper
SENA: Leveraging set-level consistency adversarial learning for robust pre-trained language model adaptation
2025cites this paper
Multibehavior Intent Disentangled Learning for Fine-Grained Interest Discovery in Recommendation
2025cites this paper
ChatPD: An LLM-driven Paper-Dataset Networking System
2025cites this paper
Query as Supervision: Toward Low-Cost and Robust Video Moment and Highlight Retrieval
2025cites this paper
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
2025cites this paper
Towards Efficient Partially Relevant Video Retrieval With Active Moment Discovering
2025cites this paper
SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking
2025cites this paper
Ensemble-Based Survival Models with the Self-Attended Beran Estimator Predictions
2025cites this paper
Ubiquitous memory augmentation via mobile multimodal embedding system
2025cites this paper
Task-driven attention attack: An enhanced adversarial framework for long text
2025cites this paper
Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation
2025cites this paper
Fast weight programming and linear transformers: from machine learning to neurobiology
2025cites this paper
Enhancing Partially Relevant Video Retrieval with Robust Alignment Learning
2025cites this paper
L3Cube-IndicHeadline-ID: A Dataset for Headline Identification and Semantic Evaluation in Low-Resource Indian Languages
2025cites this paper
Decentralized Next Point-of-Interest Recommendation Guided by Willingness to Share
2025cites this paper
DistillRecDial: A Knowledge-Distilled Dataset Capturing User Diversity in Conversational Recommendation
2025cites this paper
Causal Multi-fidelity Surrogate Forward and Inverse Models for ICF Implosions
2025cites this paper
Self-Attention Enhanced Dual BiGRU for Arabic Fake News Detection
2025cites this paper
Attention-Guided Multimodal Fusion using Hybrid CNN-LSTM Networks for Enhanced Medical Diagnosis
2025cites this paper
Comparative Study on Linguistic Representation Models for Enhancing Multimedia Security
2025cites this paper
Attention Mechanisms on Hybrid Models: Examining the Impact of Sentence Length in Sentiment Analysis
2025cites this paper
Understanding Cognitive States from Head & Hand Motion Data
2025cites this paper
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
2025cites this paper
A Multifaceted Multi-Agent Framework for Zero-Shot Emotion Analysis and Recognition of Symbolic Music
2025cites this paper
BiMA-DTI: a bidirectional Mamba-Attention hybrid framework for enhanced drug-target interaction prediction
2025cites this paper
MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding
2025cites this paper
Hierarchical Attention Network for Interpretable ECG-based Heart Disease Classification
2025cites this paper
Legal Text Analytics for Reasonable Notice Period Prediction
2025influential citation
Cognitive Large Language Model in Social Media with Local Memory
2025cites this paper
Denoising Multi-Interest-Aware Logical Reasoning for Long-Sequence Recommendation
2025cites this paper
MaskSDM with Shapley values to improve flexibility, robustness and explainability in species distribution modelling
2025cites this paper
Domain Adaptation for Japanese Sentence Embeddings with Contrastive Learning based on Synthetic Sentence Generation
2025cites this paper
Dual-Prompt-Enhanced Multiorgan Segmentation Model for Total-Body PET Images
2025cites this paper
Global Ionospheric F-Layer Electron Density Prediction Based on Multiple Radio Occultation Data Using Attention-Based Deep Learning Model
2025cites this paper
Transformer Meets Twicing: Harnessing Unattended Residual Information
2025cites this paper
EDCM-EA: event prediction based on event development context mining considering event arguments
2025cites this paper
MK-SMOTE and M-SMOTE: enhanced techniques for handling class imbalance problem
2025cites this paper
Character-Level Encoding based Neural Machine Translation for Hindi language
2025cites this paper
SECNN: Squeeze-and-Excitation Convolutional Neural Network for Sentence Classification
2025cites this paper
QSA-QConvLSTM: A Quantum Computing-Based Approach for Spatiotemporal Sequence Prediction
2025cites this paper
Improving Grasp Pose Detection by Implicitly Utilizing Geometric Information and Spatial Relations of Objects in Clutter
2025cites this paper
Convolutional Rectangular Attention Module
2025cites this paper
Dual-view cross attention enhanced semi-supervised learning method for discourse cognitive engagement classification in online course discussions
2025cites this paper
Path Complex Neural Networks for Sequential Process Activities Classification
2025cites this paper
A Comparative Study of Attention Mechanisms in Deep Learning Models for Aspect-based Sentiment Analysis of Customer Reviews
2025cites this paper
TAPE_selection: Organelle Proteins Classification With TAPE Feature Selection
2025cites this paper
MTSegNet: Manifold Transformer for 3D shape segmentation
2025cites this paper
Causality-driven attribute editing with bidirectional generative network
2025cites this paper
Graph decision transformer for offline reinforcement learning
2025cites this paper
Cross-view self-supervised heterogeneous graph representation learning
2025cites this paper
Developing a Transformer-based Autoencoder Model for Sentence Embedding
2025cites this paper
A Knowledge Graph Framework for Interpretable Video-Based Activity Recognition
2025cites this paper
Domain Lexical Knowledge-based Word Embedding Learning for Text Classification under Small Data
2025influential citation
Optimized Reinforcement Learning Model via Contrastive Learning for Intention Classification of Chinese Questions on Respiratory Diseases
2025cites this paper
Boundary-making practices: LLMs and an artifactual production of objectivity
2025cites this paper
Neural Networks based Identification of Psychological Health Status of College Students
2025cites this paper
Towards Interpretable User Intent Analysis with Deficient Evidence Fusion for Pseudo-Modalities
2025cites this paper
Development and validation of a transformer model-based early warning score for real-time prediction of adverse outcomes in the emergency department
2025cites this paper
Medcongtm: Interpretable multi-label clinical code prediction with dual-view graph contrastive topic modeling
2025cites this paper
The Origin of Self-Attention: Pairwise Affinity Matrices in Feature Selection and the Emergence of Self-Attention
2025cites this paper
Text Classification Based on Ngram2Vec Model and Gating Mechanism
2025cites this paper
Learning Robust Satellite Attitude Dynamics with Physics-Informed Normalising Flow
2025cites this paper
YModPred: an interpretable prediction method for multi-type RNA modification sites in S. cerevisiae based on deep learning
2025cites this paper
Revisiting Kernel Attention with Correlated Gaussian Process Representation
2025cites this paper