Structured Attention Networks

Yoon Kim,Carl Denton,Luong Hoang,Alexander M. Rush

Published 2017 in International Conference on Learning Representations

ABSTRACT

Attention networks have proven to be an effective approach for embedding categorical inference within a deep neural network. However, for many tasks we may want to model richer structural dependencies without abandoning end-to-end training. In this work, we experiment with incorporating richer structural distributions, encoded using graphical models, within deep networks. We show that these structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees. We experiment with two different classes of structured attention networks: a linear-chain conditional random field and a graph-based parsing model, and describe how these models can be practically implemented as neural network layers. Experiments show that this approach is effective for incorporating structural biases, and structured attention networks outperform baseline attention models on a variety of synthetic and real tasks: tree transduction, neural machine translation, question answering, and natural language inference. We further find that models trained in this way learn interesting unsupervised hidden representations that generalize simple attention.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Learning Representations
Publication date
2017-02-03
Fields of study
Computer Science
Identifiers
arXiv 1702.00887
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS
2018cited by this paper
Hybrid computing using a neural network with dynamic external memory
2016cited by this paper
A Fast Unified Model for Parsing and Sentence Understanding
2016cited by this paper
Textual Entailment with Structured Attentions and Composition
2016cited by this paper
Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper)
2016cited by this paper
Modelling Sentence Pairs with Tree-structured Attentive Encoder
2016influential reference
Online Segment to Segment Neural Transduction
2016cited by this paper
Neural Tree Indexers for Text Understanding
2016cited by this paper
Neural Architectures for Named Entity Recognition
2016cited by this paper
Enhanced LSTM for Natural Language Inference
2016cited by this paper
Proximal Deep Structured Models
2016cited by this paper
Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016
2016cited by this paper
ASPEC: Asian Scientific Paper Excerpt Corpus
2016cited by this paper
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
2016cited by this paper
Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks
2016cited by this paper
A Decomposable Attention Model for Natural Language Inference
2016influential reference
Segmental Recurrent Neural Networks for End-to-End Speech Recognition
2016cited by this paper
The Neural Noisy Channel
2016cited by this paper
Globally Normalized Transition-Based Neural Networks
2016cited by this paper
Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference
2016influential reference
Tree-Structured Composition in Neural Networks without Tree-Structured Architectures
2015cited by this paper
Neural CRF Parsing
2015cited by this paper
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
2015cited by this paper
Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks
2015cited by this paper
End-To-End Memory Networks
2015influential reference
Attention-Based Models for Speech Recognition
2015cited by this paper
Effective Approaches to Attention-based Neural Machine Translation
2015cited by this paper
Reasoning about Entailment with Neural Attention
2015cited by this paper
Approximation-Aware Dependency Parsing by Belief Propagation
2015cited by this paper
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
2015cited by this paper
Segmental Recurrent Neural Networks
2015cited by this paper
Pointer Networks
2015cited by this paper
Teaching Machines to Read and Comprehend
2015cited by this paper
Structured Prediction Energy Networks
2015cited by this paper
Gradient Estimation Using Stochastic Computation Graphs
2015cited by this paper
Natural Language Inference by Tree-Based Convolution and Heuristic Matching
2015cited by this paper
Learning to Transduce with Unbounded Memory
2015cited by this paper
Gradient-based Hyperparameter Optimization through Reversible Learning
2015cited by this paper
Learning Natural Language Inference with LSTM
2015cited by this paper
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
2015cited by this paper
Neural Turing Machines
2014cited by this paper
Memory Networks
2014cited by this paper
Learning Deep Structured Models
2014cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Adam: A Method for Stochastic Optimization
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
Deep Structured Output Learning for Unconstrained Text Recognition
2014cited by this paper
Generic Methods for Optimization-Based Modeling
2012cited by this paper
Minimum-Risk Training of Approximate CRF-Based NLP Systems
2012cited by this paper
Natural Language Processing (Almost) from Scratch
2011influential reference
Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure
2011cited by this paper
Parameter learning with truncated message-passing
2011cited by this paper
Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis
2011cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Neural conditional random fields
2010cited by this paper
Conditional Neural Fields
2009cited by this paper
First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests
2009cited by this paper
Dependency Parsing by Belief Propagation
2008cited by this paper
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
2001cited by this paper
Gradient-based learning applied to document recognition
1998cited by this paper
Three New Probabilistic Models for Dependency Parsing: An Exploration
1996influential reference
Trainable grammars for speech recognition
1979cited by this paper

CITED BY

Fast and General Automatic Differentiation for Finite-State Methods
2026cites this paper
On Sequence-to-Sequence Models for Automated Log Parsing
2026cites this paper
Effective document summarization: a hybrid clustering approach using transformer model
2026cites this paper
On the Emergence of Position Bias in Transformers
2025cites this paper
Near-real time monitoring of burned area at global scale based on deep learning
2025cites this paper
Cognitive Large Language Model in Social Media with Local Memory
2025cites this paper
FedABC: Attention-Based Client Selection for Federated Learning with Long-Term View
2025cites this paper
Quantum-centric machine learning for molecular dynamics
2025cites this paper
StaBle-MambaNet: structure-aware and blur-guided lane detection with Mamba
2025cites this paper
Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation
2025cites this paper
Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations
2025cites this paper
Event Causality Identification Based on ConceptNet and Graph Network
2025cites this paper
Machine learning approaches for image classification in developmental biology and clinical embryology
2025cites this paper
A Systematic Study of Compositional Syntactic Transformer Language Models
2025cites this paper
Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs
2025cites this paper
Commit Messages Generation Based on Core Changes
2025cites this paper
Quantum Graph Attention Networks: Trainable Quantum Encoders for Inductive Graph Learning
2025cites this paper
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
2025cites this paper
Language Modelling Techniques for Analysing the Impact of Human Genetic Variation
2025cites this paper
Scaling Graph-Based Dependency Parsing with Arc Vectorization and Attention-Based Refinement
2025cites this paper
Revisiting Kernel Attention with Correlated Gaussian Process Representation
2025cites this paper
A genotype-phenotype transformer to assess and explain polygenic risk
2025cites this paper
Automating the Analysis of Parsing Algorithms (and other Dynamic Programs)
2025cites this paper
Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
2024cites this paper
Low Complexity CSI Feedback Method Using Reformer
2024cites this paper
Automatic detection of methane emissions in multispectral satellite imagery using a vision transformer
2024cites this paper
Entropy– and Distance-Regularized Attention Improves Low-Resource Neural Machine Translation
2024cites this paper
Advanced Techniques in Training and Applying Large Language Models
2024cites this paper
Relational Prompt-Based Pre-Trained Language Models for Social Event Detection
2024cites this paper
A Category-Scalable Framework Using Millimeter-Wave Radar for Spectrogram Generation and Gesture Recognition
2024cites this paper
Exploring the impact of zero-cost proxies for Hybrid Vision Transformers
2024cites this paper
Few-Shot Learning Method for Continuous Prediction of Rock Mechanical Parameters Based on Logging Data
2024cites this paper
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
2024cites this paper
Explaining Probabilistic Models with Distributional Values
2024cites this paper
A Primal-Dual Framework for Transformers and Neural Networks
2024cites this paper
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
2024cites this paper
Multilingual Crowd-Based Requirements Engineering Using Large Language Models
2024cites this paper
To be Continuous, or to be Discrete, Those are Bits of Questions
2024cites this paper
Gearbox Fault Detection Using Continuous Wavelet Transform and Vision Transformer (ViT)
2024cites this paper
Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models
2024cites this paper
Transformer-based Stagewise Decomposition for Large-Scale Multistage Stochastic Optimization
2024cites this paper
A Novel Approach: Enhancing Data Extraction from Student Handwritten Notes Using Multi-Task U-net and GPT-4
2024cites this paper
Cross-timestep Fault Prediction with Imbalanced Data for Optical Modules in Internet Data Centers
2024cites this paper
Transformers meets neoantigen detection: a systematic literature review
2024cites this paper
Ksurf: Attention Kalman Filter and Principal Component Analysis for Prediction under Highly Variable Cloud Workloads
2024cites this paper
On Entropic Learning from Noisy Time Series in the Small Data Regime
2024cites this paper
MultiMax: Sparse and Multi-Modal Attention Learning
2024cites this paper
An attention-based bidirectional LSTM-CNN architecture for the early prediction of sepsis
2024cites this paper
Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers
2024cites this paper
Fault Diagnosis Method for Railway Signal Equipment Based on Data Enhancement and an Improved Attention Mechanism
2024cites this paper
Revisiting clustering for efficient unsupervised dialogue structure induction
2024cites this paper
A Historical Survey of Advances in Transformer Architectures
2024cites this paper
Redefining DDoS Attack Detection Using A Dual-Space Prototypical Network-Based Approach
2024cites this paper
Surface soil moisture retrieval based on transfer learning using SAR data on a local scale
2024cites this paper
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
2024cites this paper
A Survey on Automatic Generation of Figurative Language: From Rule-based Systems to Large Language Models
2024cites this paper
Transformer Based Ensemble Framework For Sequential User Behavior Prediction
2024cites this paper
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
2024cites this paper
Towards Releasing ViT from Pre-training
2024cites this paper
Unsupervised Insider Threat Detection Using Multi-Head Self-Attention Mechanisms
2024cites this paper
An innovative network intrusion detection system (NIDS): Hierarchical deep learning model based on Unsw-Nb15 dataset
2024cites this paper
Interpretable Reinforcement Learning: Bridging the Gap between Performance and Transparency
2024cites this paper
Deep Learning for Satellite Image Time-Series Analysis: A review
2024cites this paper
Review on the Application of the Attention Mechanism in Sensing Information Processing for Dynamic Welding Processes
2024cites this paper
EXplainable Artificial Intelligence (XAI)—From Theory to Methods and Applications
2024cites this paper
An acoustic emission onset time determination method based on Transformer
2024cites this paper
Multiple Sources are Better Than One: Incorporating External Knowledge in Low-Resource Glossing
2024cites this paper
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
2023cites this paper
A Representative Study on Human Detection of Artificially Generated Media Across Countries
2023cites this paper
Compositional Generalization for Data-to-Text Generation
2023cites this paper
Dual Contrastive Learning Framework for Incremental Text Classification
2023cites this paper
Channel attention for quantum convolutional neural networks
2023cites this paper
MCTNet: Multiscale Cross-Attention-Based Transformer Network for Semantic Segmentation of Large-Scale Point Cloud
2023cites this paper
AttFL
2023cites this paper
Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns
2023cites this paper
ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads
2023cites this paper
Deep Residual Weight-Sharing Attention Network With Low-Rank Attention for Visual Question Answering
2023cites this paper
Building Segmentation from Remote Sensing Image via DWT Attention Networks
2023cites this paper
Exploiting Code Symmetries for Learning Program Semantics
2023cites this paper
PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer
2023cites this paper
Data driven representation and synthesis of 3D human motion. (Modèles statistiques pour représenter et synthétiser des mouvement humains en 3D)
2023cites this paper
Recent advances in deep learning for retrosynthesis
2023cites this paper
Structured Prediction with Stronger Consistency Guarantees
2023cites this paper
Neural machine translation from text to sign language
2023cites this paper
Tü-CL at SIGMORPHON 2023: Straight-Through Gradient Estimation for Hard Attention
2023cites this paper
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
2023cites this paper
HITSZQ at SemEval-2023 Task 10: Category-aware Sexism Detection Model with Self-training Strategy
2023influential citation
Reconstruct Before Summarize: An Efficient Two-Step Framework for Condensing and Summarizing Meeting Transcripts
2023cites this paper
End-to-end clinical temporal information extraction with multi-head attention
2023cites this paper
Structure Graph Refined Information Propagate Network for Aspect-Based Sentiment Analysis
2023cites this paper
Attention: Marginal Probability is All You Need?
2023cites this paper
Learning behavior feature fused deep learning network model for MOOC dropout prediction
2023cites this paper
Structured Mean-Field Variational Inference for Higher-Order Span-Based Semantic Role
2023cites this paper
Efficient Beam Tree Recursion
2023cites this paper
Mapping of attention mechanisms to a generalized Potts model
2023cites this paper
A non‐linear non‐intrusive reduced order model of fluid flow by auto‐encoder and self‐attention deep learning methods
2023cites this paper
Self-attention in vision transformers performs perceptual grouping, not attention
2023cites this paper
RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling
2023cites this paper
Putting the Personalized Metabolic Avatar into Production: A Comparison between Deep-Learning and Statistical Models for Weight Prediction
2023cites this paper
Learning Second-Order Attentive Context for Efficient Correspondence Pruning
2023cites this paper