Self-Attention with Structural Position Representations

Xing Wang,Zhaopeng Tu,Longyue Wang,Shuming Shi

Published 2019 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

Although self-attention networks (SANs) have advanced the state-of-the-art on various NLP tasks, one criticism of SANs is their ability of encoding positions of input words (Shaw et al., 2018). In this work, we propose to augment SANs with structural position representations to model the latent structure of the input sentence, which is complementary to the standard sequential positional representations. Specifically, we use dependency tree to represent the grammatical structure of a sentence, and propose two strategies to encode the positional relationships among words in the dependency tree. Experimental results on NIST Chinese-to-English and WMT14 English-to-German translation tasks show that the proposed approach consistently boosts performance over both the absolute and relative sequential position representations.

PUBLICATION RECORD

Publication year
2019
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2019-09-01
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/D19-1145 arXiv 1909.00383
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

On Aspects of the Theory of Syntax
2021cited by this paper
THUMT: An Open-Source Toolkit for Neural Machine Translation
2020cited by this paper
Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement
2019cited by this paper
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Improving Neural Machine Translation with Neural Syntactic Distance
2019cited by this paper
Exploiting Sentential Context for Neural Machine Translation
2019cited by this paper
Adaptive Attention Span in Transformers
2019cited by this paper
Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis
2019cited by this paper
Attention Augmented Convolutional Networks
2019cited by this paper
A Structural Probe for Finding Syntax in Word Representations
2019cited by this paper
Multi-Granularity Self-Attention for Neural Machine Translation
2019cited by this paper
Context-Aware Self-Attention Networks
2019cited by this paper
Modeling Recurrence for Transformer
2019cited by this paper
Convolutional Self-Attention Networks
2019cited by this paper
Semantic Neural Machine Translation Using AMR
2019cited by this paper
Modeling Localness for Self-Attention Networks
2018cited by this paper
Phrase-level Self-Attention Networks for Universal Sentence Encoding
2018cited by this paper
Linguistically-Informed Self-Attention for Semantic Role Labeling
2018cited by this paper
Self-Attention with Relative Position Representations
2018influential reference
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties
2018cited by this paper
Multi-layer Representation Fusion for Neural Machine Translation
2018cited by this paper
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
2018cited by this paper
Exploiting Deep Representations for Neural Machine Translation
2018cited by this paper
A Structured Self-attentive Sentence Embedding
2017cited by this paper
Structured Attention Networks
2017cited by this paper
Attention is All you Need
2017influential reference
Learning to Parse and Translate Improves Neural Machine Translation
2017cited by this paper
A Decomposable Attention Model for Natural Language Inference
2016cited by this paper
Recurrent Neural Network Grammars
2016cited by this paper
Neural Machine Translation Advised by Statistical Machine Translation
2016cited by this paper
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
2015cited by this paper
Afterword The impact of “The cognitive basis for linguistic structures”
2013cited by this paper
Abstract Meaning Representation for Sembanking
2013cited by this paper
Dependency Parsing
2009cited by this paper
Dependency Grammar and Dependency Parsing
2005cited by this paper
Statistical Significance Tests for Machine Translation Evaluation
2004cited by this paper
Accurate Unlexicalized Parsing
2003cited by this paper
Head-Driven Statistical Models for Natural Language Parsing
2003cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
A non-projective dependency parser
1997cited by this paper
Cognition and the development of language
1970cited by this paper

CITED BY

Hierarchical Attention Adapter for Abstractive Dialogue Summarization
2025cites this paper
Orthogonal position representations for transformer in neural machine translation
2025cites this paper
OHiFormer: Object-Wise Hierarchical Dependency-Based Transformer for Screen Summarization
2024influential citation
AutoSurvey: Large Language Models Can Automatically Write Surveys
2024cites this paper
SETE: Syntax-Enhanced Triplet Extraction with Semantic Consistency
2024cites this paper
BERT2D: Two Dimensional Positional Embeddings for Efficient Turkish NLP
2024cites this paper
Negation Triplet Extraction with Syntactic Dependency and Semantic Consistency
2024cites this paper
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation
2024cites this paper
XTSFormer: Cross-Temporal-Scale Transformer for Irregular Time Event Prediction
2024cites this paper
Transferability of Syntax-Aware Graph Neural Networks in Zero-Shot Cross-Lingual Semantic Role Labeling
2024cites this paper
Graph Enhanced Transformer for Aspect Category Detection
2023cites this paper
Integrating Structural Priors into Transformer for Named Entity Recognition
2023cites this paper
Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting Pre-trained Language Models
2023cites this paper
What should be encoded by position embedding for neural network language models?
2023cites this paper
“Chère maison” or “maison chère”? Transformer-based prediction of adjective placement in French
2023cites this paper
Multilayer self‐attention residual network for code search
2023cites this paper
P-Transformer: Towards Better Document-to-Document Neural Machine Translation
2022cites this paper
Global Positional Self-Attention for Skeleton-Based Action Recognition
2022cites this paper
MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences
2022cites this paper
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
2022cites this paper
Transformer for Graphs: An Overview from Architecture Perspective
2022cites this paper
Pure Transformers are Powerful Graph Learners
2022cites this paper
Neural machine translation model combining dependency syntax and LSTM
2022cites this paper
STable: Table Generation Framework for Encoder-Decoder Models
2022cites this paper
Rethinking Positional Encoding in Tree Transformer for Code Representation
2022cites this paper
Deep-DFT: A Physics-ML Hybrid Approach to Predict Molecular Energy using Transformer
2021cites this paper
CaEGCN: Cross-Attention Fusion Based Enhanced Graph Convolutional Network for Clustering
2021cites this paper
Mitigating the Position Bias of Transformer Models in Passage Re-Ranking
2021cites this paper
Position Information in Transformers: An Overview
2021influential citation
Syntax-Informed Self-Attention Network for Span-Based Joint Entity and Relation Extraction
2021cites this paper
Graph Attention Networks with Positional Embeddings
2021cites this paper
Interpreting Multivariate Shapley Interactions in DNNs
2021cites this paper
Syntax-augmented Multilingual BERT for Cross-lingual Transfer
2021cites this paper
Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding
2021cites this paper
Building Interpretable Interaction Trees for Deep NLP Models
2021cites this paper
On Position Embeddings in BERT
2021cites this paper
Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation
2021cites this paper
Neural Machine Translating from XML to RDF
2021cites this paper
Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model
2021cites this paper
The Impact of Positional Encodings on Multilingual Compression
2021cites this paper
HyperExpan: Taxonomy Expansion with Hyperbolic Representation Learning
2021cites this paper
Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction
2021cites this paper
A Unified Encoding of Structures in Transition Systems
2021cites this paper
Transformer with Syntactic Position Encoding for Machine Translation
2021cites this paper
Position-aware Graph Attention Networks for Emotion Recognition in Conversations
2021cites this paper
Trees in transformers: a theoretical analysis of the Transformer's ability to represent trees
2021cites this paper
On Position Embeddings in BERT O N P OSITION E MBEDDINGS IN BERT
2021cites this paper
Automatic Business Process Structure Discovery using Ordered Neurons LSTM: A Preliminary Study
2020cites this paper
Explicit Reordering for Neural Machine Translation
2020influential citation
A-BPS: Automatic Business Process Discovery Service using Ordered Neurons LSTM
2020cites this paper
Tencent AI Lab Machine Translation Systems for the WMT20 Biomedical Translation Task
2020cites this paper
Debunking Rumors on Twitter with Tree Transformer
2020cites this paper
Multi-Unit Transformer for Neural Machine Translation
2020cites this paper
Relation-aware Graph Attention Networks with Relational Position Encodings for Emotion Recognition in Conversations
2020cites this paper
Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation
2020influential citation
Multi-Unit Transformers for Neural Machine Translation
2020influential citation
DA-Transformer: Distance-aware Transformer
2020cites this paper
Interpreting Multivariate Interactions in DNNs
2020cites this paper
Incorporating Phrase-Level Agreement into Neural Machine Translation
2020cites this paper
GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction
2020influential citation
On the Sparsity of Neural Machine Translation Models
2020cites this paper
Multiple Structural Priors Guided Self Attention Network for Language Understanding
2020cites this paper
Fusion of discourse structural position encoding for neural machine translation
2020cites this paper
Self-Attentive Hawkes Process
2020cites this paper
Spatially Aware Multimodal Transformers for TextVQA
2020cites this paper
Interpreting Hierarchical Linguistic Interactions in DNNs
2020cites this paper
Learning Source Phrase Representations for Neural Machine Translation
2020cites this paper
How Does Selective Mechanism Improve Self-Attention Networks?
2020cites this paper
Image Captioning through Image Transformer
2020cites this paper
Self-Attention with Cross-Lingual Position Representation
2020cites this paper
Self-Attentive Hawkes Processes
2019cites this paper
Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons
2019cites this paper
Captioning Through
year unknowncites this paper
Chère maison or maison chère ? Transformer-based Prediction of Adjective Placement in French
year unknowncites this paper
Graph Neural Networks for Syntax Encoding in Cross-Lingual Semantic Role Labeling
year unknowncites this paper