StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure

Mattia Opper,Victor Prokhorov,N. Siddharth

Published 2023 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

This work presents StrAE: a Structured Autoencoder framework that through strict adherence to explicit structure, and use of a novel contrastive objective over tree-structured representations, enables effective learning of multi-level representations. Through comparison over different forms of structure, we verify that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models. We then further extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm. This variant, called Self-StrAE, outperforms baselines that don't involve explicit hierarchical compositions, and is comparable to models given informative structure (e.g. constituency parses). Our experiments are conducted in a data-constrained (circa 10M tokens) setting to help tease apart the contribution of the inductive bias to effective learning. However, we find that this framework can be robust to scale, and when extended to a much larger dataset (circa 100M tokens), our 430 parameter model performs comparably to a 6-layer RoBERTa many orders of magnitude larger in size. Our findings support the utility of incorporating explicit composition as an inductive bias for effective representation learning.

PUBLICATION RECORD

Publication year
2023
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2023-05-09
Fields of study
Computer Science
Identifiers
DOI 10.48550/arXiv.2305.05588 arXiv 2305.05588
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
2023cited by this paper
Efficient Transformers with Dynamic Token Pooling
2022cited by this paper
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale
2022cited by this paper
Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling
2022cited by this paper
Improving Constituent Representation with Hypertree Neural Networks
2022cited by this paper
CQR-SQL: Conversational Question Reformulation Enhanced Context-Dependent Text-to-SQL Parsers
2022cited by this paper
Characterizing Intrinsic Compositionality in Transformers with Tree Projections
2022cited by this paper
Hierarchical Transformers Are More Efficient Language Models
2021cited by this paper
BabyBERTa: Learning More Grammar With Small-Scale Child-Directed Language
2021cited by this paper
Learning Transferable Visual Models From Natural Language Supervision
2021cited by this paper
A Simple Framework for Contrastive Learning of Visual Representations
2020cited by this paper
KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations
2020cited by this paper
Probing Pretrained Language Models for Lexical Semantics
2020cited by this paper
Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders
2020influential reference
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models
2020cited by this paper
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
2020cited by this paper
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
2020cited by this paper
Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks.
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019influential reference
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders
2019influential reference
What Does BERT Learn about the Structure of Language?
2019cited by this paper
Compound Probabilistic Context-Free Grammars for Grammar Induction
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019influential reference
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019cited by this paper
Tree Transformer: Integrating Tree Structures into Self-Attention
2019cited by this paper
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2018influential reference
Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
2018cited by this paper
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
2018cited by this paper
On Tree-Based Neural Sentence Modeling
2018cited by this paper
Building Machines that Learn and Think Like People
2018cited by this paper
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
2017cited by this paper
Learning to Compose Task-Specific Tree Structures
2017cited by this paper
Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs
2017cited by this paper
Attention is All you Need
2017cited by this paper
Pointer Sentinel Mixture Models
2016cited by this paper
Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity
2016cited by this paper
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
2016cited by this paper
Enriching Word Vectors with Subword Information
2016influential reference
Learning to Compose Words into Sentences with Reinforcement Learning
2016cited by this paper
Semi-Supervised Classification with Graph Convolutional Networks
2016cited by this paper
Language to Logical Form with Neural Attention
2016cited by this paper
Neural Machine Translation of Rare Words with Subword Units
2015cited by this paper
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
2015influential reference
U-Net: Convolutional Networks for Biomedical Image Segmentation
2015cited by this paper
The Stanford CoreNLP Natural Language Processing Toolkit
2014cited by this paper
A SICK cure for the evaluation of compositional distributional semantic models
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
Inside-Outside Semantics : A Framework for Neural Models of Semantic Composition
2014influential reference
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
2014influential reference
Dependency-Based Word Embeddings
2014cited by this paper
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013cited by this paper
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
2012cited by this paper
Cortical representation of the constituent structure of sentences
2011cited by this paper
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
2011cited by this paper
Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit
2010cited by this paper
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches
2009cited by this paper
Learning Accurate, Compact, and Interpretable Tree Annotation
2006cited by this paper
Generating Typed Dependency Parses from Phrase Structure Parses
2006cited by this paper
Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency
2004cited by this paper
Long Short-Term Memory
1997influential reference
Structure dependence in grammar formation
1987cited by this paper
Three models for the description of language
1956cited by this paper

CITED BY

TRA: Better Length Generalisation with Threshold Relative Attention
2025cites this paper
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization
2024cites this paper