Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Yossi Adi,Einat Kermany,Yonatan Belinkov,Ofer Lavi,Yoav Goldberg

Published 2016 in International Conference on Learning Representations

ABSTRACT

There is a lot of research interest in encoding variable length sentences into fixed length vectors, in a way that preserves the sentence meanings. Two common methods include representations based on averaging word vectors, and representations based on the hidden states of recurrent neural networks such as LSTMs. The sentence vectors are used as features for subsequent machine learning tasks or for pre-training in the context of deep learning. However, not much is known about the properties that are encoded in these sentence representations and about the language information they capture. We propose a framework that facilitates better understanding of the encoded representations. We define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input. We demonstrate the potential contribution of the approach by analyzing different sentence representation mechanisms. The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

PUBLICATION RECORD

Publication year
2016
Venue
International Conference on Learning Representations
Publication date
2016-08-15
Fields of study
Computer Science
Identifiers
arXiv 1608.04207
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS
2018cited by this paper
Representation of Linguistic Form and Function in Recurrent Neural Networks
2016cited by this paper
Word Ordering Without Syntax
2016cited by this paper
Learning Distributed Representations of Sentences from Unlabelled Data
2016cited by this paper
Skip-Thought Vectors
2015influential reference
A Hierarchical Neural Autoencoder for Paragraphs and Documents
2015cited by this paper
Improving Distributional Similarity with Lessons Learned from Word Embeddings
2015cited by this paper
Semi-supervised Sequence Learning
2015cited by this paper
Visualizing and Understanding Recurrent Networks
2015influential reference
rnn : Recurrent Library for Torch
2015cited by this paper
Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
2014cited by this paper
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
2014cited by this paper
Adam: A Method for Stochastic Optimization
2014influential reference
Sequence to Sequence Learning with Neural Networks
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014influential reference
Linguistic Regularities in Sparse and Explicit Word Representations
2014cited by this paper
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2014cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Speech recognition with deep recurrent neural networks
2013cited by this paper
Improving neural networks by preventing co-adaptation of feature detectors
2012influential reference
ADADELTA: An Adaptive Learning Rate Method
2012cited by this paper
The conference paper
2011cited by this paper
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
2011cited by this paper
Torch7: A Matlab-like Environment for Machine Learning
2011cited by this paper
Generating Text with Recurrent Neural Networks
2011cited by this paper
Deep Sparse Rectifier Neural Networks
2011cited by this paper
Rectified Linear Units Improve Restricted Boltzmann Machines
2010cited by this paper
NLTK: The Natural Language Toolkit
2006cited by this paper
Bleu: a Method for Automatic Evaluation of Machine Translation
2002cited by this paper
Long Short-Term Memory
1997cited by this paper
Distributed representations, simple recurrent networks, and grammatical structure
1991cited by this paper
Matched Sampling for Causal Effects: Matching to Remove Bias in Observational Studies
1973cited by this paper

CITED BY

Faithful explanation of semantic role labelling with dependency and constituency feature importance
2026cites this paper
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
2025cites this paper
Domain Pre-training Impact on Representations
2025cites this paper
How do autoregressive transformers solve full addition?
2025cites this paper
Probing Network Decisions: Capturing Uncertainties and Unveiling Vulnerabilities Without Label Information
2025cites this paper
MoLEx: Mixture of Layer Experts for Finetuning with Sparse Upcycling
2025cites this paper
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
2025cites this paper
An Automated Length-Aware Quality Metric for Summarization
2025cites this paper
Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms
2025cites this paper
Reference-Free Rating of LLM Responses via Latent Information
2025cites this paper
Probing Neural Combinatorial Optimization Models
2025cites this paper
Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples
2025cites this paper
Mechanistic Interpretability of Emotion Inference in Large Language Models
2025cites this paper
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
2025cites this paper
Spontaneous Speech Variables for Evaluating LLMs Cognitive Plausibility
2025cites this paper
Constructions are Revealed in Word Distributions
2025cites this paper
Disentangling Linguistic Features with Dimension-Wise Analysis of Vector Embeddings
2025cites this paper
Linearly Decoding Refused Knowledge in Aligned Language Models
2025cites this paper
Toward Immersive Computational Storytelling: Card-Framework for Enhanced Persona-Driven Dialogues
2025cites this paper
Steering Prepositional Phrases in Language Models: A Case of with-headed Adjectival and Adverbial Complements in Gemma-2
2025cites this paper
Words That Win: A Multi-Model NLP and Clustering Framework for Analyzing Founder Language and Business Success
2025cites this paper
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
2025cites this paper
Do Prompts Reshape Representations? An Empirical Study of Prompting Effects on Embeddings
2025cites this paper
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
2025cites this paper
Deciphering Nonlinear Hydrological Process by a Coupled Deep Learning and Physical Based Model in Southern Tibetan Plateau
2025cites this paper
Type and Complexity Signals in Multilingual Question Representations
2025cites this paper
The Blessing and Curse of Dimensionality in Safety Alignment
2025cites this paper
Less Mature is More Adaptable for Sentence-level Language Modeling
2025cites this paper
How Much Do Encoder Models Know About Word Senses?
2025cites this paper
Echoes of BERT: Do Modern Language Models Rediscover the Classical NLP Pipeline?
2025cites this paper
STARE at the Structure: Steering ICL Exemplar Selection with Structural Alignment
2025cites this paper
Towards an Explainable Comparison and Alignment of Feature Embeddings
2025cites this paper
Geological Inference from Textual Data using Word Embeddings
2025cites this paper
What Has LeBenchmark Learnt about French Syntax?
2024cites this paper
IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?
2024cites this paper
Improving Contrastive Learning in Emotion Recognition in Conversation via Data Augmentation and Decoupled Neutral Emotion
2024cites this paper
Exploring Syntactic Information in Sentence Embeddings through Multilingual Subject-verb Agreement
2024cites this paper
Does the Order of Fine-tuning Matter and Why?
2024cites this paper
Probing Omissions and Distortions in Transformer-based RDF-to-Text Models
2024cites this paper
Language-Independent Representations Improve Zero-Shot Summarization
2024cites this paper
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data
2024cites this paper
Tracking linguistic information in transformer-based sentence embeddings through targeted sparsification
2024cites this paper
Optimal synthesis embeddings
2024cites this paper
Unveiling Semantic Information in Sentence Embeddings
2024cites this paper
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
2024cites this paper
Mechanistic?
2024cites this paper
Position: Do Not Explain Vision Models Without Context
2024cites this paper
DEAN: Deactivating the Coupled Neurons to Mitigate Fairness-Privacy Conflicts in Large Language Models
2024cites this paper
What Should Embeddings Embed? Autoregressive Models Represent Latent Generating Distributions
2024cites this paper
When is an Embedding Model More Promising than Another?
2024cites this paper
Probing the Category of Verbal Aspect in Transformer Language Models
2024cites this paper
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL
2024cites this paper
In Tree Structure Should Sentence Be Generated
2024cites this paper
LLMs’ morphological analyses of complex FST-generated Finnish words
2024cites this paper
Limits of Theory of Mind Modelling in Dialogue-Based Collaborative Plan Acquisition
2024cites this paper
Embedded Named Entity Recognition using Probing Classifiers
2024cites this paper
Topic Aware Probing: From Sentence Length Prediction to Idiom Identification how reliant are Neural Language Models on Topic?
2024cites this paper
By Tying Embeddings You Are Assuming the Distributional Hypothesis
2024cites this paper
The Linguistic Feature Relation Analysis of Premise and Hypothesis for Interpreting Nature Language Inference
2024cites this paper
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
2024cites this paper
Investigating OCR-Sensitive Neurons to Improve Entity Recognition in Historical Documents
2024cites this paper
Code-Mixed Probes Show How Pre-Trained Models Generalise on Code-Switched Text
2024cites this paper
CausalGym: Benchmarking causal interpretability methods on linguistic tasks
2024cites this paper
Extending Token Computation for LLM Reasoning
2024cites this paper
Détection de la nasalité en parole à partir de wav2vec 2.0 [Detecting Nasality in Speech Using Neural Models]
2024cites this paper
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models
2024cites this paper
Counterfactual Generation from Language Models
2024cites this paper
Holmes ⌕ A Benchmark to Assess the Linguistic Competence of Language Models
2024cites this paper
Sparse Sounds: Exploring Low-Dimensionality in Music Generation Model
2024cites this paper
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
2024cites this paper
Gumbel Counterfactual Generation From Language Models
2024cites this paper
StructFormer: Document Structure-based Masked Attention and its Impact on Language Model Pre-Training
2024cites this paper
The Tug of War Within: Mitigating the Fairness-Privacy Conflicts in Large Language Models
2024cites this paper
Which to select?: Analysis of speaker representation with graph attention networks.
2024cites this paper
Empirical Evaluation of Concept Probing for Game-Playing Agents
2024cites this paper
Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5
2024cites this paper
What Do Transformers Know about Government?
2024cites this paper
Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models
2024cites this paper
On Memorization of Large Language Models in Logical Reasoning
2024cites this paper
Estimating Knowledge in Large Language Models Without Generating a Single Token
2024cites this paper
Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering
2023cites this paper
Analyzing Gender Bias in Multilingual Machine Translation
2023cites this paper
Towards Extracting and Understanding the Implicit Rubrics of Transformer Based Automatic Essay Scoring Models
2023cites this paper
Enhancing use of BERT information in neural machine translation with masking-BERT attention
2023cites this paper
Probing Numeracy and Logic of Language Models of Code
2023cites this paper
Probing Frozen NL Models for Alignment with Human Reasoning
2023cites this paper
A WordNet View on Crosslingual Transformers
2023cites this paper
Semantic Accuracy in Natural Language Generation: A Thesis Proposal
2023cites this paper
Exploring the Relationship between Analogy Identification and Sentence Structure Encoding in Large Language Models
2023cites this paper
Operationalising Representation in Natural Language Processing
2023cites this paper
ConvXAI : Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing
2023cites this paper
Social world knowledge: Modeling and applications
2023cites this paper
A Causality Inspired Framework for Model Interpretation
2023cites this paper
Symbols and grounding in large language models
2023cites this paper
ConvXAI : Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing
2023cites this paper
Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey
2023cites this paper
Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer in Prompt Tuning
2023cites this paper
Morphosyntactic probing of multilingual BERT models
2023influential citation
ConvXAI : Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing
2023cites this paper
Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space
2023cites this paper