All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Published 2017 in International Conference on Learning Representations

ABSTRACT

Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.

PUBLICATION RECORD

Publication year
2017
Venue
International Conference on Learning Representations
Publication date
2017-02-05
Fields of study
Mathematics, Computer Science
Identifiers
arXiv 1702.01417
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

IEEE Transactions on Neural Networks and Learning Systems
2019influential reference
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
2017cited by this paper
Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
2016cited by this paper
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
2016cited by this paper
Evaluation methods for unsupervised word embeddings
2015cited by this paper
When and why are log-linear models self-normalizing?
2015cited by this paper
LSTM: A Search Space Odyssey
2015influential reference
A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets
2015cited by this paper
Gated Feedback Recurrent Neural Networks
2015influential reference
Model-based Word Embeddings from Decompositions of Count Matrices
2015cited by this paper
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
2015cited by this paper
A Latent Variable Model Approach to PMI-based Word Embeddings
2015influential reference
From Paraphrase Database to Compositional Paraphrase Model and Back
2015cited by this paper
SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation
2014cited by this paper
Sequence to Sequence Learning with Neural Networks
2014cited by this paper
Convolutional Neural Networks for Sentence Classification
2014influential reference
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
2014cited by this paper
Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
2014cited by this paper
A SICK cure for the evaluation of compositional distributional semantic models
2014cited by this paper
Neural Word Embedding as Implicit Matrix Factorization
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014influential reference
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013influential reference
*SEM 2013 shared task: Semantic Textual Similarity
2013cited by this paper
Better Word Representations with Recursive Neural Networks for Morphology
2013cited by this paper
Polyglot: Distributed Word Representations for Multilingual NLP
2013cited by this paper
Multimodal Distributional Semantics
2013cited by this paper
Efficient Estimation of Word Representations in Vector Space
2013influential reference
Reasoning With Neural Tensor Networks for Knowledge Base Completion
2013influential reference
Translating Embeddings for Modeling Multi-relational Data
2013cited by this paper
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
2012cited by this paper
Two Step CCA: A new spectral method for estimating vector models of words
2012influential reference
Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD
2012cited by this paper
Improving Word Representations via Global Context and Multiple Word Prototypes
2012cited by this paper
Natural Language Processing (Almost) from Scratch
2011cited by this paper
Learning Word Vectors for Sentiment Analysis
2011influential reference
A word at a time: computing word relatedness using temporal semantic analysis
2011cited by this paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
2010cited by this paper
Distributional Memory: A General Framework for Corpus-Based Semantics
2010cited by this paper
Recurrent neural network based language model
2010cited by this paper
Three new graphical models for statistical language modelling
2007cited by this paper
Attributes in lexical acquisition
2006cited by this paper
Automatically Creating Datasets for Measures of Semantic Relatedness
2006cited by this paper
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
2005cited by this paper
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
2004influential reference
A Neural Probabilistic Language Model
2003cited by this paper
Placing search in context: the concept revisited
2002cited by this paper
Learning Question Classifiers
2002cited by this paper
Placing search in context: the concept revisited
2001cited by this paper
ADAPTIVE ESTIMATION OF A QUADRATIC FUNCTIONAL BY MODEL SELECTION
2001cited by this paper
The maximum‐likelihood solution in inter‐battery factor analysis
1979cited by this paper
Contextual correlates of synonymy
1965cited by this paper
Relations Between Two Sets of Variates
1936cited by this paper

CITED BY

From Prerequisites to Predictions: Validating a Geometric Hallucination Taxonomy Through Controlled Induction
2026cites this paper
When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning
2026influential citation
Spectra: Rethinking Optimizers for LLMs Under Spectral Anisotropy
2026cites this paper
Language Model Representations for Efficient Few-Shot Tabular Classification
2026cites this paper
Measuring Affinity between Attention-Head Weight Subspaces via the Projection Kernel
2026cites this paper
SemPA: Improving Sentence Embeddings of Large Language Models through Semantic Preference Alignment
2026cites this paper
Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss
2026cites this paper
Global Geometry Is Not Enough for Vision Representations
2026cites this paper
PEARL: Prototype-Enhanced Alignment for Label-Efficient Representation Learning with Deployment-Driven Insights from Digital Governance Communication Systems
2026cites this paper
Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry
2026cites this paper
LANGSAE EDITING: Improving Multilingual Information Retrieval via Post-hoc Language Identity Removal
2026influential citation
On the Spectral Flattening of Quantized Embeddings
2026cites this paper
Differential syntactic and semantic encoding in LLMs
2026cites this paper
Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings
2025cites this paper
CASE - Condition-Aware Sentence Embeddings for Conditional Semantic Textual Similarity Measurement
2025influential citation
Rethinking Word Similarity: Semantic Similarity through Classification Confusion
2025cites this paper
When can isotropy help adapt LLMs' next word prediction to numerical domains?
2025influential citation
Semantic-Aligned Code Summarization: Bridging the Gap Between Code and Natural Language Through Data Flow Analysis
2025cites this paper
Better Embeddings with Coupled Adam
2025cites this paper
Scalable Universal T-Cell Receptor Embeddings from Adaptive Immune Repertoires
2025cites this paper
Federated Learning Based on Kernel Local Differential Privacy and Low Gradient Sampling
2025cites this paper
Factor Augmented Supervised Learning with Text Embeddings
2025cites this paper
Multi‐Objective Manifold Representation for Opinion Mining
2025cites this paper
Dimensionality Reduction of Mathematical Problem-Solution Embeddings From Large Language Models
2025cites this paper
Explicitly unbiased large language models still form biased associations
2025cites this paper
Compression Hacking: A Supplementary Perspective on Informatics Metric of Language Models from Geometric Distortion
2025influential citation
On the Predictive Power of Representation Dispersion in Language Models
2025cites this paper
Sensory sharpening and semantic prediction errors unify competing models of predictive processing in human speech comprehension
2025cites this paper
The empirical structure of psychopathology is represented in large language models
2025cites this paper
The cell as a token: high-dimensional geometry in language models and cell embeddings
2025cites this paper
Explainable AI for Binary Black Hole Light Curve Classification via Feature-Weighted Embeddings
2025cites this paper
TermGPT: Multi-Level Contrastive Fine-Tuning for Terminology Adaptation in Legal and Financial Domain
2025cites this paper
IDEAlign: Comparing Large Language Models to Human Experts in Open-ended Interpretive Annotations
2025influential citation
A short survey on almost orthogonal vectors in a few specific large dimensions
2025cites this paper
Know Yourself and Know Your Neighbour : A Syntactically Informed Self-Supervised Compositional Sentence Representation Learning Framework using a Recursive Hypernetwork
2025cites this paper
Adversarial Defense without Adversarial Defense: Enhancing Language Model Robustness via Instance-level Principal Component Removal
2025cites this paper
Metis: Training LLMs with FP4 Quantization
2025cites this paper
Is isotropy a good proxy for generalization in time series forecasting with transformers?
2025influential citation
Linear Dimensionality Reduction for Word Embeddings in Tabular Data Classification
2025influential citation
Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal
2025cites this paper
One Swallow Does Not Make a Summer: Understanding Semantic Structures in Embedding Spaces
2025cites this paper
Exact Robustness Certification of k-Nearest Neighbors
2025cites this paper
Do We Really Need All Those Dimensions? An Intrinsic Evaluation Framework for Compressed Embeddings
2025cites this paper
Distribution-Aware Exploration for Adaptive HNSW Search
2025cites this paper
Discrete Speech Unit Extraction via Independent Component Analysis
2025cites this paper
Static Word Embeddings for Sentence Semantic Representation
2025influential citation
A multimodal-multitask framework with cross-modal relation and hierarchical interactive attention for semantic comprehension
2025cites this paper
BOLT: Block-Orthonormal Lanczos for Trace estimation of matrix functions
2025cites this paper
From Anisotropy to Isotropy: The Role of Contrastive Learning in Sentence Representation Learning
2024influential citation
Anisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis
2024cites this paper
Navigating the Effect of Parametrization for Dimensionality Reduction
2024cites this paper
Isotropy, Clusters, and Classifiers
2024cites this paper
Revisiting Query Variation Robustness of Transformer Models
2024influential citation
Gradual Syntactic Label Replacement for Language Model Pre-Training
2024cites this paper
Structure-Aware Dialogue Modeling Methods for Conversational Semantic Role Labeling
2024cites this paper
Efficient Feature Selection for Word Embedding Dimension Reduction
2024cites this paper
CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion
2024cites this paper
Bias Vector: Mitigating Biases in Language Models with Task Arithmetic Approach
2024cites this paper
Dimension Reduction with Locally Adjusted Graphs
2024cites this paper
JuniperLiu at CoMeDi Shared Task: Models as Annotators in Lexical Semantics Disagreements
2024cites this paper
CROWD: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack
2024cites this paper
Lost in Disambiguation: How Instruction-Tuned LLMs Master Lexical Ambiguity
2024cites this paper
Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models
2024influential citation
Short-text topic modeling with dual reinforcement from internal and external semantics
2024cites this paper
Latent Structures of Intertextuality in French Fiction
2024cites this paper
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
2024influential citation
Reconsidering Degeneration of Token Embeddings with Definitions for Encoder-based Pre-trained Language Models
2024influential citation
Zipfian Whitening
2024influential citation
ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science
2024cites this paper
How and where does CLIP process negation?
2024cites this paper
Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection
2024cites this paper
Leveraging natural language processing models to automate speech-intelligibility scoring
2024influential citation
Investigating Distributions of Telecom Adapted Sentence Embeddings for Document Retrieval
2024cites this paper
Exploring the Impact of a Transformer's Latent Space Geometry on Downstream Task Performance
2024cites this paper
Voices in a Crowd: Searching for clusters of unique perspectives
2024cites this paper
Contextualized dynamic meta embeddings based on Gated CNNs and self-attention for Arabic machine translation
2024cites this paper
Understanding Token Probability Encoding in Output Embeddings
2024cites this paper
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
2024cites this paper
Combining Discrete Wavelet and Cosine Transforms for Efficient Sentence Embedding
2024cites this paper
Predicting drug–gene relations via analogy tasks with word embeddings
2024cites this paper
Anisotropy is Not Inherent to Transformers
2024cites this paper
ABLE: Agency-BeLiefs Embedding to Address Stereotypical Bias through Awareness Instead of Obliviousness
2024cites this paper
Semanformer: Semantics-aware Embedding Dimensionality Reduction Using Transformer-Based Models
2024cites this paper
Representation Degeneration Problem in Prompt-based Models for Natural Language Understanding
2024cites this paper
HIL: Hybrid Isotropy Learning for Zero-shot Performance in Dense retrieval
2024cites this paper
Mind the Gap Between Prototypes and Images in Cross-domain Finetuning
2024cites this paper
A Tool Kit for Relation Induction in Text Analysis
2024cites this paper
Subword Attention and Post-Processing for Rare and Unknown Contextualized Embeddings
2024cites this paper
Investigating the Performance Impact of Dimensionality Reduction on Word Vectors
2023cites this paper
Investigating the Effectiveness of Whitening Post-processing Methods on Modifying LLMs Representations
2023influential citation
Why "classic" Transformers are shallow and how to make them go deep
2023cites this paper
Isotropic Representation Can Improve Zero-Shot Cross-Lingual Transfer on Multilingual Language Models
2023cites this paper
Improving Activation Steering in Language Models with Mean-Centring
2023cites this paper
A Comparative Study of Different Dimensionality Reduction Techniques for Arabic Machine Translation
2023influential citation
Cluster Validity for Fuzzy Text Segmentation
2023cites this paper
Emotion-Prior Awareness Network for Emotional Video Captioning
2023influential citation
Outlier Dimensions Encode Task-Specific Knowledge
2023cites this paper
Text Rendering Strategies for Pixel Language Models
2023cites this paper
Private Web Search with Tiptoe
2023cites this paper
Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling
2023cites this paper