SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models

Published 2020 in IEEE/ACM Transactions on Audio Speech and Language Processing

ABSTRACT

Sentence embedding is an important research topic in natural language processing (NLP) since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word representation, called BERT, achieves the state-of-the-art performance in quite a few NLP tasks. Yet, it is an open problem to generate a high quality sentence representation from BERT-based word models. It was shown in previous study that different layers of BERT capture different linguistic properties. This allows us to fuse information across layers to find better sentence representations. In this work, we study the layer-wise pattern of the word representation of deep contextualized models. Then, we propose a new sentence embedding method by dissecting BERT-based word models through geometric analysis of the space spanned by the word representation. It is called the SBERT-WK method. No further training is required in SBERT-WK. We evaluate SBERT-WK on semantic textual similarity and downstream supervised tasks. Furthermore, ten sentence-level probing tasks are presented for detailed linguistic analysis. Experiments show that SBERT-WK achieves the state-of-the-art performance. Our codes are publicly available.

PUBLICATION RECORD

Publication year
2020
Venue
IEEE/ACM Transactions on Audio Speech and Language Processing
Publication date
2020-02-16
Fields of study
Computer Science
Identifiers
DOI 10.1109/taslp.2020.3008390 arXiv 2002.06652
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Pitfalls in the Evaluation of Sentence Embeddings
2019cited by this paper
Are Sixteen Heads Really Better than One?
2019cited by this paper
Language Models are Unsupervised Multitask Learners
2019cited by this paper
Linguistic Knowledge and Transferability of Contextual Representations
2019cited by this paper
Evaluating word embedding models: methods and experimental results
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
Revealing the Dark Secrets of BERT
2019cited by this paper
Visualizing and Understanding the Effectiveness of BERT
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition
2019cited by this paper
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
2019cited by this paper
Efficient Sentence Embedding using Discrete Cosine Transform
2019cited by this paper
Language Models as Knowledge Bases?
2019cited by this paper
What Does BERT Learn about the Structure of Language?
2019cited by this paper
XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019cited by this paper
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
2019cited by this paper
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
2019influential reference
Interpretable Convolutional Neural Networks via Feedforward Design
2018cited by this paper
Deep Contextualized Word Representations
2018cited by this paper
Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations
2018cited by this paper
SentEval: An Evaluation Toolkit for Universal Sentence Representations
2018cited by this paper
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
2018cited by this paper
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
2018cited by this paper
What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties
2018influential reference
Know What You Don’t Know: Unanswerable Questions for SQuAD
2018cited by this paper
Evaluation of sentence embeddings in downstream and linguistic probing tasks
2018cited by this paper
Improving Language Understanding by Generative Pre-Training
2018cited by this paper
Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline
2018cited by this paper
Post-Processing of Word Representations via Variance Normalization and Dynamic Embedding
2018cited by this paper
Towards Understanding Linear Word Analogies
2018cited by this paper
Universal Sentence Encoder for English
2018influential reference
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
2018cited by this paper
Parameter-free Sentence Embedding via Orthogonal Basis
2018cited by this paper
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
2017influential reference
Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
2017cited by this paper
A Simple but Tough-to-Beat Baseline for Sentence Embeddings
2017influential reference
The Robustness of Deep Networks: A Geometrical Perspective
2017cited by this paper
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
2017cited by this paper
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
2017cited by this paper
Attention is All you Need
2017cited by this paper
Learning Distributed Representations of Sentences from Unlabelled Data
2016cited by this paper
SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
2016cited by this paper
Enriching Word Vectors with Subword Information
2016cited by this paper
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
2016cited by this paper
Understanding convolutional neural networks with a mathematical model
2016cited by this paper
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
2015cited by this paper
Skip-Thought Vectors
2015cited by this paper
Evaluation of Word Vector Representations by Subspace Alignment
2015cited by this paper
Neural Word Embedding as Implicit Matrix Factorization
2014cited by this paper
GloVe: Global Vectors for Word Representation
2014cited by this paper
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
2014cited by this paper
A SICK cure for the evaluation of compositional distributional semantic models
2014cited by this paper
Distributed Representations of Words and Phrases and their Compositionality
2013cited by this paper
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013cited by this paper
*SEM 2013 shared task: Semantic Textual Similarity
2013cited by this paper
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
2012cited by this paper
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
2005cited by this paper
Annotating Expressions of Opinions and Emotions in Language
2005cited by this paper
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
2004cited by this paper
Mining and summarizing customer reviews
2004cited by this paper
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
2004cited by this paper
Learning Question Classifiers
2002cited by this paper
An introduction to latent semantic analysis
1998cited by this paper

CITED BY

FaTRQ: Tiered Residual Quantization for LLM Vector Search in Far-Memory-Aware ANNS Systems
2026cites this paper
Hierarchical Attention-Based Multi-Agent DRL for Semantic-Aware Spectrum Efficiency in 6G V2X
2026cites this paper
Sentence representations for semantic textual similarity: A systematic review
2026cites this paper
Sparse Self Attention Network Model for Short Text Classification
2025cites this paper
Know Yourself and Know Your Neighbour : A Syntactically Informed Self-Supervised Compositional Sentence Representation Learning Framework using a Recursive Hypernetwork
2025cites this paper
HiFi-HARP: A High-Fidelity 7th-Order Ambisonic Room Impulse Response Dataset
2025cites this paper
From Speech Semantics to Brain Activity—Timescales Are Key in Their Information Transfer
2025cites this paper
Knowledge Graphs and Fine-Grained Visual Features: A Potent Duo Against Cheapfakes
2025influential citation
CRC: Knowledge-Enhanced Caption Reconstruction and Comparison for Detecting Out-of-Context Misinformation
2025cites this paper
Semantic-Aware Spectrum Efficiency for 6G V2x URLLC with Multi-Agent Hierarchical DRL
2025cites this paper
Truth be told: a multimodal ensemble approach for enhanced fake news detection in textual and visual media
2025cites this paper
Revolutionising English language education:empowering teachers with BERT-LSTM-driven pedagogical tools
2025cites this paper
Digital transformation and environmentally sustainable innovation: Based on machine learning and text analysis methods.
2025cites this paper
Community-Oriented Sentence Simplification: Towards Accessible Language Processing
2025cites this paper
Multimodal Large Language Model for Out-of-Context Problems in Fake News Detection
2025cites this paper
Detecting Dataset Reuse and Modification in Data Spaces via Structure-Aware Similarity Analysis
2025cites this paper
Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings
2025cites this paper
Semantic Compression for Word and Sentence Embeddings using Discrete Wavelet Transform
2025cites this paper
Research on a Binary Code Similarity Detection Method Based on Jump-ModernBERT
2025cites this paper
Emergent musical properties of a transformer under contrastive self-supervised learning
2025cites this paper
CLEAR: Cross-Document Link-Enhanced Attention for Relation Extraction with Relation-Aware Context Filtering
2025cites this paper
Contextualized Cross‐Domain Aspect Sentiment Transformer: A Fine‐Grained Aspect‐Centric Approach for Enhanced Context‐Aware Sentiment Analysis
2025cites this paper
Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test cases
2025cites this paper
Digital innovation, human capital allocation, and labour share: Empirical evidence from listed companies in China
2025cites this paper
Knowledge Graph-Augmented ERNIE-CNN Method for Risk Assessment in Secondary Power System Operations
2025cites this paper
A Comparative Analysis of Python Text Matching Libraries: A Multilingual Evaluation of Capabilities, Performance and Resource Utilization
2025cites this paper
DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation
2025cites this paper
MCCI: A multi-channel collaborative interaction framework for multimodal knowledge graph completion
2025cites this paper
ProST: spotting propaganda span and technique classification in news articles
2025cites this paper
Deep Learning-Based Scientific Document Summarization Considering Citation
2025cites this paper
Management of psychological emergency cases on social media: A hybrid approach combining knowledge graphs and graph neural networks
2025cites this paper
An NLP-Enabled Approach to Semantic Grouping for Improved Requirements Modularity and Traceability
2025cites this paper
Using AI and NLP for Tacit Knowledge Conversion in Knowledge Management Systems: A Comparative Analysis
2025cites this paper
Interpretable Text Embeddings and Text Similarity Explanation: A Survey
2025cites this paper
Combining referenced publication year spectroscopy and topic clustering to identify key knowledge foundations in scientometrics: an analysis of recipients of the Price Award
2025cites this paper
Comprehensive Out-of-context Misinformation Detection via Global Information Enhancement
2024cites this paper
A relation-aware representation approach for the question matching system
2024cites this paper
SMGC-SBERT: A Multi-Feature Fusion Chinese Short Text Similarity Computation Model Based on Optimised SBERT
2024cites this paper
Harmonized system code classification using supervised contrastive learning with sentence BERT and multiple negative ranking loss
2024cites this paper
Popularity Estimation and New Bundle Generation using Content and Context based Embeddings
2024cites this paper
SlideSpawn: An Automatic Slides Generation System for Research Publications
2024cites this paper
Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning
2024cites this paper
From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
2024cites this paper
Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs
2024cites this paper
Enhanced Resume Screening for Smart Hiring Using Sentence-Bidirectional Encoder Representations from Transformers (S-BERT)
2024cites this paper
Contrastive Learning with Transformer Initialization and Clustering Prior for Text Representation
2024cites this paper
ClassifAI: Automating Issue Reports Classification using Pre-Trained BERT (Bidirectional Encoder Representations from Transformers) Language Models
2024cites this paper
A Quantum-Like Tensor Compression Sentence Representation Based on Constraint Functions for Semantics Analysis
2024cites this paper
Enhancing query relevance: leveraging SBERT and cosine similarity for optimal information retrieval
2024influential citation
Soil Organic Carbon Estimation via Remote Sensing and Machine Learning Techniques: Global Topic Modeling and Research Trend Exploration
2024cites this paper
Extracting Sentence Embeddings from Pretrained Transformer Models
2024cites this paper
Extractive Question Answering Over Ancient Scriptures Texts Using Generative AI and Natural Language Processing Techniques
2024cites this paper
Similarity Over Factuality: Are we Making Progress on Multimodal Out-of-Context Misinformation Detection?
2024cites this paper
Multi-schema prompting powered token-feature woven attention network for short text classification
2024cites this paper
User identification across online social networks based on gated multi-feature extraction
2024cites this paper
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
2024cites this paper
A Generative AI-Based Assistant to Evaluate Short and Long Answer Questions
2024cites this paper
Enhancing Cheapfake Detection: An Approach Using Prompt Engineering and Interleaved Text-Image Model
2024cites this paper
Experimental study on short-text clustering using transformer-based semantic similarity measure
2024cites this paper
Automated formation of university R&D teams based on the competence selection algorithm
2024cites this paper
SetCSE: Set Operations using Contrastive Learning of Sentence Embeddings
2024cites this paper
A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings
2024cites this paper
ACP-DRL: an anticancer peptides recognition method based on deep representation learning
2024cites this paper
A survey of text summarization: Techniques, evaluation and challenges
2024cites this paper
Short text classification using semantically enriched topic model
2024cites this paper
Few-Shot Learning for Misinformation Detection Based on Contrastive Models
2024cites this paper
Refined SBERT: Representing sentence BERT in manifold space
2023cites this paper
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
2023cites this paper
Sentence embedding and fine-tuning to automatically identify duplicate bugs
2023influential citation
Fine-grained semantic textual similarity measurement via a feature separation network
2023cites this paper
Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning
2023cites this paper
Impact of word embedding models on text analytics in deep learning environment: a review
2023cites this paper
NoPPA: Non-Parametric Pairwise Attention Random Walk Model for Sentence Representation
2023cites this paper
An Overview on Language Models: Recent Developments and Outlook
2023cites this paper
IK-DDI: a novel framework based on instance position embedding and key external text for DDI extraction
2023cites this paper
A quantum-like text representation based on syntax tree for fuzzy semantic analysis
2023cites this paper
On the class separability of contextual embeddings representations - or "The classifier does not matter when the (text) representation is so good!"
2023cites this paper
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation
2023cites this paper
Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation
2023cites this paper
Leveraging Language Identification to Enhance Code-Mixed Text Classification
2023cites this paper
COSMOS: Catching Out-of-Context Image Misuse Using Self-Supervised Learning
2023influential citation
Leveraging Cross-Modals for Cheapfakes Detection
2023cites this paper
Learning to Perturb for Contrastive Learning of Unsupervised Sentence Representations
2023cites this paper
Nugget: Neural Agglomerative Embeddings of Text
2023cites this paper
Leveraging Knowledge Graphs for CheapFakes Detection: Beyond Dataset Evaluation
2023cites this paper
Multi-Models from Computer Vision to Natural Language Processing for Cheapfakes Detection
2023cites this paper
Detecting Out-of-Context Image-Caption Pair in News: A Counter-Intuitive Method
2023cites this paper
Reliability and Performance of the Online Literature Database CAMbase after Changing from a Semantic Search to a Score Ranking Algorithm
2023cites this paper
Automatically Assembling a Custom-Built Training Corpus for Improving the Learning of In-Domain Word/Document Embeddings
2023cites this paper
DEPRESSION DETECTION MODEL USINGWORD AND SENTENCE EMBEDDING WITH DIFFERENT CLASSIFIERS
2023cites this paper
BlendCSE: Blend contrastive learnings for sentence embeddings with rich semantics and transferability
2023cites this paper
An effective negative sampling approach for contrastive learning of sentence embedding
2023cites this paper
Language augmentation approach for code-mixed text classification
2023cites this paper
BERT Has More to Offer: BERT Layers Combination Yields Better Sentence Embeddings
2023cites this paper
Sentiment analysis in Tourism: Fine-tuning BERT or sentence embeddings concatenation?
2023cites this paper
RarKGQA: Multi-hop Question and Answering Method Based on Knowledge Graph Embedding
2023influential citation
Leveraging word embeddings and transformers to extract semantics from building regulations text
2023cites this paper
An adversarial defense algorithm based on image feature hashing
2023cites this paper
Detection of Malicious URL Based on BERT-CNN
2023cites this paper
Vision and Language for Digital Forensics
2022cites this paper