Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Published 2019 in Conference on Empirical Methods in Natural Language Processing

ABSTRACT

BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

PUBLICATION RECORD

Publication year
2019
Venue
Conference on Empirical Methods in Natural Language Processing
Publication date
2019-08-14
Fields of study
Computer Science
Identifiers
DOI 10.18653/v1/D19-1410 arXiv 1908.10084
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Classification and Clustering of Arguments with Contextualized Word Embeddings
2019cited by this paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach
2019cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
On Measuring Social Biases in Sentence Encoders
2019cited by this paper
Understanding the Behaviors of BERT in Ranking
2019cited by this paper
BERTScore: Evaluating Text Generation with BERT
2019cited by this paper
Real-time Inference in Multi-sentence Tasks with Deep Pretrained Transformers
2019cited by this paper
XLNet: Generalized Autoregressive Pretraining for Language Understanding
2019cited by this paper
Universal Sentence Encoder
2018influential reference
Learning Semantic Textual Similarity from Conversations
2018cited by this paper
SentEval: An Evaluation Toolkit for Universal Sentence Representations
2018cited by this paper
Learning Thematic Similarity Metric from Article Sections Using Triplet Networks
2018cited by this paper
Why Comparing Single Performance Scores Does Not Allow to Draw Conclusions About Machine Learning Approaches
2018cited by this paper
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
2017influential reference
Attention is All you Need
2017cited by this paper
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
2017influential reference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
2017cited by this paper
Billion-Scale Similarity Search with GPUs
2017cited by this paper
Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity
2016cited by this paper
Measuring the Similarity of Sentential Arguments in Dialogue
2016influential reference
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
2016cited by this paper
Learning Distributed Representations of Sentences from Unlabelled Data
2016cited by this paper
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
2015cited by this paper
Skip-Thought Vectors
2015cited by this paper
FaceNet: A unified embedding for face recognition and clustering
2015cited by this paper
A large annotated corpus for learning natural language inference
2015influential reference
GloVe: Global Vectors for Word Representation
2014cited by this paper
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
2014cited by this paper
A SICK cure for the evaluation of compositional distributional semantic models
2014cited by this paper
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
2013cited by this paper
*SEM 2013 shared task: Semantic Textual Similarity
2013cited by this paper
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
2012cited by this paper
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
2005cited by this paper
Annotating Expressions of Opinions and Emotions in Language
2005cited by this paper
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
2004cited by this paper
Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
2004cited by this paper
Mining and summarizing customer reviews
2004cited by this paper
Learning Question Classifiers
2002influential reference

CITED BY

Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory
2026cites this paper
Triple-Phase Precision Control for Fire Safety Regulations Retrieval Combating LLM Hallucination Risks
2026cites this paper
Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces
2026cites this paper
FC-CONAN: An Exhaustively Paired Dataset for Robust Evaluation of Retrieval Systems
2026cites this paper
Holistic AI in medicine; improved performance and explainability
2026influential citation
Nexus scissor: enhance open-access language model safety by connection pruning
2026cites this paper
LETTER: Self-Harmonized Representation Learning for Multimodal Recommendation
2026cites this paper
Automated knowledge extraction from marine accident reports using large language models: Graph construction and evaluation
2026cites this paper
Enhancing LLM-based building data query with chain-of-thought, retrieval-augmented generation, and fine-tuning
2026cites this paper
Improving Foundation Model Group Robustness with Auxiliary Sentence Embeddings
2026influential citation
PerspectiveCoach: Exploring LLMs for Developer Reflection
2026cites this paper
Clustering student argumentation types by implementing a multi-document clustering model: a combination of Pytorch and BERT
2026cites this paper
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
2026cites this paper
Orion-RAG: Path-Aligned Hybrid Retrieval for Graphless Data
2026cites this paper
Enhancing IoT Service Discovery Through Semantic Name-Based Forwarding
2026influential citation
Distributed Semantic Trajectory Similarity Search
2026cites this paper
Efficient low-rank index routing for high-dimensional approximate nearest neighbor search
2026cites this paper
User personas, ideation and large language models: A post-hoc study
2026cites this paper
Margin-based angular losses for lightweight text classification: Lessons from face recognition
2026cites this paper
Causal-ESC : Capture the Dynamics in Cause-and-Effect Detection for Emotional Support Conversation
2026cites this paper
DSN-STC: Leveraging Siamese networks for optimized short text clustering
2026cites this paper
Understanding Emotion in Discourse: Recognition Insights and Linguistic Patterns for Generation
2026cites this paper
Adversarial Contrastive Learning for LLM Quantization Attacks
2026cites this paper
A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance
2026cites this paper
RAL2M: Retrieval Augmented Learning-To-Match Against Hallucination in Compliance-Guaranteed Service Systems
2026cites this paper
Fast Diversified Top-k Rule Discovery via User-Guided Embeddings
2026cites this paper
SpeakerSleuth: Evaluating Large Audio-Language Models as Judges for Multi-turn Speaker Consistency
2026cites this paper
Simulated Students in Tutoring Dialogues: Substance or Illusion?
2026cites this paper
T-Retriever: Tree-based Hierarchical Retrieval Augmented Generation for Textual Graphs
2026cites this paper
ResMAS: Resilience Optimization in LLM-based Multi-agent Systems
2026cites this paper
COSMIC: A Novel Contextualized Orientation Similarity Metric Incorporating Consistency for NLG Assessment
2026cites this paper
In-depth Analysis of LLM-based Schema Linking
2026cites this paper
Computing Patient Similarity Based on Unstructured Clinical Notes
2026cites this paper
GADE+: A Graph-Based Anchor-Enhanced Framework for Targeted Document Detection
2026cites this paper
Temporal diffuser: Timing scale-aware modulation for sign language production
2026cites this paper
R2D-EQ: a two-stage workflow for risk reasoning and decision-making in earthquake emergency scenarios
2026cites this paper
LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
2026cites this paper
Mind AI's Mind: A Clinically Aligned Explainable AI Pipeline for Depression Diagnosis via Large Language Models
2026cites this paper
VLMAR: Maritime scene anomaly detection via retrieval-augmented vision-language models
2026cites this paper
Multi-model pseudo-document generation and reconstruction for hybrid query expansion
2026cites this paper
Exacerbating Differences in Polarity: Bias Adversarial Attack on Generative Large Language Models
2026cites this paper
Optimization in Information Retrieval: A Systematic Review of Techniques for Performance and Scalability
2026cites this paper
Human-LLM Collaboration Framework for Translating Health Instruments
2026cites this paper
Empowering Large Language Models to Set Up Knowledge Retrieval Indexing via Self-Learning
2026cites this paper
BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics
2026cites this paper
Trajectory Guard - A Lightweight, Sequence-Aware Model for Real-Time Anomaly Detection in Agentic AI
2026cites this paper
The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification
2026influential citation
Mental health in digital microsystems across three Asian Reddit communities
2026cites this paper
Transparent Semantic Change Detection with Dependency-Based Profiles
2026cites this paper
Enhancing Multilingual RAG Systems with Debiased Language Preference-Guided Query Fusion
2026cites this paper
LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation
2026cites this paper
Limited Linguistic Diversity in Embodied AI Datasets
2026cites this paper
Large Language Models for Creation, Enrichment and Evaluation of Taxonomic Graphs
2026cites this paper
A Multi-Modal Knowledge-Driven Approach for Generalized Zero-shot Video Classification
2026influential citation
Prompt Tuning without Labeled Samples for Zero-Shot Node Classification in Text-Attributed Graphs
2026cites this paper
Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models
2026cites this paper
When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life
2026cites this paper
The Critical Role of Aspects in Measuring Document Similarity
2026cites this paper
SemPA: Improving Sentence Embeddings of Large Language Models through Semantic Preference Alignment
2026cites this paper
LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
2026cites this paper
RIGOURATE: Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation
2026cites this paper
Higher-Order Knowledge Representations for Agentic Scientific Reasoning
2026cites this paper
Scientific knowledge graph and ontology generation using open large language models.
2026cites this paper
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
2026cites this paper
Topic Enhanced Semantic Communication System for Reliable Semantic Recovery
2026cites this paper
Collaborative Scoping: Self-Supervised Linkability Assessment for Schema Matching
2026cites this paper
Visual Question Explainable Reasoning on Hypothesis Agent Interaction with Scene
2026cites this paper
Correlation-Guided Information Deep Fusion for Multimodal Recommendation
2026cites this paper
Training robots with natural and lightweight human feedback
2026cites this paper
Questions Beyond Pixels: Integrating Commonsense Knowledge in Visual Question Generation for Remote Sensing
2026cites this paper
KANM$^{2}$L: Enhancing Multi-Modal Recommendation With KAN and Dilated Attention
2026cites this paper
An Efficient Long-Context Ranking Architecture With Calibrated LLM Distillation: Application to Person-Job Fit
2026cites this paper
300 years of British patents
2026cites this paper
Building a multi-class Short Message Service dataset for smishing detection using agglomerative clustering and dataset fusion
2026cites this paper
Literate Programming With LLMs? — A Study on Rosetta Code and CodeNet
2026cites this paper
RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering
2026cites this paper
Affection-Guided Bottleneck Diffusion for Missing Modality Issue in Multimodal Affective Computing
2026cites this paper
From Voice to Shell: A SLM-Based Assistant for IoT Maintenance Tasks on the Edge
2026cites this paper
Analysis of image aesthetics assessment as a positive-unlabelled problem
2026cites this paper
Intra-modal consistency for image-text retrieval through soft-label distillation
2026cites this paper
CTI-ANN: Self-Training-Based Annotation With Tailored Augmentation for Cyber Threat Intelligence Posts
2026cites this paper
A hierarchical enhanced text matching model for semantic alignment and importance determination of bridge defect records
2026cites this paper
Snip-Cache: A code snippet caching system for LLM-based command-driven IoT systems
2026cites this paper
Integrated multilayer reinforcement model: Explaining the dynamics of online radicalization
2026cites this paper
Research on the classification and optimization strategies of civil aviation customer service based on BERTopic and the Kano model
2026cites this paper
Enhancing Low-Resource Indian Language Machine Translation Using Large Language Models With Preference Optimization and Hypergeometric-Gamma Reward
2026cites this paper
Transforming urban planning through machine learning: A study on planning application classification using natural language processing
2026cites this paper
Sifting Truth From Spectacle! A Multimodal Hindi Dataset for Misinformation Detection With Emotional Cues and Sentiments
2026cites this paper
Analysis of emergency decision-making patterns in civil aviation risk events based on the observe-decide-act model
2026cites this paper
Intelligent Semantic Communication Scheme Integrating ISAC for Low-Altitude Intelligent Networks
2026cites this paper
Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback
2026cites this paper
GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval
2026cites this paper
Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings
2026cites this paper
Can Semantic Methods Enhance Team Sports Tactics? A Methodology for Football with Broader Applications
2026cites this paper
LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
2026cites this paper
SeRe: A Security-Related Code Review Dataset Aligned with Real-World Review Activities
2026cites this paper
Listen, Attend, Understand: a Regularization Technique for Stable E2E Speech Translation Training on High Variance labels
2026cites this paper
Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models
2026influential citation
LittiChoQA: Literary Texts in Indic Languages Chosen for Question Answering
2026cites this paper
Fused modality-enhanced graph convolutional network for multimodal recommendation
2026cites this paper