Unsupervised Sentence Representation Learning via Rank and Self Distillation with LLM-Augmented Negative Sampling

Published 2025 in 2025 IEEE 8th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)

ABSTRACT

In recent years, unsupervised sentence representation learning has made significant progress through distillationbased contrastive learning and data augmentation methods. However, Mainstream methods treat every negative sample equally, which leads to semantically relatively similar sentences being classified as negative samples for learning, preventing the capture of fine-grained information between sentences. In this paper, we integrate rank-distillation and self-distillation into a unified framework, leveraging sentence similarity ranking to assist the student model in capturing fine-grained knowledge from the data, and harnessing the advantages of both the bi-encoder and crossencoder to guide mutual knowledge extraction in an unsupervised setting. Additionally, we adopt two methods for negative sample augmentation: leveraging challenging negative samples generated by Large Language Models (LLMs) and dynamic buffer-based negative sample selection. The experimental results show that the methods in this paper achieved a Spearman correlation of 81.82% based on the BERT-base model, surpassing existing STS benchmark scores.

PUBLICATION RECORD

Publication year
2025
Venue
2025 IEEE 8th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)
Publication date
2025-08-15
Fields of study
Not labeled
Identifiers
DOI 10.1109/PRAI67447.2025.11412842
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models
2024cited by this paper
Reinforced Multi-teacher Knowledge Distillation for Unsupervised Sentence Representation
2024cited by this paper
RankCSE: Unsupervised Sentence Representations Learning via Learning to Rank
2023cited by this paper
PromptBERT: Improving BERT Sentence Embeddings with Prompts
2022cited by this paper
Ranking-Enhanced Unsupervised Sentence Representation Learning
2022cited by this paper
Debiased Contrastive Learning of Unsupervised Sentence Representations
2022cited by this paper
SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples
2022cited by this paper
SimCSE: Simple Contrastive Learning of Sentence Embeddings
2021cited by this paper
Whitening Sentence Representations for Better Semantics and Faster Retrieval
2021cited by this paper
ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
2021cited by this paper
Self-Contrastive Learning with Hard Negative Sampling for Self-supervised Point Cloud Learning
2021cited by this paper
ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding
2021cited by this paper
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations
2021cited by this paper
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2019cited by this paper
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
2019cited by this paper
Representation Degeneration Problem in Training Natural Language Generation Models
2019cited by this paper
Learning Structured Representation for Text Classification via Reinforcement Learning
2018cited by this paper
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
2016cited by this paper
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
2015cited by this paper
Neural Machine Translation by Jointly Learning to Align and Translate
2014cited by this paper
A SICK cure for the evaluation of compositional distributional semantic models
2014cited by this paper
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
2014cited by this paper
*SEM 2013 shared task: Semantic Textual Similarity
2013cited by this paper
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics
2012cited by this paper

CITED BY

No citing papers are available for this paper.