In recent years, unsupervised sentence representation learning has made significant progress through distillationbased contrastive learning and data augmentation methods. However, Mainstream methods treat every negative sample equally, which leads to semantically relatively similar sentences being classified as negative samples for learning, preventing the capture of fine-grained information between sentences. In this paper, we integrate rank-distillation and self-distillation into a unified framework, leveraging sentence similarity ranking to assist the student model in capturing fine-grained knowledge from the data, and harnessing the advantages of both the bi-encoder and crossencoder to guide mutual knowledge extraction in an unsupervised setting. Additionally, we adopt two methods for negative sample augmentation: leveraging challenging negative samples generated by Large Language Models (LLMs) and dynamic buffer-based negative sample selection. The experimental results show that the methods in this paper achieved a Spearman correlation of 81.82% based on the BERT-base model, surpassing existing STS benchmark scores.
Unsupervised Sentence Representation Learning via Rank and Self Distillation with LLM-Augmented Negative Sampling
Published 2025 in 2025 IEEE 8th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
2025 IEEE 8th International Conference on Pattern Recognition and Artificial Intelligence (PRAI)
- Publication date
2025-08-15
- Fields of study
Not labeled
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-24 of 24 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1