Update dependency sentence-transformers to v5#3
Open
mend-for-github-com[bot] wants to merge 1 commit intomainfrom
Open
Update dependency sentence-transformers to v5#3mend-for-github-com[bot] wants to merge 1 commit intomainfrom
mend-for-github-com[bot] wants to merge 1 commit intomainfrom
Conversation
d51b989 to
757610c
Compare
757610c to
e7d81b4
Compare
e7d81b4 to
d9c5111
Compare
d9c5111 to
66f7d39
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==2.2.2→==5.3.0Release Notes
huggingface/sentence-transformers (sentence-transformers)
v5.3.0: - Improved Contrastive Learning, New Losses, and Transformers v5 CompatibilityCompare Source
This minor version brings several improvements to contrastive learning:
MultipleNegativesRankingLossnow supports alternative InfoNCE formulations (symmetric, GTE-style) and optional hardness weighting for harder negatives. Two new losses are introduced,GlobalOrthogonalRegularizationLossfor embedding space regularization andCachedSpladeLossfor memory-efficient SPLADE training. The release also adds a faster hashed batch sampler, fixesGroupByLabelBatchSamplerfor triplet losses, and ensures full compatibility with the latest Transformers v5 versions.Install this version with
Updated MultipleNegativesRankingLoss (a.k.a. InfoNCE)
MultipleNegativesRankingLossreceived two major upgrades: support for alternative InfoNCE formulations from the literature, and optional hardness weighting to up-weight harder negatives.Support other InfoNCE variants (#3607)
MultipleNegativesRankingLossnow supports several well-known contrastive loss variants from the literature through newdirectionsandpartition_modeparameters. Previously, this loss only supported the standard forward direction (query → doc). You can now configure which similarity interactions are included in the loss:"query_to_doc"(default): For each query, its matched document should score higher than all other documents."doc_to_query": The symmetric reverse — for each document, its matched query should score higher than all other queries."query_to_query": For each query, all other queries should score lower than its matched document."doc_to_doc": For each document, all other documents should score lower than its matched query.The
partition_modecontrols how scores are normalized:"joint"computes a single softmax over all directions, while"per_direction"computes a separate softmax per direction and averages the losses.These combine to reproduce several loss formulations from the literature:
Standard InfoNCE (default, unchanged behavior):
Symmetric InfoNCE (Günther et al. 2024) — adds the reverse direction so both queries and documents are trained to find their match:
GTE improved contrastive loss (Li et al. 2023) — adds same-type negatives (query <-> query, doc <-> doc) for a stronger training signal, especially useful with pairs-only data:
Hardness-weighted contrastive learning (#3667)
Adds optional hardness weighting to
MultipleNegativesRankingLossandCachedMultipleNegativesRankingLoss, inspired by Lan et al. 2025 (LLaVE). This up-weights harder negatives in the softmax by addinghardness_strength * stop_grad(cos_sim)to selected negative logits. The feature is off by default (hardness_mode=None), so existing behavior is unchanged.The
hardness_modeparameter controls which negatives receive the penalty:"in_batch_negatives": Penalizes in-batch negatives only (positives and hard negatives from other samples). Works with all data formats including pairs-only."hard_negatives": Penalizes explicit hard negatives only (columns beyond the first two). Only active when hard negatives are provided."all_negatives": Penalizes both in-batch and hard negatives, leaving only the positive unpenalized.New loss: GlobalOrthogonalRegularizationLoss (#3654)
Introduces
GlobalOrthogonalRegularizationLoss(Zhang et al. 2017), a regularization loss that encourages embeddings to be well-distributed in the embedding space. It penalizes two things: (1) high mean pairwise similarity across unrelated embeddings, and (2) high second moment of similarities (which indicates clustering). This loss is meant to be combined with a primary contrastive loss likeMultipleNegativesRankingLoss. By wrapping both losses in a single module, you can share embeddings and only require one forward pass:New loss: CachedSpladeLoss for memory-efficient SPLADE training (#3670)
Introduces
CachedSpladeLoss, a gradient-cached version ofSpladeLossthat enables training SPLADE models with larger batch sizes without additional GPU memory. It applies the GradCache technique at theSpladeLosswrapper level, so both the base loss and regularizers receive pre-computed embeddings — no changes to existing base losses or regularizers are needed.Faster NoDuplicatesBatchSampler with hashing (#3611)
Adds a
NO_DUPLICATES_HASHEDbatch sampler option, which uses the existingNoDuplicatesBatchSamplerwithprecompute_hashes=True. This pre-computes xxhash 64-bit values for each sample, providing significant speedups for large batch sizes at a small memory cost. Requires thexxhashlibrary.GroupByLabelBatchSampler improvements for triplet losses (#3668)
Fixes a critical issue where
GroupByLabelBatchSamplerproduced ~99% single-class batches, causing zero gradients with triplet losses. The sampler now uses round-robin interleaving where each label emits 2 samples per round, with the label visit order reshuffled every round. This guarantees every batch contains multiple distinct labels, each with at least 2 samples.Transformers v5 compatibility
This release includes full compatibility updates for Transformers v5:
_nested_gathermethod (#3664)warmup_stepsandwarmup_ratiountil Transformers v4 support is dropped (#3645)Minor Features
requestsdependency with optionalhttpxdependency by @tomaarsen in #3618Bug Fixes
MultipleNegativesRankingLosswhennum_negatives=Noneby @fuutot in #3636MultipleNegativesRankingLossby @fuutot in #3641Performance Improvements
Training Script Migrations (v2 to v3)
Documentation
All Changes
feat] Support excluding prompt tokens with pooling with left-padding tokenizer by @tomaarsen in #3598tests] Relax the CI branches by @tomaarsen in #3610compat] Expand test suite to full transformers v5 by @tomaarsen in #3615deps] Replace requests dependency with optional httpx dependency by @tomaarsen in #3618feat] Add triplets/n-tuple support to AnglE by @tomaarsen in #3609http_getwithload_dataset-wiki1m_for_simcseandSTSbenchmarkby @omkar-334 in #3635tests] Use 120s HF Hub timeout for tests by @tomaarsen in #3637MultipleNegativesRankingLosswhennum_negatives=Noneby @fuutot in #36362_programming_train_bi-encoder.pyfrom v2 to v3 by @omkar-334 in #3629train_simcse_from_file.pyfrom v2 to v3 by @omkar-334 in #3631ContrastiveTensionLossandContrastiveTensionLossInBatchNegativesby @omkar-334 in #3639http_getwithload_dataset-askubuntuandall-nliby @omkar-334 in #3638batch_sizeargs to CE evaluators by @omkar-334 in #3643trecdataset and migratetraining_batch_hard_trec.pyfrom v2 to v3 by @omkar-334 in #3624train_stsb_ct.pyfrom v2 to v3 by @omkar-334 in #3626train_ct_from_file.pyfrom v2 to v3 by @omkar-334 in #3625train_askubuntu_ct-improved.pyfrom v2 to v3 by @omkar-334 in #3628train_stsb_ct_improvedfrom v2 to v3 by @omkar-334 in #3627train_askubuntu_simcse.pyfrom v2 to v3 by @omkar-334 in #3630train_stsb_simcse.pyfrom v2 to v3 by @omkar-334 in #3648train_askubuntu_ct.pyfrom v2 to v3 by @omkar-334 in #3647train_ct-improved_from_file.pyfrom v2 to v3 by @omkar-334 in #3646DenoisingAutoEncoderLoss.pyby @omkar-334 in #3652model.fitin test files by @omkar-334 in #3653feat] Add support for T5Gemma and T5Gemma2 models by @tomaarsen in #3644compat] Allow for both warmup_steps and warmup_ratio until transformers v4 support is dropped by @tomaarsen in #3645feat] Introduce GlobalOrthogonalRegularizationLoss by @tomaarsen in #3654compat] Introduce Transformers v5.2 compatibility: trainer _nested_gather moved by @tomaarsen in #3664perf] Speed up NoDuplicatesBatchSampler iteration (NO_DUPLICATES and NO_DUPLICATES_HASHED) by @hotchpotch in #3658fix]GroupByLabelBatchSamplerto guarantee multi-class batches for triplet losses by @MrLoh in #3668feat] IntroduceCachedSpladeLossfor memory-efficient SPLADE training by @yjoonjang in #3670docs] Add tips for adjusting batch size to improve processing speed by @tomaarsen in #3672docs] CE trainer: Removed IterableDataset from train and eval dataset type hints by @tomaarsen in #3676loss] Disallow query_to_query/doc_to_doc with partition_mode="per_direction" due to negative loss by @tomaarsen in #3677feat] Add hardness-weighted contrastive learning to losses by @yjoonjang in #3667fix] Fix model card generation with set_transform with new column names by @tomaarsen in #3680tests] Add slow reproduction tests for most common models by @tomaarsen in #3681New Contributors
A big thanks to my repeat contributors, a lot of this release originated from your contributions. Much appreciated!
Full Changelog: huggingface/sentence-transformers@v5.2.3...v5.3.0
v5.2.3: - Compatibility with Transformers v5.2 trainingCompare Source
This patch release introduces compatibility with Transformers v5.2.
Install this version with
Transformers v5.2 Support
Transformers v5.2 has just released, and it updated its
Trainerin such a way that training with Sentence Transformers would start failing on the logging step. The #3664 pull request has resolved this issue.If you're not training with Sentence Transformers, then older versions of Sentence Transformers are also compatible with Transformers v5.2.
All Changes
compat] Introduce Transformers v5.2 compatibility: trainer _nested_gather moved by @tomaarsen (#3664)Full Changelog: huggingface/sentence-transformers@v5.2.2...v5.2.3
v5.2.2: - Replace mandatoryrequestsdependency with optionalhttpxdependencyCompare Source
This patch release replaces mandatory
requestsdependency with an optionalhttpxdependency.Install this version with
Transformers v5 Support
Transformers v5.0 and its required
huggingface_hubversions have dropped support ofrequestsin favor ofhttpx. The former was also used insentence-transformers, but not listed explicitly as a dependency. This patch removes the use ofrequestsin favor ofhttpx, although it's now optional and not automatically imported. This should also save some import time.Importing Sentence Transformers should now not crash if
requestsis not installed.All Changes
deps] Replace requests dependency with optional httpx dependency by @tomaarsen (#3618)Full Changelog: huggingface/sentence-transformers@v5.2.1...v5.2.2
v5.2.1: - Joint Transformers v4 and v5 compatibilityCompare Source
This patch release adds support for the full Transformers v5 release.
Install this version with
Transformers v5 Support
Sentence Transformers v5.2.0 already introduced support for the Transformers v5.0 release candidates, but this release is adding support for the full release. The intention is to maintain backward compatibility with v4.x. The library includes dual CI testing for both version for now, allowing users to upgrade to the newest Transformers features when ready. In future versions, Sentence Transformers may start requiring Transformers v5.0 or higher.
All Changes
Full Changelog: huggingface/sentence-transformers@v5.2.0...v5.2.1
v5.2.0: - CrossEncoder multi-processing, multilingual NanoBEIR evaluators, similarity score inmine_hard_negatives, Transformers v5 supportCompare Source
This minor release introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in
mine_hard_negatives, Transformers v5 support, Python 3.9 deprecations, and more.Install this version with
CrossEncoder Multi-processing
The
CrossEncoderclass now supports multiprocessing for faster inference on CPU and multi-GPU setups. This brings CrossEncoder functionality in line with the existing multiprocessing capabilities ofSentenceTransformermodels, allowing you to use multiple CPU cores or GPUs to speed up both thepredictandrankmethods when processing large batches of sentence pairs.The implementation introduces these new methods, mirroring the SentenceTransformer approach:
start_multi_process_pool()- Initialize a pool of worker processesstop_multi_process_pool()- Clean up the worker poolUsage is straightforward with the new
poolparameter:Or simply pass a list of devices to
deviceto havepredictandrankautomatically create a pool behind the scenes.This enhancement is particularly beneficial for CPU-based deployments and enables multi-GPU reranking in the
mine_hard_negativesfunction, making hard negative mining faster for large datasets.Multilingual NanoBEIR Support
The NanoBEIR evaluators now support custom dataset IDs, allowing for evaluation on non-English NanoBEIR collections. All three NanoBEIR evaluators (dense, sparse, and cross-encoder) support this functionality with a simple
dataset_idparameter.For example:
There are already supported translations for French, Arabic, German, Spanish, Italian, Portuguese, Norwegian, Swedish, Serbian, Korean, Japanese, and 22 Bharat languages in the NanoBEIR collection. Contact me (@tomaarsen) if you have found or created another translation and would like to get it added to the collection!
Similarity Scores in Hard Negatives Mining
The
mine_hard_negativesfunction now includes anoutput_scoresparameter that allows you to export similarity scores alongside the mined negatives. Whenoutput_scores=False(default), these are the output formats for variousoutput_formats:And when
output_scores=True, the format becomes:For context,
labelsare binary options denoting whether the relevant pair was labeled as a positive or not, whereasscoresare similarity scores from theSentenceTransformerorCrossEncodermodel.Additionally:
n-tuple-scoresformat has been replaced with the cleaneroutput_format="n-tuple"combined withoutput_scores=True.For example:
Transformers v5 Support
Sentence Transformers now supports the latest Transformers v5.0 release while maintaining backward compatibility with v4.x. The library includes dual CI testing for both version for now, allowing users to upgrade to the newest Transformers features when ready. In future versions, Sentence Transformers may start requiring Transformers v5.0 or higher.
Pillow now Optional
The Pillow library is now an optional dependency rather than a required one, reducing installation size for users who don't work with image-based models. Users who need image functionality can install it via
pip install sentence-transformers[image]or directly withpip install pillow.Python 3.9 Deprecation
Following Python's deprecation schedule, Sentence Transformers v5.2.0 has deprecated support for Python 3.9. Users are encouraged to upgrade to Python 3.10 or newer to continue receiving updates and new features.
Minor Changes
labelsargument in the loss that's used to train (#3506).sentence-transformers[onnx]andsentence-transformers[onnx-gpu]extra's now rely on the newoptimum-onnxpackage withoptimum >= 2.0.0.All Changes
tests] Loosen safetensors test rtol/atol by @tomaarsen in #3572deprecation] Deprecate Python 3.9, upgrade ruff by @tomaarsen in #3573fix]: correct condition for restoring layer embeddings in TransformerDecorator/AdaptiveLayerLoss by @emapco in #3560chore] Rename master to main, update outdated URLs by @tomaarsen in #3579tests] Increase atol/rtol from 1e-6 to 1e-5 for higher test consistency by @tomaarsen in #3578feat] Allow transformers v5.0, add CI for transformers =v5 by @tomaarsen in #3586deps] Use optimum-onnx now that both optimum-onnx and optimum-intel can use optimum==2.0.0 by @tomaarsen in #3587New Contributors
An extra thanks to @Samoed, @NohTow, and @raphaelsty for engaging in valuable discussions in the pull requests, @omkar-334 for finding all kinds of open issues where possible, and @marquesafonso for working on a solid PR for multilingual NanoBEIR that we didn't end up going for.
Additionally, a big thanks to @milistu from Serbian-AI-Society, @NohTow & @raphaelsty from LightOn, @mlabonne and Fernando Fernandes Neto from LiquidAI, @lbourdois from CATIE-AQ and Arun Arumugam for creating the NanoBEIR translations that are supported out of the gate.
Full Changelog: huggingface/sentence-transformers@v5.1.2...v5.2.0
v5.1.2: - Sentence Transformers joins Hugging Face; model saving/loading improvements and loss compatibilityCompare Source
This patch celebrates the transition of Sentence Transformers to Hugging Face, and improves model saving, loading defaults, and loss compatibilities.
Install this version with