Releases: lightonai/fast-plaid
1.4.1
Fast-Plaid 1.4.1 Release Notes
Overview
Fast-Plaid 1.4.1 introduces incremental index updates with dynamic centroid expansion and a new low memory mode that significantly reduces GPU VRAM usage. This release focuses on making Fast-Plaid more efficient for production workloads with evolving document collections.
Key Features
Incremental Updates with Dynamic Centroid Expansion
The .update() method now supports intelligent centroid management:
- Buffered Updates: New documents are accumulated in a buffer. When the buffer reaches the threshold (default: 100 documents), the system triggers a centroid expansion check.
- Automatic Centroid Expansion: Embeddings far from existing centroids (outliers) are automatically identified and clustered to create new centroids, ensuring the index adapts to new data distributions over time.
- Efficient Small Updates: Small batches below the buffer size are processed immediately without centroid expansion for fast incremental updates.
This replaces the previous behavior where centroids remained fixed after initial index creation, which could lead to accuracy degradation as data distributions shifted.
Low Memory Mode
New low_memory parameter (default: True) reduces GPU VRAM usage:
fast_plaid = search.FastPlaid(index="index", device="cuda", low_memory=True)- Document tensors are kept on CPU and moved to GPU only when needed during search
- Significantly reduces VRAM footprint for large indexes
- Trade-off: Slightly slower search performance
- No effect when
device="cpu"
Memory-Optimized K-means
- Eliminates unnecessary numpy conversions during centroid computation
Embedding Reconstruction
New Rust function to reconstruct original embeddings from compressed index data, useful for debugging and analysis.
Thread-Safe Operations
- File locking mechanism ensures safe concurrent access to indexes
- Prevents corruption during simultaneous read/write operations
Configuration
New Parameters
| Parameter | Default | Description |
|---|---|---|
low_memory |
True |
Keep tensors on CPU, move to GPU only when needed |
buffer_size |
100 |
Documents to accumulate before centroid expansion |
start_from_scratch |
999 |
Rebuild index if fewer documents exist |
max_points_per_centroid |
256 |
Maximum points per centroid during expansion |
Breaking Changes
- Default
batch_sizefor.create()increased from 25,000 to 50,000 fastkmeansdependency pinned to version0.5.0
New Dependencies
filelock>=3.20.0- For thread-safe index operationsusearch>=2.21.0- For efficient similarity search during updates on CPU. Used to spot outliers.
Installation
pip install fast-plaid==1.4.1.290 # PyTorch 2.9.0
pip install fast-plaid==1.4.1.280 # PyTorch 2.8.0
pip install fast-plaid==1.4.1.271 # PyTorch 2.7.1
pip install fast-plaid==1.4.1.270 # PyTorch 2.7.0Upgrade Notes
Existing indexes created with v1.3.x are compatible with v1.4.1. The new centroid expansion features will activate automatically when using .update() on existing indexes.
For users who were experiencing accuracy degradation with frequent updates, this release should significantly improve long-term index quality without requiring full re-indexing.
Contributors @raphaelsty
1.3.1
Small release which reduce memory usage of Fast-Plaid index creation. Getting better one step at a time π
1.3.0
v1.3.0: Memory Optimizations & Architecture Improvements
This release introduces significant reductions in memory usage and improves index management.
π Performance & Memory
- Memory-Mapped Loading: Implemented a new loading system with incremental updates and zero-copy validation to prevent loading entire indices into RAM with
updatemethod. - Optimized Tensors: Shifted to smaller integer types (
Uint8,Int32) where appropriate and replacedtorch.quantilewith a custom implementation to bypass Torch limitations. - Object-Based Management: Replaced the global index cache with direct object passing, allowing Python to fully manage the index lifecycle.
βοΈ API & Behavior Changes
- Automatic Parallelism: Simplified the API by abstracting multi-device logic. If no device is provided, the search now automatically distributes across available GPUs. If CPU is provided, index should spawn faster.
- Default Settings: Changed the default search
batch_sizefrom 25,000 to 2,000.
1.2.5
FastPlaid 1.2.5: Leaner & Faster
We're excited to release FastPlaid 1.2.5! This version focuses on significant optimizations for indexing, giving you faster search speeds and much more efficient GPU VRAM management.
β¨ Highlights
-
Drastically Reduced GPU VRAM Usage: We've refactored the indexing process to process document embeddings in batches. This massively reduces GPU VRAM consumption during index creation, all without compromising on speed. No impact on overall CPU RAM usage or indexing speed.
-
Blazing-Fast Search for APIs: Centroids are now by default pre-loaded into memory by default during indexing / when creating Fast-Plaid object. This results in an acceleration of search performance for large-scale indexes deployed in API environments. It can be disabled by setting
preload_index=False. Disabling this option might be useful in environment with lots of replicates of Fast-Plaid indexes, otherwise keep it on. β -
Improved batch_size parameter: better control over memory usage during indexing with the new batch_size parameter.
-
Indexing Progress Bar: Track the status of your index creation. βοΈ
housekeeping
Code Clarity: Several variables have been renamed to improve the overall clarity and readability of the code.
PyTorch 2.9.0 Support: This release is fully compatible with PyTorch 2.9.0.
Dependency Note: Support for PyTorch 2.6.0 has been temporarily dropped due to compatibility issues.
Contributors: @raphaelsty @fschlatt
1.2.4
The version 1.2.4 of fast-plaid now support Python 3.13 version and upload dedicated wheels to PyPi. π
1.2.3
The 1.2.3 version of Fast-Plaid enhance the mutability of the index by adding deletion of specific embeddings.
It also includes a built-in sqlite filtering pipeline.
1.2.1
This new release allows to feed Fast-Plaid with un-padded queries. It also normalize decompressed embeddings to further enhance the results. It also solve an issue on small dataset where the fast-kmeans would be initialized with a larger number of clusters than training data points. This version will be integrated to PyLate as the backend for search.
1.2.0
This new release introduces filtering for Fast-Plaid, allowing any system to interoperate with it by providing subset IDs to score. π
import torch
from fast_plaid import search
fast_plaid = search.FastPlaid(index="index") # Load an existing index
# Apply a single filter to all queries
# Search for the top 5 results only within documents [2, 5, 10, 15, 18]
scores = fast_plaid.search(
queries_embeddings=torch.randn(2, 50, 128), # 2 queries
top_k=5,
subset=[2, 5, 10, 15, 18]
)
print(scores)
# Apply a different filter for each query
# Query 1: search within documents [0, 1, 2, 3, 4]
# Query 2: search within documents [10, 11, 12, 13, 14]
scores = fast_plaid.search(
queries_embeddings=torch.randn(2, 50, 128), # 2 queries
top_k=5,
subset=[
[0, 1, 2, 3, 4],
[10, 11, 12, 13, 14]
]
)
print(scores)1.1.0
Introducing mutable indexes with update method. π
Adding new parameter n_samples_kmeans which allow to modulate the number of samples used to compute the centroids and reduce memory usage on demand.
1.0.3
Ease the Torch dependancy.