Skip to content

Releases: lightonai/fast-plaid

1.4.1

05 Jan 23:34

Choose a tag to compare

Fast-Plaid 1.4.1 Release Notes

Overview

Fast-Plaid 1.4.1 introduces incremental index updates with dynamic centroid expansion and a new low memory mode that significantly reduces GPU VRAM usage. This release focuses on making Fast-Plaid more efficient for production workloads with evolving document collections.

Key Features

Incremental Updates with Dynamic Centroid Expansion

The .update() method now supports intelligent centroid management:

  • Buffered Updates: New documents are accumulated in a buffer. When the buffer reaches the threshold (default: 100 documents), the system triggers a centroid expansion check.
  • Automatic Centroid Expansion: Embeddings far from existing centroids (outliers) are automatically identified and clustered to create new centroids, ensuring the index adapts to new data distributions over time.
  • Efficient Small Updates: Small batches below the buffer size are processed immediately without centroid expansion for fast incremental updates.

This replaces the previous behavior where centroids remained fixed after initial index creation, which could lead to accuracy degradation as data distributions shifted.

Low Memory Mode

New low_memory parameter (default: True) reduces GPU VRAM usage:

fast_plaid = search.FastPlaid(index="index", device="cuda", low_memory=True)
  • Document tensors are kept on CPU and moved to GPU only when needed during search
  • Significantly reduces VRAM footprint for large indexes
  • Trade-off: Slightly slower search performance
  • No effect when device="cpu"

Memory-Optimized K-means

  • Eliminates unnecessary numpy conversions during centroid computation

Embedding Reconstruction

New Rust function to reconstruct original embeddings from compressed index data, useful for debugging and analysis.

Thread-Safe Operations

  • File locking mechanism ensures safe concurrent access to indexes
  • Prevents corruption during simultaneous read/write operations

Configuration

New Parameters

Parameter Default Description
low_memory True Keep tensors on CPU, move to GPU only when needed
buffer_size 100 Documents to accumulate before centroid expansion
start_from_scratch 999 Rebuild index if fewer documents exist
max_points_per_centroid 256 Maximum points per centroid during expansion

Breaking Changes

  • Default batch_size for .create() increased from 25,000 to 50,000
  • fastkmeans dependency pinned to version 0.5.0

New Dependencies

  • filelock>=3.20.0 - For thread-safe index operations
  • usearch>=2.21.0 - For efficient similarity search during updates on CPU. Used to spot outliers.

Installation

pip install fast-plaid==1.4.1.290  # PyTorch 2.9.0
pip install fast-plaid==1.4.1.280  # PyTorch 2.8.0
pip install fast-plaid==1.4.1.271  # PyTorch 2.7.1
pip install fast-plaid==1.4.1.270  # PyTorch 2.7.0

Upgrade Notes

Existing indexes created with v1.3.x are compatible with v1.4.1. The new centroid expansion features will activate automatically when using .update() on existing indexes.

For users who were experiencing accuracy degradation with frequent updates, this release should significantly improve long-term index quality without requiring full re-indexing.

Contributors @raphaelsty

1.3.1

17 Dec 22:50
d8da345

Choose a tag to compare

Small release which reduce memory usage of Fast-Plaid index creation. Getting better one step at a time 😊

1.3.0

04 Dec 15:53
6ac767f

Choose a tag to compare

v1.3.0: Memory Optimizations & Architecture Improvements

This release introduces significant reductions in memory usage and improves index management.

πŸš€ Performance & Memory

  • Memory-Mapped Loading: Implemented a new loading system with incremental updates and zero-copy validation to prevent loading entire indices into RAM with update method.
  • Optimized Tensors: Shifted to smaller integer types (Uint8, Int32) where appropriate and replaced torch.quantile with a custom implementation to bypass Torch limitations.
  • Object-Based Management: Replaced the global index cache with direct object passing, allowing Python to fully manage the index lifecycle.

βš™οΈ API & Behavior Changes

  • Automatic Parallelism: Simplified the API by abstracting multi-device logic. If no device is provided, the search now automatically distributes across available GPUs. If CPU is provided, index should spawn faster.
  • Default Settings: Changed the default search batch_size from 25,000 to 2,000.

1.2.5

29 Oct 21:42
fdba8a8

Choose a tag to compare

FastPlaid 1.2.5: Leaner & Faster

We're excited to release FastPlaid 1.2.5! This version focuses on significant optimizations for indexing, giving you faster search speeds and much more efficient GPU VRAM management.

✨ Highlights

  • Drastically Reduced GPU VRAM Usage: We've refactored the indexing process to process document embeddings in batches. This massively reduces GPU VRAM consumption during index creation, all without compromising on speed. No impact on overall CPU RAM usage or indexing speed.

  • Blazing-Fast Search for APIs: Centroids are now by default pre-loaded into memory by default during indexing / when creating Fast-Plaid object. This results in an acceleration of search performance for large-scale indexes deployed in API environments. It can be disabled by setting preload_index=False. Disabling this option might be useful in environment with lots of replicates of Fast-Plaid indexes, otherwise keep it on. βœ…

  • Improved batch_size parameter: better control over memory usage during indexing with the new batch_size parameter.

  • Indexing Progress Bar: Track the status of your index creation. βš™οΈ

housekeeping

Code Clarity: Several variables have been renamed to improve the overall clarity and readability of the code.

PyTorch 2.9.0 Support: This release is fully compatible with PyTorch 2.9.0.

Dependency Note: Support for PyTorch 2.6.0 has been temporarily dropped due to compatibility issues.

Contributors: @raphaelsty @fschlatt

1.2.4

23 Sep 15:10
4348cdd

Choose a tag to compare

The version 1.2.4 of fast-plaid now support Python 3.13 version and upload dedicated wheels to PyPi. πŸš€

1.2.3

22 Sep 19:06
38e3fbf

Choose a tag to compare

The 1.2.3 version of Fast-Plaid enhance the mutability of the index by adding deletion of specific embeddings.
It also includes a built-in sqlite filtering pipeline.

1.2.1

10 Sep 13:31
d8bfa55

Choose a tag to compare

This new release allows to feed Fast-Plaid with un-padded queries. It also normalize decompressed embeddings to further enhance the results. It also solve an issue on small dataset where the fast-kmeans would be initialized with a larger number of clusters than training data points. This version will be integrated to PyLate as the backend for search.

1.2.0

08 Sep 11:55
6147e6a

Choose a tag to compare

This new release introduces filtering for Fast-Plaid, allowing any system to interoperate with it by providing subset IDs to score. πŸš€

import torch
from fast_plaid import search


fast_plaid = search.FastPlaid(index="index") # Load an existing index

# Apply a single filter to all queries
# Search for the top 5 results only within documents [2, 5, 10, 15, 18]
scores = fast_plaid.search(
    queries_embeddings=torch.randn(2, 50, 128), # 2 queries
    top_k=5,
    subset=[2, 5, 10, 15, 18]
)

print(scores)

# Apply a different filter for each query
# Query 1: search within documents [0, 1, 2, 3, 4]
# Query 2: search within documents [10, 11, 12, 13, 14]
scores = fast_plaid.search(
    queries_embeddings=torch.randn(2, 50, 128), # 2 queries
    top_k=5,
    subset=[
        [0, 1, 2, 3, 4],
        [10, 11, 12, 13, 14]
    ]
)

print(scores)

1.1.0

13 Aug 15:48

Choose a tag to compare

Introducing mutable indexes with update method. πŸš€

Adding new parameter n_samples_kmeans which allow to modulate the number of samples used to compute the centroids and reduce memory usage on demand.

1.0.3

06 Jun 15:13

Choose a tag to compare

Ease the Torch dependancy.