WeavScope 🔭

A clean, multi-tenant wrapper for Weaviate — batteries-included, no boilerplate.

WeavScope lets you interact with Weaviate using a simple, Pythonic API. It handles the full lifecycle: connecting, creating collections, managing tenants, inserting vectors, and searching — all with one consistent interface. Stop writing boilerplate; start building.

Why WeavScope?

Working with Weaviate directly involves a lot of ceremony: creating clients, managing connections, building multi-tenancy configs, handling batch contexts, deserializing responses, and cleaning up after yourself. WeavScope abstracts all of that away.

Without WeavScope	With WeavScope
Manual `connect_to_custom(...)` calls	Auto-connects from `WeaviateConfig`
Manually build `Configure.multi_tenancy(...)`	`ensure_collection()` handles it
Manually create/delete tenants	Auto-created & deleted by context manager
Manage `batch.dynamic()` context	`scope.batch.add_objects(...)` — done
Deserialize raw Weaviate objects	Results are plain Python dicts

How It Works

WeavScope is built around a two-step pattern, because collection creation (schema setup) is a one-time operation, while tenant-scoped data operations happen repeatedly:

┌─────────────────────────────────────────────────────────────────────┐
│ STEP 1 (once): WeavScope.ensure_collection()                        │
│   → Creates the Weaviate collection with multi-tenancy enabled.     │
│   → Idempotent: safe to call again — skips if already exists.       │
└────────────────────────────┬────────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────────┐
│ STEP 2 (per tenant): with WeavScope(config, tenant_id="...") as ws: │
│   → Creates the tenant on __enter__                                 │
│   → Exposes ws.batch  → insert objects                              │
│   → Exposes ws.query  → search objects                              │
│   → Deletes the tenant + all its data on __exit__                   │
└─────────────────────────────────────────────────────────────────────┘

This separation ensures the collection schema exists before any tenant operations happen, and the context manager keeps each "scope" of work clean and isolated.

Installation

pip install weavscope

Requires Python 3.11+ and a running Weaviate instance (v1.24+ recommended for multi-tenancy support).

Core Concepts

WeaviateConfig

WeaviateConfig is a plain Python dataclass that holds all your connection and embedding settings. There is no hidden env-var magic — you control how values are supplied (hardcoded, from os.environ, from a secrets manager, etc.).

from weavscope import WeaviateConfig

config = WeaviateConfig(
    WEAVIATE_HOST="localhost",           # Weaviate instance hostname or IP
    WEAVIATE_PORT=8080,                  # HTTP port (default: 8080)
    WEAVIATE_GRPC_PORT=50051,            # gRPC port (default: 50051)
    WEAVIATE_CLASS_NAME="MyCollection",  # Collection (class) name in Weaviate
    WEAVIATE_EMBEDDING_MODEL_PROVIDER="openai",          # Embedding provider
    WEAVIATE_EMBEDDING_MODEL_NAME="text-embedding-3-small",  # Model name
    WEAVIATE_API_KEY="",                 # Weaviate API key (empty = no auth)
    WEAVIATE_EMBEDDING_MODEL_API_KEY="", # Embedding provider API key
)

Key fields:

Field	Type	Default	Description
`WEAVIATE_HOST`	`str`	—	Hostname or IP of your Weaviate instance
`WEAVIATE_PORT`	`int`	`8080`	HTTP API port
`WEAVIATE_GRPC_PORT`	`int`	`50051`	gRPC port (required for batch imports)
`WEAVIATE_USE_GRPC`	`bool`	`True`	Use gRPC for batch inserts (faster). Set `False` for environments without gRPC
`WEAVIATE_CLASS_NAME`	`str`	—	Name of the Weaviate collection (PascalCase recommended)
`WEAVIATE_API_KEY`	`str`	`""`	Weaviate auth key. Leave empty for open/anonymous instances
`WEAVIATE_EMBEDDING_MODEL_PROVIDER`	`str`	—	Embedding provider (see Supported Providers)
`WEAVIATE_EMBEDDING_MODEL_NAME`	`str`	—	Model name for the selected provider
`WEAVIATE_EMBEDDING_MODEL_API_KEY`	`str`	`""`	API key for the embedding provider

WeavScope

WeavScope is the main entry point. It connects to Weaviate on instantiation and exposes two sub-interfaces:

scope.batch — for inserting data (WeavScopeBatch)
scope.query — for searching data (WeavScopeQuery)

It can be used as a context manager (recommended) or manually with try/finally.

# Context manager (recommended)
with WeavScope(config, tenant_id="my-tenant") as scope:
    # tenant is created here automatically
    scope.batch.add_objects(objects=[...], id_field="title")
    results = scope.query.hybrid("my search query")
# tenant is deleted and connection is closed here automatically

# Manual usage (when you need more control)
scope = WeavScope(config)
try:
    scope.ensure_tenant("my-tenant")
    scope.batch.add_objects(objects=[...], tenant_id="my-tenant")
    results = scope.query.hybrid("my query", tenant_id="my-tenant")
    scope.delete_tenant("my-tenant")
finally:
    scope.close()

Tenants

WeavScope is built around Weaviate's multi-tenancy feature, which provides data isolation at the tenant level. Each tenant is a logically separate storage space within the same collection.

Tenants are identified by a string ID (e.g., "project-A", "user-123", "event-42").
When you use WeavScope(config, tenant_id="..."), the tenant is auto-created on enter and auto-deleted (with all its data) on exit.
If you want tenants to persist after the scope exits, manage them manually (don't pass tenant_id to the constructor).

Why tenants? They let multiple isolated workloads share a single Weaviate collection without interfering with each other. Ideal for multi-user applications, per-project vector stores, or ephemeral session data.

Batch Insertions

scope.batch.add_objects(...) handles inserting a list of dictionaries into a tenant. It supports:

gRPC batching (fast, default if WEAVIATE_USE_GRPC=True)
REST fallback (sequential inserts if gRPC is disabled)
Deterministic UUIDs — pass id_field="title" to generate a UUID from (object_value, tenant_id), ensuring idempotent inserts (inserting the same object twice won't duplicate it)

Querying

scope.query exposes four search methods, all returning a list of plain Python dicts:

Method	Description
`.hybrid(query)`	BM25 keyword + vector similarity (recommended default)
`.near_text(query)`	Pure semantic (vector) search by text
`.near_vector(vector)`	Vector search using a pre-computed embedding
`.bm25(query)`	Pure keyword (BM25) search
`.fetch_all()`	Fetch all objects from a tenant
`.fetch_by_id(uuid)`	Fetch a single object by UUID

Each result dict has the shape:

{
    "uuid": "...",
    "properties": { "title": "...", "content": "...", ... },
    "score": 0.87,       # hybrid/BM25 score
    "distance": 0.12,    # vector distance
    "certainty": 0.88,   # semantic certainty
}

Quick Start (Two-Step Pattern)

Here's the minimal, complete example to get running with a local Weaviate instance:

from weavscope import WeavScope, WeaviateConfig

# Configure your connection (no credentials needed for a local open instance)
config = WeaviateConfig(
    WEAVIATE_HOST="localhost",
    WEAVIATE_PORT=8080,
    WEAVIATE_GRPC_PORT=50051,
    WEAVIATE_CLASS_NAME="Articles",
    WEAVIATE_EMBEDDING_MODEL_PROVIDER="gemini",
    WEAVIATE_EMBEDDING_MODEL_NAME="gemini-embedding-001",
    WEAVIATE_EMBEDDING_MODEL_API_KEY="your-gemini-api-key",
)

# STEP 1: Create the collection (run once — idempotent, safe to repeat)
setup = WeavScope(config)
try:
    setup.ensure_collection(
        provider="gemini",
        model="gemini-embedding-001"
    )
finally:
    setup.close()

# STEP 2: Operate within a tenant scope
# The tenant "project-A" is auto-created on entry and auto-deleted on exit.
with WeavScope(config, tenant_id="project-A") as scope:

    # Insert documents — UUIDs are derived deterministically from the title field
    scope.batch.add_objects(
        objects=[
            {"title": "Intro to AI", "content": "AI is changing the world..."},
            {"title": "Vector DBs", "content": "Vector databases are cool."},
        ],
        id_field="title"
    )

    # Search using hybrid (BM25 + vector) search
    results = scope.query.hybrid("machine learning")

    for hit in results:
        print(f"Found: {hit['properties']['title']} (score: {hit['score']})")

# Connection is closed and tenant "project-A" (with all its data) is deleted.

Why two steps? Weaviate requires the collection (schema) to exist before tenants can be added to it. ensure_collection() is idempotent — safe to call every time, but typically run once during app startup or deployment.

Detailed Usage Guide

Step 1: Define Your Configuration

All settings live in one WeaviateConfig object. Use os.environ to pull secrets from environment variables:

import os
from weavscope import WeaviateConfig

config = WeaviateConfig(
    WEAVIATE_HOST=os.environ.get("WEAVIATE_HOST", "localhost"),
    WEAVIATE_PORT=int(os.environ.get("WEAVIATE_PORT", 8080)),
    WEAVIATE_GRPC_PORT=int(os.environ.get("WEAVIATE_GRPC_PORT", 50051)),
    WEAVIATE_CLASS_NAME="Articles",
    WEAVIATE_API_KEY=os.environ.get("WEAVIATE_API_KEY", ""),
    WEAVIATE_EMBEDDING_MODEL_PROVIDER="gemini",
    WEAVIATE_EMBEDDING_MODEL_NAME="gemini-embedding-001",
    WEAVIATE_EMBEDDING_MODEL_API_KEY=os.environ["GEMINI_API_KEY"],
)

For open/anonymous local Weaviate instances (no auth), leave WEAVIATE_API_KEY empty (it defaults to ""). The embedding model key is only required if you're using a hosted model (OpenAI, Gemini, Cohere, etc.) for server-side vectorization. If you're supplying your own pre-computed vectors, use provider="custom" and omit the embedding key.

Step 2: Create the Collection

The collection is the Weaviate "class" (schema) that holds all your data. Multi-tenancy is enabled automatically.

from weavscope import WeavScope

setup = WeavScope(config)
try:
    setup.ensure_collection(
        provider="gemini",           # Which embedding provider powers this collection
        model="gemini-embedding-001" # The specific model to use for vectorization
    )
finally:
    setup.close()

ensure_collection() is idempotent — if the collection already exists, it does nothing and logs a debug message. Run it at startup without worry.

You can also add extra properties to the schema:

from weaviate.classes.config import Property, DataType

setup.ensure_collection(
    provider="openai",
    model="text-embedding-3-small",
    extra_properties=[
        Property(name="author", data_type=DataType.TEXT),
        Property(name="published_at", data_type=DataType.DATE),
        Property(name="word_count", data_type=DataType.INT),
    ]
)

Note: tenant_id and object_id properties are always added automatically as base properties by WeavScope.

Step 3: Insert Data in a Tenant Scope

with WeavScope(config, tenant_id="project-A") as scope:
    scope.batch.add_objects(
        objects=[
            {"title": "Intro to AI", "content": "Artificial Intelligence is..."},
            {"title": "Deep Learning", "content": "Neural networks learn by..."},
            {"title": "RAG Systems", "content": "Retrieval Augmented Generation..."},
        ],
        id_field="title"   # Use "title" to generate deterministic UUIDs
    )

How deterministic UUIDs work: When you specify id_field="title", WeavScope generates a UUID from the combination of the field value and the tenant ID using a UUID5 hash. This means:

Inserting the same object into the same tenant produces the same UUID every time.
Re-running your ingestion pipeline won't create duplicate records.
Objects with the same title in different tenants get different UUIDs.

Inserting a single object:

scope.batch.add_object(
    properties={"title": "One Document", "content": "..."},
    id_field="title"
)

Inserting with pre-computed vectors (custom provider):

my_vector = [0.1, 0.3, 0.5, ...]  # Your own embedding

scope.batch.add_object(
    properties={"title": "Doc", "content": "..."},
    vector=my_vector
)

Deleting objects by filter:

scope.batch.delete_objects_where(
    filter_property="title",
    filter_value="Intro to AI"
)

Step 4: Query Within the Scope

with WeavScope(config, tenant_id="project-A") as scope:
    # ... (insert objects) ...

    results = scope.query.hybrid(
        query_text="neural networks",
        limit=5,         # Return up to 5 results (default: 10)
        alpha=0.75,      # 0.0 = pure BM25, 1.0 = pure vector (default: 0.75)
    )

    for hit in results:
        print(f"[{hit['score']:.3f}]  {hit['properties']['title']}")

All Query Methods

`scope.query.hybrid(query_text, ...)`

Combines BM25 (keyword) and vector (semantic) search. The alpha parameter controls the blend.

results = scope.query.hybrid(
    query_text="machine learning tutorial",
    limit=10,
    alpha=0.75,                         # 75% vector, 25% BM25
    exclude_property="title",           # Optional: filter out objects where...
    exclude_value="Intro to AI",        # ...title == "Intro to AI"
    return_properties=["title"],        # Optional: only return specific properties
)

`scope.query.near_text(query_text, ...)`

Pure semantic search — finds objects whose vectors are closest to the query text's embedding.

results = scope.query.near_text(
    query_text="deep neural architectures",
    limit=5,
    certainty=0.8,   # Minimum similarity threshold (0.0–1.0)
    distance=0.2,    # Maximum vector distance (alternative to certainty)
)

`scope.query.near_vector(vector, ...)`

Search using a pre-computed embedding vector. Useful when you already have an embedding from your own pipeline.

my_embedding = [0.12, 0.45, ...]  # 768-dim or however many dims your model uses

results = scope.query.near_vector(
    vector=my_embedding,
    limit=5,
    certainty=0.7,
)

`scope.query.bm25(query_text, ...)`

Pure keyword search (no vectors). Fast and effective for exact or near-exact term matching.

results = scope.query.bm25(
    query_text="vector database performance",
    limit=10,
    properties=["title", "content"],  # Only search within these fields
)

`scope.query.fetch_all(limit=100, ...)`

Retrieve all objects in a tenant up to a limit.

all_docs = scope.query.fetch_all(limit=50, return_properties=["title"])

`scope.query.fetch_by_id(uuid, ...)`

Retrieve a single object by its Weaviate UUID.

doc = scope.query.fetch_by_id("3fa85f64-5717-4562-b3fc-2c963f66afa6")
if doc:
    print(doc["properties"]["title"])

Supported Embedding Providers

Pass the provider name as a string — WeavScope maps it to the correct Weaviate vectorizer config internally.

Provider Alias	Weaviate Vectorizer	Notes
`"openai"`	`text2vec_openai`	OpenAI embedding models
`"gemini"`	`text2vec_google_gemini`	Gemini Embedding API
`"cohere"`	`text2vec_cohere`	Cohere embedding models
`"google"` / `"vertexai"`	`text2vec_palm`	Legacy Vertex AI / PaLM
`"huggingface"`	`text2vec_huggingface`	HuggingFace Inference API
`"voyageai"`	`text2vec_voyageai`	VoyageAI embedding models
`"mistral"`	`text2vec_mistral`	Mistral embedding models
`"jinaai"`	`text2vec_jinaai`	Jina AI embedding models
`"azure"`	`text2vec_azure_openai`	Azure OpenAI; pass deployment name as model
`"custom"`	None	You supply vectors manually via `vector=`

The embedding model API key is passed to Weaviate via the appropriate provider-specific HTTP header (e.g., X-OpenAI-Api-Key, X-Goog-Api-Key) — all handled automatically by WeavScope.

Error Handling

All WeavScope exceptions inherit from WeavscopeError, so you can catch them broadly or specifically:

from weavscope import (
    WeavscopeError,            # Base — catch all WeavScope errors
    WeavscopeConnectionError,  # Failed to connect to Weaviate
    WeavscopeCollectionError,  # Collection create/delete failed
    WeavscopeTenantError,      # Tenant create/delete/list failed
    WeavscopeBatchError,       # Batch insert/delete failed
    WeavscopeQueryError,       # Query execution failed
)

try:
    with WeavScope(config, tenant_id="project-A") as scope:
        scope.batch.add_objects(objects=[...], id_field="title")
        results = scope.query.hybrid("neural networks")

except WeavscopeConnectionError as e:
    print(f"Could not reach Weaviate: {e}")

except WeavscopeBatchError as e:
    print(f"Insertion failed: {e}")

except WeavscopeQueryError as e:
    print(f"Search failed: {e}")

except WeavscopeError as e:
    print(f"Unexpected WeavScope error: {e}")

Architecture Overview

weavscope/
├── __init__.py               # Public API exports
├── config/
│   └── settings.py           # WeaviateConfig dataclass
└── core/
│   ├── connection.py         # Weaviate client factory (connect_to_custom)
│   ├── providers.py          # Maps provider names → Weaviate VectorConfig
│   ├── store.py              # WeavScope: collection & tenant lifecycle
│   ├── batch.py              # WeavScopeBatch: object insertion
│   └── query.py              # WeavScopeQuery: all search methods
└── utils/
    ├── exceptions.py         # Custom exception hierarchy
    ├── logging.py            # Structured logger setup
    └── uuid.py               # Deterministic UUID5 generation

Data flow for a batch insert:

User → scope.batch.add_objects(objects, id_field)
  → WeavScopeBatch._store.ensure_tenant(tenant_id)
  → Generate UUID5(object[id_field] + tenant_id)   [if id_field set]
  → collection.with_tenant(tenant_id).batch.dynamic()
  → batch.add_object(properties=obj, uuid=uuid)
  → Weaviate vectorizes server-side using configured provider
  → Stores (properties + vector) in tenant's shard

AI/LLM Documentation

For AI coding assistants and LLMs looking for an in-depth technical overview of WeavScope's architecture and API, see LLM.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
docs		docs
mcp		mcp
tests		tests
weavscope		weavscope
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
LLM.txt		LLM.txt
README.md		README.md
debug_weaviate.py		debug_weaviate.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

WeavScope 🔭

Table of Contents

Why WeavScope?

How It Works

Installation

Core Concepts

WeaviateConfig

WeavScope

Tenants

Batch Insertions

Querying

Quick Start (Two-Step Pattern)

Detailed Usage Guide

Step 1: Define Your Configuration

Step 2: Create the Collection

Step 3: Insert Data in a Tenant Scope

Step 4: Query Within the Scope

All Query Methods

scope.query.hybrid(query_text, ...)

scope.query.near_text(query_text, ...)

scope.query.near_vector(vector, ...)

scope.query.bm25(query_text, ...)

scope.query.fetch_all(limit=100, ...)

scope.query.fetch_by_id(uuid, ...)

Supported Embedding Providers

Error Handling

Architecture Overview

AI/LLM Documentation

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scope.query.hybrid(query_text, ...)`

`scope.query.near_text(query_text, ...)`

`scope.query.near_vector(vector, ...)`

`scope.query.bm25(query_text, ...)`

`scope.query.fetch_all(limit=100, ...)`

`scope.query.fetch_by_id(uuid, ...)`

Packages