MongoDB Atlas Vector Search + .NET 10 Semantic Movie Search

This project demonstrates implementing semantic search over movie plots using MongoDB Atlas Vector Search in a minimal .NET 10 API. Text queries are converted to embeddings with an Ollama local model (mxbai-embed-large) and matched against pre-computed plot embeddings stored in MongoDB.

Features

Minimal API (Program.cs) targeting .NET 10
MongoDB Atlas collection embedded_movies based on sample_mflix dataset
Vector field plot_embedding_1024 (1024-dim, cosine similarity)
Semantic search endpoint: GET /api/movies?term=...&limit=...
Embedding generation via Microsoft.Extensions.AI + Ollama
Optional background service (MovieEmbeddingsGenerator) to bulk generate missing embeddings
Server-side candidate filtering (e.g. Year >= 2010)

Project Structure

Movies.Search.Api/Program.cs – service registration & HTTP endpoints
Movies.Search.Api/Models/Movie.cs – MongoDB document model including embedding field
Movies.Search.Api/Services/MovieService.cs – vector search logic
Movies.Search.Api/Services/MovieEmbeddingsGenerator.cs – optional background batch embedding generation
Movies.Search.Api/Services/IMovieService.cs – abstraction for querying movies

Prerequisites

.NET 10 SDK (Preview if not GA) – verify with dotnet --version
MongoDB Atlas cluster (M10 or above recommended for performance)
sample_mflix dataset loaded (Atlas Sample Data "Load Sample Dataset")
Ollama installed locally (Linux/macOS/Windows WSL) https://ollama.com
Model: mxbai-embed-large pulled via ollama pull mxbai-embed-large

1. Prepare Atlas Collection

Load the Atlas Sample Data (includes sample_mflix.movies).
Create a new collection embedded_movies and copy (or transform) documents from sample_mflix.movies ensuring a plot field exists. You can start by inserting a subset for faster experimentation.
Add a new field plot_embedding_1024 (array of 1024 floats) – initially empty/null.

2. Create Vector Search Index

In Atlas UI (Data Explorer -> Indexes -> Create Search Index):

Index Name: vector_index
Type: JSON Editor
Definition:

{
  "fields": [
    {
      "type": "vector",
      "path": "plot_embedding_1024",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}

3. Configure Application Settings

Create (or edit) appsettings.Development.json:

{
  "ConnectionStrings": {
    "MongoDb": "mongodb+srv://<user>:<password>@<cluster>/?retryWrites=true&w=majority"
  },
  "Ollama": {
    "Url": "http://localhost:11434"
  }
}

Replace credentials accordingly.

4. Run Ollama Embedding Model

ollama pull mxbai-embed-large
ollama run mxbai-embed-large  # (Optional test)

Ensure the Ollama daemon is reachable at http://localhost:11434.

5. Generate Plot Embeddings

You have two options:

A. Automatic Background Generation

Uncomment the hosted service line in Program.cs:

// builder.Services.AddHostedService<MovieEmbeddingsGenerator>();

Change to:

builder.Services.AddHostedService<MovieEmbeddingsGenerator>();

On startup it will process batches (example limit 10,000) and update plot_embedding_1024 for documents missing embeddings.

B. Manual / Scripted Bulk Update

Write a one-off console utility that:

Fetches documents with null/empty plot_embedding_1024
Sends each plot (or batched) to the embedding generator
Updates the array field

(Option A is usually sufficient for demos.)

6. Build & Run API

dotnet restore
dotnet run --project Movies.Search.Api

API listens (by default) on HTTPS localhost port assigned by Kestrel.

7. Query Semantic Search Endpoint

Example:

GET https://localhost:<port>/api/movies?term=heartwarming%20story%20of%20friendship&limit=5

Response (trimmed example shape):

[
  {
    "id": "...",
    "title": "...",
    "plot": "...",
    "year": 2015,
    "plot_embedding_1024": [0.0123, -0.0345, ...]
  }
]

If term is omitted, the service returns the first limit documents (non-semantic fallback).

How Semantic Search Works

Input query (term) is converted to a 1024-d embedding via IEmbeddingGenerator backed by Ollama.
A vector search aggregation is executed:
- Field: plot_embedding_1024
- Index: vector_index
- Similarity metric: cosine
- Candidate filtering: Year >= 2010 (Filter in VectorSearchOptions)
Top N (limit) nearest vectors returned.

Relevant code (MovieService):

var vectorEmbeddings = await GenerateEmbeddings(term);
var vectorSearchOptions = new VectorSearchOptions<Movie>
{
    IndexName = "vector_index",
    NumberOfCandidates = 200,
    Filter = Builders<Movie>.Filter.Gte(m => m.Year, 2010)
};
return await _moviesCollection
    .Aggregate()
    .VectorSearch(m => m.PlotEmbedding1024, vectorEmbeddings, limit, vectorSearchOptions)
    .ToListAsync();

References

MongoDB Atlas Vector Search Docs
Microsoft.Extensions.AI (experimental) embeddings
Ollama model registry

This README provides a concise guide to stand up and demonstrate semantic search with MongoDB Atlas and .NET.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Movies.Search.Api		Movies.Search.Api
Movies.Search.Client		Movies.Search.Client
.gitignore		.gitignore
Movies.Search.sln		Movies.Search.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MongoDB Atlas Vector Search + .NET 10 Semantic Movie Search

Features

Project Structure

Prerequisites

1. Prepare Atlas Collection

2. Create Vector Search Index

3. Configure Application Settings

4. Run Ollama Embedding Model

5. Generate Plot Embeddings

A. Automatic Background Generation

B. Manual / Scripted Bulk Update

6. Build & Run API

7. Query Semantic Search Endpoint

How Semantic Search Works

References

About

Uh oh!

Releases

Packages

Languages

iamhitya/mongodb-vector-semantic-search-dotnet

Folders and files

Latest commit

History

Repository files navigation

MongoDB Atlas Vector Search + .NET 10 Semantic Movie Search

Features

Project Structure

Prerequisites

1. Prepare Atlas Collection

2. Create Vector Search Index

3. Configure Application Settings

4. Run Ollama Embedding Model

5. Generate Plot Embeddings

A. Automatic Background Generation

B. Manual / Scripted Bulk Update

6. Build & Run API

7. Query Semantic Search Endpoint

How Semantic Search Works

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages