This project demonstrates implementing semantic search over movie plots using MongoDB Atlas Vector Search in a minimal .NET 10 API. Text queries are converted to embeddings with an Ollama local model (mxbai-embed-large) and matched against pre-computed plot embeddings stored in MongoDB.
- Minimal API (
Program.cs) targeting .NET 10 - MongoDB Atlas collection
embedded_moviesbased onsample_mflixdataset - Vector field
plot_embedding_1024(1024-dim, cosine similarity) - Semantic search endpoint:
GET /api/movies?term=...&limit=... - Embedding generation via
Microsoft.Extensions.AI+ Ollama - Optional background service (
MovieEmbeddingsGenerator) to bulk generate missing embeddings - Server-side candidate filtering (e.g.
Year >= 2010)
Movies.Search.Api/Program.cs– service registration & HTTP endpointsMovies.Search.Api/Models/Movie.cs– MongoDB document model including embedding fieldMovies.Search.Api/Services/MovieService.cs– vector search logicMovies.Search.Api/Services/MovieEmbeddingsGenerator.cs– optional background batch embedding generationMovies.Search.Api/Services/IMovieService.cs– abstraction for querying movies
- .NET 10 SDK (Preview if not GA) – verify with
dotnet --version - MongoDB Atlas cluster (M10 or above recommended for performance)
sample_mflixdataset loaded (Atlas Sample Data "Load Sample Dataset")- Ollama installed locally (Linux/macOS/Windows WSL) https://ollama.com
- Model:
mxbai-embed-largepulled viaollama pull mxbai-embed-large
- Load the Atlas Sample Data (includes
sample_mflix.movies). - Create a new collection
embedded_moviesand copy (or transform) documents fromsample_mflix.moviesensuring aplotfield exists. You can start by inserting a subset for faster experimentation. - Add a new field
plot_embedding_1024(array of 1024 floats) – initially empty/null.
In Atlas UI (Data Explorer -> Indexes -> Create Search Index):
- Index Name:
vector_index - Type: JSON Editor
- Definition:
{
"fields": [
{
"type": "vector",
"path": "plot_embedding_1024",
"numDimensions": 1024,
"similarity": "cosine"
}
]
}Create (or edit) appsettings.Development.json:
{
"ConnectionStrings": {
"MongoDb": "mongodb+srv://<user>:<password>@<cluster>/?retryWrites=true&w=majority"
},
"Ollama": {
"Url": "http://localhost:11434"
}
}Replace credentials accordingly.
ollama pull mxbai-embed-large
ollama run mxbai-embed-large # (Optional test)Ensure the Ollama daemon is reachable at http://localhost:11434.
You have two options:
Uncomment the hosted service line in Program.cs:
// builder.Services.AddHostedService<MovieEmbeddingsGenerator>();Change to:
builder.Services.AddHostedService<MovieEmbeddingsGenerator>();On startup it will process batches (example limit 10,000) and update plot_embedding_1024 for documents missing embeddings.
Write a one-off console utility that:
- Fetches documents with null/empty
plot_embedding_1024 - Sends each
plot(or batched) to the embedding generator - Updates the array field
(Option A is usually sufficient for demos.)
dotnet restore
dotnet run --project Movies.Search.ApiAPI listens (by default) on HTTPS localhost port assigned by Kestrel.
Example:
GET https://localhost:<port>/api/movies?term=heartwarming%20story%20of%20friendship&limit=5
Response (trimmed example shape):
[
{
"id": "...",
"title": "...",
"plot": "...",
"year": 2015,
"plot_embedding_1024": [0.0123, -0.0345, ...]
}
]If term is omitted, the service returns the first limit documents (non-semantic fallback).
- Input query (
term) is converted to a 1024-d embedding viaIEmbeddingGeneratorbacked by Ollama. - A vector search aggregation is executed:
- Field:
plot_embedding_1024 - Index:
vector_index - Similarity metric: cosine
- Candidate filtering:
Year >= 2010(FilterinVectorSearchOptions)
- Field:
- Top N (limit) nearest vectors returned.
Relevant code (MovieService):
var vectorEmbeddings = await GenerateEmbeddings(term);
var vectorSearchOptions = new VectorSearchOptions<Movie>
{
IndexName = "vector_index",
NumberOfCandidates = 200,
Filter = Builders<Movie>.Filter.Gte(m => m.Year, 2010)
};
return await _moviesCollection
.Aggregate()
.VectorSearch(m => m.PlotEmbedding1024, vectorEmbeddings, limit, vectorSearchOptions)
.ToListAsync();- MongoDB Atlas Vector Search Docs
- Microsoft.Extensions.AI (experimental) embeddings
- Ollama model registry
This README provides a concise guide to stand up and demonstrate semantic search with MongoDB Atlas and .NET.