A .NET-native full-text search engine. Segment-centric indexing, memory-mapped reads, and atomic commit semantics. Targets net10.0 and net11.0. The only external dependency for the core library is NativeCompressions (LZ4 + Zstandard). Everything else uses BCL types.
| Project | Description |
|---|---|
Rowles.LeanLucene |
Core library |
Rowles.LeanLucene.Tests |
xUnit test suite |
Rowles.LeanLucene.Benchmarks |
BenchmarkDotNet suites, compared against Lucene.NET |
Rowles.LeanLucene.Example.JsonApi |
ASP.NET Minimal API example |
dotnet build
dotnet test
var dir = new MMapDirectory("path/to/index");
var config = new IndexWriterConfig();
using var writer = new IndexWriter(dir, config);
var doc = new LeanDocument();
doc.Add(new TextField("title", "hello world", stored: true));
doc.Add(new StringField("id", "1", stored: true));
writer.AddDocument(doc);
writer.Commit();
using var searcher = new IndexSearcher(dir);
var results = searcher.Search("hello", "title", topN: 10);For near-real-time search, use SearcherManager, which polls for new commits and swaps the searcher with reference-counted acquire/release:
using var mgr = new SearcherManager(dir);
var searcher = mgr.Acquire();
try { var results = searcher.Search("hello", "title", 10); }
finally { mgr.Release(searcher); }Buffers documents in memory and flushes immutable segments to disk. Auto-flushes when RamBufferSizeMB (default 256 MB) or MaxBufferedDocs (default 10,000) is reached. Background segment merges run after each commit.
var config = new IndexWriterConfig
{
RamBufferSizeMB = 128,
MaxBufferedDocs = 5_000,
MaxQueuedDocs = 10_000, // backpressure; blocks AddDocument when exceeded
CompressionPolicy = FieldCompressionPolicy.Lz4,
StoredFieldBlockSize = 16,
MergeThreshold = 10,
PostingsSkipInterval = 128,
StoreTermVectors = false,
UseCompoundFile = false,
IndexSort = new IndexSort("date", SortFieldType.Long, reverse: true),
Schema = mySchema, // optional; validates fields on AddDocument
DeletionPolicy = new KeepLastNCommitsPolicy(3),
Metrics = new DefaultMetricsCollector(),
};// Atomic delete-then-add
writer.UpdateDocument("id", "42", replacement);
// Soft delete
writer.DeleteDocuments(new TermQuery("id", "42"));
writer.Commit();Index parent/child document blocks for nested queries:
writer.AddDocumentBlock(new[] { child1, child2, parentDoc });| Type | Description |
|---|---|
TextField |
Tokenised text; supports analysis pipeline |
StringField |
Exact-match keyword; not tokenised |
NumericField |
double values; indexed in a BKD tree for range queries |
GeoPointField |
Lat/lon encoded as a 64-bit integer |
VectorField |
float[] for vector/KNN queries |
The default StandardAnalyser lowercases, removes punctuation, applies stop word filtering, and interns tokens. Per-field analyser overrides are set on IndexWriterConfig.FieldAnalysers.
Built-in analysers:
StandardAnalyser- configurable stop words and intern cache sizeStemmedAnalyser- wraps any stemmerLanguageAnalyser- language-specific pipelines
Built-in stemmers: English, French, German, Russian.
Built-in tokenisers: standard, N-gram, edge N-gram, CJK bigram.
Character filters can be added to IndexWriterConfig.CharFilters and run before tokenisation. Token budget enforcement is configured via MaxTokensPerDocument and TokenBudgetPolicy (Truncate or Throw).
| Query | Notes |
|---|---|
TermQuery |
Single exact term |
BooleanQuery |
Combines clauses with Must / Should / MustNot |
PhraseQuery |
Ordered term sequence; slop supported |
PrefixQuery |
term* |
WildcardQuery |
te?m* |
FuzzyQuery |
Edit-distance matching |
RangeQuery / TermRangeQuery |
Numeric and string range |
RegexpQuery |
FST-backed regexp matching |
VectorQuery |
KNN by cosine similarity |
MoreLikeThisQuery |
Document similarity |
FunctionScoreQuery |
Custom per-doc score function |
DisjunctionMaxQuery |
Best-scoring clause wins |
ConstantScoreQuery |
Wraps any query with a fixed score |
RrfQuery |
Reciprocal rank fusion |
SpanNearQuery / SpanOrQuery / SpanNotQuery |
Span-level proximity |
BlockJoinQuery |
Parent/child nested document queries |
GeoBoundingBoxQuery / GeoDistanceQuery |
Geographic filtering |
Parses Lucene-style query strings:
var parser = new QueryParser("content", new StandardAnalyser());
var query = parser.Parse("+title:lean -status:deleted \"full text\"~2 fuzzy~1 prefix* field:value^2.0");Supported syntax: field:term, "phrase", "slop phrase"~N, +required, -excluded, (grouping), prefix*, wild?card, fuzzy~N, term^boost.
var query = new BooleanQueryBuilder()
.Must(new TermQuery("status", "active"))
.Should(new TermQuery("category", "tech"))
.MustNot(new TermQuery("deleted", "true"))
.Build();Default similarity is BM25 (Bm25Similarity.Instance). TF-IDF is also available. The scoring model is set on both IndexWriterConfig.Similarity and IndexSearcherConfig.Similarity. Multi-segment searches use BlockMaxWAND for early termination. IndexSort enables additional early termination for sort-aligned queries.
Score explanations:
var explanation = searcher.Explain(new TermQuery("title", "lean"), docId);var agg = new AggregationRequest("price", AggregationType.Histogram, interval: 10.0);
var result = searcher.Aggregate(query, agg);
var facets = searcher.GetFacets(query, "category", topN: 10);var suggestions = DidYouMeanSuggester.Suggest(searcher, "title", "worl", maxEdits: 2, topN: 5);var highlighter = new Highlighter(searcher, query);
string snippet = highlighter.GetBestFragment("content", storedText);Deduplicate results by a field value:
var opts = new SearchOptions { CollapseField = "thread_id", CollapseMode = CollapseMode.Max };
var results = searcher.Search(query, topN: 10, opts);var searcherConfig = new IndexSearcherConfig
{
Metrics = new DefaultMetricsCollector(),
SlowQueryLog = new SlowQueryLog(threshold: TimeSpan.FromMilliseconds(50)),
SearchAnalytics = new SearchAnalytics(capacity: 1000),
};
var writerConfig = new IndexWriterConfig
{
Metrics = new DefaultMetricsCollector(),
};
var snapshot = ((DefaultMetricsCollector)searcherConfig.Metrics).GetSnapshot();
IndexSizeReport size = searcher.GetIndexSize();Point-in-time read-only views of the index, safe to hold while the writer continues indexing:
IndexSnapshot snap = writer.AcquireSnapshot();
// ... use snap ...
writer.ReleaseSnapshot(snap);var schema = new IndexSchema();
schema.AddField("id", FieldType.String, required: true);
schema.AddField("title", FieldType.Text, required: true);
schema.AddField("price", FieldType.Numeric, required: false);
var config = new IndexWriterConfig { Schema = schema };SchemaValidationException is thrown from AddDocument on violation.
| Policy | Description |
|---|---|
KeepLatestCommitPolicy |
Keeps only the most recent commit (default) |
KeepLastNCommitsPolicy |
Keeps the last N commit generations |
On construction, IndexWriter reads the latest segments_N file and loads any existing commit state. Partial or corrupt commits are skipped.
Benchmark suites compare LeanLucene against Lucene.NET across indexing, search, analysis, and more.
# All suites, full run
.\scripts\benchmark.ps1
# Single suite, smoke test
.\scripts\benchmark.ps1 -Suite query -Strat fast
# Intense run, specific doc count
.\scripts\benchmark.ps1 -Strat intense -DocCount 20000
# List available suites
.\scripts\benchmark.ps1 -ListAvailable suites: index, query, analysis, boolean, phrase, prefix, fuzzy, wildcard, deletion, smallindex, tokenbudget, diagnostics, suggester, schemajson, compound, indexsort, blockjoin.
Output is written to bench/data/<type>/<runId>/<suite>/ with JSON, Markdown, and HTML reports.
Rowles.LeanLucene.Example.JsonApi is an ASP.NET Minimal API that exposes collections over HTTP. Configure the data directory:
LEANLUCENE_DATA_PATH=/path/to/data
Endpoints:
GET /collections
DELETE /collections/{name}
POST /collections/{name}/documents (body: JSON object or array)
DELETE /collections/{name}/documents?field=id&term=42
GET /collections/{name}/search?q=hello&field=content&topN=10
Search responses include totalHits, hits (score + stored fields), and suggestions (DidYouMean per token).