Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests-matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
- name: Setup .NET
uses: actions/setup-dotnet@v3
with:
dotnet-version: 9.0.x
dotnet-version: 10.0.x

- name: Install dependencies
run: dotnet restore ${{ env.PROJECT }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests-ubuntu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ jobs:
uses: dusrdev/actions/.github/workflows/reusable-dotnet-test-mtp.yaml@main
with:
platform: ubuntu-latest
dotnet-version: 9.0.x
dotnet-version: 10.0.x
test-project-path: ${{ matrix.project }}
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@
##
## Get latest from `dotnet new gitignore`

# Benchmark results
**/BenchmarkDotNet.Artifacts/results/*
**/BenchmarkDotNet.Artifacts/*.log
**/BenchmarkDotNet.Artifacts/*.csv
**/BenchmarkDotNet.Artifacts/*.html

# dotenv files
.env

Expand Down
92 changes: 92 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# AGENTS.md

This file contains repo-specific instructions for AI coding agents working in this repository.

## Critical points before answering any question or performing any task

- Never assume based on read data from earlier points in the conversation or logical guesses - Always read the latest version of each file that is references in the tasks or conversation, unless a different version is explicitly asked for.
- Never change the source project/code to make something in the unit tests easier when it costs perf or otherwise. The source code is much more important than testing convenience. If you have suggestion on how to refactor the source to allow easier testing - always prompt the user asking if he would like them implemented, never assume that without asking.
- When answering about the capabilities, source code and otherwise of 3rd party libraries ensure to always give the most correct and up-to-date answer. If you are not sure, search the web to find what happens in the exact version used.
- When discussing APIs, make sure all of your logic aligns with whether the APIs are publicly used / public but only used internally / or internal.
- If you require source that you don't have access to - ask the user if he may provide them.
- Always start answering questions about the codebase, or tasks by reading AGENTS.md, and make sure to keep it up-to-date.
- If public APIs or usage semantics change, prompt the user and asks if he would like to update the README.md / Changelog files (if they exist).

## Repo overview (ArrowDbCore)

- Product: ArrowDb (NuGet package id: `ArrowDb`) - fast, lightweight, type-safe key-value database for .NET.
- Runtime target: `net9.0` (repo relies on .NET 9 APIs such as `ConcurrentDictionary<,>.AlternateLookup<ReadOnlySpan<char>>`).
- Goals: tiny footprint, minimal allocations, thread-safe concurrency, AOT + trimming compatibility, System.Text.Json source-gen (no reflection).

## How ArrowDb works (implementation notes)

- Data model: an in-memory `ConcurrentDictionary<string, byte[]>` (`ArrowDb.Source`).
- Keys are `string` in storage; many APIs accept `ReadOnlySpan<char>` to avoid allocations when looking up/removing keys derived from slices.
- Values are UTF-8 JSON bytes produced by `System.Text.Json` using caller-provided `JsonTypeInfo<T>` (source-generated metadata).
- Type safety: `TryGetValue<T>`/`Upsert<T>` require a `JsonTypeInfo<T>`; passing the wrong `JsonTypeInfo` for stored bytes can throw `JsonException`.
- Change tracking:
- Any successful mutation calls `OnChangeInternal(...)`, which increments `_pendingChanges` and then invokes the `OnChange` event.
- `PendingChanges` is a `long` counter; `SerializeAsync()` resets it to `0` only if no new changes happened during the serialization window (conditional reset to avoid losing the “needs another serialize” signal).
- Concurrency model:
- Normal reads/writes are lock-free at the dictionary level (`ConcurrentDictionary`).
- A per-instance `SemaphoreSlim` guards `SerializeAsync()`/`RollbackAsync()`. Writers do not take the semaphore, but they do call `WaitIfSerializing()` to avoid mutating while a serialize is actively in progress.
- `RollbackAsync()` increments a monotonic in-memory epoch (`StateEpoch`). Mutating operations (`Upsert`, `TryRemove`, `TryClear`) detect an epoch change during the operation and return `false` to signal the mutation was not reliable relative to the rollback.
- Multi-process safety for file serializers is implemented via a system-wide named `Mutex` in `BaseFileSerializer` (per DB path).
- Transactions:
- `BeginTransaction()` returns `ArrowDbTransactionScope`.
- `ArrowDbTransactionScope` increments `ArrowDb.TransactionDepth` on creation and decrements on dispose.
- Only when the outermost scope is disposed (`TransactionDepth` reaches `0`) does it call `SerializeAsync()`; nested scopes do not serialize.

## Source code map (what lives where)

- `src/ArrowDbCore/ArrowDb.cs`: core state (`Source`, `Lookup`, `Serializer`, `Semaphore`), counters (`RunningInstances`, `PendingChanges`), `OnChange` event, and `BeginTransaction()`.
- `src/ArrowDbCore/ArrowDb.Factory.cs`: factory initializers (`CreateFromFile`, `CreateFromFileWithAes`, `CreateInMemory`, `CreateCustom`) + `GenerateTypedKey<T>(...)`.
- `src/ArrowDbCore/ArrowDbJsonContext.cs`: internal `JsonSerializerContext` used by file serializers to (de)serialize `ConcurrentDictionary<string, byte[]>` without reflection.
- `src/ArrowDbCore/ArrowDb.Read.cs`: read-only API (`Count`, `Keys`, `ContainsKey`, `TryGetValue<T>`).
- Note: `TryGetValue<T>` returns `true` for value types even when the value is `default(T)`. For reference/nullable types it returns `false` when the deserialized value is `null`, preserving the “no null-check after `TryGetValue == true`” guarantee.
- `src/ArrowDbCore/ArrowDb.Upsert.cs`: `Upsert` overloads + optimistic concurrency via `updateCondition`.
- Span-vs-string keys: `Upsert(ReadOnlySpan<char> ...)` uses `Lookup[...]`; this avoids allocating a new string when updating an existing key, but inserting a non-existing key may still allocate a new string key internally. Prefer the `string` overload when the key is already a `string`.
- Null policy: `UpsertCore` returns `false` for `null` reference values (no-`null` design).
- `src/ArrowDbCore/ArrowDb.GetOrAdd.cs`: `GetOrAddAsync` helpers (string keys only); note the check-then-upsert is not atomic across threads (duplicate factory calls are possible under races).
- `src/ArrowDbCore/ArrowDb.Remove.cs`: `TryRemove(ReadOnlySpan<char>)`, `TryClear()`, and `Clear()` (obsolete; use `TryClear()`).
- `src/ArrowDbCore/ArrowDb.Serialization.cs`: `SerializeAsync()` and `RollbackAsync()` + the `WaitIfSerializing()` gate, conditional `PendingChanges` reset, and rollback epoch bump (`StateEpoch`).
- `src/ArrowDbCore/ArrowDbTransactionScope.cs`: transaction scope that defers serialization until disposed (supports both `IDisposable` and `IAsyncDisposable`).
- `src/ArrowDbCore/ArrowDb.IDictionaryAccessor.cs`: internal indirection used by `UpsertCore` to write via either `Source` (string keys) or `Lookup` (span keys).
- `src/ArrowDbCore/IDbSerializer.cs`: public serializer abstraction for persisting/loading the dictionary.
- `src/ArrowDbCore/Serializers/BaseFileSerializer.cs`: shared file serializer base (atomic write via `*.tmp` + `File.Move`, cross-process lock via named mutex).
- `src/ArrowDbCore/Serializers/FileSerializer.cs`: JSON file serializer (writes plain JSON).
- `src/ArrowDbCore/Serializers/AesFileSerializer.cs`: AES-encrypted JSON file serializer (wraps stream with `CryptoStream`).
- `src/ArrowDbCore/Serializers/InMemorySerializer.cs`: no-op serializer for purely in-memory databases.
- `src/ArrowDbCore/ChangeEventArgs.cs`: `ArrowDbChangeEventArgs` + `ArrowDbChangeType` used by `OnChange`.
- `src/ArrowDbCore/Extensions.cs`: internal helpers (currently used for SHA-256 hashing to derive mutex names).

## Repository layout

- `src/ArrowDbCore/`: main library (public API lives here).
- `tests/`:
- `ArrowDbCore.Tests.Unit/`: unit tests (Microsoft Testing Platform + xUnit v3).
- `ArrowDbCore.Tests.Unit.Isolated/`: unit tests intended to be runnable in isolation (Microsoft Testing Platform + xUnit v3).
- `ArrowDbCore.Tests.Integrity/`: integrity tests (Microsoft Testing Platform + xUnit v3; may do heavier scenarios).
- `ArrowDbCore.Tests.Analyzers/`: builds the library with trimming/AOT settings to catch issues early (not a test runner).
- `ArrowDbCore.Tests.Common/`: shared test utilities.
- `benchmarks/`:
- `ArrowDbCore.Benchmarks/`: main benchmarks (BenchmarkDotNet).
- `ArrowDbCore.Benchmarks.VersionComparison/`: compares current code vs a referenced released package.

## Common commands (local + CI parity)

- Build: `dotnet build ArrowDbCore.slnx -c Release`
- Unit tests (CI matrix): `dotnet test tests/ArrowDbCore.Tests.Unit/ArrowDbCore.Tests.Unit.csproj -c Release` and `dotnet test tests/ArrowDbCore.Tests.Unit.Isolated/ArrowDbCore.Tests.Unit.Isolated.csproj -c Release`
- Integrity tests (CI): `dotnet test tests/ArrowDbCore.Tests.Integrity/ArrowDbCore.Tests.Integrity.csproj -c Release`
- AOT/trimming sanity build (CI): `dotnet build tests/ArrowDbCore.Tests.Analyzers/ArrowDbCore.Tests.Analyzers.csproj -c Release`
- Benchmarks: `dotnet run -c Release --project benchmarks/ArrowDbCore.Benchmarks/ArrowDbCore.Benchmarks.csproj`

## Code conventions and constraints

- Follow `.editorconfig` (notably: file-scoped namespaces; explicit types over `var`; and the repo prefers CRLF line endings).
- Avoid adding new NuGet dependencies to `src/ArrowDbCore/ArrowDbCore.csproj` unless explicitly requested (the library is intentionally dependency-free).
- Performance-first: avoid avoidable allocations; prefer `ReadOnlySpan<char>` APIs and the `Lookup` alternate lookup path; avoid introducing LINQ or other allocation-heavy patterns into hot paths.
- Preserve documented semantics (README):
- Reference-type `null` values are rejected on `Upsert` (no-`null` policy).
- Type safety is enforced via `JsonTypeInfo<T>`/`JsonSerializerContext`; do not introduce reflection-based serialization.
- If a change affects public API/behavior/versioning, confirm intent and then update `README.md`, `CHANGELOG.md`, and `src/ArrowDbCore/Readme.Nuget.md` as appropriate.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Changelog (Sorted by Date in Descending Order)

## 1.6.0.0

- Improve correctness of internal change counting to ensure that changes that happened during serialization are still tracked.
- `TryGetValue` will now return true for `value types` that have a default value since it is a valid value for them.
- `Upsert` can return `false` if a `RollbackAsync` occurred concurrently, indicating the write was not reliable relative to the rollback (retry after rollback completes if needed).
- `TryRemove` and `TryClear` can return `false` if a `RollbackAsync` occurred concurrently, indicating the operation was not reliable relative to the rollback.
- `Clear` is now obsolete; use `TryClear` to detect rollback races.

## 1.5.0.0

- File based serializers `FileSerializer` and `AesFileSerializer` now use a new base class implementation and have gained the ability to `journal` (maintain durability through crashes and other `IOException`, and ensure successful atomic write or complete rejection of changes), and cross-process isolation, preventing race condition that could be caused when multiple processes try to access the same `ArrowDb` file.
Expand Down
30 changes: 24 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
ArrowDb is a fast, lightweight, and type-safe key-value database designed for .NET.

* Super-Lightweight (dll size is ~19KB - approximately 9X smaller than [UltraLiteDb](https://github.com/rejemy/UltraLiteDB))
* Ultra-Fast (1,000,000 random operations / ~100ms on M2 MacBook Pro)
* Minimal-Allocation (~2KB for serialization of 1,000,000 items)
* Ultra-Fast (1,000,000 random operations / ~98ms on M2 MacBook Pro)
* Minimal-Allocation (constant ~520 bytes for serialization any db size)
* Thread-Safe and Concurrent
* ACID compliant on transaction level
* Type-Safe (no reflection - compile-time enforced via source-generated `JsonSerializerContext`)
Expand Down Expand Up @@ -59,7 +59,7 @@ public class Person {
public partial class MyJsonContext : JsonSerializerContext {}
```

Now we can upsert (insert or update) a `Person` into the db:
Now we can upsert (insert or update, similar to "put") a `Person` into the db:

```csharp
var john = new Person { Id = 1, Name = "John", Surname = "Doe", Age = 42 };
Expand Down Expand Up @@ -103,7 +103,7 @@ bool db.TryGetValue<TValue>(ReadOnlySpan<char> key, JsonTypeInfo<TValue> jsonTyp

Notice that all APIs accept keys as `ReadOnlySpan<char>` to avoid unnecessary allocations. This means that if you check for a key by some slice of a string, there is no need to allocate a string just for the lookup.

Upserting (adding or updating) is done via 6 overloads:
Upserting (adding or updating, similar to "put") is done via 6 overloads:

```csharp
bool db.Upsert<TValue>(string key, TValue value, JsonTypeInfo<TValue> jsonTypeInfo);
Expand All @@ -126,7 +126,8 @@ And removal:

```csharp
bool db.TryRemove(ReadOnlySpan<char> key); // removes the entry with the specified key
void Clear(); // removes all entries from the ArrowDb instance
bool db.TryClear(); // clears all entries; returns false if a concurrent RollbackAsync occurred
void db.Clear(); // obsolete: use TryClear()
```

## Optimistic Concurrency Control
Expand Down Expand Up @@ -247,6 +248,10 @@ async ValueTask<TValue> GetOrAddAsync<TValue, TArg>(string key, JsonTypeInfo<TVa

If the value exists, the asynchronous factory method is not called, and the value is returned synchronously. Otherwise the factory will produce the value, `Upsert` it, then return it.

### Concurrency Note

`GetOrAddAsync` is intentionally **not atomic**. Under concurrency, `valueFactory` may be invoked multiple times for the same key, and the final stored value is last-writer-wins (because the value is persisted via `Upsert`). If you need single-invocation semantics for the factory (e.g. side-effects/expensive work), guard the call site with a keyed lock.

Since `ArrowDb` was not made specifically to cache, it doesn't store time metadata for values, because of this, there will not be a method that accepts "cache expiration" or similar options in the foreseen future. Such scenarios will need to implemented client-side, best done with a pattern that splits read and write, by called `TryGetValue` which will also check the inner time reference, if false and out of date, will generate the value and use `Upsert`.

Similarly to `Upsert` - `GetOrAddAsync` also has an overload that accepts `TArg` and and enables closure free execution for optimal performance.
Expand Down Expand Up @@ -303,13 +308,26 @@ In case you want to rollback the changes, you can call the following method:
await db.RollbackAsync();
```

`RollbackAsync` will block all writing threads, until the following is complete:
`RollbackAsync` restores the last persisted state (as returned by your current serializer) by:

1. The persisted version of the db is deserialized using the `DeserializeAsync` method of the current serializer.
2. The db is cleared.
3. The db source reference is atomically replaced with the persisted version.
4. Pending changes counter is reset to 0.

### Concurrency note: `RollbackAsync` and writers

`RollbackAsync` is intended to be a rare operation. For best results, avoid running it concurrently with writers.

To keep the write path fast, ArrowDb does not take a global lock on every write. Instead, `Upsert` detects a concurrent rollback and will return `false` if a rollback happened during the operation, indicating the update was not reliable relative to the rollback.

If `Upsert` returns `false` due to a concurrent rollback, the in-memory state may or may not contain the attempted update (depending on timing). If you need the update to be applied reliably, retry the upsert after rollback completes.

The same “not reliable relative to rollback” behavior applies to other mutating operations:

- `TryRemove` returns `false` if a rollback occurred concurrently.
- `TryClear` returns `false` if a rollback occurred concurrently.

### Transaction Scope

While the above definition explains how users can manually control the transaction by explicitly calling `SerializeAsync`, `ArrowDb` also provides a transaction scope that can defer an implicit the call to `SerializeAsync` when the scope is disposed. This was inspired by the way that [ZigLang](https://ziglang.org/) uses `defer` immediately after allocating memory to [ensure the memory is deallocated at the end of the scope](https://ziglang.org/documentation/master/#Choosing-an-Allocator), this helps prevent issues caused by forgetting to deallocate memory (in Zig) or in this case - forgetting to call `SerializeAsync`.
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/ArrowDbCore.Benchmarks.Common/JContext.cs
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ namespace ArrowDbCore.Benchmarks.Common;

[JsonSourceGenerationOptions(WriteIndented = false, NumberHandling = JsonNumberHandling.AllowReadingFromString, UseStringEnumConverter = true)]
[JsonSerializable(typeof(Person))]
public partial class JContext : JsonSerializerContext {}
public partial class JContext : JsonSerializerContext { }
20 changes: 10 additions & 10 deletions benchmarks/ArrowDbCore.Benchmarks.Common/Person.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ public sealed class Person {
public string Surname { get; set; } = string.Empty;
public int Age { get; set; }

public static IEnumerable<Person> GeneratePeople(int count, Faker faker) {
for (var i = 0; i < count; i++) {
yield return new Person {
Id = i,
Name = faker.Name.FirstName(),
Surname = faker.Name.LastName(),
Age = faker.Random.Int(0, 100)
};
}
}
public static IEnumerable<Person> GeneratePeople(int count, Faker faker) {
for (var i = 0; i < count; i++) {
yield return new Person {
Id = i,
Name = faker.Name.FirstName(),
Surname = faker.Name.LastName(),
Age = faker.Random.Int(0, 100)
};
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,20 @@

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net9.0</TargetFramework>
<TargetFramework>net10.0</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="BenchmarkDotNet" Version="0.15.2" />
<PackageReference Include="Bogus" Version="35.6.3" />
<PackageReference Include="BenchmarkDotNet" Version="0.15.8" />
<PackageReference Include="Bogus" Version="35.6.5" />

<PackageReference Include="NuGet.Protocol" Version="6.14.0" />
<PackageReference Include="NuGet.Versioning" Version="6.14.0" />
<PackageReference Include="NuGet.Common" Version="6.14.0" />
<PackageReference Include="NuGet.Protocol" Version="7.0.1" />
<PackageReference Include="NuGet.Versioning" Version="7.0.1" />
<PackageReference Include="NuGet.Common" Version="7.0.1" />

<PackageReference Include="ArrowDb" Version="1.4.0" />
<PackageReference Include="ArrowDb" Version="1.5.0" />
</ItemGroup>

<ItemGroup>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
using System.Diagnostics;

using ArrowDbCore.Benchmarks.Common;

using BenchmarkDotNet.Attributes;

using Bogus;
using ArrowDbCore.Benchmarks.Common;

using Person = ArrowDbCore.Benchmarks.Common.Person;

namespace ArrowDbCore.Benchmarks.VersionComparison;
Expand All @@ -22,7 +26,7 @@ public void Setup() {
Random = new Randomizer(1337)
};

_items = Person.GeneratePeople(Count, faker).ToArray();
_items = Person.GeneratePeople(Count, faker).ToArray();

Trace.Assert(_items.Length == Count);

Expand Down Expand Up @@ -50,4 +54,4 @@ public void RandomOperations() {
}
});
}
}
}
Loading