Dagr is a high-performance, type-safe binary serialization framework for Swift. It uses a schema-first, code-generation approach to automatically create all the boilerplate code needed to build, read, and manage complex data graphs. This provides the raw performance of a custom binary format with the safety and developer-friendly ergonomics of a native Swift framework.
Swift's standard Codable protocol is fantastic for working with formats like JSON, but it can be too slow or produce verbose output for performance-critical applications. On the other end, frameworks like Protocol Buffers or FlatBuffers offer high performance but can feel foreign in a Swift project, often requiring external tools and lacking support for complex object graphs.
Dagr was built to fill this gap, offering a "best of both worlds" solution with the following key advantages:
- Compact Binary Format: Serialized data is significantly smaller than its JSON or XML equivalent, saving disk space and network bandwidth.
- Fast Serialization: Bypasses the overhead of text-based parsing for maximum speed.
- Memory Optimization: The DSL allows for fine-grained performance tuning via
frozenandsparseattributes on nodes, directly impacting their binary footprint:- VTable Overhead: Regular nodes use a Variable Table (VTable) to store offsets to their fields. This adds a small overhead (1-8 bytes per field) but provides flexibility.
sparsenodes optimize this by only including entries for present fields, significantly reducing overhead for data with many optional/absent fields. - Frozen Nodes: For stable, dense data structures,
frozen: trueeliminates VTable overhead entirely. Fields are laid out in a fixed, sequential order, and optional fields are tracked via a compact bitset, resulting in the smallest possible binary footprint and fastest access.
- VTable Overhead: Regular nodes use a Variable Table (VTable) to store offsets to their fields. This adds a small overhead (1-8 bytes per field) but provides flexibility.
- Zero Boilerplate: The code generator writes all the serialization logic for you, eliminating manual, error-prone work.
- Compile-Time Safety: Define your schema in Swift and get full autocompletion and compile-time checks. Typos and type mismatches are caught by the compiler, not at runtime.
- Seamless Integration: A native Swift Package Manager plugin automates code generation as part of your normal build process.
- Full Graph Support: Dagr is explicitly designed to handle complex object graphs, including cyclical references, which cause many other frameworks to fail.
- Rich Type System: Provides first-class support for not just structs (
Node), but also enums (Enum) and tagged unions (UnionType), including arrays of unions.
Dagr supports defining default values for fields and creating pre-configured instances of nodes, known as "prefabs". These features enhance the flexibility and reusability of your data models.
The defaults feature allows you to specify a default value for a field within a Node definition. If a new instance of the node is created without a value for that field, the default value is automatically used. This is useful for making fields optional from a developer's perspective, even if they are required in the schema, and for providing sensible defaults for your data structures.
Here is an example:
Node("Person") {
"name" ++ .utf8 ++ .required ++ .string("John Doe")
"active" ++ .bool ++ .bool(true)
"gender" ++ .ref("Gender") ++ .required ++ .ref("diverse")
}In this example:
- The
namefield is a required string, but it has a default value of "John Doe". - The
activefield is a boolean with a default value oftrue. - The
genderfield is a reference to aGenderenum and defaults to thediversecase.
The prefabs feature allows you to define named, pre-configured, immutable instances of a Node. Think of them as templates or constant examples of your data structures. For each prefab you define, a static let constant is generated on the corresponding node type, making it easy to access and reuse.
This is useful for creating standard, reusable instances of your data structures. For example, you could have a User node with a guest prefab for an anonymous user, or a Color node with prefabs for red, green, and blue.
Here is an example:
Node("MyDate", frozen: true) {
"day" ++ .u8 ++ .required
"month" ++ .u8 ++ .required
"year" ++ .u16 ++ .required
} ++ [
"millennium": [
"day": .int(1),
"month": .int(1),
"year": .int(2000)
]
]In this example:
- A prefab named
millenniumis defined for theMyDatenode. - This prefab represents the date January 1, 2000.
- The generated code will include a static property
MyDate.millenniumthat you can use in your code.
Defaults and prefabs can be used together. You can use a prefab as the default value for a field that is a reference to another node.
For example, in a Person node, the date field's default value could be a reference to the millennium prefab from the MyDate node:
Node("Person") {
// ... other fields
"date" ++ .ref("MyDate") ++ .ref("millennium")
}This means that if you create a Person object without specifying a date, its date property will automatically be set to the MyDate.millennium instance.
- Automated Compatibility Validation: Dagr automatically saves a fingerprint of your schema and, on subsequent builds, validates that any changes are backward-compatible. This prevents you from accidentally shipping a breaking change.
- Forward & Backward Compatibility: The framework is designed to allow new code to read old data and old code to safely ignore fields from new data.
Dagr automatically derives the byteWidth (number of bytes reserved in the binary format) for Enum and UnionType based on their current number of cases. However, users can also explicitly set a capacity when defining these types in the DSL. This capacity directly influences the byteWidth and is crucial for future expansion:
- Breaking Change: Adding new cases that cause the
byteWidthto increase (e.g., going from 255 cases to 256, which requires moving from 1 byte to 2 bytes) is a breaking change. This also applies if you explicitly change thecapacityvalue. Older code expecting a smallerbyteWidthwill not be able to correctly read the data. - Workaround for Future Compatibility: To reserve space for future cases and avoid breaking changes, you can initially define a
capacityvalue in your DSL that is larger than currently needed. This ensures that subsequent additions of real cases (up to the reservedcapacitylimit) will not alter the binary format.
Dagr is particularly well-suited for applications where the performance, type safety, and efficient handling of complex data models are paramount, and where standard serialization solutions (like JSON/Codable) might fall short.
- Games: Saving and loading game state, managing complex in-game entities (characters, inventory, world objects) and their relationships, or bundling game assets.
- Mobile Applications with Rich Offline Data: Apps that cache large amounts of structured data locally (e.g., content management apps, productivity tools, health trackers, e-commerce catalogs).
- High-Performance Server-Side Swift Applications: Building APIs, microservices, or backend systems in Swift that require high throughput and low latency for data exchange between services or with clients.
- Embedded Systems and IoT Devices (if using Swift): Data logging, configuration storage, or inter-device communication on resource-constrained hardware.
- Applications with Complex, Interconnected Data Models: Any application where data naturally forms a graph (e.g., social networks, knowledge bases, document structures with cross-references, CAD/design software data).
Here is a complete, step-by-step guide showing how to use Dagr to create a simple, high-performance persistence layer for a todo application.
First, you need to add Dagr to your project as a Swift Package Manager dependency. In your Package.swift file, add Dagr to your app's dependencies and create a new executable target for your code generation tool that depends on DagrCodeGen.
// swift-tools-version: 5.9
import PackageDescription
let package = Package(
name: "MyTodoApp",
platforms: [.macOS(.v10_15)],
products: [
.executable(name: "MyTodoApp", targets: ["MyTodoApp"]),
.executable(name: "CodeGenerator", targets: ["CodeGenerator"])
],
dependencies: [
.package(url: "https://github.com/mzaks/dagr.git", branch: "main")
],
targets: [
.executableTarget(
name: "MyTodoApp",
dependencies: [
.product(name: "Dagr", package: "dagr")
]
),
.executableTarget(
name: "CodeGenerator",
dependencies: [
.product(name: "DagrCodeGen", package: "dagr")
]
),
.testTarget(
name: "MyTodoAppTests",
dependencies: ["MyTodoApp"]
),
]
)Next, create a main.swift file for your CodeGenerator target. This code defines the data model for our todo app and contains the logic to generate the Swift source file based on a command-line argument.
// In CodeGenerator/main.swift
import Foundation
import DagrCodeGen
func main() throws {
guard CommandLine.arguments.count > 1 else {
print("Usage: CodeGenerator <output-path>")
print("Example: swift run CodeGenerator Sources/MyTodoApp/Generated/TodoApp.swift")
return
}
let outputPath = CommandLine.arguments[1]
let outputUrl = URL(fileURLWithPath: outputPath)
print("Generating TodoApp schema at: \(outputUrl.path)")
try generate(
graph: DataGraph("TodoApp", rootType: .ref("TodoList")) {
Enum("Priority", ["low", "medium", "high"])
// A dedicated, frozen node for UUIDs for maximum efficiency.
Node("UUID", frozen: true) {
"p1" ++ .u64 ++ .required
"p2" ++ .u64 ++ .required
}
// A flexible node for our todo items.
// Not frozen, so we can add fields later.
Node("TodoItem") {
"id" ++ .ref("UUID") ++ .required
"title" ++ .utf8 ++ .required
"isCompleted" ++ .bool ++ .required
"priority" ++ .ref("Priority") ++ .required
}
Node("TodoList") {
"title" ++ .utf8 ++ .required
"items" ++ .ref("TodoItem").array
}
},
path: outputUrl
)
print("✅ Generation complete.")
}
try main()Now, run the CodeGenerator from your terminal to create the TodoApp.swift file. Make sure the output directory exists first.
# From the root of your project
mkdir -p Sources/MyTodoApp/Generated
swift run CodeGenerator Sources/MyTodoApp/Generated/This will create TodoApp.swift in the specified directory, containing all the necessary types and the high-level encode and decode API.
Finally, you can use the generated code in your main application. Add the generated TodoApp.swift file to your MyTodoApp target and use the clean, type-safe API.
To make handling UUIDs easier, we can add a small extension to the generated TodoApp.UUID class.
// In your main application code (e.g., MyTodoApp/main.swift)
import Foundation
// Make sure the generated `TodoApp.swift` file is part of your app's target.
// MARK: - UUID Conversion Helper
// This extension makes it easy to work with our custom UUID type.
extension TodoApp.UUID {
/// Initializes our custom UUID from a standard Foundation.UUID.
convenience init(from foundationUUID: Foundation.UUID) {
let (p1, p2) = withUnsafeBytes(of: foundationUUID.uuid) { ptr in
(
ptr.load(fromByteOffset: 0, as: UInt64.self),
ptr.load(fromByteOffset: 8, as: UInt64.self)
)
}
self.init(p1: p1, p2: p2)
}
/// Converts our custom UUID back to a standard Foundation.UUID.
func toFoundationUUID() -> Foundation.UUID {
var uuid: uuid_t = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
withUnsafeMutableBytes(of: &uuid) { ptr in
ptr.storeBytes(of: self.p1, toByteOffset: 0, as: UInt64.self)
ptr.storeBytes(of: self.p2, toByteOffset: 8, as: UInt64.self)
}
return Foundation.UUID(uuid: uuid)
}
}
// MARK: - Todo App Example
func runTodoAppExample() throws {
// 1. Create the `TodoItem` objects using our UUID helper.
let item1 = TodoApp.TodoItem(
id: .init(from: Foundation.UUID()), // Use the convenient initializer
title: "Implement Dagr serialization",
isCompleted: true,
priority: .high
)
let item2Id = Foundation.UUID()
let item2 = TodoApp.TodoItem(
id: .init(from: item2Id),
title: "Write documentation",
isCompleted: false,
priority: .medium
)
// 2. Create the main `TodoList`.
let myTodoList = TodoApp.TodoList(
title: "My Project Tasks",
items: [item1, item2]
)
// 3. SERIALIZE the list using the `encode` convenience method.
let serializedData = try TodoApp.encode(root: myTodoList)
// 4. DESERIALIZE the data back into a `TodoList` object.
let deserializedList = try TodoApp.decode(data: serializedData)
// 5. Verify the data, converting the custom UUID back for comparison.
assert(deserializedList.title == "My Project Tasks")
assert(deserializedList.items.count == 2)
assert(deserializedList.items[1].title == "Write documentation")
let deserializedId = deserializedList.items[1].id.toFoundationUUID()
assert(deserializedId == item2Id)
print("✅ Todo app data serialized and deserialized successfully!")
}
try runTodoAppExample()To make the process even smoother, you can create an SPM Command Plugin to run the generator. This allows you to simply run swift package generate-code instead of typing the full command.
1. Update Package.swift
First, define the plugin product and target in your Package.swift.
// swift-tools-version: 5.9
import PackageDescription
let package = Package(
name: "MyTodoApp",
platforms: [.macOS(.v10_15)],
products: [
.executable(name: "MyTodoApp", targets: ["MyTodoApp"]),
.executable(name: "CodeGenerator", targets: ["CodeGenerator"]),
// Add the plugin product
.plugin(name: "CodeGenPlugin", targets: ["CodeGenPlugin"]),
],
dependencies: [
.package(url: "https://github.com/mzaks/dagr.git", branch: "main")
],
targets: [
.executableTarget(
name: "MyTodoApp",
dependencies: [
.product(name: "Dagr", package: "dagr")
]
),
.executableTarget(
name: "CodeGenerator",
dependencies: [
.product(name: "DagrCodeGen", package: "dagr")
]
),
// Add the plugin target
.plugin(
name: "CodeGenPlugin",
capability: .command(
intent: .custom(
verb: "generate-code",
description: "Generates the Dagr models for the TodoApp"
),
permissions: [
.writeToPackageDirectory(reason: "To generate TodoApp.swift")
]
),
dependencies: [
.target(name: "CodeGenerator")
]
),
.testTarget(
name: "MyTodoAppTests",
dependencies: ["MyTodoApp"]
),
]
)2. Write the Plugin Code
Create a Plugins/CodeGenPlugin directory and add the following Plugin.swift file. This plugin will find the CodeGenerator executable and run it with the correct output path.
// In Plugins/CodeGenPlugin/Plugin.swift
import PackagePlugin
import Foundation
@main
struct CodeGenPlugin: CommandPlugin {
func performCommand(context: PluginContext, arguments: [String]) async throws {
// 1. Get the CodeGenerator tool
let tool = try context.tool(named: "CodeGenerator")
// 2. Define the output path for the generated file
let outputUrl = context.package.directoryURL
.appending(components: "Sources", "MyTodoApp", "Generated")
// 3. Create the directory if it doesn't exist
try FileManager.default.createDirectory(
at: outputUrl,
withIntermediateDirectories: true
)
// 4. Run the generator process
let proc = Process()
proc.executableURL = tool.url
proc.arguments = [outputUrl.path(percentEncoded: false)] + arguments
try proc.run()
proc.waitUntilExit()
// 5. Check for errors
if proc.terminationReason != .exit || proc.terminationStatus != 0 {
let problem = "\(proc.terminationReason):\(proc.terminationStatus)"
Diagnostics.error("Code generation failed: \(problem)")
} else {
print("✅ Plugin finished generating code.")
}
}
}3. Run the Plugin
Now you can generate your code with a simple, memorable command:
swift package generate-codeThis command will automatically find and run your CodeGenerator, which in turn creates or updates the TodoApp.swift file, fully automating your workflow.
The Dagr binary format is a custom, compact, and efficient serialization format designed for Swift object graphs. It's built from the "end backwards" by the DataBuilder and read from the "beginning forwards" by the DataReader.
The format is essentially a contiguous block of bytes. Objects and data are laid out sequentially. References between objects are handled using offsets, which can be absolute or relative.
Dagr uses a custom LEB128 (Little Endian Base 128) encoding for lengths, counts, and offsets.
- Encoding: Each byte uses 7 bits for data and the most significant bit (MSB) as a continuation flag. If the MSB is
1, more bytes follow. If0, it's the last byte. - Signed LEB (ZigZag): For signed integers (like relative offsets), a ZigZag encoding is applied before LEB128 to map negative and positive numbers efficiently to unsigned values.
- Functions:
storeAsLEB,readAndSeekLEB,readAndSeekSignedLEB.
- Numeric Types (
UInt8,Int32,Float64, etc.):- Stored directly as their raw byte representation.
- Encoding:
store(number: T) - Decoding:
readNumeric<T>()
- Booleans (
Bool):- Stored as a single
UInt8(0 forfalse, 1 fortrue). - Encoding:
store(number: UInt8) - Decoding:
readBool()
- Stored as a single
- Strings (
String):- Stored as UTF8 bytes.
- Format:
[LEB128_Length] [UTF8_Bytes] - Encoding:
store(string: String) - Decoding:
readAndSeekSting()
- Data (
Data):- Stored as raw bytes.
- Format:
[LEB128_Length] [Raw_Bytes] - Encoding:
store(data: Data) - Decoding:
readAndSeekData()
Arrays are stored with their elements laid out sequentially, often preceded by a length. Arrays with optional elements include a bitmask.
- General Array Format:
[LEB128_Count] [Element_1] [Element_2] ... [Element_N] - Arrays with Optionals:
[LEB128_Count] [Element_1] ... [Element_N] [Bitmask]- The
Bitmaskis a bit-packed array of booleans indicating if each element is present (true) ornil(false).
- The
- Bit-Packed Arrays (Booleans, 1-bit, 2-bit, 4-bit):
- Booleans are packed 8 per byte.
- 1-bit, 2-bit, 4-bit numbers are packed into bytes to save space.
- Encoding:
store(bools:),store(oneBitArray:),store(twoBitArray:),store(fourBitArray:) - Decoding:
readAndSeekBoolArray(),readAndSeekSingleBitArray(),readAndSeekTwoBitArray(),readAndSeekFourBitArray()
- Arrays of References (Strings, Data, Nodes, Unions):
- Instead of storing the actual data inline, these arrays store relative offsets to where the actual data is located elsewhere in the buffer.
- Format:
[LEB128_EncodedLengthAndWidthCode] [Relative_Offset_1] ... [Relative_Offset_N] - The
EncodedLengthAndWidthCodecombines the array count and a "width code" (0-3) indicating the byte width of each relative offset (1, 2, 4, or 8 bytes). - Encoding:
store(strings:),store(datas:),store(structNodes:),store(unionTypes:) - Decoding:
readAndSeekStringArray(),readAndSeekDataArray(),readAndSeekStructArray(),readAndSeekUnionTypeArray()
Nodes are complex objects with fields. Their layout depends on whether they are frozen or sparse.
- General Node Format:
[VTable/SparseVTable] [Field_Data_1] [Field_Data_2] ... - VTable (Virtual Table): An array of offsets to the actual field data within the node's block. This allows for flexible field ordering and optional fields.
frozenNodes: Optimized for fixed layouts. Their fields are likely laid out contiguously, potentially without a full vTable, or with a very compact one.sparseNodes: Optimized for nodes with many optional/absent fields. Their vTable only stores entries for fields that are actually present, along with their index.- References within Nodes: Fields that are references to other
Nodeinstances are stored as offsets.- Forward References/Cycles: Dagr uses a "late binding" mechanism (
nodesForLateBinding) during serialization to back-patch pointers for forward references or cycles. During deserialization, a cache (structCache) is used to reconstruct cycles.
- Forward References/Cycles: Dagr uses a "late binding" mechanism (
- Encoding:
store(structNode: Node)(which callsnode.apply(builder:)) - Decoding:
getStructNode<T: Node>(from offset: UInt64)(which callsT.with(reader:offset:))
Enums are stored as their underlying numeric value.
- Format:
[Numeric_Value] - The size of the numeric value (1, 2, 4, or 8 bytes) depends on the enum's
byteWidth(determined by its number of cases). - Encoding:
store(enum: T) - Decoding:
readAndSeekEnum<T: EnumNode>()
Unions are stored as "tagged unions," meaning they include a tag (type ID) to indicate which variant they represent.
- Format:
[LEB128_TypeID] [Payload_Data] - Type ID: A compact integer that identifies the specific case of the union.
- Payload Data: The serialized data of the associated value for that union case.
- Encoding:
store(unionType: AppliedUnionType) - Decoding:
readAndSeekUnionTypeArray<T: UnionNode>()(for arrays of unions, individual unions are handled byT.from(typeId:value:reader:offset:))
- Endianness: Dagr's numeric serialization (
store(number: T)) directly copies the raw bytes of Swift's native numeric types. This implies that the format uses the native endianness of the system on which the data was serialized (typically Little-Endian on modern Intel and ARM architectures).- Implication: For cross-platform compatibility (e.g., reading data serialized on a big-endian system on a little-endian system), explicit endianness conversion would be required, or the format would need to standardize on a fixed endianness. Currently, it relies on the host system's endianness.
- Alignment: The
UnsafeMutableRawPointer.allocatecall specifies analignment: 1, meaning no specific byte alignment is enforced for the overall buffer. Individual numeric types are copied byte-by-byte, so their internal alignment is handled by Swift'sMemoryLayout.
The DataReader is designed to be robust against malformed or truncated data, throwing specific errors rather than crashing.
ReaderError.badOffset: Thrown when an attempt is made to read or seek beyond the bounds of the providedDatabuffer. This prevents out-of-bounds reads and crashes.ReaderError.unfinishedLEB: Thrown if an LEB128 sequence is encountered that doesn't terminate (i.e., the MSB is always1but the data ends).ReaderError.unfittingString,unfittingData,unfittingNumericType, etc.: Thrown when a length or offset indicates data that would extend beyond the buffer, or if the data cannot be interpreted as the expected type.BuilderError.wentOverMaxSize: Thrown by theDataBuilderif the serialized data exceeds the predefinedmaxSize(defaulting toInt32.max), preventing excessive memory allocation.
The Dagr binary format is built upon several core principles:
- Compactness: Achieved through variable-length encoding (LEB128), bit-packing for booleans and small integers, and storing references/offsets instead of duplicating data.
- Efficiency: Designed for fast serialization and deserialization by minimizing parsing overhead and directly manipulating memory.
- Type Safety: The format is strictly typed, and the generated Swift code ensures that data is read and written according to the defined schema, preventing type mismatches.
- Graph Support: Explicitly handles complex object graphs, including cyclical references, by tracking object identities and using a combination of forward and backward pointers.
- Schema Evolution: The format supports backward and forward compatibility through its design (e.g., allowing new fields to be added and old fields to be skipped) and is enforced by the code generator's compatibility validation.
- Swift-Native: The design aligns with Swift's memory model and type system, making it feel natural for Swift developers.
The format's design, particularly its use of LEB128 for lengths and offsets, and its vTable/sparse vTable mechanisms for nodes, provides inherent extensibility.
- Adding New Fields: New fields can be added to existing nodes without breaking compatibility with older readers (as older readers will simply ignore unknown fields).
- Adding New Types: New
Node,Enum, orUnionTypedefinitions can be introduced. - Version Information: While not explicitly seen in the binary format itself, the
GenerationProtocolJSON files (which store the schema and a timestamp) provide a mechanism for tracking schema versions outside the binary data, which is crucial for managing complex evolution scenarios.
This project is licensed under the MIT License - see the LICENSE file for details.
