Add first version of write-ahead log#376
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a write-ahead log (WAL) for ModelarDB that ensures durability and crash recovery for ingested time series data. The WAL logs uncompressed data on a per-table basis using segmented IPC streaming files before data enters the storage engine. On startup, unpersisted batches are replayed from the WAL to prevent data loss. Batch IDs are tracked through the entire pipeline from ingestion to compression to persistence, and are recorded in Delta Lake commit metadata for checkpointing.
Changes:
- Added
WriteAheadLogandWriteAheadLogFiletypes implementing per-table segmented WAL with append, rotate, persist-tracking, and replay capabilities - Threaded batch IDs through the entire data pipeline (
IngestedDataBuffer→UncompressedDataBuffer→CompressedSegmentBatch→CompressedDataBuffer→ Delta Lake commit metadata) and replaced the previous spilled-buffer recovery with WAL-based replay - Extended
DeltaTableWriterto store batch IDs in Delta Lake commit metadata, and addedInvalidStateerror variant for WAL-specific error conditions
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
crates/modelardb_storage/src/write_ahead_log.rs |
New WAL implementation with segmented IPC files, batch tracking, and comprehensive tests |
crates/modelardb_storage/src/lib.rs |
Added WAL module and WRITE_AHEAD_LOG_FOLDER constant |
crates/modelardb_storage/src/error.rs |
Added InvalidState error variant |
crates/modelardb_storage/src/data_folder/mod.rs |
Added location() accessor, batch_ids parameter to write method, commit metadata support |
crates/modelardb_server/src/storage/mod.rs |
Integrated WAL into StorageEngine, added insert_data_points_with_batch_id |
crates/modelardb_server/src/context.rs |
WAL initialization, replay on startup, table create/drop integration |
crates/modelardb_server/src/main.rs |
Replaced spilled buffer init with WAL replay |
crates/modelardb_server/src/storage/compressed_data_manager.rs |
Marks batches as persisted in WAL after saving to disk |
crates/modelardb_server/src/storage/compressed_data_buffer.rs |
Added batch ID tracking to compressed data buffers |
crates/modelardb_server/src/storage/uncompressed_data_manager.rs |
Replaced spilled buffer recovery with deletion, batch ID propagation |
crates/modelardb_server/src/storage/uncompressed_data_buffer.rs |
Added batch ID tracking to in-memory and on-disk buffers |
crates/modelardb_server/src/storage/data_transfer.rs |
Updated call sites with empty batch IDs for remote transfers |
crates/modelardb_embedded/src/operations/data_folder.rs |
Updated call sites with empty batch IDs |
crates/modelardb_server/src/configuration.rs |
Updated test setup for WAL |
Cargo.toml / Cargo.lock |
Added serde_json and tracing dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
skejserjensen
requested changes
Mar 23, 2026
skejserjensen
approved these changes
Mar 24, 2026
chrthomsen
reviewed
Mar 27, 2026
chrthomsen
approved these changes
Mar 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements a write-ahead log (WAL) in modelardb_storage that ensures durability and crash recovery for ingested time series data. The WAL logs uncompressed data on a per-table basis before it enters the storage engine, and supports replaying unpersisted batches on startup to prevent data loss. The per-table WAL files are currently segmented based on batch count. Note that this PR includes the first version of the WAL, meaning that several future features have not been implemented yet.
The features that will be implemented in future PRs include controlling the threshold for segmentation and segmenting based on batch size, making it possible to disable the WAL, handling spilled buffers, explicitly handling data transfer and truncate, integration testing with fail-rs, and general optimizations.