-
Notifications
You must be signed in to change notification settings - Fork 5
New C1Z Format + Storage Backends + Pebble #440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
morgabra
wants to merge
1
commit into
main
Choose a base branch
from
morgabra/pebble
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # Requirements Document | ||
|
|
||
| ## Introduction | ||
|
|
||
| This document outlines the requirements for implementing a Pebble-based storage engine as an alternative to the current SQLite-based storage backend in the Baton SDK. The new storage engine will provide a Go-native, high-performance key-value store that eliminates cgo dependencies while maintaining full compatibility with the existing dotc1z/engine.StorageBackend interface. | ||
|
|
||
| ## Requirements | ||
|
|
||
| ### Requirement 1 | ||
|
|
||
| **User Story:** As a Baton SDK developer, I want a Pebble-based storage engine implementation, so that I can leverage a Go-native, high-performance storage backend without cgo dependencies. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN the Pebble storage engine is implemented THEN it SHALL implement the complete dotc1z/engine.StorageBackend interface | ||
| 2. WHEN using the Pebble storage engine THEN it SHALL maintain full API compatibility with the existing SQLite implementation | ||
| 3. WHEN the Pebble storage engine is used THEN it SHALL eliminate all cgo dependencies from the storage layer | ||
| 4. WHEN operations are performed THEN the Pebble engine SHALL provide equivalent or better performance compared to SQLite | ||
|
|
||
| ### Requirement 2 | ||
|
|
||
| **User Story:** As a connector developer, I want seamless data model compatibility, so that existing connectors work without modification when switching storage backends. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN storing resource types THEN the system SHALL maintain the same external_id-based identification | ||
| 2. WHEN storing resources THEN the system SHALL preserve parent-child relationships and resource_type associations | ||
| 3. WHEN storing entitlements THEN the system SHALL maintain resource associations and external_id uniqueness | ||
| 4. WHEN storing grants THEN the system SHALL preserve all relationship mappings (resource, principal, entitlement) | ||
| 5. WHEN storing assets THEN the system SHALL maintain content_type metadata and binary data integrity | ||
| 6. WHEN managing sync runs THEN the system SHALL preserve all metadata (started_at, ended_at, token, type, parent) | ||
|
|
||
| ### Requirement 3 | ||
|
|
||
| **User Story:** As a system operator, I want efficient query performance, so that list operations and filtering work at scale with large datasets. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN listing resources by type THEN the system SHALL use optimized key prefixes for efficient range scans | ||
| 2. WHEN filtering entitlements by resource THEN the system SHALL use secondary indexes for fast lookups | ||
| 3. WHEN querying grants by principal, resource, or entitlement THEN the system SHALL use appropriate secondary indexes | ||
| 4. WHEN performing pagination THEN the system SHALL use key-based tokens instead of integer offsets | ||
| 5. WHEN executing conditional upserts THEN the system SHALL compare discovered_at timestamps efficiently | ||
|
|
||
| ### Requirement 4 | ||
|
|
||
| **User Story:** As a sync process, I want proper sync lifecycle management, so that I can start, checkpoint, and complete syncs with proper cleanup. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN starting a new sync THEN the system SHALL create sync metadata and assign a unique sync_id | ||
| 2. WHEN checkpointing a sync THEN the system SHALL update the sync token atomically | ||
| 3. WHEN ending a sync THEN the system SHALL update ended_at timestamp and create completion indexes | ||
| 4. WHEN cleaning up old syncs THEN the system SHALL preserve the latest N full syncs and remove older ones | ||
| 5. WHEN performing cleanup THEN the system SHALL use efficient range deletions and trigger compaction | ||
|
|
||
| ### Requirement 5 | ||
|
|
||
| **User Story:** As a data consumer, I want diff and clone operations, so that I can generate incremental changes and create portable sync snapshots. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN generating a sync diff THEN the system SHALL identify records present in applied sync but not in base sync | ||
| 2. WHEN cloning a sync THEN the system SHALL create a consistent snapshot of all sync data | ||
| 3. WHEN viewing a specific sync THEN the system SHALL isolate reads to that sync's data only | ||
| 4. WHEN performing diff operations THEN the system SHALL maintain referential integrity across all entity types | ||
|
|
||
| ### Requirement 6 | ||
|
|
||
| **User Story:** As a system administrator, I want proper asset handling, so that binary assets are stored and retrieved efficiently with appropriate size limits. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN storing assets THEN the system SHALL preserve content_type metadata | ||
| 2. WHEN retrieving assets THEN the system SHALL return data as an io.Reader interface | ||
| 3. WHEN handling large assets THEN the system SHALL define and enforce reasonable size limits | ||
| 4. WHEN storing asset data THEN the system SHALL maintain data integrity and support efficient retrieval | ||
|
|
||
| ### Requirement 7 | ||
|
|
||
| **User Story:** As a developer, I want comprehensive testing and compatibility verification, so that I can trust the new storage engine works correctly. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN testing compatibility THEN the system SHALL provide a "tee" mode that writes to both engines and compares results | ||
| 2. WHEN running tests THEN the system SHALL include property-based tests for key encoding/decoding | ||
| 3. WHEN validating functionality THEN the system SHALL include cross-engine equivalence tests for all Reader APIs | ||
| 4. WHEN performing stress testing THEN the system SHALL include fuzzing and metamorphic tests for random sync sequences | ||
|
|
||
| ### Requirement 8 | ||
|
|
||
| **User Story:** As a system operator, I want observability and maintenance tools, so that I can monitor performance and maintain data integrity. | ||
|
|
||
| #### Acceptance Criteria | ||
|
|
||
| 1. WHEN monitoring performance THEN the system SHALL expose metrics for write/read operations, compaction stats, and cache hit rates | ||
| 2. WHEN debugging issues THEN the system SHALL provide logging for slow operations with query details | ||
| 3. WHEN maintaining data integrity THEN the system SHALL provide tools to verify indexes against primary data | ||
| 4. WHEN managing storage THEN the system SHALL provide manual compaction and vacuum capabilities |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,126 @@ | ||
| # Implementation Plan | ||
|
|
||
| - [x] 1. Set up core Pebble engine structure and key encoding | ||
| - Create `pkg/dotc1z/engine/pebble/engine.go` with PebbleEngine struct implementing StorageEngine interface | ||
| - Implement `pkg/dotc1z/engine/pebble/keys.go` with binary key encoding/decoding functions | ||
| - Create comprehensive unit tests for key encoding with proper sort order verification | ||
| - _Requirements: 1.1, 1.2_ | ||
|
|
||
| - [x] 2. Implement value serialization with metadata envelope | ||
| - Create `pkg/dotc1z/engine/pebble/values.go` with ValueEnvelope protobuf and codec functions | ||
| - Implement serialization/deserialization for discovered_at timestamps and content_type | ||
| - Write unit tests for value encoding roundtrips and metadata preservation | ||
| - _Requirements: 1.2, 2.1, 2.6_ | ||
|
|
||
| - [x] 3. Implement basic database lifecycle operations | ||
| - Code `NewPebbleEngine` constructor with proper Pebble database initialization | ||
| - Implement `Close()`, `Dirty()`, and `OutputFilepath()` methods | ||
| - Add database validation and error handling for connection issues | ||
| - Write unit tests for engine lifecycle and configuration | ||
| - _Requirements: 1.1, 1.3_ | ||
|
|
||
| - [x] 4. Implement sync lifecycle management | ||
| - Create `pkg/dotc1z/engine/pebble/sync.go` with sync run management | ||
| - Implement `StartSync`, `StartNewSync`, `StartNewSyncV2`, `SetCurrentSync`, `CheckpointSync`, `EndSync` | ||
| - Add sync metadata storage using `v1|sr|{sync_id}` key pattern | ||
| - Write unit tests for sync state transitions and metadata persistence | ||
| - _Requirements: 4.1, 4.2, 4.3_ | ||
|
|
||
| - [x] 5. Implement resource type storage and retrieval | ||
| - Code `PutResourceTypes` and `PutResourceTypesIfNewer` with batch operations | ||
| - Implement `ListResourceTypes` with pagination using key-based tokens | ||
| - Add `GetResourceType` for point lookups using primary keys | ||
| - Write unit tests for resource type CRUD operations and pagination | ||
| - _Requirements: 2.1, 3.1, 3.4_ | ||
|
|
||
| - [x] 6. Implement resource storage with parent-child relationships | ||
| - Code `PutResources` and `PutResourcesIfNewer` with proper key structure | ||
| - Implement `ListResources` with optional resource type filtering | ||
| - Add `GetResource` for point lookups with composite keys | ||
| - Write unit tests for resource operations and relationship preservation | ||
| - _Requirements: 2.2, 3.1, 3.4_ | ||
|
|
||
| - [x] 7. Implement entitlement storage with resource associations | ||
| - Code `PutEntitlements` and `PutEntitlementsIfNewer` operations | ||
| - Implement `ListEntitlements` with resource filtering support | ||
| - Add `GetEntitlement` for point lookups | ||
| - Write unit tests for entitlement operations and resource associations | ||
| - _Requirements: 2.3, 3.1, 3.4_ | ||
|
|
||
| - [x] 8. Create secondary index management system | ||
| - Create `pkg/dotc1z/engine/pebble/indexes.go` with index key generation | ||
| - Implement index maintenance for entitlements-by-resource relationships | ||
| - Add index creation and cleanup during entity operations | ||
| - Write unit tests for index consistency and lookup performance | ||
| - _Requirements: 3.2, 3.3_ | ||
|
|
||
| - [x] 9. Implement grant storage with multiple relationship indexes | ||
| - Code `PutGrants`, `PutGrantsIfNewer`, and `DeleteGrant` operations | ||
| - Implement secondary indexes for grants-by-resource, grants-by-principal, grants-by-entitlement | ||
| - Add `ListGrants` with filtering by resource, principal, or entitlement | ||
| - Write unit tests for grant operations and all index types | ||
| - _Requirements: 2.4, 3.2, 3.3, 3.4_ | ||
|
|
||
| - [x] 10. Implement asset storage with binary data handling | ||
| - Code `PutAsset` with content type metadata preservation | ||
| - Implement `GetAsset` returning io.Reader interface for binary data | ||
| - Add proper handling of large asset sizes and memory management | ||
| - Write unit tests for asset storage, retrieval, and content type handling | ||
| - _Requirements: 2.5, 2.6, 6.1, 6.2, 6.3_ | ||
|
|
||
| - [x] 11. Implement conditional upsert logic for IfNewer operations | ||
| - Add discovered_at timestamp comparison logic in all IfNewer methods | ||
| - Implement atomic read-modify-write operations using Pebble batches | ||
| - Ensure proper error handling and idempotency for concurrent operations | ||
| - Write unit tests for conditional upsert behavior and timestamp comparisons | ||
| - _Requirements: 3.5, 4.1_ | ||
|
|
||
| - [x] 12. Implement pagination system with key-based tokens | ||
| - Create `pkg/dotc1z/engine/pebble/pagination.go` with token encoding/decoding | ||
| - Replace integer-based pagination with key-based continuation tokens | ||
| - Implement stable pagination across all List operations | ||
| - Write unit tests for pagination consistency and boundary conditions | ||
| - Keep the pagination system extremely simple and concise, think MVP | ||
| - _Requirements: 3.4_ | ||
|
|
||
| - [x] 13. Implement sync cleanup and maintenance operations | ||
| - Code `Cleanup()` method with efficient range deletions for old syncs | ||
| - Implement preservation logic for latest N full syncs | ||
| - Add manual compaction triggers after large deletions | ||
| - Write unit tests for cleanup operations and space reclamation | ||
| - _Requirements: 4.4, 4.5_ | ||
|
|
||
| - [ ] 14. Implement diff and clone operations | ||
| - Code `GenerateSyncDiff` with set-difference logic between sync ranges | ||
| - Implement `CloneSync` with consistent snapshot creation | ||
| - Add `ViewSync` for isolating reads to specific sync data | ||
| - Write unit tests for diff generation accuracy and clone consistency | ||
| - _Requirements: 5.1, 5.2, 5.3, 5.4_ | ||
|
|
||
| - [ ] 15. Implement remaining StorageEngine interface methods | ||
| - Code `Stats()` method for resource counting across entity types | ||
| - Implement `ListSyncRuns` with proper ordering and pagination | ||
| - Add `ListGrantsForPrincipal` with principal-based filtering | ||
| - Write unit tests for all remaining interface methods | ||
| - _Requirements: 1.1, 1.2_ | ||
|
|
||
| - [ ] 17. Implement performance optimizations and monitoring | ||
| - Add configurable batch sizes and sync policies for write operations | ||
| - Implement iterator pooling and proper resource cleanup | ||
| - Add performance metrics collection for operations and storage | ||
| - Write benchmark tests comparing performance against SQLite implementation | ||
| - _Requirements: 1.4, 8.1, 8.2_ | ||
|
|
||
| - [ ] 18. Create comprehensive integration test suite | ||
| - Write cross-engine compatibility tests using identical test data | ||
| - Implement property-based tests for key ordering and pagination consistency | ||
| - Add stress tests for large datasets and concurrent operations | ||
| - Create fuzzing tests for random sync sequences and edge cases | ||
| - _Requirements: 7.1, 7.2, 7.3, 7.4_ | ||
|
|
||
| - [ ] 19. Add observability and maintenance tooling | ||
| - Implement metrics collection for write/read operations and compaction stats | ||
| - Add slow operation logging with configurable thresholds | ||
| - Create integrity checking tools for index validation | ||
| - Write unit tests for monitoring and maintenance functionality | ||
| - _Requirements: 8.1, 8.2, 8.3, 8.4_ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π‘ Verification agent
π§© Analysis chain
Pebble dependency looks good; please verify latest patch and vuln status
Nice addition. Before landing, sanity-check latest patch availability and known advisories for Pebble and its CockroachDB companions.
Run:
π Script executed:
Length of output: 175
π Script executed:
Length of output: 487
π Script executed:
Length of output: 216
π Script executed:
Length of output: 15186
Action Required: Confirm Dependency and Go Runtime Updates
The Pebble module (v1.1.5) is already at its latest patch, but its CockroachDB companions have newer patch releases and there are several high-severity CVEs in the Go standard library affecting this codebase. Please address the following before landing:
These are fixed in Go 1.24.4 and 1.24.6 respectively β please upgrade your Go toolchain in CI and local environments and re-run
govulncheck.Next steps:
go.modentries forgithub.com/cockroachdb/errorsandgithub.com/cockroachdb/redact.go mod tidyandgo run golang.org/x/vuln/cmd/govulncheck@latest ./...to confirm no remaining advisories.π€ Prompt for AI Agents