New C1Z Format + Storage Backends + Pebble #440

coderabbitai · 2025-08-22T21:28:18Z

💡 Verification agent

🧩 Analysis chain

Pebble dependency looks good; please verify latest patch and vuln status

Nice addition. Before landing, sanity-check latest patch availability and known advisories for Pebble and its CockroachDB companions.

Run:

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show available updates for key modules go list -m -u github.com/cockroachdb/pebble go list -m -u github.com/cockroachdb/errors go list -m -u github.com/cockroachdb/redact # Tidy to confirm module graph is consistent go mod tidy -v # Vulnerability scan (includes tests) go run golang.org/x/vuln/cmd/govulncheck@latest ./...

Length of output: 175

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show available updates for key CockroachDB modules, bypassing the vendor directory go list -m -u -mod=mod github.com/cockroachdb/pebble go list -m -u -mod=mod github.com/cockroachdb/errors go list -m -u -mod=mod github.com/cockroachdb/redact # Tidy module graph to confirm consistency go mod tidy -mod=mod -v # Run vulnerability scan go run -mod=mod golang.org/x/vuln/cmd/govulncheck@latest ./...

Length of output: 487

🏁 Script executed:

#!/bin/bash set -euo pipefail # Ensure modules are enabled and bypass the vendor directory export GO111MODULE=on # Check for available updates for CockroachDB modules go list -m -u github.com/cockroachdb/pebble go list -m -u github.com/cockroachdb/errors go list -m -u github.com/cockroachdb/redact # Tidy the module graph to confirm consistency go mod tidy -v # Run vulnerability scan across the repo go run golang.org/x/vuln/cmd/govulncheck@latest ./...

Length of output: 216

🏁 Script executed:

#!/bin/bash set -euo pipefail # Enable modules and bypass vendor directory for all commands export GO111MODULE=on export GOFLAGS="-mod=mod" echo "=== Checking for module updates ===" go list -m -u github.com/cockroachdb/pebble go list -m -u github.com/cockroachdb/errors go list -m -u github.com/cockroachdb/redact echo "=== Tidying module graph ===" go mod tidy -v echo "=== Running vulnerability scan ===" go run golang.org/x/vuln/cmd/govulncheck@latest ./...

Length of output: 15186

Action Required: Confirm Dependency and Go Runtime Updates

The Pebble module (v1.1.5) is already at its latest patch, but its CockroachDB companions have newer patch releases and there are several high-severity CVEs in the Go standard library affecting this codebase. Please address the following before landing:

Pebble remains at v1.1.5 (no upgrade available)

CockroachDB/errors: v1.11.3 → v1.12.0 (upgrade recommended)

CockroachDB/redact: v1.1.5 → v1.1.6 (upgrade recommended)

Standard Library CVEs detected by govulncheck (all present in go1.24.1):

GO-2025-3849 (database/sql)

GO-2025-3751 (net/http)

GO-2025-3750 (os/syscall)

GO-2025-3749 (crypto/x509)

GO-2025-3563 (net/http/internal)
These are fixed in Go 1.24.4 and 1.24.6 respectively — please upgrade your Go toolchain in CI and local environments and re-run govulncheck.

Next steps:

Bump the go.mod entries for github.com/cockroachdb/errors and github.com/cockroachdb/redact.

Ensure CI and developer setups use Go ≥ 1.24.6.

Re-run go mod tidy and go run golang.org/x/vuln/cmd/govulncheck@latest ./... to confirm no remaining advisories.

🤖 Prompt for AI Agents

In go.mod around line 16, the Pebble entry is fine but the CockroachDB companion modules need version bumps and the Go runtime must be updated: change github.com/cockroachdb/errors to v1.12.0 and github.com/cockroachdb/redact to v1.1.6 in go.mod, update CI and developer toolchain to use Go >= 1.24.6, then run go mod tidy and re-run govulncheck (golang.org/x/vuln/cmd/govulncheck@latest ./...) to verify the standard-library CVEs are resolved; if any indirect modules require adjustments, run go get to pin versions before tidying and commit the updated go.mod and go.sum.

-Original file line number
+Diff line change
@@ -0,0 +1,99 @@
+    # Requirements Document
+    ## Introduction
+    This document outlines the requirements for implementing a Pebble-based storage engine as an alternative to the current SQLite-based storage backend in the Baton SDK. The new storage engine will provide a Go-native, high-performance key-value store that eliminates cgo dependencies while maintaining full compatibility with the existing dotc1z/engine.StorageBackend interface.
+    ## Requirements
+    ### Requirement 1
+    **User Story:** As a Baton SDK developer, I want a Pebble-based storage engine implementation, so that I can leverage a Go-native, high-performance storage backend without cgo dependencies.
+    #### Acceptance Criteria
+. WHEN the Pebble storage engine is implemented THEN it SHALL implement the complete dotc1z/engine.StorageBackend interface
+. WHEN using the Pebble storage engine THEN it SHALL maintain full API compatibility with the existing SQLite implementation
+. WHEN the Pebble storage engine is used THEN it SHALL eliminate all cgo dependencies from the storage layer
+. WHEN operations are performed THEN the Pebble engine SHALL provide equivalent or better performance compared to SQLite
+    ### Requirement 2
+    **User Story:** As a connector developer, I want seamless data model compatibility, so that existing connectors work without modification when switching storage backends.
+    #### Acceptance Criteria
+. WHEN storing resource types THEN the system SHALL maintain the same external_id-based identification
+. WHEN storing resources THEN the system SHALL preserve parent-child relationships and resource_type associations
+. WHEN storing entitlements THEN the system SHALL maintain resource associations and external_id uniqueness
+. WHEN storing grants THEN the system SHALL preserve all relationship mappings (resource, principal, entitlement)
+. WHEN storing assets THEN the system SHALL maintain content_type metadata and binary data integrity
+. WHEN managing sync runs THEN the system SHALL preserve all metadata (started_at, ended_at, token, type, parent)
+    ### Requirement 3
+    **User Story:** As a system operator, I want efficient query performance, so that list operations and filtering work at scale with large datasets.
+    #### Acceptance Criteria
+. WHEN listing resources by type THEN the system SHALL use optimized key prefixes for efficient range scans
+. WHEN filtering entitlements by resource THEN the system SHALL use secondary indexes for fast lookups
+. WHEN querying grants by principal, resource, or entitlement THEN the system SHALL use appropriate secondary indexes
+. WHEN performing pagination THEN the system SHALL use key-based tokens instead of integer offsets
+. WHEN executing conditional upserts THEN the system SHALL compare discovered_at timestamps efficiently
+    ### Requirement 4
+    **User Story:** As a sync process, I want proper sync lifecycle management, so that I can start, checkpoint, and complete syncs with proper cleanup.
+    #### Acceptance Criteria
+. WHEN starting a new sync THEN the system SHALL create sync metadata and assign a unique sync_id
+. WHEN checkpointing a sync THEN the system SHALL update the sync token atomically
+. WHEN ending a sync THEN the system SHALL update ended_at timestamp and create completion indexes
+. WHEN cleaning up old syncs THEN the system SHALL preserve the latest N full syncs and remove older ones
+. WHEN performing cleanup THEN the system SHALL use efficient range deletions and trigger compaction
+    ### Requirement 5
+    **User Story:** As a data consumer, I want diff and clone operations, so that I can generate incremental changes and create portable sync snapshots.
+    #### Acceptance Criteria
+. WHEN generating a sync diff THEN the system SHALL identify records present in applied sync but not in base sync
+. WHEN cloning a sync THEN the system SHALL create a consistent snapshot of all sync data
+. WHEN viewing a specific sync THEN the system SHALL isolate reads to that sync's data only
+. WHEN performing diff operations THEN the system SHALL maintain referential integrity across all entity types
+    ### Requirement 6
+    **User Story:** As a system administrator, I want proper asset handling, so that binary assets are stored and retrieved efficiently with appropriate size limits.
+    #### Acceptance Criteria
+. WHEN storing assets THEN the system SHALL preserve content_type metadata
+. WHEN retrieving assets THEN the system SHALL return data as an io.Reader interface
+. WHEN handling large assets THEN the system SHALL define and enforce reasonable size limits
+. WHEN storing asset data THEN the system SHALL maintain data integrity and support efficient retrieval
+    ### Requirement 7
+    **User Story:** As a developer, I want comprehensive testing and compatibility verification, so that I can trust the new storage engine works correctly.
+    #### Acceptance Criteria
+. WHEN testing compatibility THEN the system SHALL provide a "tee" mode that writes to both engines and compares results
+. WHEN running tests THEN the system SHALL include property-based tests for key encoding/decoding
+. WHEN validating functionality THEN the system SHALL include cross-engine equivalence tests for all Reader APIs
+. WHEN performing stress testing THEN the system SHALL include fuzzing and metamorphic tests for random sync sequences
+    ### Requirement 8
+    **User Story:** As a system operator, I want observability and maintenance tools, so that I can monitor performance and maintain data integrity.
+    #### Acceptance Criteria
+. WHEN monitoring performance THEN the system SHALL expose metrics for write/read operations, compaction stats, and cache hit rates
+. WHEN debugging issues THEN the system SHALL provide logging for slow operations with query details
+. WHEN maintaining data integrity THEN the system SHALL provide tools to verify indexes against primary data
+. WHEN managing storage THEN the system SHALL provide manual compaction and vacuum capabilities

-Original file line number
+Diff line change
@@ -0,0 +1,126 @@
+    # Implementation Plan
+    - [x] 1. Set up core Pebble engine structure and key encoding
+      - Create `pkg/dotc1z/engine/pebble/engine.go` with PebbleEngine struct implementing StorageEngine interface
+      - Implement `pkg/dotc1z/engine/pebble/keys.go` with binary key encoding/decoding functions
+      - Create comprehensive unit tests for key encoding with proper sort order verification
+      - _Requirements: 1.1, 1.2_
+    - [x] 2. Implement value serialization with metadata envelope
+      - Create `pkg/dotc1z/engine/pebble/values.go` with ValueEnvelope protobuf and codec functions
+      - Implement serialization/deserialization for discovered_at timestamps and content_type
+      - Write unit tests for value encoding roundtrips and metadata preservation
+      - _Requirements: 1.2, 2.1, 2.6_
+    - [x] 3. Implement basic database lifecycle operations
+      - Code `NewPebbleEngine` constructor with proper Pebble database initialization
+      - Implement `Close()`, `Dirty()`, and `OutputFilepath()` methods
+      - Add database validation and error handling for connection issues
+      - Write unit tests for engine lifecycle and configuration
+      - _Requirements: 1.1, 1.3_
+    - [x] 4. Implement sync lifecycle management
+      - Create `pkg/dotc1z/engine/pebble/sync.go` with sync run management
+      - Implement `StartSync`, `StartNewSync`, `StartNewSyncV2`, `SetCurrentSync`, `CheckpointSync`, `EndSync`
+      - Add sync metadata storage using `v1|sr|{sync_id}` key pattern
+      - Write unit tests for sync state transitions and metadata persistence
+      - _Requirements: 4.1, 4.2, 4.3_
+    - [x] 5. Implement resource type storage and retrieval
+      - Code `PutResourceTypes` and `PutResourceTypesIfNewer` with batch operations
+      - Implement `ListResourceTypes` with pagination using key-based tokens
+      - Add `GetResourceType` for point lookups using primary keys
+      - Write unit tests for resource type CRUD operations and pagination
+      - _Requirements: 2.1, 3.1, 3.4_
+    - [x] 6. Implement resource storage with parent-child relationships
+      - Code `PutResources` and `PutResourcesIfNewer` with proper key structure
+      - Implement `ListResources` with optional resource type filtering
+      - Add `GetResource` for point lookups with composite keys
+      - Write unit tests for resource operations and relationship preservation
+      - _Requirements: 2.2, 3.1, 3.4_
+    - [x] 7. Implement entitlement storage with resource associations
+      - Code `PutEntitlements` and `PutEntitlementsIfNewer` operations
+      - Implement `ListEntitlements` with resource filtering support
+      - Add `GetEntitlement` for point lookups
+      - Write unit tests for entitlement operations and resource associations
+      - _Requirements: 2.3, 3.1, 3.4_
+    - [x] 8. Create secondary index management system
+      - Create `pkg/dotc1z/engine/pebble/indexes.go` with index key generation
+      - Implement index maintenance for entitlements-by-resource relationships
+      - Add index creation and cleanup during entity operations
+      - Write unit tests for index consistency and lookup performance
+      - _Requirements: 3.2, 3.3_
+    - [x] 9. Implement grant storage with multiple relationship indexes
+      - Code `PutGrants`, `PutGrantsIfNewer`, and `DeleteGrant` operations
+      - Implement secondary indexes for grants-by-resource, grants-by-principal, grants-by-entitlement
+      - Add `ListGrants` with filtering by resource, principal, or entitlement
+      - Write unit tests for grant operations and all index types
+      - _Requirements: 2.4, 3.2, 3.3, 3.4_
+    - [x] 10. Implement asset storage with binary data handling
+      - Code `PutAsset` with content type metadata preservation
+      - Implement `GetAsset` returning io.Reader interface for binary data
+      - Add proper handling of large asset sizes and memory management
+      - Write unit tests for asset storage, retrieval, and content type handling
+      - _Requirements: 2.5, 2.6, 6.1, 6.2, 6.3_
+    - [x] 11. Implement conditional upsert logic for IfNewer operations
+      - Add discovered_at timestamp comparison logic in all IfNewer methods
+      - Implement atomic read-modify-write operations using Pebble batches
+      - Ensure proper error handling and idempotency for concurrent operations
+      - Write unit tests for conditional upsert behavior and timestamp comparisons
+      - _Requirements: 3.5, 4.1_
+    - [x] 12. Implement pagination system with key-based tokens
+      - Create `pkg/dotc1z/engine/pebble/pagination.go` with token encoding/decoding
+      - Replace integer-based pagination with key-based continuation tokens
+      - Implement stable pagination across all List operations
+      - Write unit tests for pagination consistency and boundary conditions
+      - Keep the pagination system extremely simple and concise, think MVP
+      - _Requirements: 3.4_
+    - [x] 13. Implement sync cleanup and maintenance operations
+      - Code `Cleanup()` method with efficient range deletions for old syncs
+      - Implement preservation logic for latest N full syncs
+      - Add manual compaction triggers after large deletions
+      - Write unit tests for cleanup operations and space reclamation
+      - _Requirements: 4.4, 4.5_
+    - [ ] 14. Implement diff and clone operations
+      - Code `GenerateSyncDiff` with set-difference logic between sync ranges
+      - Implement `CloneSync` with consistent snapshot creation
+      - Add `ViewSync` for isolating reads to specific sync data
+      - Write unit tests for diff generation accuracy and clone consistency
+      - _Requirements: 5.1, 5.2, 5.3, 5.4_
+    - [ ] 15. Implement remaining StorageEngine interface methods
+      - Code `Stats()` method for resource counting across entity types
+      - Implement `ListSyncRuns` with proper ordering and pagination
+      - Add `ListGrantsForPrincipal` with principal-based filtering
+      - Write unit tests for all remaining interface methods
+      - _Requirements: 1.1, 1.2_
+    - [ ] 17. Implement performance optimizations and monitoring
+      - Add configurable batch sizes and sync policies for write operations
+      - Implement iterator pooling and proper resource cleanup
+      - Add performance metrics collection for operations and storage
+      - Write benchmark tests comparing performance against SQLite implementation
+      - _Requirements: 1.4, 8.1, 8.2_
+    - [ ] 18. Create comprehensive integration test suite
+      - Write cross-engine compatibility tests using identical test data
+      - Implement property-based tests for key ordering and pagination consistency
+      - Add stress tests for large datasets and concurrent operations
+      - Create fuzzing tests for random sync sequences and edge cases
+      - _Requirements: 7.1, 7.2, 7.3, 7.4_
+    - [ ] 19. Add observability and maintenance tooling
+      - Implement metrics collection for write/read operations and compaction stats
+      - Add slow operation logging with configurable thresholds
+      - Create integrity checking tools for index validation
+      - Write unit tests for monitoring and maintenance functionality
+      - _Requirements: 8.1, 8.2, 8.3, 8.4_

-Original file line number
+Diff line change
@@ Expand Up / @@ -13,6 +13,7 @@ require ( @@
     	github.com/aws/aws-sdk-go-v2/service/s3 v1.75.0
     	github.com/aws/aws-sdk-go-v2/service/sts v1.33.10
     	github.com/aws/smithy-go v1.22.2
+    	github.com/cockroachdb/pebble v1.1.5
     	github.com/conductorone/dpop v0.2.3
     	github.com/conductorone/dpop/integrations/dpop_grpc v0.2.3
     	github.com/conductorone/dpop/integrations/dpop_oauth2 v0.2.3
@@ Expand Down Expand Up / @@ -63,6 +64,7 @@ require ( @@
     require (
     	filippo.io/edwards25519 v1.1.0 // indirect
+    	github.com/DataDog/zstd v1.4.5 // indirect
     	github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.10 // indirect
     	github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.25 // indirect
     	github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.34 // indirect
@@ Expand All / @@ -76,30 +78,49 @@ require ( @@
     	github.com/aws/aws-sdk-go-v2/service/sso v1.24.12 // indirect
     	github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.11 // indirect
     	github.com/benbjohnson/clock v1.3.5 // indirect
+    	github.com/beorn7/perks v1.0.1 // indirect
     	github.com/cenkalti/backoff/v4 v4.3.0 // indirect
+    	github.com/cespare/xxhash/v2 v2.3.0 // indirect
+    	github.com/cockroachdb/errors v1.11.3 // indirect
+    	github.com/cockroachdb/fifo v0.0.0-20240606204812-0bbfbd93a7ce // indirect
+    	github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b // indirect
+    	github.com/cockroachdb/redact v1.1.5 // indirect
+    	github.com/cockroachdb/tokenbucket v0.0.0-20230807174530-cc333fc44b06 // indirect
     	github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
     	github.com/dolthub/maphash v0.1.0 // indirect
     	github.com/dustin/go-humanize v1.0.1 // indirect
     	github.com/fsnotify/fsnotify v1.8.0 // indirect
     	github.com/gammazero/deque v1.0.0 // indirect
+    	github.com/getsentry/sentry-go v0.27.0 // indirect
     	github.com/go-logr/logr v1.4.2 // indirect
     	github.com/go-logr/stdr v1.2.2 // indirect
     	github.com/go-ole/go-ole v1.3.0 // indirect
+    	github.com/gogo/protobuf v1.3.2 // indirect
     	github.com/golang/protobuf v1.5.4 // indirect
+    	github.com/golang/snappy v0.0.4 // indirect
     	github.com/grpc-ecosystem/grpc-gateway/v2 v2.26.1 // indirect
     	github.com/hashicorp/hcl v1.0.0 // indirect
     	github.com/inconshreveable/mousetrap v1.1.0 // indirect
     	github.com/jellydator/ttlcache/v3 v3.3.0 // indirect
+    	github.com/kr/pretty v0.3.1 // indirect
+    	github.com/kr/text v0.2.0 // indirect
     	github.com/lufia/plan9stats v0.0.0-20240909124753-873cd0166683 // indirect
     	github.com/magiconair/properties v1.8.9 // indirect
     	github.com/mattn/go-isatty v0.0.20 // indirect
     	github.com/mattn/go-sqlite3 v1.14.22 // indirect
+    	github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
     	github.com/ncruces/go-strftime v0.1.9 // indirect
     	github.com/pelletier/go-toml/v2 v2.2.3 // indirect
+    	github.com/pkg/errors v0.9.1 // indirect
     	github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
     	github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect
     	github.com/pquerna/cachecontrol v0.2.0 // indirect
+    	github.com/prometheus/client_golang v1.15.0 // indirect
+    	github.com/prometheus/client_model v0.3.0 // indirect
+    	github.com/prometheus/common v0.42.0 // indirect
+    	github.com/prometheus/procfs v0.9.0 // indirect
     	github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
+    	github.com/rogpeppe/go-internal v1.13.1 // indirect
     	github.com/sagikazarmark/locafero v0.7.0 // indirect
     	github.com/sagikazarmark/slog-shim v0.1.0 // indirect
     	github.com/shoenig/go-m1cpu v0.1.6 // indirect
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New C1Z Format + Storage Backends + Pebble #440

Uh oh!

Diff view

Diff view

Uh oh!

There are no files selected for viewing

Uh oh!

coderabbitai bot Aug 22, 2025

Uh oh!

Uh oh!

New C1Z Format + Storage Backends + Pebble #440

Are you sure you want to change the base?

Uh oh!

New C1Z Format + Storage Backends + Pebble #440

Uh oh!

Diff view

Diff view

Uh oh!

There are no files selected for viewing

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!