Poppy speedup by cvkramer · Pull Request #36 · ConductorOne/baton-file

cvkramer · 2026-01-13T22:40:08Z

Description

Connector-Level Caching (Bump github.com/conductorone/baton-sdk from 0.2.92 to 0.2.93 #1)

Eliminated redundant file loading and cache rebuilding on every pagination call
File now loaded once per connector instance instead of 9+ times per sync

Child Type Index (Bump github.com/conductorone/baton-sdk from 0.2.94 to 0.2.96 #2)

Pre-built parent→children index during cache creation
Replaced O(n) scan (50,000 comparisons/page) with O(1) map lookup

Pre-Sorted Indexes (Bump github.com/conductorone/baton-sdk from 0.2.94 to 0.2.98 #3)

Built sorted resource and entitlement indexes once during cache building
Eliminated O(n log n) sorting on every page request (20+ times per sync)

Increased Page Size

Raised from 50 to 200 items per page (centralized as defaultPageSize constant)
Impact: 4x fewer API round trips (5 calls vs 20 for 1,000 items)

Useful links:

Summary by CodeRabbit

Performance
- Faster, more efficient resource, entitlement and grant syncs via centralized in-memory caching and reduced disk reads.
- Standardized pagination sizes across list, entitlement, and grant operations for consistent performance.
Behavior
- Results returned in a stable, pre-sorted order for more predictable listings.
- Parent/child relationships and entitlement queries resolved more quickly, improving list and detail responsiveness.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

a thread-safe caching mechanism at the connector level. Changes: - Add cache fields to FileConnector for loaded data and built caches - Implement getCachedData() method with double-check locking pattern - Update fileSyncer to reference connector instead of file path - Modify List(), Entitlements(), and Grants() to use cached data Performance Impact: - File loaded once per connector instance instead of per-method call - Eliminates ~9x redundant file reads during typical sync operations - Expected 8-18x improvement for large files (10k+ rows)

1. Added child types index field in pkg/connector/models.go: - Added cachedChildTypes map[string]map[string]struct{} to FileConnector - Maps parent resource ID → set of child resource type IDs 2. Created buildChildTypesIndex() function in pkg/connector/cache.go: - Builds the index once during cache creation - Iterates through all resources and maps children to their parents - Returns a map for O(1) lookups 3. Updated getCachedData() in pkg/connector/connector.go: - Calls buildChildTypesIndex() and stores result in cache - Returns the child types index along with other caches - Signature updated to return 6 values instead of 5 4. Optimized List() method in pkg/connector/syncers.go: - Replaced O(n) loop scanning all resources with O(1) index lookup - Changed from iterating loadedData.Resources to simple map lookup: childTypesIndex[resource.Id.Resource] - Removed unused strings import Performance Impact: Before: - For each paginated resource (up to 50 per page), scanned ALL resources in the file - 50 resources × 1,000 total resources = 50,000 comparisons per page - O(n*m) complexity where n = resources per page, m = total resources After: - Direct O(1) map lookup for each resource - 50 resources × 1 lookup = 50 lookups per page - ~1000x improvement for large datasets

1. Added sorted index fields in pkg/connector/models.go: - cachedSortedResourcesByType map[string][]*v2.Resource - Pre-sorted resources grouped by type - cachedSortedEntitlementsByRes map[string][]*v2.Entitlement - Pre-sorted entitlements grouped by resource 2. Created sorting functions in pkg/connector/cache.go: - buildSortedResourcesByType() - Groups resources by type and sorts each group once - buildSortedEntitlementsByResource() - Groups entitlements by resource and sorts each group once 3. Updated getCachedData() in pkg/connector/connector.go: - Calls the sorting functions during cache building - Stores sorted indexes in the connector cache 4. Optimized List() method in pkg/connector/syncers.go: - Retrieves pre-sorted resources from cached index - Only filters by parent if needed (no type filtering or sorting) - Eliminated O(n log n) sort on every page request 5. Optimized Entitlements() method in pkg/connector/syncers.go: - Retrieves pre-sorted entitlements from cached index - No filtering or sorting needed - direct lookup - Eliminated O(n log n) sort on every page request Note: Grants() method still sorts on each request because grants are filtered dynamically by context (principal or target). Pre-building a sorted index for grants would be complex and may not provide significant benefit given their filtering requirements. Performance Impact: Before: - List(): Filter all resources + sort all matching resources on every page (20× for 1000 resources with page size 50) - Entitlements(): Filter all entitlements + sort all matching entitlements on every page - Total: ~20 × O(n log n) operations per sync After: - List(): O(1) map lookup + O(m) parent filter where m = resources of this type (no sorting) - Entitlements(): O(1) map lookup (no filtering or sorting) - Sorting happens once during cache building: O(n log n) total Estimated speedup: 10-20x faster for pagination scenarios with multiple pages.

- defaultPageSize = 200 - Centralized constant replacing hardcoded values - Documented reasoning: all data is in-memory and already pre-sorted/filtered 2. Updated all three syncer methods: - List(): Changed pageSize := 50 to pageSize := defaultPageSize - Entitlements(): Changed pageSize := 50 to pageSize := defaultPageSize - Grants(): Changed pageSize := 50 to pageSize := defaultPageSize Performance Impact: For 1,000 resources: - Before (page size 50): 20 pagination calls needed - After (page size 200): 5 pagination calls needed - Reduction: 4x fewer API round trips Combined with our previous optimizations: - Each pagination call is now 10-20x faster (no redundant file reads, no sorting) - 4x fewer pagination calls (larger page size) - Total estimated speedup: 40-80x for pagination scenarios Why Page Size 200? - All data is in-memory (no I/O concerns) - Pre-sorted and pre-filtered (minimal processing per request) - No external API rate limits - Balances memory usage with performance - Easy to adjust via the constant if needed

coderabbitai · 2026-01-13T22:40:19Z

Walkthrough

Adds a thread-safe, lazy-initialized in-memory cache to FileConnector (getCachedData, clearCache, cache fields and helper builders); refactors fileSyncer and syncer handlers to use prebuilt, sorted caches and standardized pagination instead of per-call file reads and sorts.

Changes

Cohort / File(s)	Summary
Caching Infrastructure `pkg/connector/models.go`, `pkg/connector/cache.go`, `pkg/connector/connector.go`	Added `cacheMutex` and cached fields to `FileConnector` (`cachedData`, `cachedResourceTypes`, `cachedResources`, `cachedEntitlements`, `cachedChildTypes`, `cachedSortedResourcesByType`, `cachedSortedEntitlementsByRes`). Implemented `getCachedData` with double-checked locking and helper builders: `buildChildTypesIndex`, `buildSortedResourcesByType`, `buildSortedEntitlementsByResource`.
Syncer Refactor & Pagination `pkg/connector/syncers.go`	`fileSyncer` now references `FileConnector` (not file path). `List`, `Entitlements`, and `Grants` use connector caches and pre-sorted lists; per-call sorting and disk reads removed. Introduced `defaultPageSize` (200) and `resourceTypeHasTrait` helper; grants/entitlements use cached maps.
API Surface `pkg/connector/connector.go`	New method added on `FileConnector`: `getCachedData(ctx context.Context) (...)` returning cached structures and indexes.
Other `pkg/connector/*.go` (general)	Debug logging added around cache building and totals. Duplicate helper function instances appear in the patch; no exported type signature changes besides the new method.

Sequence Diagram(s)

sequenceDiagram
    participant Sync as Sync Operation (List/Entitlements/Grants)
    participant FC as FileConnector
    participant Cache as In-Memory Cache
    participant Loader as File Loader / Builders

    Sync->>FC: getCachedData(ctx)
    alt Cache hit
        FC->>Cache: read cachedData & indexes (read lock)
        FC-->>Sync: return cached structures
    else Cache miss
        FC->>FC: acquire write lock
        FC->>Loader: LoadFileData()
        Loader-->>FC: LoadedData
        FC->>Loader: buildChildTypesIndex(LoadedData)
        Loader-->>FC: childTypesIndex
        FC->>Loader: buildSortedResourcesByType(LoadedData)
        Loader-->>FC: sortedResources
        FC->>Loader: buildSortedEntitlementsByResource(LoadedData)
        Loader-->>FC: sortedEntitlements
        FC->>Cache: store caches & indexes
        FC->>FC: release write lock
        FC-->>Sync: return cached structures
    end
    Sync->>Cache: filter & paginate using cached indexes
    Sync-->>Client: return results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through bytes and built a snug den,

Indexed children, sorted every den,
Locks held kindly, reads now swift,
Pages glide like a carrot gift,
Syncs hop faster — hooray, again! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title "Poppy speedup" is vague and generic, using non-descriptive terminology that doesn't convey meaningful information about the changeset.	Use a more descriptive title that clearly indicates the main change, such as "Add connector-level caching and pre-sorted indexes to improve pagination performance".

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

pkg/connector/connector.go (2)
40-40: Consider returning a struct instead of 5+ values.

The return signature with 5 maps plus error is complex and fragile—callers must carefully match positional returns. A dedicated cache result struct would improve readability and make future extensions easier.
♻️ Optional refactor to use a result struct
// CachedData holds all cached data for the connector
type CachedData struct {
    LoadedData     *LoadedData
    ResourceTypes  map[string]*v2.ResourceType
    Resources      map[string]*v2.Resource
    Entitlements   map[string]*v2.Entitlement
    ChildTypes     map[string]map[string]struct{}
}

func (fc *FileConnector) getCachedData(ctx context.Context) (*CachedData, error) {
    // ... implementation returns &CachedData{...}, nil
}
105-109: Good: Informative logging for cache population.

Logging the counts of cached entities provides useful visibility into what was loaded. Consider adding a zap.Int("resource_types", len(resourceTypesCache)) to also track the number of distinct resource types for completeness.
♻️ Optional: Include resource types count in log
 l.Info("Successfully cached file data",
+    zap.Int("resource_types", len(resourceTypesCache)),
     zap.Int("users", len(loadedData.Users)),
     zap.Int("resources", len(loadedData.Resources)),
     zap.Int("entitlements", len(loadedData.Entitlements)),
     zap.Int("grants", len(loadedData.Grants)))

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76a8945 and 5f97268.

📒 Files selected for processing (1)

pkg/connector/connector.go

🧰 Additional context used

🧬 Code graph analysis (1)

pkg/connector/connector.go (2)

pkg/connector/models.go (2)

FileConnector (18-31)

LoadedData (35-40)

pkg/connector/file_loader.go (1)

LoadFileData (41-53)

🔇 Additional comments (4)

pkg/connector/connector.go (4)

37-54: Double-checked locking pattern is correctly implemented.

The fast path with read lock and slow path with write lock is a well-known optimization pattern. However, there's a subtle consideration: after releasing RUnlock() on line 54 and before acquiring Lock() on line 57, another goroutine could acquire the write lock and populate the cache. The double-check on line 61 correctly handles this race.

65-71: Good: File loading with proper error context.

The file is loaded once via LoadFileData and errors are wrapped with descriptive context. This aligns with the PR objective of eliminating redundant file loading per pagination call.

117-130: LGTM: ResourceSyncers correctly delegates to cached data.

The method now uses getCachedData to obtain the resource types cache and passes the FileConnector reference to syncers. This allows syncers to access both the returned caches and the stored sorted indexes through the connector, supporting the caching optimization.

89-103: Correct the mutex protection claim—syncers use mutex locks when accessing sorted indexes.

The method correctly builds and stores sortedResourcesByType and sortedEntitlementsByResource, and syncers do call getCachedData() first to ensure cache initialization. However, the review's claim that direct field access is "safe without mutex protection" is inaccurate. Both syncer methods protect access to these fields with explicit mutex locks:

List() at syncers.go:60 uses cacheMutex.RLock() before accessing cachedSortedResourcesByType

Entitlements() at syncers.go:147 uses cacheMutex.RLock() before accessing cachedSortedEntitlementsByRes

The current implementation is correct, but the review comment's characterization of the synchronization strategy is misleading.

Likely an incorrect or invalid review comment.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

btipling · 2026-01-14T00:30:11Z

pkg/connector/connector.go

 }

+// clearCache invalidates all cached data, forcing a fresh load on the next getCachedData call.
+// This is called at the start of each sync operation to ensure fresh data between syncs.


the new session cache does this automatically, see

session.SetManyJSON(ctx, ss, newProjects, projectsNamespace)

in this PR. Each sync is a session.

https://github.com/ConductorOne/baton-jira/pull/60/changes#diff-24729dc2f69fe4b9cd5110c6c93d6242efc677e17f6055ecbf7ff11c6dc4b6d9

Ohhhhh nice

I don't see you having actually made the changes required to use the session cache, it looks like you're still storing things in memory. I'm not sure if it is available for on prem. I would hope so.

c1-charleskramer added 4 commits January 12, 2026 22:42

cvkramer requested a review from a team January 13, 2026 22:40

Add cache invalidation between syncs

76a8945

btipling reviewed Jan 14, 2026

View reviewed changes

cache invalidation unnecessary, handled by baton-sdk

5f97268

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poppy speedup#36

Poppy speedup#36
cvkramer wants to merge 6 commits intomainfrom
poppy_speedup

cvkramer commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

btipling Jan 14, 2026 •

edited

Loading

Uh oh!

cvkramer Jan 14, 2026

Uh oh!

btipling Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cvkramer commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Useful links:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

btipling Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cvkramer Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

btipling Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cvkramer commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

btipling Jan 14, 2026 •

edited

Loading