Skip to content

Poppy speedup#36

Open
cvkramer wants to merge 6 commits intomainfrom
poppy_speedup
Open

Poppy speedup#36
cvkramer wants to merge 6 commits intomainfrom
poppy_speedup

Conversation

@cvkramer
Copy link

@cvkramer cvkramer commented Jan 13, 2026

Description

  1. Connector-Level Caching (Bump github.com/conductorone/baton-sdk from 0.2.92 to 0.2.93 #1)
  • Eliminated redundant file loading and cache rebuilding on every pagination call
  • File now loaded once per connector instance instead of 9+ times per sync
  1. Child Type Index (Bump github.com/conductorone/baton-sdk from 0.2.94 to 0.2.96 #2)
  • Pre-built parent→children index during cache creation
  • Replaced O(n) scan (50,000 comparisons/page) with O(1) map lookup
  1. Pre-Sorted Indexes (Bump github.com/conductorone/baton-sdk from 0.2.94 to 0.2.98 #3)
  • Built sorted resource and entitlement indexes once during cache building
  • Eliminated O(n log n) sorting on every page request (20+ times per sync)
  1. Increased Page Size
  • Raised from 50 to 200 items per page (centralized as defaultPageSize constant)
  • Impact: 4x fewer API round trips (5 calls vs 20 for 1,000 items)

Useful links:

Summary by CodeRabbit

  • Performance

    • Faster, more efficient resource, entitlement and grant syncs via centralized in-memory caching and reduced disk reads.
    • Standardized pagination sizes across list, entitlement, and grant operations for consistent performance.
  • Behavior

    • Results returned in a stable, pre-sorted order for more predictable listings.
    • Parent/child relationships and entitlement queries resolved more quickly, improving list and detail responsiveness.

✏️ Tip: You can customize this high-level summary in your review settings.

   a thread-safe caching mechanism at the connector level.

   Changes:
   - Add cache fields to FileConnector for loaded data and built caches
   - Implement getCachedData() method with double-check locking pattern
   - Update fileSyncer to reference connector instead of file path
   - Modify List(), Entitlements(), and Grants() to use cached data

   Performance Impact:
   - File loaded once per connector instance instead of per-method call
   - Eliminates ~9x redundant file reads during typical sync operations
   - Expected 8-18x improvement for large files (10k+ rows)
  1. Added child types index field in pkg/connector/models.go:
  - Added cachedChildTypes map[string]map[string]struct{} to FileConnector
  - Maps parent resource ID → set of child resource type IDs

  2. Created buildChildTypesIndex() function in pkg/connector/cache.go:
  - Builds the index once during cache creation
  - Iterates through all resources and maps children to their parents
  - Returns a map for O(1) lookups

  3. Updated getCachedData() in pkg/connector/connector.go:
  - Calls buildChildTypesIndex() and stores result in cache
  - Returns the child types index along with other caches
  - Signature updated to return 6 values instead of 5

  4. Optimized List() method in pkg/connector/syncers.go:
  - Replaced O(n) loop scanning all resources with O(1) index lookup
  - Changed from iterating loadedData.Resources to simple map lookup: childTypesIndex[resource.Id.Resource]
  - Removed unused strings import

  Performance Impact:

  Before:
  - For each paginated resource (up to 50 per page), scanned ALL resources in the file
  - 50 resources × 1,000 total resources = 50,000 comparisons per page
  - O(n*m) complexity where n = resources per page, m = total resources

  After:
  - Direct O(1) map lookup for each resource
  - 50 resources × 1 lookup = 50 lookups per page
  - ~1000x improvement for large datasets
  1. Added sorted index fields in pkg/connector/models.go:
  - cachedSortedResourcesByType map[string][]*v2.Resource - Pre-sorted resources grouped by type
  - cachedSortedEntitlementsByRes map[string][]*v2.Entitlement - Pre-sorted entitlements grouped by resource

  2. Created sorting functions in pkg/connector/cache.go:
  - buildSortedResourcesByType() - Groups resources by type and sorts each group once
  - buildSortedEntitlementsByResource() - Groups entitlements by resource and sorts each group once

  3. Updated getCachedData() in pkg/connector/connector.go:
  - Calls the sorting functions during cache building
  - Stores sorted indexes in the connector cache

  4. Optimized List() method in pkg/connector/syncers.go:
  - Retrieves pre-sorted resources from cached index
  - Only filters by parent if needed (no type filtering or sorting)
  - Eliminated O(n log n) sort on every page request

  5. Optimized Entitlements() method in pkg/connector/syncers.go:
  - Retrieves pre-sorted entitlements from cached index
  - No filtering or sorting needed - direct lookup
  - Eliminated O(n log n) sort on every page request

  Note: Grants() method still sorts on each request because grants are filtered dynamically by context (principal or target). Pre-building a sorted index for grants would be complex and may not provide significant benefit given their filtering requirements.

  Performance Impact:

  Before:
  - List(): Filter all resources + sort all matching resources on every page (20× for 1000 resources with page size 50)
  - Entitlements(): Filter all entitlements + sort all matching entitlements on every page
  - Total: ~20 × O(n log n) operations per sync

  After:
  - List(): O(1) map lookup + O(m) parent filter where m = resources of this type (no sorting)
  - Entitlements(): O(1) map lookup (no filtering or sorting)
  - Sorting happens once during cache building: O(n log n) total

  Estimated speedup: 10-20x faster for pagination scenarios with multiple pages.
  - defaultPageSize = 200 - Centralized constant replacing hardcoded values
  - Documented reasoning: all data is in-memory and already pre-sorted/filtered

  2. Updated all three syncer methods:
  - List(): Changed pageSize := 50 to pageSize := defaultPageSize
  - Entitlements(): Changed pageSize := 50 to pageSize := defaultPageSize
  - Grants(): Changed pageSize := 50 to pageSize := defaultPageSize

  Performance Impact:

  For 1,000 resources:
  - Before (page size 50): 20 pagination calls needed
  - After (page size 200): 5 pagination calls needed
  - Reduction: 4x fewer API round trips

  Combined with our previous optimizations:
  - Each pagination call is now 10-20x faster (no redundant file reads, no sorting)
  - 4x fewer pagination calls (larger page size)
  - Total estimated speedup: 40-80x for pagination scenarios

  Why Page Size 200?

  - All data is in-memory (no I/O concerns)
  - Pre-sorted and pre-filtered (minimal processing per request)
  - No external API rate limits
  - Balances memory usage with performance
  - Easy to adjust via the constant if needed
@cvkramer cvkramer requested a review from a team January 13, 2026 22:40
@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

Walkthrough

Adds a thread-safe, lazy-initialized in-memory cache to FileConnector (getCachedData, clearCache, cache fields and helper builders); refactors fileSyncer and syncer handlers to use prebuilt, sorted caches and standardized pagination instead of per-call file reads and sorts.

Changes

Cohort / File(s) Summary
Caching Infrastructure
pkg/connector/models.go, pkg/connector/cache.go, pkg/connector/connector.go
Added cacheMutex and cached fields to FileConnector (cachedData, cachedResourceTypes, cachedResources, cachedEntitlements, cachedChildTypes, cachedSortedResourcesByType, cachedSortedEntitlementsByRes). Implemented getCachedData with double-checked locking and helper builders: buildChildTypesIndex, buildSortedResourcesByType, buildSortedEntitlementsByResource.
Syncer Refactor & Pagination
pkg/connector/syncers.go
fileSyncer now references FileConnector (not file path). List, Entitlements, and Grants use connector caches and pre-sorted lists; per-call sorting and disk reads removed. Introduced defaultPageSize (200) and resourceTypeHasTrait helper; grants/entitlements use cached maps.
API Surface
pkg/connector/connector.go
New method added on FileConnector: getCachedData(ctx context.Context) (...) returning cached structures and indexes.
Other
pkg/connector/*.go (general)
Debug logging added around cache building and totals. Duplicate helper function instances appear in the patch; no exported type signature changes besides the new method.

Sequence Diagram(s)

sequenceDiagram
    participant Sync as Sync Operation (List/Entitlements/Grants)
    participant FC as FileConnector
    participant Cache as In-Memory Cache
    participant Loader as File Loader / Builders

    Sync->>FC: getCachedData(ctx)
    alt Cache hit
        FC->>Cache: read cachedData & indexes (read lock)
        FC-->>Sync: return cached structures
    else Cache miss
        FC->>FC: acquire write lock
        FC->>Loader: LoadFileData()
        Loader-->>FC: LoadedData
        FC->>Loader: buildChildTypesIndex(LoadedData)
        Loader-->>FC: childTypesIndex
        FC->>Loader: buildSortedResourcesByType(LoadedData)
        Loader-->>FC: sortedResources
        FC->>Loader: buildSortedEntitlementsByResource(LoadedData)
        Loader-->>FC: sortedEntitlements
        FC->>Cache: store caches & indexes
        FC->>FC: release write lock
        FC-->>Sync: return cached structures
    end
    Sync->>Cache: filter & paginate using cached indexes
    Sync-->>Client: return results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped through bytes and built a snug den,

Indexed children, sorted every den,
Locks held kindly, reads now swift,
Pages glide like a carrot gift,
Syncs hop faster — hooray, again! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title "Poppy speedup" is vague and generic, using non-descriptive terminology that doesn't convey meaningful information about the changeset. Use a more descriptive title that clearly indicates the main change, such as "Add connector-level caching and pre-sorted indexes to improve pagination performance".
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
pkg/connector/connector.go (2)

40-40: Consider returning a struct instead of 5+ values.

The return signature with 5 maps plus error is complex and fragile—callers must carefully match positional returns. A dedicated cache result struct would improve readability and make future extensions easier.

♻️ Optional refactor to use a result struct
// CachedData holds all cached data for the connector
type CachedData struct {
    LoadedData     *LoadedData
    ResourceTypes  map[string]*v2.ResourceType
    Resources      map[string]*v2.Resource
    Entitlements   map[string]*v2.Entitlement
    ChildTypes     map[string]map[string]struct{}
}

func (fc *FileConnector) getCachedData(ctx context.Context) (*CachedData, error) {
    // ... implementation returns &CachedData{...}, nil
}

105-109: Good: Informative logging for cache population.

Logging the counts of cached entities provides useful visibility into what was loaded. Consider adding a zap.Int("resource_types", len(resourceTypesCache)) to also track the number of distinct resource types for completeness.

♻️ Optional: Include resource types count in log
 l.Info("Successfully cached file data",
+    zap.Int("resource_types", len(resourceTypesCache)),
     zap.Int("users", len(loadedData.Users)),
     zap.Int("resources", len(loadedData.Resources)),
     zap.Int("entitlements", len(loadedData.Entitlements)),
     zap.Int("grants", len(loadedData.Grants)))

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76a8945 and 5f97268.

📒 Files selected for processing (1)
  • pkg/connector/connector.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/connector/connector.go (2)
pkg/connector/models.go (2)
  • FileConnector (18-31)
  • LoadedData (35-40)
pkg/connector/file_loader.go (1)
  • LoadFileData (41-53)
🔇 Additional comments (4)
pkg/connector/connector.go (4)

37-54: Double-checked locking pattern is correctly implemented.

The fast path with read lock and slow path with write lock is a well-known optimization pattern. However, there's a subtle consideration: after releasing RUnlock() on line 54 and before acquiring Lock() on line 57, another goroutine could acquire the write lock and populate the cache. The double-check on line 61 correctly handles this race.


65-71: Good: File loading with proper error context.

The file is loaded once via LoadFileData and errors are wrapped with descriptive context. This aligns with the PR objective of eliminating redundant file loading per pagination call.


117-130: LGTM: ResourceSyncers correctly delegates to cached data.

The method now uses getCachedData to obtain the resource types cache and passes the FileConnector reference to syncers. This allows syncers to access both the returned caches and the stored sorted indexes through the connector, supporting the caching optimization.


89-103: Correct the mutex protection claim—syncers use mutex locks when accessing sorted indexes.

The method correctly builds and stores sortedResourcesByType and sortedEntitlementsByResource, and syncers do call getCachedData() first to ensure cache initialization. However, the review's claim that direct field access is "safe without mutex protection" is inaccurate. Both syncer methods protect access to these fields with explicit mutex locks:

  • List() at syncers.go:60 uses cacheMutex.RLock() before accessing cachedSortedResourcesByType
  • Entitlements() at syncers.go:147 uses cacheMutex.RLock() before accessing cachedSortedEntitlementsByRes

The current implementation is correct, but the review comment's characterization of the synchronization strategy is misleading.

Likely an incorrect or invalid review comment.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

}

// clearCache invalidates all cached data, forcing a fresh load on the next getCachedData call.
// This is called at the start of each sync operation to ensure fresh data between syncs.
Copy link

@btipling btipling Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new session cache does this automatically, see

session.SetManyJSON(ctx, ss, newProjects, projectsNamespace)

in this PR. Each sync is a session.

https://github.com/ConductorOne/baton-jira/pull/60/changes#diff-24729dc2f69fe4b9cd5110c6c93d6242efc677e17f6055ecbf7ff11c6dc4b6d9

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhhh nice

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see you having actually made the changes required to use the session cache, it looks like you're still storing things in memory. I'm not sure if it is available for on prem. I would hope so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants