feat: Add maintenance RPC endpoints for coordinated cache flushing#574
feat: Add maintenance RPC endpoints for coordinated cache flushing#574AnkushinDaniil wants to merge 4 commits intomainfrom
Conversation
61e0d72 to
60c8892
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds maintenance RPC endpoints for coordinated cache flushing in Arbitrum nodes. The implementation introduces a time-based tracking mechanism to monitor accumulated block processing time and provides RPC endpoints for external coordination of trie cache flush operations to prevent simultaneous flushes across multiple nodes.
Changes:
- Added three new RPC endpoints:
TriggerMaintenance(),ShouldTriggerMaintenance(), andMaintenanceStatus() - Implemented
ProcessingTimeTrackerto track accumulated block processing time with randomized offset support - Added configuration options for controlling maintenance timing thresholds and intervals
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Nethermind.Arbitrum/Modules/ArbitrumRpcModule.cs | Implements the three new maintenance RPC endpoints with background task execution |
| src/Nethermind.Arbitrum/Modules/IArbitrumRpcModule.cs | Defines the RPC method signatures for maintenance endpoints |
| src/Nethermind.Arbitrum/Execution/ProcessingTimeTracker.cs | Tracks accumulated processing time and calculates time before flush with random offset |
| src/Nethermind.Arbitrum/Execution/ArbitrumInitializeProcessingTimeTracker.cs | Subscribes to blockchain processor events to track processing time |
| src/Nethermind.Arbitrum/Data/MaintenanceStatus.cs | Simple DTO for maintenance status response |
| src/Nethermind.Arbitrum/Config/IArbitrumConfig.cs | Adds configuration properties for maintenance thresholds and intervals |
| src/Nethermind.Arbitrum/Config/ArbitrumConfig.cs | Implements default configuration values |
| src/Nethermind.Arbitrum/ArbitrumPlugin.cs | Registers ProcessingTimeTracker as singleton and adds initialization step |
| src/Nethermind.Arbitrum/ArbitrumRpcModuleFactory.cs | Updates factory to inject new dependencies |
| src/Nethermind.Arbitrum/Modules/ArbitrumRpcModuleWithComparison.cs | Updates constructor signature for new dependencies |
| src/Nethermind.Arbitrum/Properties/scripts/generate-system-test-config.sh | Adds Arbitrum configuration section to test config |
| src/Nethermind.Arbitrum.Test/Rpc/ArbitrumRpcModuleMaintenanceTests.cs | Comprehensive test coverage for maintenance endpoints including edge cases |
| src/Nethermind.Arbitrum.Test/Execution/ProcessingTimeTrackerTests.cs | Unit tests for time tracking logic |
| src/Nethermind.Arbitrum.Test/Execution/ArbitrumInitializeProcessingTimeTrackerTests.cs | Tests for event subscription and time accumulation |
| src/Nethermind.Arbitrum.Test/Infrastructure/ThrowingWorldStateManagerDecorator.cs | Test decorator for simulating FlushCache failures |
| src/Nethermind.Arbitrum.Test/Infrastructure/ArbitrumRpcTestBlockchain.cs | Updates test infrastructure to support new dependencies |
| src/Nethermind.Arbitrum.Test/Rpc/ArbitrumRpcModuleTests.cs | Updates existing tests with new constructor parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/Nethermind.Arbitrum/Properties/scripts/generate-system-test-config.sh
Show resolved
Hide resolved
src/Nethermind.Arbitrum/Execution/ArbitrumInitializeProcessingTimeTracker.cs
Show resolved
Hide resolved
src/Nethermind.Arbitrum.Test/Execution/ArbitrumInitializeProcessingTimeTrackerTests.cs
Show resolved
Hide resolved
src/Nethermind.Arbitrum.Test/Execution/ArbitrumInitializeProcessingTimeTrackerTests.cs
Show resolved
Hide resolved
| if (Logger.IsInfo) | ||
| Logger.Info("Starting trie flush maintenance"); | ||
|
|
||
| worldStateManager.FlushCache(CancellationToken.None); |
There was a problem hiding this comment.
Hmm, I really don't understand why do we need to flush caches... I'd ask some knowledgeable person (Damian, Ashraf, Ben...) to verify we need any maintenance at all.
There was a problem hiding this comment.
To me it looks like manually overriding pruning behaviour. From the looks of what is done, I would say it's the correct thing, although maybe I do not understand the full picture. Is the overall goal here to ensure that we don't get any mismatch between DB state on execution and consensus side? Wondering if this can cause any issues with pruning? cc: @asdacap
05b571c to
2013718
Compare
Fixes #565
Implements Maintenance - coordinated state flush: Nitro and Nethermind coordinate trie persistence without disrupting block production.
sequenceDiagram participant Nitro as Nitro (Consensus) participant RPC as ArbitrumRpcModule participant Tracker as ProcessingTimeTracker participant Semaphore as CreateBlocksSemaphore participant WSM as WorldStateManager Note over Nitro,WSM: Normal Block Production loop Every Block Nitro->>RPC: DigestMessage(block) RPC->>Semaphore: WaitAsync(0) Semaphore-->>RPC: acquired RPC->>RPC: Process block RPC->>Semaphore: Release() RPC-->>Nitro: MessageResult Note over Tracker: AccumulatedTime += ProcessingMs end Note over Nitro,WSM: Maintenance Check (periodic) Nitro->>RPC: ShouldTriggerMaintenance() RPC->>Tracker: TimeBeforeFlush Tracker-->>RPC: TrieTimeLimitMs - (Accumulated + RandomOffset) alt TimeBeforeFlush ≤ Threshold RPC-->>Nitro: true Note over Nitro,WSM: Maintenance Triggered Nitro->>RPC: TriggerMaintenance() RPC->>RPC: _runningMaintenance = true RPC-->>Nitro: "OK" (immediate return) Note over RPC: Background Task Started RPC->>Semaphore: Wait() [blocking] Note over Semaphore: Blocks DigestMessage RPC->>WSM: FlushCache() WSM-->>RPC: success RPC->>Tracker: Reset() Note over Tracker: AccumulatedTime = 0<br/>RandomOffset = new random RPC->>Semaphore: Release() RPC->>RPC: _runningMaintenance = false else TimeBeforeFlush > Threshold RPC-->>Nitro: false Note over Nitro: Continue normal operation end Note over Nitro,WSM: Error Handling rect rgb(80, 40, 40) Note over RPC,WSM: If FlushCache throws: Note over RPC: _runningMaintenance = false<br/>Tracker NOT reset<br/>Can retry maintenance end Note over Nitro,WSM: Follower Mode Collision rect rgb(80, 70, 30) Note over Nitro,Semaphore: If DigestMessage during maintenance: Nitro->>RPC: DigestMessage(block) RPC->>Semaphore: WaitAsync(0) Semaphore-->>RPC: not acquired RPC-->>Nitro: "CreateBlock mutex held" end