⚡ Optimize RedisNodeDiscoveryRepository.list_all_active_nodes performance#187
⚡ Optimize RedisNodeDiscoveryRepository.list_all_active_nodes performance#187Jeshua Ben Joseph (Theaxiom) wants to merge 3 commits intomainfrom
Conversation
Co-authored-by: Theaxiom <57013+Theaxiom@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Optimizes Redis node discovery by switching from loading all keys + one large MGET to incremental SCAN + per-page fetching, and adds a local benchmark to compare approaches.
Changes:
- Updated
RedisNodeDiscoveryRepository.list_all_active_nodesto iterate withscan(count=100)andmgetper scan page. - Added a
fakeredisbenchmark module comparing the original vs. optimized retrieval strategies.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| coordinator/infrastructure/persistence/inference_repositories.py | Implements incremental SCAN loop and per-page MGET for active nodes. |
| coordinator/tests/benchmark/test_redis_perf.py | Adds a benchmark script to measure performance differences with fakeredis. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| m = MockRedis(10000) | ||
|
|
||
| print(f"Original: {m.original():.4f}s") | ||
| print(f"Optimized: {m.optimized():.4f}s") |
| cursor = "0" | ||
| nodes = [] | ||
| for raw_data in raw_nodes: | ||
| if raw_data: | ||
| nodes.append(Node.from_dict(json.loads(raw_data))) | ||
| while True: | ||
| cursor, keys = self.redis.scan(cursor=cursor, match="node:*", count=100) |
| if cursor == 0 or cursor == "0" or cursor == b"0": | ||
| break |
There was a problem hiding this comment.
Addressed! The cursor is now correctly initialized to 0 instead of "0" and terminates when cursor == 0.
| cursor, keys = self.redis.scan(cursor=cursor, match="node:*", count=100) | ||
| if keys: | ||
| raw_nodes = self.redis.mget(keys) |
There was a problem hiding this comment.
Good catch! I've explicitly chunked the keys returned by the scan into bounds of SCAN_BATCH_SIZE to guarantee that the mget calls stay within bounds, as scan can indeed return more than the specified count.
| if raw_data: | ||
| nodes.append(Node.from_dict(json.loads(raw_data))) | ||
| while True: | ||
| cursor, keys = self.redis.scan(cursor=cursor, match="node:*", count=100) |
There was a problem hiding this comment.
Addressed! I've extracted the 100 into a SCAN_BATCH_SIZE constant inside the method for better clarity and discoverability.
|
Copilot apply changes based on the comments in this thread |
Co-authored-by: Theaxiom <57013+Theaxiom@users.noreply.github.com>
Agent-Logs-Url: https://github.com/100monkeys-ai/monkey-troop/sessions/288e813b-41c4-4d47-8cc5-a449a19357c2 Co-authored-by: Theaxiom <57013+Theaxiom@users.noreply.github.com>
All review feedback has been addressed across commits
|
💡 What: Replaced the
scan_iterfollowed by a single massivemgetinRedisNodeDiscoveryRepository.list_all_active_nodeswith a paginated approach usingscan(count=100)and chunkedmgetcalls.🎯 Why: The previous implementation loaded all keys into memory at once and executed a single
mgetfor all active nodes. This was inefficient, blocked the Redis server for prolonged periods during the massivemget, and caused peak memory usage spikes in Python. The optimized chunked retrieval strategy mitigates these issues without altering the method's behavior.📊 Measured Improvement:
A dedicated benchmark using
fakerediscomparingscan_iter+ singlemgetvs.scan(count=100)+ chunkedmgetover 10,000 mock nodes showed significant speedup:PR created automatically by Jules for task 8901727981925052959 started by Jeshua Ben Joseph (@Theaxiom)