Hyperscaled backup rest api call by brokespace · Pull Request #696 · taoshidev/vanta-network

brokespace · 2026-03-12T21:37:36Z

Taoshi Pull Request

Description

This adds a secondary service that runs alongside the primary hyperliquid tracker to periodically reach out to the rest api to get the recent trades on all accounts to ensure that if theres a hickup in tracking an account that it will be measured.

Hitting the rate limit on the REST api endpoint does not effect the websocket connection, therefore the two can use the same IP's.

Related Issues (JIRA)

[Reference any related issues or tasks that this pull request addresses or closes.]

Checklist

I have tested my changes on testnet.
I have updated any necessary documentation.
I have added unit tests for my changes (if applicable).
If there are breaking changes for validators, I have (or will) notify the community in Discord of the release.

Reviewer Instructions

[Provide any specific instructions or areas you would like the reviewer to focus on.]

Definition of Done

Code has been reviewed.
All checks and tests pass.
Documentation is up to date.
Approved by at least one reviewer.

Checklist (for the reviewer)

Code follows project conventions.
Code is well-documented.
Changes are necessary and align with the project's goals.
No breaking changes introduced.

Optional: Deploy Notes

[Any instructions or notes related to deployment, if applicable.]

/cc @mention_reviewer

* Simplify order sending for miners * Display deprecated message on run receive signals server. * Fix miner integration test * Update docstring

…y to be the corrent internal funcs

…ngs in the entity miner rest server

github-actions · 2026-03-12T21:38:52Z

🤖 Claude AI Code Review

Last reviewed on: 14:23:47

Summary

This PR adds a backup REST API polling mechanism to complement the primary WebSocket-based Hyperliquid tracker. The backup system periodically polls for missed fills using the REST API with exponential backoff for unhealthy proxy ports. It includes comprehensive test coverage and enhanced port health monitoring.

✅ Strengths

Excellent Test Coverage: The PR includes 788 new lines of comprehensive unit tests covering edge cases, failure scenarios, and async behavior.
Well-Structured Health Tracking: The _PortHealthRecord class uses __slots__ for memory efficiency and provides clean separation of concerns for port health state.
Graceful Degradation: Falls back to direct connection when all proxy ports are unhealthy, maintaining service availability.
Idempotent Watermarking: The backup poll system correctly prevents duplicate processing and only advances watermarks on successful polls.
Proper Resource Cleanup: The orchestrator correctly cancels the backup task on shutdown and closes sessions after use.
Backward Compatibility: The _unhealthy_ports property maintains API compatibility while transitioning to the new health record system.

⚠️ Concerns

CRITICAL

Race Condition in Orchestrator Cleanup (lines 597-602)
```
for _ in range(int(self.ADDRESS_REFRESH_INTERVAL_S)):
    if self._stop_event.is_set():
        break
    await asyncio.sleep(1.0)
```
The loop breaks but doesn't execute cleanup in the finally block immediately. If stop is signaled during this sleep loop, cleanup is delayed up to ADDRESS_REFRESH_INTERVAL_S seconds.
Missing Rate Limiting Enforcement (lines 1019-1041)
```
min_delay_s = 60.0 / ValiConfig.HL_BACKUP_POLL_RATE_BUDGET
```
While min_delay_s is calculated, there's no enforcement if processing a fill takes longer than this delay. Multiple addresses polled sequentially could exceed the 600 req/min budget.
Thread Safety Concerns
- _last_poll_time_ms dict is modified in async context without locks (lines 663, 1063)
- _backup_fills_caught and _backup_polls_total counters are not atomic
- While Python's GIL provides some protection, async tasks can interleave, potentially causing inconsistent state

Unhandled Session Leaks on Exception (line 897)

except Exception as e:
    bt.logging.error(f"[HL_TRACKER] REST error fetching account state for {hl_address}: {e}")
    self._report_rest_proxy_failure(proxy_port)
    return None
finally:
    session.close()

If an exception occurs during session creation in _make_proxied_session(), the session might not be assigned, and session.close() could fail.

MAJOR

Infinite Cooldown Possible (line 439)
```
ref_time = rec.last_probe_time or rec.unhealthy_since or 0
```
If both last_probe_time and unhealthy_since are somehow None, using 0 could cause immediate probe retries in a tight loop.
Memory Leak in Hash Tracking (line 1053)
```
self._record_hash(fill_hash)
```
_processed_hashes set grows unbounded. While fills have dedup, there's no TTL or size limit. Over time, this could consume significant memory.
Blocking I/O in Async Context (lines 989-996)
```
def _do_request():
    try:
        resp = session.post(api_url, json=payload, timeout=10)
```
Using run_in_executor(None, ...) with the default thread pool executor is acceptable, but could be problematic under high load. Consider using an async HTTP client like aiohttp.
Missing Validation on Backup Fill Processing (lines 1048-1059)
The backup poll doesn't validate fill timestamps or check if fills are too old. Could process ancient fills if watermark is corrupted.

💡 Suggestions

Code Quality

Extract Magic Numbers to Constants (line 1008)
```
await asyncio.sleep(10.0)  # Let WS connections establish first
```
Define HL_BACKUP_POLL_STARTUP_DELAY_S = 10.0 in ValiConfig.

Simplify Proxy Port Collection (lines 919-926)

all_ports = list(self._available_ports)
for shard in self._shards.values():
    if shard.healthy and shard.port is not None:
        all_ports.append(shard.port)
all_ports = sorted(set(all_ports))

Could be:

all_ports = set(self._available_ports)
all_ports.update(s.port for s in self._shards.values() 
                 if s.healthy and s.port is not None)

Add Type Hints (line 381)
```
self._port_health: Dict[int, _PortHealthRecord] = {}
```
Good! But _PortHealthRecord class itself should have return type hints on methods.
Consolidate Logging Levels (lines 1049-1055)
Mix of bt.logging.info and bt.logging.debug. Establish consistent criteria for when to use each level.
Extract Fill Hash Logic (line 1042)
```
fill_hash = fill.get("hash") or fill.get("tid")
```
This pattern appears twice (lines 829, 1042). Extract to helper method _get_fill_hash(fill: dict) -> Optional[str].

Performance

Optimize Port Health Status Building (lines 440-453)
Creates a new list on every get_status() call. Consider caching with TTL if status is polled frequently.
Batch REST Calls (lines 1019-1041)
Currently polls addresses sequentially. If HL API supports batch requests, could significantly reduce latency.
Lazy Initialization of Backup Poll (line 580)
The backup task starts immediately in orchestrator. Consider delaying until first address is tracked to avoid unnecessary polling.

Maintainability

Add Circuit Breaker Pattern
If backup polls consistently fail for an address, consider temporarily blacklisting it to avoid wasting resources.
Metrics Enhancement
Track additional metrics:
- Average backup poll latency per address
- Miss rate (fills caught by backup vs total fills)
- REST API error types breakdown

🔒 Security Notes

Proxy Credential Exposure (lines 936, 776)
```
proxy_url = self._make_shard_proxy_url(port)
```
Ensure proxy URLs containing credentials are never logged. Review all logging statements that might output proxy_url.
Input Validation Missing
- _fetch_fills_by_time doesn't validate the structure of returned fills before processing (line 996)
- Malformed API responses could cause exceptions or unexpected behavior
- Add schema validation for critical fields: hash, coin, side, sz, px
Timeout Configuration (line 992)
```
resp = session.post(api_url, json=payload, timeout=10)
```
10-second timeout seems reasonable, but should be configurable via ValiConfig for operational flexibility.
DOS Vector via Watermark Reset
If an attacker can manipulate _last_poll_time_ms, they could force re-processing of old fills. Ensure this state is protected and persisted securely.
Rate Limit Bypass Risk
The PR claims "Hitting the rate limit on the REST api endpoint does not effect the websocket connection." However, there's no handling for 429 responses. Add explicit rate limit detection and backoff.

📋 Additional Observations

Testing

Missing Integration Tests: All tests are unit tests with mocked dependencies. Need at least one integration test hitting a real (or containerized mock) HL API.
Race Condition Tests Missing: No tests verify thread safety of shared counters and dictionaries under concurrent access.
Test Naming: Excellent descriptive test names. Consider grouping related tests into nested test classes for better organization.

Documentation

Missing Docstring (line 56): _PortHealthRecord class needs a module-level docstring explaining the exponential backoff strategy.
Config Documentation (lines 545-554): Add comments explaining why specific thresholds were chosen (e.g., why 600 req/min instead of closer to 1200 limit).
Architecture Decision: Consider adding an ADR (Architecture Decision Record) documenting why REST backup was chosen over WebSocket redundancy or other alternatives.

Dependency Management

Duplicate SOCKS Libraries (requirements.txt)
```
+python-socks>=2.7.1
+PySocks>=1.7.1
```
Both python-socks and PySocks are added. The code uses requests.Session().proxies which works with PySocks. Verify if both are needed or if one can be removed.

🎯 Recommendation

APPROVE WITH REQUIRED CHANGES

This is solid engineering work with excellent test coverage and thoughtful design. However, the critical issues around race conditions, rate limiting, and memory leaks must be addressed before merge.

Required Before Merge:

Fix race condition in orchestrator cleanup (break → return)
Add rate limiting enforcement for backup polls
Add bounded size to _processed_hashes (LRU cache or TTL)
Add explicit session leak protection with context managers
Validate fill structure before processing

Strongly Recommended:

Add integration test with real API
Document proxy credential handling in security guidelines
Add 429 rate limit handling
Review and consolidate SOCKS dependencies

Nice to Have:

Convert to async HTTP client (aiohttp)
Add circuit breaker for problematic addresses
Extract common fill hash logic
Add more observability metrics

Great work overall! The backup mechanism significantly improves reliability. 🚀

…wont track trades made by that address

ward-taoshi and others added 30 commits March 3, 2026 14:08

Reduce max leverage for challenge

db717af

reduce buying power by 4 for subaccount challenge period

81f17f4

Fix axon bug

a1d4f12

Reduce miner logs

63eec69

Enforce position limit with account balance

2a43be0

Process cash with the updated order attributes

8df2e5e

Apply transaction fees to balance

e6d52ab

Adjust for floating point error

633b90a

Clamp on buying power

f780833

Scale fees before comparing buying power

bdfece7

Fix order size on resize

1366def

Logs for challenge debug

c55a648

Initialize new subaccount with challenge status

603be55

Update subaccount elimination to mdd and promotion to pull latest

84e6fc6

Clarify ledger drawdown visibility

2f68722

Move collateral broadcast to mineraccount and fix validator tests

5b542e1

Pass config for braodcast base

493e1f5

Track collateral balance in slack message

96304bc

Only close orders on quantity flips

3fbece9

Log to track unrealized pnl on positions

26ed9b3

set unrealized returns on position before validation

11ed5ae

Squash merge feat/subaccount-websockets into feat/vanta_entity

db617d8

Add admin bypass

c1516f2

Fix hanging test

8cdd492

Simplify prop net order placer (#660)

c4cbf35

* Simplify order sending for miners * Display deprecated message on run receive signals server. * Fix miner integration test * Update docstring

Don't retry on market closed

92344bb

Include error message in websockets

ef72ba1

return empty payout for admin subaccounts

436ba75

Pass error message from validator

ae40327

Inject hwm from perf ledger into miner account size

50fa6d8

brokespace added 13 commits March 9, 2026 10:53

added proper handling of HWM data

8bbef36

added subaccount dashboard rpc back to entity server

9f48c14

added coarse and fine l2 hl book subscriptions

4be9859

removed localnet netuid

93a6572

added FLAT_ALL execution type back

b85b704

redefined the get_positions and receive_signal in neurons/validator.p…

bd8133b

…y to be the corrent internal funcs

added entity miner docs

fee1fdb

defined needed objects in position.py

3331db0

removed create_hl_subaccount.py as not needed, was for testing

8a6c395

removed the run miner script that was for testing

4ac7049

clened up miner rest server to have 0 entity miner logic, it all belo…

8292f90

…ngs in the entity miner rest server

added backup rest api pinging

c02e501

added proper tests for hyperliquid rest api backup tracking

1dc486f

brokespace requested review from derekawender, jbonilla-tao, sli-tao, taoshidev1, trdougherty and ward-taoshi as code owners March 12, 2026 21:37

brokespace added 3 commits March 12, 2026 15:39

last poll time is now changed so that when a new address is added it …

3f3a7e3

…wont track trades made by that address

added better failure handling for hyperliquid tracking

c98298e

added proper check for USDC balance min of 000 for hl traders

8863f4f

brokespace force-pushed the feat/hyperscaled branch 2 times, most recently from 48461d5 to 3b5630d Compare March 13, 2026 19:41

ward-taoshi force-pushed the feat/hyperscaled branch from 38a472b to d486519 Compare March 16, 2026 23:06

sli-tao force-pushed the feat/hyperscaled branch 2 times, most recently from c2d55af to 595b86c Compare April 2, 2026 11:07

sli-tao force-pushed the feat/hyperscaled branch 2 times, most recently from b077a4b to be3cb98 Compare April 3, 2026 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperscaled backup rest api call#696

Hyperscaled backup rest api call#696
brokespace wants to merge 115 commits intofeat/hyperscaledfrom
feat/hyperscaled_backup_call

brokespace commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brokespace commented Mar 12, 2026

Taoshi Pull Request

Description

Related Issues (JIRA)

Checklist

Reviewer Instructions

Definition of Done

Checklist (for the reviewer)

Optional: Deploy Notes

Uh oh!

github-actions bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Claude AI Code Review

Summary

✅ Strengths

⚠️ Concerns

CRITICAL

MAJOR

💡 Suggestions

Code Quality

Performance

Maintainability

🔒 Security Notes

📋 Additional Observations

Testing

Documentation

Dependency Management

🎯 Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 12, 2026 •

edited

Loading