Skip to content

WIP: Add Hyperliquid WebSocket fill integration for HL miners#674

Open
trdougherty wants to merge 1 commit intomainfrom
feat/hl_integration
Open

WIP: Add Hyperliquid WebSocket fill integration for HL miners#674
trdougherty wants to merge 1 commit intomainfrom
feat/hl_integration

Conversation

@trdougherty
Copy link
Copy Markdown
Collaborator

DO NOT MERGE YET (looking at you gordon and sam) this is just a template of changes.

Validators subscribe to Hyperliquid userFills for registered HL miners, receive fills in real-time, and inject them as orders using HL execution prices (skipping Polygon/Tiingo price fetching). HL miners are blocked from submitting signals via REST/axon. Includes WebSocket miner connection pool infrastructure.

Taoshi Pull Request

Description

[Provide a brief description of the changes introduced by this pull request.]

Related Issues (JIRA)

[Reference any related issues or tasks that this pull request addresses or closes.]

Checklist

  • I have tested my changes on testnet.
  • I have updated any necessary documentation.
  • I have added unit tests for my changes (if applicable).
  • If there are breaking changes for validators, I have (or will) notify the community in Discord of the release.

Reviewer Instructions

[Provide any specific instructions or areas you would like the reviewer to focus on.]

Definition of Done

  • Code has been reviewed.
  • All checks and tests pass.
  • Documentation is up to date.
  • Approved by at least one reviewer.

Checklist (for the reviewer)

  • Code follows project conventions.
  • Code is well-documented.
  • Changes are necessary and align with the project's goals.
  • No breaking changes introduced.

Optional: Deploy Notes

[Any instructions or notes related to deployment, if applicable.]

/cc @mention_reviewer

Validators subscribe to Hyperliquid userFills for registered HL miners,
receive fills in real-time, and inject them as orders using HL execution
prices (skipping Polygon/Tiingo price fetching). HL miners are blocked
from submitting signals via REST/axon. Includes WebSocket miner
connection pool infrastructure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🤖 Claude AI Code Review

Last reviewed on: 13:42:17

Summary

This PR introduces a Hyperliquid WebSocket integration that allows validators to receive real-time trade fills from registered HL miners, bypassing traditional REST signal submission. The implementation includes a WebSocket connection pool infrastructure, fill translation layer, and integration with the existing order processing pipeline.


✅ Strengths

  • Good separation of concerns: The architecture splits responsibilities cleanly (websocket client, fill translator, service orchestrator)
  • Reconnection logic: Built-in exponential backoff with jitter in WebSocketConnectionTemplate
  • Deduplication handling: Uses _seen_fills set to prevent duplicate fill processing
  • Thread-safe operations: Proper use of locks in AssetSelectionManager for HL miner registry
  • Immutability by design: HL wallet registration is immutable once set, preventing gaming

⚠️ Concerns

CRITICAL ISSUES

  1. Multiple Inheritance Anti-Pattern (vali_objects/position_management/position_manager_client.py:67-74)

    class PositionManagerClient(RPCClientBase, WebSocketMinerConnectionPool):
    • This creates a confusing class hierarchy where PositionManagerClient IS-A WebSocketMinerConnectionPool, which doesn't make semantic sense
    • The WebSocket pool is never initialized properly (no connection_builder passed)
    • Methods like get_positions_for_hotkeys try to call self._websocket_manager.get_positions_for_hotkeys() but there's no _websocket_manager attribute
    • FIX: Use composition instead: self._websocket_manager = WebSocketMinerConnectionPool(...)
  2. Unbounded Memory Growth (vali_objects/utils/hyperliquid/hl_fill_service.py:43-44)

    self._seen_fills: set = set()
    • No cleanup or size limit on _seen_fills set
    • Will grow indefinitely with each unique fill, causing memory leak over time
    • FIX: Use an LRU cache or time-based eviction (e.g., remove entries older than 24h)
  3. Race Condition in WebSocket Startup (mining/run_receive_signals_server.py:323-332)

    if __name__ == "__main__":
        waitress.serve(app, host="0.0.0.0", port=8088, connection_limit=1000)
        print('Successfully started run_receive_signals_server.')
        
        # We may also want to set up the websockets ecosystem here
        websocket_server = WebSocketServer(...)
    • Code after waitress.serve() is unreachable because serve() is blocking
    • WebSocket server will never be instantiated
    • FIX: Start WebSocket server in a separate thread before calling serve(), or use async ASGI server
  4. Missing Error Handling in Critical Path (vali_objects/utils/hyperliquid/hl_fill_service.py:162-179)

    err_msg, position, order = self._mom._process_market_order(...)
    • Direct access to private method _process_market_order from external class
    • No handling of potential exceptions that could crash the executor thread
    • FIX: Use public API and wrap in comprehensive try-except
  5. Incomplete WebSocket Pool Integration (shared_objects/websockets/miner_management.py:150-158)

    async def combined_on_message(ws, msg: str | bytes) -> None:
        await on_message(ws, msg)
        if original_on_message is not None:
            await original_on_message(ws, msg)
    • Monkey-patching internal attributes (conn._on_message) is fragile
    • Will break if WebSocketConnectionTemplate internal API changes
    • FIX: Add proper callback chaining support to WebSocketConnectionTemplate

MAJOR ISSUES

  1. Security: No Authentication/Validation (vali_objects/utils/hyperliquid/hl_websocket_client.py:44-50)

    async with websockets.connect(uri, ping_interval=20, ping_timeout=10) as ws:
        await ws.send(subscribe_msg)
    • No verification that fills actually belong to the claimed wallet address
    • Hyperliquid WS is public; need to validate signatures or use API keys
    • RISK: Malicious actors could potentially inject fake fills
  2. Missing Transaction Validation (vali_objects/utils/hyperliquid/hl_fill_translator.py:115)

    • No validation that HL fills match expected trade sizes, prices within reasonable bounds
    • No checks for dust trades or manipulation attempts
    • FIX: Add validation rules (min trade size, price deviation limits, etc.)
  3. Resource Leak in ThreadPoolExecutor (vali_objects/utils/hyperliquid/hl_fill_service.py:199)

    self._executor.shutdown(wait=False)
    • wait=False may leave threads running after shutdown
    • Could cause issues on validator restart
    • FIX: Use wait=True with timeout, or use concurrent.futures.as_completed()
  4. No Rate Limiting (vali_objects/utils/hyperliquid/hl_fill_service.py:154)

    • No protection against fill spam from a compromised HL wallet
    • Could overwhelm the order processing pipeline
    • FIX: Add per-miner rate limiting (e.g., max N fills per minute)
  5. Configuration Hardcoding (vali_objects/vali_config.py:243)

    HL_MAX_UNIQUE_USERS = 10
    • Hard limit with no runtime override or explanation
    • What happens when limit is exceeded? (no clear handling in reconcile logic)
    • FIX: Document the rationale and add graceful handling

MODERATE ISSUES

  1. Incomplete Type Annotations (shared_objects/websockets/base.py:82-90)

    on_message: Optional[Callable[[WebSocketClientProtocol, str | bytes], Awaitable[None]]] = None,
    • Using str | bytes union syntax requires Python 3.10+
    • Should use Union[str, bytes] for compatibility or set minimum Python version
    • Missing return type annotations in several places
  2. Missing Test Coverage Indicators

    • No unit tests included in this PR
    • Checklist item unchecked: "I have added unit tests for my changes"
    • Complex async code needs thorough testing
    • FIX: Add tests for fill parsing, deduplication, connection pool, error scenarios
  3. Logging Inconsistency

    • Mix of bt.logging and standard logger
    • Some debug logs use .info(), others use .debug()
    • FIX: Standardize on one logging approach with consistent levels
  4. No Metrics/Observability (vali_objects/utils/hyperliquid/hl_fill_service.py)

    • No counters for fills processed, errors, reconnections
    • No latency tracking from HL fill timestamp to processing
    • FIX: Add prometheus metrics or structured logging for monitoring
  5. Missing Graceful Degradation (neurons/validator.py:429-443)

    if self.asset_selection_client.is_hl_miner(sender_hotkey):
        # Block immediately
    • If HL WebSocket is down, miners are completely blocked
    • No fallback mechanism
    • FIX: Add health check and fallback to REST if WS fails

💡 Suggestions

Architecture Improvements

  1. Extract Fill Processing Interface

    class FillProcessor(ABC):
        @abstractmethod
        async def process_fill(self, miner_hotkey: str, fill: Fill) -> ProcessResult:
            pass
    • Makes testing easier and allows for different fill sources in future
  2. Add Circuit Breaker Pattern

    • Implement circuit breaker for HL WebSocket connections
    • If connection fails N times in M minutes, temporarily disable and alert
  3. Use Message Queue for Fill Processing

    • Instead of direct thread pool submission, use a proper message queue (e.g., asyncio.Queue)
    • Allows for backpressure handling and better error recovery
  4. Configuration Validation

    def __post_init__(self):
        if self.HL_MAX_UNIQUE_USERS < 1:
            raise ValueError("HL_MAX_UNIQUE_USERS must be >= 1")

Code Quality Improvements

  1. Add Type Aliases for Clarity (shared_objects/websockets/base.py)

    MessageHandler = Callable[[WebSocketClientProtocol, Union[str, bytes]], Awaitable[None]]
  2. Extract Magic Numbers (vali_objects/utils/hyperliquid/hl_fill_service.py:75)

    RECONCILE_INTERVAL_SECONDS = 60
    RECONNECT_DELAY_SECONDS = 5
  3. Add Docstring Examples

    """
    Examples:
        >>> fill = HyperliquidFill.from_ws_data({
        ...     "coin": "BTC", "px": "50000.0", ...
        ... })
        >>> fill.trade_pair
        TradePair.BTCUSD
    """
  4. Use Structured Logging (vali_objects/utils/hyperliquid/hl_fill_service.py:164)

    bt.logging.info(
        "[HL_FILL_SVC] Processing fill",
        extra={
            "miner": miner_hotkey[:10],
            "direction": fill.dir,
            "size": fill.sz,
            "coin": fill.coin,
            "price": fill.price
        }
    )

Performance Optimizations

  1. Batch Fill Processing

    • Instead of processing fills one-by-one, batch multiple fills per miner
    • Reduces lock contention and database writes
  2. Lazy Connection Creation (vali_objects/utils/hyperliquid/hl_fill_service.py:90)

    • Don't create connections for inactive miners
    • Add activity tracking and close connections after N minutes of inactivity
  3. Use asyncio.gather for Parallel Operations

    await asyncio.gather(
        *[self._teardown_connection(hk) for hk in to_remove],
        return_exceptions=True
    )

Testing Recommendations

  1. Add Integration Tests

    • Mock Hyperliquid WebSocket responses
    • Test fill parsing with various edge cases
    • Test reconnection logic
    • Test deduplication across restarts
  2. Add Chaos Engineering Tests

    • Random connection drops
    • Malformed messages
    • Out-of-order fills
    • Duplicate fills with different timestamps

🔒 Security Notes

High Priority

  1. Signature Verification Missing

    • HL fills should include cryptographic signatures from the wallet
    • Validator must verify signatures before processing
    • Otherwise, anyone can subscribe to public WS and replay fills
  2. Input Validation Required

    • Validate all numeric fields (price, size) for reasonable bounds
    • Check for NaN, infinity, negative values
    • Prevent integer overflow in quantity calculations
  3. DoS Protection Needed

    • No rate limiting on fill processing
    • Malicious miner could flood with fills
    • Implement: Token bucket rate limiter per miner
  4. Secrets Management

    • If API keys are added later, ensure they're not logged
    • Use environment variables or secure key management

Medium Priority

  1. Error Message Information Disclosure (neurons/validator.py:437-438)

    msg = (f"Hyperliquid miner {sender_hotkey} cannot submit signals via REST. "
    • Reveals internal architecture details
    • Consider generic error message: "Signal submission not allowed for this miner type"
  2. Unbounded Queue Sizes

    • self._inbox: asyncio.Queue[MinerWebSocketMessage] = asyncio.Queue(maxsize=inbox_maxsize)
    • Default inbox_maxsize=0 means unbounded
    • Could cause OOM if fills arrive faster than processing
    • Set reasonable default: inbox_maxsize=10000
  3. Data Persistence Security (vali_objects/utils/asset_selection/asset_selection_manager.py:159)

    • hl_miners.json contains wallet addresses (PII)
    • Ensure file permissions are restrictive (600)
    • Consider encrypting at rest

📋 Additional Checklist Items

  • Add migration guide for existing validators
  • Document HL miner registration process for miners
  • Add monitoring dashboard recommendations
  • Create runbook for common issues (WS disconnects, fill delays, etc.)
  • Add feature flag to enable/disable HL integration
  • Version compatibility check for Python 3.10+ features
  • Load testing results with 10 concurrent HL miners
  • Fallback mechanism if HL WebSocket API is deprecated/changed

🎯 Blocking Issues (Must Fix Before Merge)

  1. ❌ Fix unreachable WebSocket server initialization in run_receive_signals_server.py
  2. ❌ Remove multiple inheritance anti-pattern in PositionManagerClient
  3. ❌ Add bounded memory for _seen_fills deduplication set
  4. ❌ Implement fill signature verification or alternative authentication
  5. ❌ Add unit tests for critical paths (fill parsing, deduplication, error handling)
  6. ❌ Document the breaking change and migration steps for validators

📝 Final Thoughts

This is a solid architectural foundation for HL integration, but needs refinement before production deployment. The WebSocket infrastructure is well-designed with good reconnection logic, but the integration points have several critical issues that could cause runtime failures or security vulnerabilities.

Primary concerns: Memory leaks, unreachable code, security validation gaps, and insufficient error handling. These must be addressed before moving forward.

Estimated effort to production-ready: 2-3 days of focused work on the blocking issues above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant