Open
Conversation
…mpotency Protocol hardening: version tolerance + deterministic idempotency
…mpotency Protocol hardening: Phases B+C+D (version tolerance, idempotency, reliable delivery)
…on-idempotency Hardening: Phase B/C protocol versioning, idempotency & bug fixes
Added an image to enhance the article's visual appeal.
…on-idempotency Comprehensive hardening: P0/P1 bug fixes, thread safety, security
…on-idempotency fix: P2/P3 hardening across 13 modules
contribution_ratio was never synced from the ledger to hive_members, last_seen only updated on connect/disconnect events, and addresses were never captured at join time. This fixes all three root causes plus initializes presence tracking at join so uptime_pct accumulates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The hive-status RPC only returned tier/joined_at/pubkey for our membership, so cl-revenue-ops revenue-hive-status showed null for these fields (Issue #36). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mat, determinism, dedup - Bug 1 (Critical): calculate_our_balance now uses identical MemberContribution conversion as compute_settlement_plan (proper uptime normalization, int casting, rebalance_costs inclusion) - Bug 2 (Critical): Period format standardized to YYYY-WW across routing_pool.py and rpc_commands.py (was YYYY-WNN, mismatched settlement format) - Bug 3: settle_period atomicity check changed from `if ok is False` to `if not ok` to catch None/0 returns from record_pool_distribution - Bug 4: generate_payments sort now includes peer_id tie-breaker for deterministic payment ordering, matching generate_payment_plan - Bug 5: capital_score now reflects weighted_capacity instead of uptime_pct - Bug 6: asyncio event loop in settlement_loop wrapped in try/finally to ensure loop.close() on exceptions - Bug 8: Revenue deduplication by payment_hash (application-level check + UNIQUE constraint + index on pool_revenue table) - Bug 9: Removed snapshot_contributions() side-effects from read-only paths (get_pool_status, calculate_distribution) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ification, signed ACKs CRITICAL: - Add ban check to handle_hello/handle_attest (prevents ban evasion via rejoin) - Add timestamp freshness checks to 23 message handlers with per-type age limits (GOSSIP 1hr, INTENT 10min, SETTLEMENT 24hr, INTELLIGENCE 2hr) - 5-minute future clock skew tolerance HIGH: - Add cryptographic signature verification to 13 previously unsigned handlers (health_report, liquidity_need/snapshot, route_probe/batch, peer_reputation_snapshot, task_request/response, splice_init_request/response, splice_update/signed/abort) - MSG_ACK now signed: create_msg_ack accepts rpc for signing, handle_msg_ack verifies signature (backward-compatible) MODERATE: - Increase relay dedup window from 300s to 3600s (covers freshness windows) - Increase MAX_SEEN_MESSAGES from 10000 to 50000 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL: Replace 9 unsafe plugin.rpc calls with safe_plugin.rpc
- handle_expansion_nominate/elect/decline: checkmessage() and getinfo()
- hive_calculate_size: listchannels() and listfunds()
- hive_test_intent: getinfo()
- hive_test_pending_action: listchannels() and getinfo()
These bypassed the RPC_LOCK thread serialization, risking race conditions
when background threads make concurrent RPC calls to lightningd.
CRITICAL: Fix direct dict access on RPC results
- init(): getinfo()['id'] → getinfo().get('id', '') — could crash startup
- hive_test_intent: getinfo()['id'] → .get('id', '')
- hive_test_pending_action: getinfo()['id'] → .get('id', '')
- member_ids set comprehension: m['peer_id'] → m.get('peer_id', '')
HIGH: Wrap unprotected signmessage vote signing in try-except
- _propose_settlement_gaming_ban: vote signing had no error handling
- hive_propose_ban: vote signing had no error handling
Both could crash if signmessage RPC fails after proposal creation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…safety - strategic_positioning: fix AttributeError crashes (fleet_coverage, target_capacity_sats, value_score → correct attribute names) - cooperative_expansion: fix TOCTOU in join_remote_round (atomic check-and-set), negative liquidity score (clamp to 0), deterministic election tie-breaker (peer_id), use-after-free in handle_decline (capture decline_count in local), state validation in handle_elect, prune unbounded _recent_opens/_target_cooldowns - governance: add threading.Lock for failsafe budget TOCTOU race (atomic check-execute-update) - settlement: cap remainder allocation to len(frac_order) preventing cyclic wrapping - bridge: fix double record_failure() on timeout (subprocess.TimeoutExpired → TimeoutError chain) - liquidity_coordinator: fix MCF assignment ID collision (include channel suffixes) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove trustedcoin plugin (explorer-only Bitcoin backend) - Add vitality plugin v0.4.5 for plugin health monitoring - Update Docker image version to 2.2.7 - vitality auto-restarts failed plugins, improving production uptime Ref: lightning-goats/cl-hive
…orrectness P0 crashes fixed: - channel_rationalization: _get_topology_snapshot() → get_topology_snapshot() - network_metrics: same AttributeError crash on nonexistent private method - fee_coordination: TypeError when TemporalPattern.hour_of_day/day_of_week is None - task_manager: crash on None target/amount_sats in _execute_expand_task P1 logic errors fixed: - channel_rationalization: self.analyzer → self.rationalizer.redundancy_analyzer - channel_rationalization: r.owner_id → r.owner_member, r.freed_capacity_sats → r.freed_capital_sats - channel_rationalization: self.our_pubkey → self._our_pubkey - fee_coordination: day_of_week == -1 → is None for pattern matching - planner: listpeerchannels(target) → listpeerchannels(id=target) - planner: guard for None return from create_intent before accessing .intent_id - yield_metrics: net_revenue now subtracts total_cost (including open_cost) not just rebalance_cost - routing_intelligence: int() wrap on float avg_capacity_sats to match type annotation - mcf_solver: reverse edges now properly filtered via is_reverse flag instead of cost_ppm < 0 P2 edge cases fixed: - mcf_solver: solution_valid false when no solution exists (was reporting true) - peer_reputation: force_close_count uses max() not sum() across reporters - peer_reputation: filter None from unique_reporters set - network_metrics: use hive_connections not external topology for "not connected to" - yield_metrics: clamp depletion_risk and saturation_risk to [0, 1.0] - yield_metrics: init _remote_yield_metrics in __init__ instead of hasattr - channel_rationalization: init _remote_coverage/_remote_close_proposals in __init__ - channel_rationalization: guard ZeroDivisionError on empty topology - health_aggregator: round() instead of int() for health score truncation - planner: clamp negative ratio in channel size calculation - fee_coordination: min strength floor (0.1) for route markers preserving failure signal - fee_intelligence: filter None from reporters list - quality_scorer: Tuple[bool, str] type hint for Python 3.8 compat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add threading.Lock to AdaptiveFeeController, StigmergicCoordinator, MyceliumDefenseSystem, TimeBasedFeeAdjuster, FeeCoordinationManager to protect shared state from concurrent modification - Add threading.Lock to VPNTransportManager with snapshot-swap pattern for atomic reconfiguration and protected stats/peer state - Route task_manager._execute_expand_task through governance engine instead of directly calling rpc.fundchannel (security: fail closed) - Fix outbox retry: parse/serialize errors now fail permanently instead of retrying indefinitely with backoff - Add cache bounds: cap _remote_pheromones (500 peers), _markers (1000 routes), _peer_stats (500 peers), _remote_yield_metrics (200 peers), _flow_history (500 channels) - Add stale key eviction to rate limiters in peer_reputation, routing_intelligence, liquidity_coordinator, task_manager Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-revenue-ops 5 bugs fixed in the cooperative fee coordination flow: - Non-salient fee changes now correctly revert to current_fee (was returning the modified fee even when salience filter said "not worth changing") - pheromone_levels RPC now returns list under "pheromone_levels" key with field names matching cl-revenue-ops expectations (level, above_threshold) - New hive-record-routing-outcome RPC for pheromone updates when source/destination are unavailable (fallback was calling read-only hive-pheromone-levels with invalid write params) - Health multiplier comments corrected to match actual math ranges These bugs combined meant the pheromone-based adaptive fee learning signal was completely non-functional — routing outcomes were never recorded as pheromone updates, and pheromone levels were unreadable by cl-revenue-ops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ting, MCF Critical fixes: - CircularFlow.cycle → CircularFlow.members: AttributeError crash in get_shareable_circular_flows and get_all_circular_flow_alerts - BFS fleet path finding used shared external peers as connectivity proxy instead of checking actual direct channels between members (phantom routes) - LiquidityCoordinator._lock defined but never acquired — all shared mutable state unprotected from concurrent access Medium fixes: - MCFCircuitBreaker not thread-safe (added threading.Lock) - MCF get_total_demand only counted inbound needs — fleets with only outbound needs never triggered optimization - receive_mcf_assignment could exceed MAX_MCF_ASSIGNMENTS if cleanup didn't free space (now rejects) - Empty string peers from failed channel lookups polluted circular flow detection graph - to_us_msat not converted to int before comparison (Msat type safety) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… encapsulation - create_mcf_ack_message() called with 4 extra args (TypeError on every ACK) - create_mcf_completion_message() called with 7 extra args (TypeError on every completion) - ctx.state_manager AttributeError in rebalance_hubs/rebalance_path (safe getattr) - execute_hive_circular_rebalance missing permission check for fund movements - get_mcf_optimized_path ignoring to_channel parameter (wrong assignment match) - _check_stuck_mcf_assignments reaching into private dict (encapsulated with lock) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…defensive copies State Manager: - _validate_state_entry() no longer silently mutates input dict (available > capacity now rejected) - update_peer_state() makes defensive copies of fee_policy, topology, capabilities - Caps available_sats at capacity_sats in update_peer_state() - load_from_database() and _load_state_from_db() now use from_dict() for consistent field handling Planner: - Added missing feerate gate to _propose_expansion() (documented but never implemented) - Fixed cfg.market_share_cap_pct crash → getattr(cfg, 'market_share_cap_pct', 0.20) - Fixed cfg.governance_mode crash → getattr(cfg, 'governance_mode', 'advisor') Gossip: - Added timestamp freshness check: rejects messages >1hr old or >5min in future 23 new tests, 1225 total passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…te_entry() Prevents unbounded arrays, non-string entries, and oversized capability strings from being accepted via gossip or FULL_SYNC messages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gnments() Added get_all_assignments() method to LiquidityCoordinator and updated the mcf_assignments RPC to use it instead of reaching into private dict. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add vitality-amboss=true to docker-entrypoint.sh config generation - Add vitality-watch-channels=true for channel health monitoring - Add vitality-expiring-htlcs=50 for HTLC expiry warnings - Update Dockerfile comment to document Amboss integration
…AttributeError Critical fixes across 5 modules: - mcf_solver: MCFCircuitBreaker.get_status() race condition — can_execute() called outside lock returned stale value; refactored to _can_execute_unlocked() called atomically within lock - liquidity_coordinator: 8 thread safety fixes — missing locks on get_status(), get_pending_mcf_assignments(), get_mcf_assignment(), update_mcf_assignment_status(), create_mcf_ack_message(), create_mcf_completion_message(), get_mcf_status(); deadlock fix (non-reentrant lock + nested call); new claim_pending_assignment() atomic method to prevent TOCTOU double-claim race - cl-hive.py: _send_mcf_ack() TypeError — create_mcf_ack_message() takes no params but was called with 4 positional args; sendcustommsg keyword args fix; broadcast_intent_abort NameError (plugin → safe_plugin); missing coordinator check in handle_mcf_completion_report; TOCTOU claim race replaced with atomic claim_pending_assignment() - cost_reduction: CircularFlow AttributeError (cf.members_count → cf.cycle_count); hub scoring division-by-zero guard; record_mcf_ack() thread safety with dedicated lock and proper __init__ initialization - intent_manager: get_intent_stats() race — _remote_intents read without lock 25 new tests covering all fixes including concurrent access verification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- H-1: Move listfunds() RPC call outside _channel_peer_cache_lock - H-2: Add _rate_lock to protect _probe_rate/_batch_rate dicts in routing_intelligence - M-6: Log full traceback in message dispatcher exception handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
D2: cleanup_expired_intents() performs UPDATE + DELETE without a transaction, risking orphaned records on crash. Wrap in BEGIN IMMEDIATE. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The eviction used min-by-timestamp which is unpredictable under clock skew or same-second updates. Use dict insertion order (FIFO) which correctly evicts the oldest tracked peer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… more HIGH: Fee broadcast tracking no longer resets on non-broadcast path, 2-member hive quorum no longer requires impossible 3 votes, handle_welcome no longer trusts remote tier or adds peer as full member, channel existence check fails-closed on RPC error, hive-close-channel uses correct param name. MEDIUM: Safe amount_msat parsing, max_channel_sats enforcement, market_share lower bound, contribution ledger TOCTOU, shutdown ordering, dead config annotation, signed bootstrap promotion vouch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.