From bdd3abd55ab8ab8926c4a086ac632a16e571f4d4 Mon Sep 17 00:00:00 2001 From: Dmitry Perchanov Date: Tue, 5 May 2026 13:26:51 +0200 Subject: [PATCH] add kip-0023: DNS-resolved peer addresses Adds hostname support to kaspad's `--addpeer` / `--connect` CLI flags and the `AddPeer` RPC, mirroring Bitcoin Core's tolerant `addnode` semantics. Hostnames are resolved at dial time, tolerated when DNS fails, and re-resolved periodically so a running node tracks DNS changes for the lifetime of the configured peer. Reference implementation under review at kaspanet/rusty-kaspa#988. Signed-off-by: Dmitry Perchanov --- README.md | 3 +- kip-0023.md | 292 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 294 insertions(+), 1 deletion(-) create mode 100644 kip-0023.md diff --git a/README.md b/README.md index bf6de03..3fae149 100644 --- a/README.md +++ b/README.md @@ -14,4 +14,5 @@ Kaspa Improvement Proposals (KIPs) describe standard proposals for the Kaspa net | [10](kip-0010.md) | Consensus, Script Engine | New Transaction Opcodes for Enhanced Script Functionality | Maxim Biryukov, Ori Newman | Active | | [13](kip-0013.md) | Consensus | Transient Storage Handling | Michael Sutton, coderofstuff | Active | | [14](kip-0014.md) | Consensus | The Crescendo Hardfork | Michael Sutton | Active | -| [15](kip-0015.md) | Consensus | Canonical Transaction Ordering and Sequencing Commitments | Mike Zak, Ro Ma | Active | \ No newline at end of file +| [15](kip-0015.md) | Consensus | Canonical Transaction Ordering and Sequencing Commitments | Mike Zak, Ro Ma | Active | +| [23](kip-0023.md) | Node, API/RPC | DNS-resolved peer addresses | Dmitry Perchanov | Proposed | \ No newline at end of file diff --git a/kip-0023.md b/kip-0023.md new file mode 100644 index 0000000..7be3080 --- /dev/null +++ b/kip-0023.md @@ -0,0 +1,292 @@ +``` +KIP: 23 +Layer: Node, API/RPC +Title: DNS-resolved peer addresses +Type: Node behavior, wire format extension (backward-compatible) +Author: Dmitry Perchanov +Comments-URI: https://github.com/kaspanet/rusty-kaspa/issues/947 +created: 2026-05-03 +updated: 2026-05-06 +Status: Proposed +``` + +## Abstract + +This KIP extends `kaspad`'s `--addpeer` / `--connect` flags, the kaspa-cli interactive `addpeer` command, and the RPC `AddPeer` method to accept hostnames in addition to numeric IP addresses. Hostnames are resolved at dial time (not parse time), re-resolved on dial failure, and re-resolved periodically on a configurable interval, mirroring Bitcoin Core's `-addnode` semantics. Three new metrics track resolution outcomes and active hostname state via `kaspa-cli getmetrics`. Wire compatibility for the wRPC `AddPeerRequest` borsh frame is preserved via a `u16` version dispatch (v1 IP-only ↔ v2 string-form). The behavioral contract is **tolerant**: an unresolvable hostname does not abort `kaspad` startup and does not return an error to RPC `AddPeer` callers; the endpoint is registered and retried periodically until it resolves. + +## Motivation + +The pre-KIP behavior of `kaspad --addpeer=...` accepts only numeric IP addresses, which is impractical in the increasingly common case where the peer is named by DNS — Kubernetes service DNS (`*.svc.cluster.local`), Docker Compose service names, dynamic DNS for home operators, or simple operator hostnames where the IP rotates on container reschedule (motivating GitHub issue: kaspanet/rusty-kaspa#947). + +Bitcoin Core has supported hostname `-addnode` since at least 0.10, with mature tolerance semantics: `bitcoind` does not abort on an unresolvable hostname at startup and does not return an error to the `addnode` RPC caller; the dial loop retries every 60 seconds indefinitely. This KIP brings `kaspad` to the same operator UX, taking Bitcoin Core's `addnode` behavior as the parity reference where it is correct (and it is: operator visibility for unresolvable peers is preserved via metrics + `getconnectedpeerinfo` / `getaddednodeinfo`-equivalent surfaces, without coupling startup health to current DNS state). + +The change also unblocks operational patterns that today require a second-tier proxy (DNS resolution outside the node, IP injected via init script): k8s `StatefulSet` pod-to-pod peering, Compose-network peering, and ephemeral testnet topologies. + +## Specification + +### S1. Public type — `kaspa_utils::networking::PeerEndpoint` + +A new enum representing either a parsed IP/socket address or a hostname endpoint: + +```rust +#[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize, BorshSerialize, BorshDeserialize)] +pub enum PeerEndpoint { + Address(ContextualNetAddress), + Hostname { host: String, port: Option }, +} +``` + +`PeerEndpoint::parse(s: &str)` performs textual validation only — never DNS. Hostname validation is RFC 1123 strict: ASCII letter-digit-hyphen, label ≤ 63 chars, total ≤ 253 chars, no leading/trailing hyphens, no consecutive dots, no underscores. Garbage input returns `PeerEndpointParseError` synchronously; this is the only failure path that is **not** tolerant (typo at parse time vs. typo at DNS time should be disambiguable to the operator). + +`PeerEndpoint::resolve(default_port: u16) -> Result, PeerEndpointResolveError>` is async and uses `tokio::net::lookup_host` wrapped in a 5-second `tokio::time::timeout`. The `Address` variant resolves trivially to its single inner `NetAddress`; the `Hostname` variant returns one `NetAddress` per resolved A/AAAA record. + +`Display`, `FromStr`, `TryFrom<&str>`, `TryFrom` are derived to mirror `ContextualNetAddress`. Borsh `Serializer`/`Deserializer` are derived for wire use. + +### S2. Connection-manager state — `kaspa_connectionmanager::ConnectionManager` + +Two new state slots: + +```rust +struct HostnameRequest { + host: Arc, // hostname (canonical key in the registry) + port: u16, // p2p port + is_permanent: bool, + last_resolved: HashSet, + last_refresh: Option, // None == "refresh ASAP" sentinel set by mark_stale + stale_reason: Option, // companion to last_refresh: which trigger flagged the entry + refresh_failures: u32, +} + +#[non_exhaustive] +pub enum StaleReason { + InitialRetry, // entry registered with empty / errored initial resolve + DialFailure, // dial against a previously-resolved socket failed + PeriodicEmpty, // periodic re-resolve returned Ok(empty); fast-retry path +} + +pub struct ConnectionManager { + // existing fields… + hostname_state: TokioMutex, // hostname_requests + companion bookkeeping + hostname_refresh_interval: Duration, // 0 disables periodic refresh + resolver: Arc, // test seam +} +``` + +The existing `connection_requests: HashMap` is augmented with a `hostname_origin: Option>` field on each `ConnectionRequest` so that dial-failure path can mark the corresponding `HostnameRequest` stale and force the next refresh tick. Stale entries are flagged via `HostnameRegistry::mark_stale(host, reason)`, which atomically zeros `last_refresh` to `None` and stamps `stale_reason = Some(reason)` (the atomic-pair-write invariant is verified by the registry-level `apply_refresh_results` race-detection equality check; see Acceptance Tests §`kaspa-connectionmanager` for the 4-quadrant `(snapshot, current)` matrix and the `Ok(empty)` outcome-axis extension). + +A new public API `add_endpoint_request` replaces / wraps the existing IP-only `add_connection_request` for callers that may pass a hostname: + +```rust +impl ConnectionManager { + pub async fn add_endpoint_request( + &self, + endpoint: PeerEndpoint, + is_permanent: bool, + default_port: u16, + ); +} +``` + +The function is **infallible** by design — its return type is the unit value. The well-formed-but-unresolvable Hostname path registers the entry in `hostname_requests`, emits a `warn!` log, and increments the `peer_hostname_resolutions_total{status="failed",trigger="initial"}` counter. The resolver `Err` and `Ok(empty)` outcomes are treated identically — both mean "no IPs to dial right now"; both register and queue for retry. There is no error return: the infallibility of `add_endpoint_request` is what makes Bitcoin-parity tolerance load-bearing at the RPC and CLI surfaces (an RPC `AddPeer` caller cannot receive a resolution-failure error because the underlying registration cannot fail). Address-variant payloads short-circuit through the existing `add_connection_request` path with no semantic change. + +A new `start_hostname_refresh_loop` spawns a background tokio task that ticks every `hostname_refresh_interval` (default 600 s; `0` disables periodic refresh, leaving only dial-failure-triggered re-resolution). Each tick walks `hostname_requests`, calls the resolver, and reconciles each hostname's resolved-IP set into `connection_requests`: +- New IP appears → insert with `hostname_origin = Some(host)`; reset attempts. +- IP disappears → remove from `connection_requests`; existing connection (if any) is **not** terminated (LEAVE-UP policy; matches Bitcoin Core). +- IP unchanged → no mutation (idempotent reconciliation). + +### S3. Wire format — wRPC `AddPeerRequest` borsh dispatch + +The borsh payload of `AddPeerRequest` carries a leading `u16` version tag. v1 (existing) encodes `RpcContextualPeerAddress` (IP-only). v2 encodes the canonical `Display` string of `RpcPeerEndpoint` (a type alias for `PeerEndpoint`): + +```text +v1 (existing — server MUST keep deserializing): + u16 version=1 + RpcContextualPeerAddress peer_address # ContextualNetAddress wire form + bool is_permanent + +v2 (new): + u16 version=2 + String endpoint_str # PeerEndpoint::Display form + bool is_permanent +``` + +**Client-side emission asymmetry (load-bearing for rolling upgrades).** A client built from the post-KIP branch emits v1 byte-identically for `Address`-variant payloads (numeric IP literals) so that a v1-only server still in a mixed-version cluster decodes them unchanged; the same client emits v2 only for `Hostname`-variant payloads (a hostname-using client requires a post-KIP server anyway because the server owns the resolution loop, so the asymmetry is the documented contract). + +Server deserializer dispatches on the version tag: +- v1 → reconstruct `RpcPeerEndpoint::Address(ContextualNetAddress)` (no semantic loss for IP-only callers). +- v2 → `RpcPeerEndpoint::from_str(endpoint_str)`. + +The gRPC wire (`AddPeerRequestMessage.address` is already `string`) needs no schema change — inbound conversion calls `RpcPeerEndpoint::from_str` instead of the prior `RpcContextualPeerAddress::from_str`. + +### S4. RPC behavior — `AddPeer` + +Pre-KIP: returns `RpcError::PeerHostResolutionFailed` if the hostname does not resolve at the moment of the call. Post-KIP (Bitcoin Core parity): returns `Ok(AddPeerResponse {})` for any well-formed `peer_address`, including hostnames that currently do not resolve. The endpoint is registered in `hostname_requests` and retried periodically. + +Parse-failure remains synchronous-fast: a malformed hostname (per RFC 1123 strict validation) returns `RpcError::InvalidPeerEndpoint` to the caller. This disambiguates "typo I want to know about now" (parse failure) from "DNS not currently working" (resolution failure that is naturally transient). + +This is a behavioral change for callers that previously branched on `Err(PeerHostResolutionFailed)` — see Backward Compatibility §B3. + +### S5. CLI flags + +`kaspad --addpeer=` and `kaspad --connect=` accept any of: +- `1.2.3.4:16111` (IPv4 socket) +- `1.2.3.4` (IPv4, default port) +- `[::1]:16111` (IPv6 socket) +- `node.example.com:16111` (hostname socket) +- `pod-1.svc.cluster.local` (hostname, default port) + +A new `--hostname-refresh-interval=` flag (env `KASPAD_HOSTNAME_REFRESH_INTERVAL_SEC`, default `600`) controls the background re-resolve cadence. The flag carries a **disjoint accept set** `{0} ∪ [MIN_HOSTNAME_REFRESH_INTERVAL_SEC, ∞)` with `MIN_HOSTNAME_REFRESH_INTERVAL_SEC = 30`: `0` disables periodic refresh (dial-failure-triggered re-resolve still runs); `30` and above are accepted; the rejection band `1..30` is rejected at BOTH the CLI parse layer AND the TOML config-decode arm of `Args::parse` (defense-in-depth via the shared pure function `validate_hostname_refresh_interval_sec`; both gates surface `clap::Error::ValueValidation` with byte-identical message wording). Rationale: local-DNS-resolver thrashing protection — sub-30s cadences multiplied by many registered hostnames duplicate-query the resolver below typical A-record TTLs with no operational benefit. kaspa-cli interactive `addpeer` and the RPC `AddPeer` method accept the same set of endpoint forms. + +`kaspad` startup is **tolerant**: if any `--addpeer` / `--connect` host fails to resolve at startup, kaspad logs `warn!`, registers the endpoint, and continues startup. The previous "abort with non-zero exit on first unresolvable host" behavior is removed. + +### S6. Metrics — `kaspa-metrics-core` + +Three new metrics are emitted by `ConnectionManager` and exported via `RpcApi::get_metrics` → `GetMetricsResponse.peer_hostname_metrics` → `kaspa-cli getmetrics`: + +| Metric | Type | Labels (semantically) | Description | +|---|---|---|---| +| `peer_hostname_resolutions_total` | counter (8 buckets) | `status ∈ {ok, failed}`, `trigger ∈ {initial, initial_retry, dial_failure, periodic}` | One increment per resolution attempt, by outcome × trigger | +| `peer_hostname_active` | gauge | — | Distinct hostname entries currently registered | +| `peer_hostname_resolved_addrs` | gauge | — | Unique resolved socket addresses currently in `connection_requests` | + +Trigger semantics: + +- `initial` — the first resolve attempt at the moment a hostname is registered (CLI `--addpeer` / `--connect` parse, RPC `AddPeer`). +- `initial_retry` — the first refresh tick following a registration whose `initial` resolve failed (or returned no records). Distinct from `dial_failure` so operators can tell a never-resolved hostname apart from one that previously resolved and then lost connectivity. +- `dial_failure` — a dial against a previously-resolved socket failed, triggering re-resolution of the originating hostname. +- `periodic` — the background refresh loop's per-tick re-resolve, including both cadence-elapsed entries (`last_refresh = Some(t)` with `now - t >= cadence`) and `PeriodicEmpty` fast-retry entries (`last_refresh = None`, `stale_reason = Some(PeriodicEmpty)` set when a prior periodic resolve returned `Ok(empty)`; `PeriodicEmpty` is internal-only and maps to `ResolveTrigger::Periodic` for the wire metric label so the 8-bucket counter shape is unchanged). Cadence configurable via `--hostname-refresh-interval` (default `600` s; `0` disables periodic refresh). + +The wire encoding flattens the two-label counter into 8 distinct fields. The protobuf message type name `PeerHostnameMetrics` carries the group, so wire-flat field names drop the redundant `peerHostname` prefix — consistent with the existing kaspa-codebase convention used by `BandwidthMetrics`, `ConnectionMetrics`, `ConsensusMetrics`, `ProcessMetrics`, and `StorageMetrics`. Wire-flat fields are: `resolutionsTotalInitialOk`, `resolutionsTotalInitialFailed`, `resolutionsTotalInitialRetryOk`, `resolutionsTotalInitialRetryFailed`, `resolutionsTotalDialFailureOk`, `resolutionsTotalDialFailureFailed`, `resolutionsTotalPeriodicOk`, `resolutionsTotalPeriodicFailed`, `active`, and `resolvedAddrs`. + +The metric NAMES (Prometheus-canonical for any future Prometheus exporter) remain `peer_hostname_resolutions_total{status, trigger}`, `peer_hostname_active`, and `peer_hostname_resolved_addrs`. The exporter layer prepends the `peer_hostname_` group prefix at presentation time; the wire layer stays prefix-free per the kaspa codebase convention. Operators querying the metric still get per-(status, trigger) values; label-aware tooling sees the labels embedded in wire field names. + +## Backward Compatibility + +### B1. wRPC borsh `AddPeerRequest` v1 clients + +Existing v1 clients (built before this KIP) continue to send the v1 borsh payload. The server dispatches on the `u16` version tag and decodes v1 into `RpcPeerEndpoint::Address(...)` with no semantic loss. **Server-side MUST retain v1 decoding indefinitely** — there is no mechanism to migrate every existing client. + +### B2. gRPC `AddPeerRequestMessage` + +The protobuf field `string address = 1` is unchanged. gRPC clients that send IP-only strings continue to work identically. The inbound conversion now parses through `RpcPeerEndpoint::from_str`, which accepts the pre-KIP IP forms unchanged. + +### B3. RPC `AddPeer` callers branching on `Err(PeerHostResolutionFailed)` + +This is the **operator-facing breaking change** in this KIP. Pre-KIP, an RPC client calling `AddPeer { peer_address: "" }` received `Err(RpcError::PeerHostResolutionFailed)` synchronously. Post-KIP, the same call returns `Ok(AddPeerResponse {})`; the unresolved endpoint is registered for periodic retry. Operator visibility for the unresolved state is via `peer_hostname_resolutions_total{status="failed",trigger="initial"}` (incrementing as long as the hostname stays unresolved) and via `getconnectedpeerinfo` (the peer never appears as connected until DNS recovers). + +Callers that distinguish "this peer is unreachable right now" from "this peer was added successfully" should migrate from RPC error inspection to either (a) the metrics counter, or (b) a follow-up `getconnectedpeerinfo` query to confirm the peer connected. This mirrors the operational pattern Bitcoin Core operators have used since 0.10. + +### B4. wRPC borsh `GetMetricsResponse` v1 clients + +The `GetMetricsResponse` borsh wire bumps from `version=1` to `version=2`. v1 clients (before this KIP) reading a v2 server response see the existing fields decode normally and the new `peer_hostname_metrics: Option` field is silently skipped (workflow_serializer reader does not consume trailing bytes after the v1 deserializer returns). v1 clients reading a v1 server response (against an old server) see `peer_hostname_metrics = None`. v2 clients reading v1 server response see `peer_hostname_metrics = None` (the version-2 dispatch guards on `version >= 2`). All pairs are forward/backward compatible without action. + +### B5. Protobuf forward compatibility for `GetMetricsResponseMessage.peerHostnameMetrics` + +The new sub-message `PeerHostnameMetrics peerHostnameMetrics = 16;` is added with a previously-unused field number. Protobuf forward-compat (silently skip unknown fields) means old gRPC clients reading new server response see only the existing fields; no error, no decode failure. + +### B6. CLI `--hostname-refresh-interval` flag + +New flag with default `600`; env var `KASPAD_HOSTNAME_REFRESH_INTERVAL_SEC` overrides. Operators who do not set it see the default (10-minute periodic re-resolve). Operators who set `0` disable periodic refresh and fall back to dial-failure-only re-resolution. The accept set is **disjoint**: `{0} ∪ [30, ∞)` (with `MIN_HOSTNAME_REFRESH_INTERVAL_SEC = 30`); values in the rejection band `1..30` are rejected at BOTH the CLI parse layer (clap `value_parser` closure) AND the TOML config-decode arm of `Args::parse` (both gates invoke the shared pure function `validate_hostname_refresh_interval_sec` and surface the rejection through `clap::Error::ValueValidation` with byte-identical message wording). The dual-gate construction is defense-in-depth: a TOML config with `hostname-refresh-interval-sec = 1` is rejected with the same error shape as `kaspad --hostname-refresh-interval=1`. No existing flag is repurposed or removed. + +### B7. `--addpeer=` / `--connect=` startup behavior + +Operators who pass only IPs see no behavioral change. The Bitcoin Core parity tolerance applies symmetrically to hostnames and IPs: if `--addpeer=1.2.3.4` is unreachable, kaspad starts (the IP-unreachable path was always tolerant); if `--addpeer=foo.example.com` is unresolvable, kaspad now also starts — matching the IP-unreachable behavior and Bitcoin Core's `bitcoind -addnode` semantics. + +## Reference Implementation + +The reference implementation lives in the rusty-kaspa branch: + +- **Repository:** https://github.com/kaspanet/rusty-kaspa +- **Branch:** `kas-947-addpeer-hostnames` +- **Branch base:** `master` +- **Pull request:** [`kaspanet/rusty-kaspa#988`](https://github.com/kaspanet/rusty-kaspa/pull/988) +- **HEAD:** `9b84f01f6de74584cd24affa7029ae607ac2c64f` + +**Implementation phases (each maps to a commit family on the branch):** + +1. **kaspa-utils** — add `PeerEndpoint`, RFC 1123 strict validator, async `resolve` with the 5-second `tokio::time::timeout` wrapper. ~679 lines added; 25 unit tests covering parse / borsh round-trip / serde wrapping / display canonicalization / OS-resolver smoke / timeout-wrapper / RFC 1123 validator coverage. +2. **kaspa-rpc-core** — `AddPeerRequest` borsh v1↔v2 dispatch (Address emits v1, Hostname emits v2) and `GetMetricsResponse` v1↔v2 dispatch carrying the new `peer_hostname_metrics` field; 7 `AddPeerRequest` borsh fixture tests + 3 `GetMetricsResponse` v1↔v2 fixture tests. +3. **kaspa-connectionmanager** — `HostnameRegistry` (atomic-pair `last_refresh` / `stale_reason` race-detection contract), `HostnameResolver` trait + `TokioHostnameResolver` production impl + `FakeHostnameResolver` test seam, `add_endpoint_request` infallible API, `start_hostname_refresh_loop` driver. ~1752 lines added across `hostname.rs` (new), `lib.rs`, and `test_support.rs` (new); 22 `HostnameRegistry` tests + 2 `HostnameMetrics` tests + 1 helper-function test. +4. **kaspad** — wire hostname endpoints + `--hostname-refresh-interval` flag with the disjoint accept set + 3 new metric definitions plumbed through `RpcApi::get_metrics` to `kaspa-cli getmetrics`. +5. **testing/integration** — 5 RPC integration tests covering hostname / IP / wire-version paths + 8 kaspad-startup integration tests covering hostname / IP / IPv6 / periodic-refresh / dial-failure-re-resolve / unresolvable-keeps-running / resolve-timeout-metric paths. + +## Acceptance Tests + +Test names are the canonical identifiers the reference implementation lands; verifiers should grep for these symbols in `utils/src/networking.rs`, `rpc/core/src/model/message.rs`, `components/connectionmanager/src/hostname.rs`, and `testing/integration/src/{rpc_tests,daemon_integration_tests}.rs` of the reference branch. + +### `kaspa-utils` (unit, 25 tests in `utils/src/networking.rs`) + +Parse — `peer_endpoint_parses_ipv4`, `peer_endpoint_parses_ipv6_bracketed`, `peer_endpoint_parses_hostname_no_port`, `peer_endpoint_parses_hostname_with_port`, `peer_endpoint_parses_hostname_subdomain`, `peer_endpoint_parses_hostname_lowercases_input`, `peer_endpoint_parses_trailing_dot_canonical`. + +Reject — `peer_endpoint_rejects_underscore`, `peer_endpoint_rejects_double_dot`, `peer_endpoint_rejects_label_too_long`, `peer_endpoint_rejects_total_too_long`, `peer_endpoint_rejects_all_numeric_rightmost_label`, `peer_endpoint_rejects_garbage`. + +Borsh / serde / display — `peer_endpoint_borsh_roundtrip_address`, `peer_endpoint_borsh_roundtrip_hostname`, `peer_endpoint_display_canonical`, `peer_endpoint_serde_deserialize_error_wraps_to_canonical_shape`, `peer_endpoint_serde_deserialize_error_wraps_inside_struct_field`. + +Resolve — `peer_endpoint_resolve_address_variant`, `peer_endpoint_resolve_localhost` (the only test that depends on the host OS resolver; covers the `tokio::net::lookup_host` feature gate end-to-end), `peer_endpoint_resolve_timeout_wrapper_fires`, `peer_endpoint_resolve_timeout_wrapper_returns_payload`, `peer_endpoint_resolve_timeout_wrapper_propagates_io_error` (the three timeout-wrapper tests use `#[tokio::test(start_paused = true)]` to drive the 5-second timeout under hermetic time without a real DNS round-trip). + +Validator — `hostname_validator_accepts_rfc1123_examples`, `hostname_validator_rejects_only_dot_and_double_trailing_dots`. + +### `kaspa-rpc-core` borsh fixtures (10 tests in `rpc/core/src/model/message.rs`) + +`AddPeerRequest` (7) — `add_peer_request_v1_byte_buffer_decodes_to_address`, `add_peer_request_address_emits_v1_byte_identical` (Address-variant from a post-KIP client serializes byte-identically to the v1 fixture so a v1-only server still decodes it), `add_peer_request_address_v1_roundtrip_for_ipv6_and_no_port`, `add_peer_request_v2_roundtrip_hostname`, `add_peer_request_v2_string_payload_format`, `add_peer_request_v2_macro_framed_roundtrip`, `add_peer_request_v2_invalid_endpoint_yields_typed_rpc_error`. + +`GetMetricsResponse` v1↔v2 (3) — `get_metrics_response_v1_byte_buffer_decodes_with_no_hostname_metrics`, `get_metrics_response_v2_emit_roundtrip_with_hostname_metrics`, `get_metrics_response_v2_emit_with_none_hostname_metrics_preserves_v2_tag` (pin the version-tag-by-constant contract — v2 emit does NOT downgrade to v1 when `peer_hostname_metrics: None`). + +### `kaspa-connectionmanager` (unit + integration, 25 tests in `components/connectionmanager/src/hostname.rs` with `FakeHostnameResolver`) + +`HostnameRegistry` (22) — `registry_upsert_inserts_initial_set`, `registry_refresh_observes_added_ip`, `registry_refresh_observes_removed_ip`, `registry_refresh_total_churn`, `registry_refresh_idempotent_when_unchanged`, `registry_refresh_failure_does_not_drop_entry`, `registry_refresh_failure_independent_per_host`, `registry_refresh_clears_failures_on_success`, `registry_permanent_hostname_survives_consecutive_failures`, `registry_multi_record_resolution_reports_all_added`, `registry_mark_stale_clears_last_refresh`, `registry_host_for_socket_round_trips`, `registry_pending_refreshes_skips_when_cadence_unmet`, `registry_pending_refreshes_labels_per_entry_trigger`, `registry_pending_refreshes_periodic_empty_eligible_as_periodic`, `registry_apply_preserves_concurrent_mark_stale`, `registry_apply_advances_anchor_when_no_concurrent_mark_stale`, `registry_apply_benign_double_stale_interleaving`, `registry_apply_preserves_rival_stamp_under_none_snapshot`, `registry_apply_periodic_empty_marks_stale_when_no_concurrent_mark_stale`, `registry_apply_periodic_empty_preserves_concurrent_mark_stale`, `registry_refresh_all_empty_marks_periodic_empty_and_records_failed` — covering upsert / refresh diff / failure independence / `mark_stale` atomic-pair / per-entry `(stale_reason, last_refresh)` race detection / `Ok(empty)` outcome-axis extension / cadence eligibility. + +`HostnameMetrics` (2) — `metrics_record_initial_ok_and_failed_separately`, `metrics_record_refresh_periodic_then_dial_failure_buckets`. + +Helper (1) — `refresh_enabled_zero_means_disabled` (`Duration::ZERO` disables periodic refresh). + +### `kaspa-testing-integration` RPC (5 tests in `testing/integration/src/rpc_tests.rs`) + +`rpc_add_peer_hostname_localhost_success`, `rpc_add_peer_hostname_unresolvable_accepts` (asserts `Ok(AddPeerResponse{})` + endpoint-registered + `initial_failed` counter incremented + node-still-up — Bitcoin Core parity), `rpc_add_peer_ipv4_unchanged`, `rpc_add_peer_v1_wire_compat`, `rpc_add_peer_v2_emit_default`. + +### `kaspa-testing-integration` kaspad startup (8 tests in `testing/integration/src/daemon_integration_tests.rs`) + +`kaspad_addpeer_hostname_localhost_starts`, `kaspad_addpeer_hostname_unresolvable_keeps_running` (asserts daemon stays alive on the unresolvable-hostname path via RPC liveness probe + `initial_failed` counter ≥ 1 after `2 × refresh_interval`; orthogonal to the §S5 / KAS-947 spec §15.5.c rejection-band gate, which DOES emit `clap::Error::ValueValidation` for `--hostname-refresh-interval` values in `1..30`), `kaspad_addpeer_hostname_resolve_timeout_metric` (companion that drives the 5-second resolve-timeout path through `FakeHostnameResolver::set_timeout` and asserts the `initial_failed` counter increments under the timeout arm), `kaspad_addpeer_ipv4_unchanged`, `kaspad_addpeer_ipv6_unchanged`, `kaspad_periodic_refresh_observed`, `kaspad_periodic_refresh_disabled_with_zero_interval`, `kaspad_dial_failure_re_resolves`. + +### Bitcoin Core parity smoke (manual, on testnet-10) + +A live-network smoke procedure against `foundation` (or any operator-controlled tn-10 peer) exercises hostname `--addpeer`, dial-failure-triggered re-resolve, periodic refresh observability via `kaspa-cli getmetrics`, multi-record reconciliation, and disappeared-IP handling. Procedure detailed in the rusty-kaspa branch's `KAS-947` validation report. + +## Bitcoin Core parity rationale + +This KIP takes Bitcoin Core as the parity reference: study Bitcoin's approach for analogous features, match where Bitcoin is correct, improve where Bitcoin is suboptimal. Each behavior in the Specification is verified against Bitcoin Core source-of-truth at HEAD `8f4a3ba` (Bitcoin Core master at 2026-05-03): + +| KIP behavior | Bitcoin Core equivalent | Source citation | +|---|---|---| +| `--addpeer` accepts hostnames | `bitcoind -addnode` accepts hostnames | `bitcoin/bitcoin/src/init.cpp:void SetupServerArgs` (~line 464; `-addnode=` registered as `ALLOW_ANY` at ~line 544); `bitcoin/bitcoin/src/init.cpp:bool AppInitMain` (~line 1426) wires `connOptions.m_added_nodes = args.GetArgs("-addnode")` (~line 2112) — no DNS at startup | +| Hostname stored raw at startup; no DNS at registration | `CConnman::AddNode` pushes raw string into `m_added_node_params`; only error path is "duplicate" | `bitcoin/bitcoin/src/net.cpp:CConnman::AddNode` (~line 3740) | +| RPC `AddPeer` returns `Ok` even if unresolvable | `addnode "add"` calls `connman.AddNode(...)`; only error path is `RPC_CLIENT_NODE_ALREADY_ADDED`; resolution failure is never an error here | `bitcoin/bitcoin/src/rpc/net.cpp:static RPCMethod addnode` (~line 319) — `if (command == "add")` at ~line 365 | +| Background dial loop retries periodically | `CConnman::ThreadOpenAddedConnections` runs `while(true)` with `m_interrupt_net->sleep_for(tried ? 60s : 2s)` | `bitcoin/bitcoin/src/net.cpp:CConnman::ThreadOpenAddedConnections` (~line 2974); sleep at ~line 2998 | +| DNS at dial time, empty result on failure (no exception) | `CConnman::ConnectNode` calls `Lookup(pszDest, default_port, fNameLookup && !HaveNameProxy(), 256)`; empty vector means resolution failure → `connect_to` push of empty `addrConnect` → `IsValid()` skip | `bitcoin/bitcoin/src/net.cpp:CConnman::ConnectNode` (~line 377; `Lookup` call at ~line 411) | +| Underlying DNS function returns empty on invalid input | `LookupHost` returns `std::vector{}` for invalid input — no exception, no abort | `bitcoin/bitcoin/src/netbase.cpp:std::vector LookupHost` (~line 173) | + +**Where kaspa improves on Bitcoin:** +- **Strict RFC 1123 parse-time validation.** Bitcoin Core's `LookupHost` silently rejects malformed input by returning an empty vector — operationally equivalent to a typo never resolving. kaspa's `PeerEndpoint::parse` rejects malformed input at parse time with a structured error, giving operators immediate feedback on typos vs. patient retry on legitimately-unresolvable-right-now hostnames. +- **First-class metrics surface.** Bitcoin operators inspect `getaddednodeinfo` for resolution state; kaspa adds `peer_hostname_resolutions_total{status, trigger}` to `kaspa-cli getmetrics`, which is more amenable to Prometheus/alerting integrations. +- **Configurable refresh interval with explicit-disable knob.** Bitcoin's 60-second retry is hard-coded; kaspa exposes `--hostname-refresh-interval` (default 600 s — conservative for k8s-scale fleets — with `0` to disable periodic refresh entirely). + +## References + +1. Bitcoin Core source-of-truth at HEAD `8f4a3ba` (2026-05-03): https://github.com/bitcoin/bitcoin +2. rusty-kaspa Issue #947 — original motivating use case: https://github.com/kaspanet/rusty-kaspa/issues/947 +3. RFC 1123 §2.1 — DNS hostname syntax requirements: https://datatracker.ietf.org/doc/html/rfc1123 +4. Bitcoin Core 0.10.0 release notes — initial `-addnode` hostname support: https://bitcoin.org/en/release/v0.10.0 +5. `tokio::net::lookup_host` (async DNS resolution): https://docs.rs/tokio/latest/tokio/net/fn.lookup_host.html +6. The reference rusty-kaspa branch and full design spec are in the rusty-kaspa repository: see Reference Implementation above. + +## Authors + +Dmitry Perchanov + +## License + +This KIP is licensed under [CC0-1.0 (Public Domain)](https://creativecommons.org/publicdomain/zero/1.0/), matching the kaspanet/kips repository convention. + +## Status + +Proposed.