Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# HyperCache

Check failure on line 1 in README.md

View check run for this annotation

Trunk.io / Trunk Check

prettier

Incorrect formatting, autoformat by running 'trunk fmt'

[![Go](https://github.com/hyp3rd/hypercache/actions/workflows/go.yml/badge.svg)][build-link] [![CodeQL](https://github.com/hyp3rd/hypercache/actions/workflows/codeql.yml/badge.svg)][codeql-link] [![golangci-lint](https://github.com/hyp3rd/hypercache/actions/workflows/golangci-lint.yml/badge.svg)][golangci-lint-link]

Expand Down Expand Up @@ -278,7 +278,7 @@

#### Hinted Handoff

When a replica is unreachable during a write, a hint (deferred write) is enqueued locally keyed by the target node ID. Hints have a TTL (`WithDistHintTTL`) and are replayed on an interval (`WithDistHintReplayInterval`). Limits can be applied per node (`WithDistHintMaxPerNode`). Expired hints are dropped; delivered hints increment replay counters. Metrics exposed via the management endpoint allow monitoring queued, replayed, expired, and dropped hints.
When a replica is unreachable during a write, a hint (deferred write) is enqueued locally keyed by the target node ID. Hints have a TTL (`WithDistHintTTL`) and are replayed on an interval (`WithDistHintReplayInterval`). Limits can be applied per node (`WithDistHintMaxPerNode`) as well as globally across all nodes (`WithDistHintMaxTotal` total entries, `WithDistHintMaxBytes` approximate bytes). Expired hints are dropped; delivered hints increment replay counters; globally capped drops increment a separate metric. Metrics exposed via the management endpoint allow monitoring queued, replayed, expired, dropped (transport errors), and globally dropped hints along with current approximate queued bytes.

Test helper methods for forcing a replay cycle (`StartHintReplayForTest`, `ReplayHintsForTest`, `HintedQueueSize`) are compiled only under the `test` build tag to keep production binaries clean.

Expand All @@ -290,7 +290,7 @@

#### Build Tags

The repository uses a `//go:build test` tag to include auxiliary instrumentation and helpers exclusively in test builds (e.g. hinted handoff queue inspection). Production builds omit these symbols automatically.
The repository uses a `//go:build test` tag to include auxiliary instrumentation and helpers exclusively in test builds (e.g. hinted handoff queue inspection). Production builds omit these symbols automatically. Heartbeat peer sampling (`WithDistHeartbeatSample`) and membership state metrics (suspect/dead counters) are part of the experimental failure detection added in Phase 2.

#### Metrics Snapshot

Expand Down
12 changes: 6 additions & 6 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Distributed Backend Roadmap

Check failure on line 1 in ROADMAP.md

View check run for this annotation

Trunk.io / Trunk Check

prettier

Incorrect formatting, autoformat by running 'trunk fmt'

This document tracks the evolution of the experimental `DistMemory` backend into a production‑grade multi‑node cluster in incremental, reviewable phases.

Expand Down Expand Up @@ -58,18 +58,18 @@

Deliverables:

- Gossip/heartbeat loop (k random peers, interval configurable).
- Node state transitions: alive → suspect → dead (timeouts & confirmations).
- Ring rebuild on state change (exclude dead nodes, retain for hint replay until TTL expiry).
- Global hint queue caps (count + bytes) with drop metrics.
- Heartbeat loop with optional random peer sampling (`WithDistHeartbeatSample`) and configurable interval. (Implemented)
- Node state transitions: alive → suspect → dead (timeouts & probe-driven escalation) with metrics for suspect/dead transitions. (Implemented)
- Ring rebuild on state change (exclude dead nodes). (Implemented)
- Global hint queue caps (count + bytes) with drop metrics (`WithDistHintMaxTotal`, `WithDistHintMaxBytes`). (Implemented)

Metrics:

- Heartbeat successes/failures, suspect/dead counters, membership version.
- Heartbeat successes/failures, suspect/dead counters, membership version, global hint drops, approximate queued hint bytes. (Partially implemented; membership version exposed via snapshot API.)

Success Criteria:

- Simulated node failure triggers quorum degradation & hinting; recovery drains hints.
- Simulated node failure triggers quorum degradation & hinting; recovery drains hints. (Covered by failure recovery & hint cap tests.)

### Phase 3: Rebalancing & Key Transfer (Weeks 5–6)

Expand Down
1 change: 1 addition & 0 deletions cspell.config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ words:
- daixiang
- Decr
- depguard
- distconfig
- errcheck
- ewrap
- excludeonly
Expand Down
22 changes: 18 additions & 4 deletions internal/cluster/membership.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ type Membership struct {
mu sync.RWMutex
nodes map[NodeID]*Node
ring *Ring
ver MembershipVersion
}

// NewMembership creates a new membership container bound to a ring.
Expand All @@ -25,11 +26,15 @@ func (m *Membership) Upsert(n *Node) {
m.nodes[n.ID] = n

nodes := make([]*Node, 0, len(m.nodes))
for _, v := range m.nodes {
nodes = append(nodes, v)
for _, v := range m.nodes { // exclude dead nodes from ring ownership
if v.State != NodeDead {
nodes = append(nodes, v)
}
}

m.ver.Next()
m.mu.Unlock()

m.ring.Build(nodes)
}

Expand Down Expand Up @@ -63,11 +68,15 @@ func (m *Membership) Remove(id NodeID) bool { //nolint:ireturn
delete(m.nodes, id)

nodes := make([]*Node, 0, len(m.nodes))
for _, v := range m.nodes { // collect snapshot
nodes = append(nodes, v)
for _, v := range m.nodes { // exclude dead nodes
if v.State != NodeDead {
nodes = append(nodes, v)
}
}

m.ver.Next()
m.mu.Unlock()

m.ring.Build(nodes)

return true
Expand All @@ -83,9 +92,14 @@ func (m *Membership) Mark(id NodeID, state NodeState) bool { //nolint:ireturn
n.Incarnation++

n.LastSeen = time.Now()

m.ver.Next()
}

m.mu.Unlock()

return ok
}

// Version returns current membership version.
func (m *Membership) Version() uint64 { return m.ver.Get() }
15 changes: 15 additions & 0 deletions internal/cluster/version.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
package cluster

import "sync/atomic"

// MembershipVersion tracks a monotonically increasing version for membership changes.
// Used to expose a cheap cluster epoch for clients/metrics.
type MembershipVersion struct { // holds membership epoch
v atomic.Uint64
}

// Next increments and returns the next version.
func (mv *MembershipVersion) Next() uint64 { return mv.v.Add(1) }

// Get returns current version.
func (mv *MembershipVersion) Get() uint64 { return mv.v.Load() }
Loading
Loading