-
Notifications
You must be signed in to change notification settings - Fork 0
Evaluate hashicorp/raft upstream migration #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,279 @@ | ||||||
| # Raft Upstream Migration Evaluation | ||||||
|
|
||||||
| **Date:** 2026-03-24 | ||||||
| **Issue:** #56 | ||||||
| **Status:** Complete | ||||||
|
|
||||||
| ## Executive Summary | ||||||
|
|
||||||
| Orchestrator depends on `openark/raft`, a 2017-era fork of `hashicorp/raft` pinned at commit `fba9f909f7fe` (September 2017). The fork diverges significantly from upstream `hashicorp/raft` v1.7+, which has undergone nine years of active development including security fixes, performance improvements, and major API changes. | ||||||
|
|
||||||
| **Recommendation:** Migrate to upstream `hashicorp/raft` v1.7.x. The migration is moderate effort (estimated 3-5 days of focused work) and eliminates ongoing security and maintenance risk from running unmaintained consensus code. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## 1. Current State | ||||||
|
|
||||||
| ### Replace Directive (go.mod) | ||||||
|
|
||||||
| ``` | ||||||
| replace github.com/hashicorp/raft => github.com/openark/raft v0.0.0-20170918052300-fba9f909f7fe | ||||||
| ``` | ||||||
|
|
||||||
| This redirects all `github.com/hashicorp/raft` imports to a fork from September 18, 2017. The `go.mod` declares `github.com/hashicorp/raft v1.7.3` as the desired version, but the replace directive overrides it entirely. | ||||||
|
|
||||||
| ### Files Using the Raft API | ||||||
|
|
||||||
| | File | Purpose | | ||||||
| |------|---------| | ||||||
| | `go/raft/store.go` | Raft initialization, peer management, command application | | ||||||
| | `go/raft/raft.go` | High-level orchestrator raft operations (leader checks, snapshots, yield) | | ||||||
| | `go/raft/fsm.go` | Finite state machine (Apply, Snapshot, Restore) | | ||||||
| | `go/raft/fsm_snapshot.go` | Snapshot persistence | | ||||||
| | `go/raft/rel_store.go` | SQLite-backed LogStore and StableStore | | ||||||
| | `go/raft/file_snapshot.go` | Custom file-based SnapshotStore | | ||||||
| | `go/raft/http_client.go` | HTTP client for leader communication (not raft API) | | ||||||
| | `go/raft/applier.go` | CommandApplier interface (no raft API) | | ||||||
| | `go/raft/snapshot.go` | SnapshotCreatorApplier interface (no raft API) | | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## 2. Divergent API Catalog | ||||||
|
|
||||||
| ### 2.1 Removed APIs (do not exist in upstream v1.7+) | ||||||
|
|
||||||
| #### `raft.PeerStore` interface and `raft.StaticPeers` | ||||||
|
|
||||||
| - **Location:** `store.go:22` (field type), `store.go:74` (instantiation), `store.go:75` (`SetPeers`) | ||||||
| - **What it does:** The old API used a `PeerStore` interface to track cluster membership. `StaticPeers` was a simple in-memory implementation. | ||||||
| - **Upstream replacement:** Upstream replaced `PeerStore` with the `Configuration` / `ConfigurationStore` system. Peer management is now handled through `raft.Configuration` containing `Server` entries. For bootstrap, use `raft.BootstrapCluster()`. At runtime, query `raft.GetConfiguration()`. | ||||||
| - **Migration effort:** **Moderate.** Must replace `PeerStore` field with configuration-based peer tracking. The `GetPeers()` function (called from `raft.go:299`) must be reimplemented using `raft.GetConfiguration()`. | ||||||
|
|
||||||
| #### `raft.AddUniquePeer()` | ||||||
|
|
||||||
| - **Location:** `store.go:69` | ||||||
| - **What it does:** Helper to add a peer to a string slice if not already present. | ||||||
| - **Upstream replacement:** No direct replacement needed. This is a trivial utility; replace with a local helper or inline dedup logic. | ||||||
| - **Migration effort:** **Trivial.** | ||||||
|
|
||||||
| #### `config.EnableSingleNode` | ||||||
|
|
||||||
| - **Location:** `store.go:83` | ||||||
| - **What it does:** Allows a single node to self-elect as leader without any peers. | ||||||
| - **Upstream replacement:** Use `raft.BootstrapCluster()` to initialize a single-node cluster. This is a one-time operation checked against existing state. | ||||||
| - **Migration effort:** **Moderate.** Need to add bootstrap logic that runs conditionally on first startup. | ||||||
|
|
||||||
| #### `config.DisableBootstrapAfterElect` | ||||||
|
|
||||||
| - **Location:** `store.go:84` | ||||||
| - **What it does:** Controls whether bootstrap info is retained after initial election. | ||||||
| - **Upstream replacement:** Not needed. Upstream `BootstrapCluster()` is inherently a one-time operation. | ||||||
| - **Migration effort:** **Trivial.** Simply remove. | ||||||
|
|
||||||
| #### `raft.NewRaft()` 7-argument constructor | ||||||
|
|
||||||
| - **Location:** `store.go:111` | ||||||
| - **Signature used:** `NewRaft(config, fsm, logStore, stableStore, snapshotStore, peerStore, transport)` | ||||||
| - **Upstream signature:** `NewRaft(config, fsm, logStore, stableStore, snapshotStore, transport)` (6 arguments, no peerStore) | ||||||
| - **What changed:** The `peerStore` argument was removed. Peer/membership information is now managed through `Configuration`. | ||||||
| - **Migration effort:** **Trivial** once PeerStore is removed. Drop the `peerStore` argument. | ||||||
|
|
||||||
| #### `raft.Raft.AddPeer()` / `raft.Raft.RemovePeer()` | ||||||
|
|
||||||
| - **Location:** `store.go:125` (AddPeer), `store.go:137` (RemovePeer) | ||||||
| - **What they do:** Add or remove a peer from the cluster using the old string-based peer identity. | ||||||
| - **Upstream replacement:** `raft.Raft.AddVoter(id, address, prevIndex, timeout)` and `raft.Raft.RemoveServer(id, prevIndex, timeout)`. The new API uses `ServerID` + `ServerAddress` instead of a single string, and requires an index parameter for consistency. | ||||||
| - **Migration effort:** **Moderate.** Must decide on a ServerID scheme (e.g., use address as ID, or introduce a separate ID). Update both `AddPeer` and `RemovePeer` in `store.go`, plus all callers. | ||||||
|
|
||||||
| #### `raft.Raft.Yield()` | ||||||
|
|
||||||
| - **Location:** `raft.go:284` | ||||||
| - **What it does:** Causes the leader to voluntarily step down in favor of a specific peer. This is a custom addition in the openark fork, not present in standard hashicorp/raft. | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description for |
||||||
| - **Upstream replacement:** `raft.Raft.LeadershipTransfer()` (added in upstream) transfers leadership but does not target a specific peer. Upstream also has `LeadershipTransferToServer(id, address)` for targeted transfer. | ||||||
| - **Migration effort:** **Moderate.** `Yield()` is called from `raft.go:284` and from `fsm.go:65` (via the `yield` and `yieldByHint` FSM commands). Must map the yield/yield-hint logic to `LeadershipTransfer()` or `LeadershipTransferToServer()`. The semantics are slightly different: `Yield()` was a local step-down, while `LeadershipTransfer` is a coordinated handoff. | ||||||
|
|
||||||
| #### `raft.Raft.StepDown()` | ||||||
|
|
||||||
| - **Location:** `raft.go:277` | ||||||
| - **What it does:** Forces this node to step down from leadership. | ||||||
| - **Upstream replacement:** Not present as a public method in upstream. However, `LeadershipTransfer()` achieves a similar result. Alternatively, since this is only called in one place, it may be replaceable with `LeadershipTransfer()`. | ||||||
| - **Migration effort:** **Moderate.** Need to evaluate whether `LeadershipTransfer()` is an acceptable substitute or if a different approach is needed. | ||||||
|
|
||||||
| #### `raft.Raft.Leader()` returning `string` | ||||||
|
|
||||||
| - **Location:** `raft.go:235` | ||||||
| - **What it does:** Returns the address of the current leader as a plain string. | ||||||
| - **Upstream replacement:** `raft.Raft.Leader()` in upstream returns `raft.ServerAddress` (a typed string). Additionally, upstream provides `LeaderWithID()` returning both `ServerAddress` and `ServerID`. | ||||||
| - **Migration effort:** **Trivial.** `ServerAddress` is a `string` typedef; a simple type conversion suffices. | ||||||
|
|
||||||
| #### `raft.Raft.LeaderCh()` | ||||||
|
|
||||||
| - **Location:** `raft.go:154` | ||||||
| - **What it does:** Returns a channel that signals leadership changes. | ||||||
| - **Upstream status:** `LeaderCh()` still exists in upstream v1.7.x. | ||||||
| - **Migration effort:** **None.** API is compatible. | ||||||
|
Comment on lines
+109
to
+114
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
|
|
||||||
| ### 2.2 Compatible APIs (exist in both fork and upstream) | ||||||
|
|
||||||
| | API Call | Location | Status | | ||||||
| |----------|----------|--------| | ||||||
| | `raft.DefaultConfig()` | `store.go:48` | Compatible | | ||||||
| | `raft.NewTCPTransport()` | `store.go:60` | Compatible | | ||||||
| | `raft.Raft.Apply()` | `store.go:157` | Compatible | | ||||||
| | `raft.Raft.State()` | `raft.go:248`, `store.go:148` | Compatible | | ||||||
| | `raft.Raft.Snapshot()` | `raft.go:264` | Compatible | | ||||||
| | `raft.Raft.LeaderCh()` | `raft.go:154` | Compatible | | ||||||
| | `raft.Leader` / `raft.Follower` / `raft.Candidate` (state constants) | `raft.go:222,227,249,260` | Compatible | | ||||||
| | `raft.RaftState` type | `raft.go:247` | Compatible | | ||||||
| | `raft.Log` struct | `fsm.go:33`, `rel_store.go:179,196,201` | Compatible | | ||||||
| | `raft.FSMSnapshot` interface | `fsm.go:84` | Compatible | | ||||||
| | `raft.SnapshotSink` interface | `fsm_snapshot.go:35` | Compatible | | ||||||
| | `raft.SnapshotMeta` struct | `file_snapshot.go:55,137,154,191,199,273` | Compatible (but `Peers` field changed to `Configuration`) | | ||||||
|
||||||
| | `raft.SnapshotMeta` struct | `file_snapshot.go:55,137,154,191,199,273` | Compatible (but `Peers` field changed to `Configuration`) | | |
| | `raft.SnapshotMeta` struct | `file_snapshot.go:55,137,154,191,199,273` | Divergent – `Peers` field replaced by `Configuration`; migration requires code changes in `FileSnapshotStore`. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document, dated 2026, repeatedly refers to the 2017 fork as being 'nine years' old. While this is consistent with the document's date, it may be confusing for present-day readers. To improve clarity and make the document more timeless, consider phrasing the age relative to the fork date, for example: '...undergone many years of active development since 2017...'. This phrasing is also more robust against the passage of time. This point applies to lines 9, 172, 176, 229, and 262.