-
-
Notifications
You must be signed in to change notification settings - Fork 53
feat: add basic support for multi-cluster replication #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
arlyon
wants to merge
5
commits into
alexandrevilain:main
Choose a base branch
from
arlyon:feat/cluster-replication
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
0f3ce93
feat: add basic support for multi-cluster replication
arlyon b22f6aa
Update multi-cluster-replication.md
arlyon 5b620a8
Apply suggestions from code review
arlyon 6c35eff
run `make manifests`
arlyon 4892e87
run `make generate`
arlyon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Multi-cluster replication | ||
|
|
||
| Temporal supports multi-cluster replication. This feature allows you to replicate specific temporal namespaces to a different temporal cluster. This is useful for disaster recovery, or to have a temporal cluster in a different region for latency reasons, or if you want to upgrade the temporal history shard count. | ||
|
|
||
| ## How it works | ||
|
|
||
| To set up multi-cluster replication using the temporal operator, you must first enable global namespaces on the clusters you wish to support, and then assign them a unique failover version. | ||
| This can be configured via the `spec.replicaton` of the `TemporalCluster` resource. Temporal operator automatically configures the remaining fields, and currently hard-codes the failover | ||
| increment to 10, meaning you can have at most one leader and 9 followers. If a cluster fails, the remaining clusters will elect a new primary cluster based with the lowest failover version. The original cluster, if it comes back online, will | ||
| be assigned a new failover version, which is always the lowest multiple of the failover increment (+ initialFailoverVersion) that is greater than the leader cluster's failover version. | ||
|
|
||
| ```yaml | ||
| apiVersion: temporal.io/v1beta1 | ||
| kind: TemporalCluster | ||
| metadata: | ||
| name: prod | ||
| spec: | ||
| replication: | ||
| enableGlobalNamespace: true | ||
| initialFailoverVersion: 1 | ||
| ``` | ||
|
|
||
| For example, in a setup with a leader with `initialFailoverVersion` 1, and a follower with `initialFailoverVersion` 2, since the increment is set to 10 a failure in the leader will flip control to the follower, and increment the leader's failover version to 11. | ||
|
|
||
| ## Starting replication | ||
|
|
||
| Once the two clusters are configured, simply set up connections between them using the temporal CLI. In the future, there may be a way to do this via the operator. | ||
|
|
||
| ```bash | ||
| # port forward to the frontend of the primary cluster | ||
| kubectl port-forward primary-frontend 7233:7233 | ||
|
|
||
| temporal operator cluster upsert --frontend-address secondary-cluster.namespace.svc.cluster.local:7233 --enable-connection true | ||
|
|
||
| # port forward to the frontend of the secondary cluster | ||
| kubectl port-forward secondary-frontend 7233:7233 | ||
|
|
||
| temporal operator cluster upsert --frontend-address primary-cluster.namespace.svc.cluster.local:7233 --enable-connection true | ||
| ``` | ||
|
|
||
| ## Replicating namespaces | ||
|
|
||
| Once the clusters are connected, you can replicate namespaces between them. This can be done via the temporal CLI, or via the operator. Simply create a new namespace resource for just the primary clusterRef, and add the secondary cluster to the list of clusters. The namespace will be added to the secondary cluster automatically, with all the same settings, and start receiving updates. | ||
|
|
||
| ```yaml | ||
| apiVersion: temporal.io/v1beta1 | ||
| kind: TemporalNamespace | ||
| metadata: | ||
| name: primary-namespace | ||
| spec: | ||
| clusterRef: | ||
| name: primary | ||
| clusters: | ||
| - primary | ||
| - secondary | ||
| activeClusterName: secondary | ||
| isGlobalNamespace: true | ||
| ``` | ||
|
|
||
| | **🚨 Note**: Enabling replication will not automatically replicate old workflows. It only replicates workflows as they are interacted with. For cases like trying to increase the shard count, this is important as you need to make sure each workflow has been evaluated at least once after replication has been set up. | ||
|
|
||
| ## A mechanism for increasing the history shard count | ||
|
|
||
| Since temporal 1.20, replicated clusters do not require the same number of history shards. This means it is a viable method for migrating a cluster that has outgrown its shard count. To do this, simply have your secondary cluster use a higher shard count than the primary cluster. The only requirement is that the shard count on the secondary cluster is an even multiple of the first. So if you have 512 shards on the primary cluster, you can have 1024 shards (or any other multiple) on the secondary cluster, but not 1023. | ||
|
|
||
| When replication is complete, simply take down the old cluster, and flip your clients over to the new cluster. This can all be achieved with very little downtime. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.