-
Notifications
You must be signed in to change notification settings - Fork 472
Create public preview docs for new Kubernetes operator #19942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a lot of work -- apologies for taking a while on the review.
I have a lot of suggested edits here, but am blocking for the following:
- (Obviously, as you mentioned) Still needs the operator overview doc, and the Preview callout.
- Markdown formatting doesn't match our docs set, as we discussed already.
- Similarly, some style-related notes on capitalization, use of "we" in tutorials, etc.
- Related: Code and output snippets should be presented cleanly and consistently with how we use them in the docs (see my notes).
- In the guidance, some example yaml is missing where it would greatly help clarity. In particular, the instructions in
deploy-cockroachdb-with-kubernetes-operator
on configuring locality are confusing to me. I think these parts can be tightened up. - The two migration docs should more clearly emphasize that they aren't for prod deployments.
- Ensure the "informal" product terminology for the new vs. old operator is consistent throughout.
Otherwise, I took the time to make suggested rewrites, but some of the copy feels more conversational than is helpful (especially in kubernetes-operator-performance
). This isn't a blocking comment per se, and a conversational tone can be nice with such a complicated topic. But at times (in the performance doc, and with the locality stuff I mentioned above), I think it obfuscates the guidance itself. Let me know if you want to discuss. I also replaced passive with active language where I saw it.
src/current/_includes/v25.2/sidebar-data/self-hosted-deployments.json
Outdated
Show resolved
Hide resolved
src/current/_includes/v25.2/sidebar-data/self-hosted-deployments.json
Outdated
Show resolved
Hide resolved
src/current/_includes/v25.2/sidebar-data/self-hosted-deployments.json
Outdated
Show resolved
Hide resolved
src/current/_includes/v25.2/sidebar-data/self-hosted-deployments.json
Outdated
Show resolved
Hide resolved
6e2d630
to
5fec404
Compare
938d9a8
to
152ac55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotating whose reviews are needed where
src/current/_includes/v25.2/orchestration/kubernetes-limitations.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PM question, do we need to start an equivalent release notes page for the CockroachDB operator now? Or should we re-evaluate that for GA once things stabilize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pritesh-lahoti I think we were hoping to align our operator/helm releases to crdb releases and so could perhaps have them in the same release notes. Can you please confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had plans to align the release versions, but not necessarily the release timelines themselves.
For example, a new feature getting introduced within CRDB for a particular release version could also be exposed via Helm/Operator in the same release version [best-effort], so while the release versions would be the same, the timeline of these two events could differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. Something to revisit when we make user-facing changes to the operator while it's in public preview.
|
||
The Operator updates all nodes and triggers a rolling restart of the pods with the new storage capacity. | ||
|
||
To verify that the storage capacity has been updated, run `kubectl get pvc` to view the persistent volume claims (PVCs). It will take a few minutes before the PVCs are completely updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PM question - see above
src/current/v25.2/deploy-cockroachdb-with-kubernetes-operator.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs PM and eng review (this is the introduction page for the new operator):
- For PM: Any feedback on this intro, would you like to introduce the new operator to readers any differently?
- For eng: The "Kubernetes terminology" section below was taken from the existing Kubernetes intro. Are there any changes or new fields you would suggest adding to explain concepts to readers? Consider that this is a very high level introduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. We could optionally also introduce service
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBD on service
, I'd want to learn more about the other use cases for that field.
38eb43e
to
d708151
Compare
Ryan is OOO - addressed all of his comments last week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. We could optionally also introduce service
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting review for now but I am not done going thru. Just have some meetings and wanted to get you the comments thus far so you can take action. Thanks!
src/current/_includes/v25.2/cockroachdb-operator-recommendation.md
Outdated
Show resolved
Hide resolved
The {{ site.data.products.cockroachdb-operator }} is a fully-featured [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) that allows you to deploy and manage CockroachDB self-hosted clusters. | ||
|
||
{{site.data.alerts.callout_info}} | ||
The {{ site.data.products.cockroachdb-operator }} is in [Preview]({% link {{ page.version.version }}/cockroachdb-feature-availability.md %}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to also have the recommendation here re something like "It is the recommended choice for new Kubernetes deployments due to its better support for CockroachDB’s operational best practices."
People on Revenue might be sad but I think we'd all be sadder if they chose the old operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's tough to pitch in the docs as long as we have the "preview" tag. Either way though, I did my best attempt at that pitch in this boilerplate that I spread throughout the existing kubernetes docs:
Cockroach Labs recommends that new deployments of CockroachDB on Kubernetes use the CockroachDB operator. To migrate an existing deployment to use the CockroachDB operator, read the Helm and Public operator migration guides.
That boilerplate just isn't in this specific document.
src/current/_includes/v25.2/sidebar-data/self-hosted-deployments.json
Outdated
Show resolved
Hide resolved
src/current/v25.2/deploy-cockroachdb-with-kubernetes-operator.md
Outdated
Show resolved
Hide resolved
src/current/v25.2/deploy-cockroachdb-with-kubernetes-operator.md
Outdated
Show resolved
Hide resolved
The subject alternative names are based on a release called `my-release` in the `cockroach-ns` namespace. Make sure they match the services created with the release during Helm install. | ||
{{site.data.alerts.end}} | ||
|
||
If you wish to supply certificates with [cert-manager](https://cert-manager.io/), set `cockroachdb.tls.certManager.enabled` to `true`, and `cockroachdb.tls.certManager.issuer` to an IssuerRef (as they appear in certificate resources) pointing to a clusterIssuer or issuer that you have set up in the cluster. The following Kubernetes application describes an example issuer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should cert-manager
be its own section? it does in the values.yaml so we probably want to outline those steps separately from external certs
? I'm a bit out of my depth here tbh and could use eng help @pritesh-lahoti
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to move TLS config to a totally separate section or page, it's a lot of info to try and digest in the middle of a numbered list of instructions. Can tackle that later based on customer feedback if they find it confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense.
src/current/v25.2/deploy-cockroachdb-with-kubernetes-operator.md
Outdated
Show resolved
Hide resolved
src/current/v25.2/deploy-cockroachdb-with-kubernetes-operator.md
Outdated
Show resolved
Hide resolved
cockroach start --locality region=us-central1,zone=us-central1-c,dc=dc2 | ||
~~~ | ||
|
||
Optionally, review the `cockroachdb.crdbCluster.topologySpreadConstraints` configuration and set `topologyKey` to a locality variable that will have distinct values for each node. By default the lowest locality level is `zone`, so the following configuration sets that value as the `topologyKey`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does locality variable
mean in this case? Is it the nodeLabel
or the localityLabel
? I think it's nodeLabel
right? This is currently ambiguous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, we could explain here too why it's important to spread across zones for maximum ersilience from CockroachDB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified to "...set topologyKey
to the nodeLabel
value of a locality level that has..." which is a mouthful but technically accurate.
Thinking about if there's a good landing page to send readers to for locality best practice. Adding a link to https://www.cockroachlabs.com/docs/stable/topology-patterns.html but not sure if that's helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes requested. Thanks again for your help!
The Helm chart creates a public Service that exposes both SQL and gRPC connections over a single power. However, the operator uses a different port for gRPC communication. To ensure compatibility, update the public Service to reflect the correct gRPC port used by the operator. | ||
Apply the updated Service manifest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritesh-lahoti what happens in the meantime? if someone tries to e.g. call a CLI command that hits RPCs will they fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mwang1026 Yes, the gRPC will have downtime until the public service is applied with new config (only if somebody is using public service to call the grpc endpoint).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This got me thinking and I have decided to do this exploration.
Originally Statefulsets runs gRPC and sql on 26257 port. Now, with the operator gRPC will be moved to port 26258. Yes, there will be downtime after all the nodes are migrated and until the service is patched.
After migrating half of replica to the cockroachdb operator, we can patch the service to use the new gRPC port. Now there are replicas to serve on the new port, the gRPC commands works perfectly fine. With this approach there will not be any downtime for gRPC also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will still depend on what pod the command is routed to though right? Regardless I think we should call this out as a temporary limitation. The cli commands I know off the top of my head that call RPCs are cockroach node drain
and cockroach node decommission
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a warning to the helm migration guide about there being a limited-scope downtime specifically for commands that hit RPCs - https://github.com/cockroachdb/docs/pull/19942/files#diff-b3d91dedb7197a1f4d3b45de3839ee0b45f8146a3837981d1e1c3700ec318cbfR28
|
||
For more context on how these rules work, see the [Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/). The [custom resource definition](https://github.com/cockroachdb/helm-charts/blob/master/cockroachdb-parent/charts/cockroachdb/values.yaml) details the fields supported by the operator. | ||
|
||
### Add a pod affinity or anti-affinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritesh-lahoti re: the node anti-affinity that we have hardcoded to allow only one pod per k8s node, is that the only anti-affinity that can't be altered via the helm chart? I'm having a hard time keeping up with these changes 😓 (good problem to have). WDYT about mentioning something about that here if it is the case? And then pointing to podtemplate as needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator hard-codes the pod anti-affinity (and not the node anti-affinity). Right, it makes sense to call it out here.
All of it can be overridden by the corresponding affinity policies specified in the podTemplate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
You should provision an appropriate amount of disk storage for your workload. For recommendations on this, see the [Production Checklist]({% link {{ page.version.version }}/recommended-production-settings.md %}#storage). | ||
|
||
### Expand disk size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pritesh-lahoti this only works with disks that can dynamically scale up right? so like EBS and such? I think it'd be good to make that clear here and what behavior would be if you didn't have EBS like disks (and that we don't recommend doing that and that they should horizontally scale...?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
You can use the operator to configure the CockroachDB logging system. This allows you to output logs to [configurable log sinks]({% link {{ page.version.version }}/configure-logs.md %}#configure-log-sinks) such as file or network logging destinations. | ||
|
||
The logging configuration is defined in a [ConfigMap](https://kubernetes.io/docs/concepts/configuration/configmap/) object, using a key named `logs.yaml`. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is logs.yaml the same format as the standard logs.yaml file that cockroach grabs for logging configs? If so maybe we link to docs about the logs configuration via yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that link in line 238 goes to https://www.cockroachlabs.com/docs/stable/configure-logs.html#configure-log-sinks
I'm on board with adding that link to the values.yaml as a comment.
e0559fc
to
e05da7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, pending fix to 1 minor typo
I reviewed:
- cockroachdb-operator-overview
https://deploy-preview-19942--cockroachdb-docs.netlify.app/docs/v25.2/cockroachdb-operator-overview.html - cockroachdb-operator-recommendation
https://deploy-preview-19942--cockroachdb-docs.netlify.app/docs/releases/kubernetes-operator.html
https://deploy-preview-19942--cockroachdb-docs.netlify.app/docs/v25.2/kubernetes-performance.html - configure-cockroachdb-operator#ingress
https://deploy-preview-19942--cockroachdb-docs.netlify.app/docs/v25.2/configure-cockroachdb-operator.html#ingress - override-templates-cockroachdb-operator
https://deploy-preview-19942--cockroachdb-docs.netlify.app/docs/v25.2/override-templates-cockroachdb-operator.html
Applying batch suggestions from Ryan's review in github Co-authored-by: Ryan Kuo <8740013+taroface@users.noreply.github.com>
Co-authored-by: Florence Morris <58752716+florence-crl@users.noreply.github.com>
Publicly document the new CockroachDB operator (prev. "enterprise operator") as a preview.
TODO: