diff --git a/.github/prompts/review-docs.prompt.md b/.github/prompts/review-docs.prompt.md index 2a4569e9..c39a478e 100644 --- a/.github/prompts/review-docs.prompt.md +++ b/.github/prompts/review-docs.prompt.md @@ -13,6 +13,7 @@ Review the documentation for clarity, completeness, and accuracy. - H1 titles under `docs/how-to` should start with "How to". - Section headers and index entries should all use sentence case (not title case). - Known product names should be capitalized consistently throughout the documentation. +- Spelling according to US English conventions. ## Context diff --git a/docs/how-to/configure-and-tune/evaluate-telemetry-volume.md b/docs/how-to/configure-and-tune/evaluate-telemetry-volume.md index cb657f97..38ffe4e0 100644 --- a/docs/how-to/configure-and-tune/evaluate-telemetry-volume.md +++ b/docs/how-to/configure-and-tune/evaluate-telemetry-volume.md @@ -11,6 +11,21 @@ In order to correctly size the VM(s) needed for COS, you need to know how much t ## Metrics rate + +### Manual evaluation +Find out the metrics endpoint manifest for each observed workload. If it is not documented, +you will need to manually count the number of non-comment lines served on the metrics endpoint, +for example: + +```bash +curl -sf localhost:8080/metrics | grep -v "^# " | wc -l +``` + +This will give you the number of timeseries that will be created for the workload, per unit. + +Another option is to deploy a temporary pilot Prometheus charm. + +### With charmed Prometheus Have your deployment sending all metrics to Prometheus (or Mimir) and inspect the 48hr plot for `count({__name__=~".+"})`. The raw data can also be obtained by querying the Prometheus `query` endpoint directly: @@ -36,8 +51,13 @@ load[load generator] ---|db| postgresql postgresql ---|metrics-endpoint| prometheus ``` - ## Logs rate +### Manual evaluation +The most reliable way to evaluate the logging rate of a workload is with load tests. + +Another option is to deploy temporary pilot Loki and Prometheus charms. + +### With charmed Loki and Prometheus Have your deployment sending all logs to Loki, and inspect the 48hr plot for `loki_distributor_*_received_total`: ``` diff --git a/docs/how-to/deploy-and-manage/index.md b/docs/how-to/deploy-and-manage/index.md index 339dbbda..0777d304 100644 --- a/docs/how-to/deploy-and-manage/index.md +++ b/docs/how-to/deploy-and-manage/index.md @@ -14,14 +14,10 @@ These guides cover deploying, upgrading, managing, and securing access to COS. See our [tutorials](/tutorial/index) for guidance on deploying COS. -## Upgrades - -Move between COS revisions with confidence. - ```{toctree} :maxdepth: 1 -Cross-track upgrade instructions +Install ``` ## Secure access @@ -34,3 +30,14 @@ Protect and expose COS endpoints for production traffic. Configure TLS encryption Configure ingress ``` + +## Upgrades + +Move between COS revisions with confidence. + +```{toctree} +:maxdepth: 1 + +Cross-track upgrade instructions +``` + diff --git a/docs/how-to/deploy-and-manage/install.md b/docs/how-to/deploy-and-manage/install.md new file mode 100644 index 00000000..a14a7176 --- /dev/null +++ b/docs/how-to/deploy-and-manage/install.md @@ -0,0 +1,83 @@ +--- +myst: + html_meta: + description: "Install the Canonical Observability Stack: preparation checklist covering sizing, networking, storage, and deployment options." +--- + +# How to install COS + +## Preparation + +Before deploying COS or COS Lite, work through the items below. + +### COS flavor + +The [flavor of COS](/explanation/overview/what-is-cos) to install depends on your use-case. +If you want to install on edge devices, then COS Lite is likely the right choice; otherwise +you should probably go with "full" COS. + +```{mermaid} +graph LR + +subgraph env["Monitored environment"] +opentelemetry-collector +end + +subgraph k8s["K8s cluster"] +COS +end + +subgraph pc["Public cloud"] +cos-alerter["COS Alerter"] +end + +subgraph storage["Storage cluster"] +S3 +end + +opentelemetry-collector ---|telemetry| COS +COS --- S3 +COS --- cos-alerter +``` + +### Kubernetes cluster + +Deploy COS on a high-availability Kubernetes cluster with at least 3 control plane nodes. + +### Sizing + +Use the [sizing guide](/reference/system-requirements) to determine the minimum hardware for your deployment. +If you don't yet know how much telemetry your workloads generate, start with [How to evaluate telemetry volume](/how-to/configure-and-tune/evaluate-telemetry-volume). + +Follow the [storage best practices](/reference/storage) to set up a distributed storage backend with a replication factor of 3. +Do **not** use `hostPath` storage in production. + +### Configure networking + +Review the [networking best practices](/reference/networking) and ensure: + +- A load balancer (for example, MetalLB) is available to give Traefik a stable IP. +- Egress is open for Charmhub, the Juju OCI registry, and Snapcraft. + +### Plan for TLS + +Production deployments should use TLS. +See [How to configure TLS encryption](/how-to/deploy-and-manage/configure-tls-encryption) for the available modes and what you need to prepare (for example, an external certificates provider). + +### Authentication and authorization +Only the Grafana and Traefik charms support auth. +For exposing Grafana publicly, use two Traefik charms, one for internal connections, and another for external access, which will provide ingress to Grafana. + +### Dedicated Juju controller and model + +You should bootstrap a dedicated Juju controller and model just for COS. + +## Create Terraform plan + +```hcl +module ... { ... } +``` + +## Deploy COS Alerter + +COS Alerter is a watchdog service for COS you should deploy on a physically different cloud. \ No newline at end of file