diff --git a/docs/conf.py b/docs/conf.py index a1412d98..044d97cb 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -217,6 +217,7 @@ "https://github.com/canonical/ACME/*", "troubleshooting/", "https://github.com/canonical/observability-stack//terraform/cos-lite", + r"https://matrix\.to/.*", ] @@ -275,6 +276,7 @@ exclude_patterns = [ "doc-cheat-sheet*", + ".venv", ] # Adds custom CSS files, located under 'html_static_path' diff --git a/docs/explanation/design-goals.md b/docs/explanation/design-goals.md index d4e02fb5..082ce2c1 100644 --- a/docs/explanation/design-goals.md +++ b/docs/explanation/design-goals.md @@ -12,7 +12,7 @@ There are several design goals we want to accomplish with COS: * Provide a set of high-quality observability charmed operators that are designed to work well on their own, and better together. -* Make COS run on Kubernetes, with specific focus on [MicroK8s](https://microk8s.io/), to achieve a very "appliance-like" user experience. +* Make COS run on Kubernetes, with specific focus on [MicroK8s](https://canonical.com/microk8s), to achieve a very "appliance-like" user experience. * Ensure a consistent, cohesive experience: all alerts go through Alertmanager, Grafana can plot all telemetry, etc. diff --git a/docs/explanation/telemetry-labels.md b/docs/explanation/telemetry-labels.md index 7e852eab..07b21aef 100644 --- a/docs/explanation/telemetry-labels.md +++ b/docs/explanation/telemetry-labels.md @@ -150,5 +150,5 @@ This is useful for: See also: - [reference/juju-topology-labels](Juju topology labels) -- [How relabeling in Prometheus works](https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works) +- [How relabeling in Prometheus works](https://grafana.com/blog/how-relabeling-in-prometheus-works/) - [PromLens Relabeler](https://relabeler.promlabs.com/). diff --git a/docs/how-to/exposing-a-metrics-endpoint.md b/docs/how-to/exposing-a-metrics-endpoint.md index 7e121a3d..86e872f0 100644 --- a/docs/how-to/exposing-a-metrics-endpoint.md +++ b/docs/how-to/exposing-a-metrics-endpoint.md @@ -52,6 +52,32 @@ class ScrapableCharm: }]) ``` +The `*` wildcard in the target address is the most common pattern. When the +`prometheus_scrape` charm library generates the scrape configuration, it +expands the wildcard into one scrape target per unit and enriches each one +with the corresponding `juju_unit` topology label. + +If your workload requires explicit hostname or IPs instead of wildcards (for +example, for TLS with strict `SNI` validation), you can use fully-qualified +addresses as targets: + +```python +class ScrapableCharm: + # ... + def __init__(self, *args): + # ... + self.metrics_endpoint_provider = MetricsEndpointProvider( + self, + jobs=[{ + "static_configs": [{ + "targets": ["myapp-0.myapp-endpoints.mymodel.svc.cluster.local:8080"] + }], + }]) +``` + +Non-wildcard targets whose host matches a known unit address or FQDN are +also enriched with the `juju_unit` label, just like wildcard targets. + ## Declaring the relation As a last step, you need to declare the relation in your charms `metadata.yaml` file. @@ -62,5 +88,5 @@ provides: interface: prometheus_scrape ``` -Congratulations! You will now be able to add an integration between your charm -and a scraper! \ No newline at end of file +Congratulations! You will now be able to add an integration between your charm +and a scraper! diff --git a/docs/how-to/integrating-cos-lite-with-uncharmed-applications.md b/docs/how-to/integrating-cos-lite-with-uncharmed-applications.md index 73db6bbc..de21347d 100644 --- a/docs/how-to/integrating-cos-lite-with-uncharmed-applications.md +++ b/docs/how-to/integrating-cos-lite-with-uncharmed-applications.md @@ -101,7 +101,7 @@ See [this guide](https://github.com/canonical/cos-configuration-k8s-operator#dep To enable secure communications with (and within) COS Lite, deploy COS Lite with the [TLS overlay](https://github.com/canonical/cos-lite-bundle/pull/80). -You can follow [this guide](https://charmhub.io/traefik-k8s/docs/tls-termination) to enable TLS in Traefik and COS Lite. +You can follow [this guide](https://documentation.ubuntu.com/observability/track-2/how-to/configure-tls-encryption/) to enable TLS in Traefik and COS Lite. ### Grafana Agent snap as a client As a client (e.g. scraping `/metrics` endpoint), Grafana Agent must trust the CA that signed the COS charms (or the COS @@ -114,7 +114,7 @@ juju run ssc/0 get-ca-certificate --format=yaml \ | yq '.ssc/0.results.ca-certificate' ``` -Next, you need to [add the certificate to the root store](https://documentation.ubuntu.com/server/how-to/security/install-a-root-ca-certificate-in-the-trust-store/index.html). +Next, you need to [add the certificate to the root store](https://ubuntu.com/server/docs/how-to/security/install-a-root-ca-certificate-in-the-trust-store/). > Note: After running `update-ca-certificates` and restarting the `grafana-agent` snap service, check the Grafana Agent > logs to confirm there are no log lines such as: diff --git a/docs/how-to/selectively-drop-telemetry.md b/docs/how-to/selectively-drop-telemetry.md index b159aaa4..0af283d3 100644 --- a/docs/how-to/selectively-drop-telemetry.md +++ b/docs/how-to/selectively-drop-telemetry.md @@ -179,7 +179,7 @@ processors: - The [OTLP data model](https://betterstack.com/community/guides/observability/otlp/#the-otlp-data-model) - Official docs: [``](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config) - [Dropping metrics at scrape time with Prometheus](https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus/) (robustperception, 2015) -- [How relabeling in Prometheus works](https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/) (grafana.com, 2022) +- [How relabeling in Prometheus works](https://grafana.com/blog/how-relabeling-in-prometheus-works/) (grafana.com, 2022) - [How to drop and delete metrics in Prometheus](https://tanmay-bhat.github.io/posts/how-to-drop-and-delete-metrics-in-prometheus/) (gh:tanmay-bhat, 2022) - Playgrounds: - https://demo.promlens.com/ diff --git a/docs/how-to/troubleshooting/troubleshoot-gateway-address-unavailable.md b/docs/how-to/troubleshooting/troubleshoot-gateway-address-unavailable.md index 5eaa8758..eaf80030 100644 --- a/docs/how-to/troubleshooting/troubleshoot-gateway-address-unavailable.md +++ b/docs/how-to/troubleshooting/troubleshoot-gateway-address-unavailable.md @@ -13,7 +13,7 @@ apply, although you will need to tailor the exact steps and commands to your set ## Checklist - You have run `juju trust traefik --scope=cluster` -- The [MetalLB MicroK8s add-on](https://microk8s.io/docs/addon-metallb) is enabled. +- The [MetalLB MicroK8s add-on](https://canonical.com/microk8s/docs/addon-metallb) is enabled. - Traefik's service type is ``LoadBalancer``. - An external IP address is assigned to Traefik. @@ -54,7 +54,7 @@ This can happen when: - MetalLB has only one IP in its range but you deployed two instances of Traefik, or when Traefik is forcefully removed (`--force --no-wait`) and a new Traefik app is deployed immediately after. -- The [ingress](https://microk8s.io/docs/ingress) add-on is enabled. It's possible +- The [ingress](https://canonical.com/microk8s/docs/ingress) add-on is enabled. It's possible that Nginx from the ingress add-on has claimed the `ExternalIP`. Disable Nginx and re-enable MetalLB. diff --git a/docs/how-to/troubleshooting/troubleshoot-no-data-in-grafana-panels.md b/docs/how-to/troubleshooting/troubleshoot-no-data-in-grafana-panels.md index 7f3e6656..6691ddeb 100644 --- a/docs/how-to/troubleshooting/troubleshoot-no-data-in-grafana-panels.md +++ b/docs/how-to/troubleshooting/troubleshoot-no-data-in-grafana-panels.md @@ -11,17 +11,17 @@ Perhaps you had "no data" all along or it started happening only recently. ## Inspect variable values -Drop-down [variables](https://grafana.com/docs/grafana/latest/dashboards/variables/) +Drop-down [variables](https://grafana.com/docs/grafana/latest/visualizations/dashboards/variables/) could be filtering out data incorrectly. Under dashboard settings, inspect the current values of the variables. - If you can find a combination of dropdown selections that results in data being shown, then - perhaps the offered variable options should be [narrowed down](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#add-a-query-variable) with a more accurate query. + perhaps the offered variable options should be [narrowed down](https://grafana.com/docs/grafana/latest/visualizations/dashboards/variables/add-template-variables/) with a more accurate query. - If the options listed in the dropdown are missing items you expect to be there, then the datasource might be missing some telemetry, or perhaps we refer to a metric that does not exist, or apply a combination of labels that does not produce a result. ## Confirm the query is valid -[Edit the panel](https://grafana.com/docs/grafana/latest/panels-visualizations/panel-editor-overview/) +[Edit the panel](https://grafana.com/docs/grafana/latest/visualizations/panels-visualizations/panel-editor-overview/) and incrementally simplify the faulty query, until data shows up. For example, - drop label matchers diff --git a/docs/index.rst b/docs/index.rst index 0893c9b5..df3bb722 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -54,6 +54,6 @@ and constructive feedback. * `Join the Matrix community chat `_ * `Contribute on GitHub `_ -* `Code of conduct `_ +* `Code of conduct `_ * `Canonical contributor license agreement `_ diff --git a/docs/reference/best-practices/storage.md b/docs/reference/best-practices/storage.md index 94d75b70..43d60550 100644 --- a/docs/reference/best-practices/storage.md +++ b/docs/reference/best-practices/storage.md @@ -13,7 +13,7 @@ has a growth rate of about 50GB per day under normal operations. So, if you want a retention interval of about two months, you'll need 3TB of storage only for the telemetry. ## Set up distributed storage -In production, **do not** use hostPath storage ([`hostpath-storage`](https://microk8s.io/docs/addon-hostpath-storage) in MicroK8s; `local-storage` in Canonical K8s): +In production, **do not** use hostPath storage ([`hostpath-storage`](https://canonical.com/microk8s/docs/addon-hostpath-storage) in MicroK8s; `local-storage` in Canonical K8s): - `PersistentVolumeClaims` created by the host path storage provisioner are bound to the local node, so it is *impossible to move them to a different node*. - A `hostpath` volume can *grow beyond the capacity set in the volume claim manifest*. @@ -22,5 +22,5 @@ Use Ceph CSI. Refer to Canonical Kubernetes [snap](https://documentation.ubuntu. and [charm](https://documentation.ubuntu.com/canonical-kubernetes/latest/charm/howto/ceph-csi/) docs. ### MicroK8s -Use the [`rook-ceph`](https://microk8s.io/docs/addon-rook-ceph) add-on together with Microceph. -See the [Microceph tutorial](https://microk8s.io/docs/how-to-ceph). +Use the [`rook-ceph`](https://canonical.com/microk8s/docs/addon-rook-ceph) add-on together with Microceph. +See the [Microceph tutorial](https://canonical.com/microk8s/docs/how-to-ceph). diff --git a/docs/reference/best-practices/topology.md b/docs/reference/best-practices/topology.md index ff91afe7..2fe20a28 100644 --- a/docs/reference/best-practices/topology.md +++ b/docs/reference/best-practices/topology.md @@ -120,5 +120,5 @@ end ## References - High availability: [Canonical K8s](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/high-availability/), - [MicroK8s](https://microk8s.io/docs/high-availability). + [MicroK8s](https://canonical.com/microk8s/docs/high-availability). diff --git a/docs/reference/juju-topology-labels.md b/docs/reference/juju-topology-labels.md index beb75891..bbed8587 100644 --- a/docs/reference/juju-topology-labels.md +++ b/docs/reference/juju-topology-labels.md @@ -29,10 +29,21 @@ Incidental dashboards coming in from a git repository via the `cos-configuration When dashboards are forwarded through a `grafana-agent` intermediary, the juju topology labels of the charm of origin are injected (and not `grafana-agent`'s). Any subsequent chaining to additional grafana agent charms would leave the labels intact. ### Charms relating through `cos-proxy` -`cos-proxy` will apply its own topology to the labels, as old LMA-provider units don't implement the more modern interfaces that we would need to add topology to the telemetry. +`cos-proxy` will apply its own topology to the labels, as old LMA-provider units don't implement the more modern interfaces that we would need to add topology to the telemetry. -## Metrics -Metrics are workload-specific and vary from charm to charm. +## Metrics +Metrics are workload-specific and vary from charm to charm. + +### Charms relating through `metrics-endpoint` + +When a charm relates to `prometheus-k8s`, `opentelemetry-collector-k8s` or `opentelemetry-collector` via the `metrics-endpoint` interface, the `prometheus_scrape` library generates per-unit scrape jobs enriched with all Juju topology labels, including `juju_unit`. + +Scrape targets can be specified in two ways: + +- **Wildcard targets** (e.g. `*:8080`): The wildcard is expanded into one scrape job per unit, each targeting the unit's address and labeled with the corresponding `juju_unit`. +- **Non-wildcard targets** (e.g. `alertmanager-0.alertmanager-endpoints.svc.cluster.local:9093` or `10.1.14.39:8080`): The library matches each target's host (IP address or FQDN) against known unit addresses. Matched targets produce a per-unit scrape job with `juju_unit`, just like wildcard targets. Targets that cannot be matched to any known unit are grouped in a single job with all other topology labels but without `juju_unit`. + +This ensures that metrics from any charmed workload — regardless of how its targets are defined — can be filtered by unit in Grafana dashboards and alert expressions. ### Charms relating through `grafana-agent` (`-k8s` or not) For `grafana-agent`: any metrics coming from the principal charm will be tagged with the topology of the principal unit. The generic Linux metrics coming from the node exporter will be tagged with the grafana-agent unit topology. @@ -71,7 +82,7 @@ In `grafana-agent`, logs scraped from files, such as `/var/log`, will be tagged In `grafana-agent-k8s`, the charm will not modify the topology. ### Charms relating through `cos-proxy` -`cos-proxy` will apply its own topology to the logs. +`cos-proxy` will apply its own topology to the logs. ## Traces Any charm can stream traces to Tempo using the `tracing` charm lib. Usually this is done by sending the traces to a `grafana-agent` (soon to be replaced by the OTEL collector), which forwards them to the COS stack. The agent will be responsible to attach to any trace going through it the juju topology of the unit generating them, if known, or else its own (for uncharmed workloads). diff --git a/docs/reference/security-hardening-guide.md b/docs/reference/security-hardening-guide.md index fd7fd09d..a9bcd356 100644 --- a/docs/reference/security-hardening-guide.md +++ b/docs/reference/security-hardening-guide.md @@ -22,7 +22,7 @@ By default, applications of Charmed Grafana are deployed with a single administr * change this password as described [in the Grafana charm docs](https://github.com/canonical/grafana-k8s-operator?tab=readme-ov-file#web-interface) * consider adding less-privileged accounts as needed (see the [official Grafana Docs](https://grafana.com/docs/grafana/latest/) for how to do this manually inside Grafana) -If you're using the Canonical Identity Platform to manage authentication, this could be used to manage Grafana user accounts directly. See [the Hydra docs](https://charmhub.io/hydra/docs/how-to/integrate-oidc-compatible-charms) for more details. +If you're using the Canonical Identity Platform to manage authentication, this could be used to manage Grafana user accounts directly. See [the Hydra docs](https://canonical-identity.readthedocs-hosted.com/reference/charms/hydra/) for more details. ### Be judicious about what is exposed via an ingress diff --git a/docs/tutorial/installation/cos-lite-microk8s-sandbox.md b/docs/tutorial/installation/cos-lite-microk8s-sandbox.md index b3d2ab2a..0efcf05b 100644 --- a/docs/tutorial/installation/cos-lite-microk8s-sandbox.md +++ b/docs/tutorial/installation/cos-lite-microk8s-sandbox.md @@ -23,7 +23,7 @@ Let's go and deploy that bundle! ## Configure MicroK8s -For the COS Lite bundle deployment to go smoothly, make sure the following MicroK8s [addons](https://microk8s.io/docs/addons) are enabled: `dns`, `hostpath-storage` and `metallb`. +For the COS Lite bundle deployment to go smoothly, make sure the following MicroK8s [addons](https://canonical.com/microk8s/docs/addons) are enabled: `dns`, `hostpath-storage` and `metallb`. You can check this with `microk8s status`, and if any are missing, enable them with @@ -33,7 +33,7 @@ $ microk8s enable dns ```{note} While the following setup is sufficient for non-production environments, if you're looking for a more resilient storage option, -consider deploying MicroCeph on MicroK8s using this [guide](https://microk8s.io/docs/how-to-ceph). +consider deploying MicroCeph on MicroK8s using this [guide](https://canonical.com/microk8s/docs/how-to-ceph). ``` ```bash @@ -57,11 +57,11 @@ $ microk8s kubectl rollout status daemonset.apps/speaker -n metallb-system -w ``` ```{note} -If you have an HTTP proxy configured, you will need to give this information to MicroK8s. See [the proxy documentation](https://microk8s.io/docs/install-proxy) for details. +If you have an HTTP proxy configured, you will need to give this information to MicroK8s. See [the proxy documentation](https://canonical.com/microk8s/docs/install-proxy) for details. ``` ```{note} -By default, MicroK8s will use `8.8.8.8` and `8.8.4.4` as DNS servers, which can be adjusted. See [the DNS documentation](https://microk8s.io/docs/addon-dns) for details. +By default, MicroK8s will use `8.8.8.8` and `8.8.4.4` as DNS servers, which can be adjusted. See [the DNS documentation](https://canonical.com/microk8s/docs/addon-dns) for details. ``` ## Deploy the COS Lite bundle diff --git a/docs/tutorial/instrument-machine-charms.md b/docs/tutorial/instrument-machine-charms.md index a35df1b3..6ba9b237 100644 --- a/docs/tutorial/instrument-machine-charms.md +++ b/docs/tutorial/instrument-machine-charms.md @@ -9,7 +9,7 @@ This tutorial will teach you how to integrate a charm deployed on a machine substrate with the Canonical Observability Stack running on Kubernetes. -The Grafana Agent machine charm handles installation, configuration, and Day 2 operations specific to the [Grafana Agent](https://grafana.com/oss/agent/), using [Juju](https://juju.is). The charm is designed to run in virtual machines as a [subordinate](https://discourse.charmhub.io/t/subordinate-applications/1053). +The Grafana Agent machine charm handles installation, configuration, and Day 2 operations specific to the [Grafana Agent](https://grafana.com/oss/agent/), using [Juju](https://canonical.com/juju). The charm is designed to run in virtual machines as a [subordinate](https://discourse.charmhub.io/t/subordinate-applications/1053). ```{note} Application units are typically run in an isolated container on a machine with no knowledge or access to other applications deployed onto the same machine.