Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@
"https://github.com/canonical/ACME/*",
"troubleshooting/",
"https://github.com/canonical/observability-stack//terraform/cos-lite",
r"https://matrix\.to/.*",
]


Expand Down Expand Up @@ -275,6 +276,7 @@

exclude_patterns = [
"doc-cheat-sheet*",
".venv",
]

# Adds custom CSS files, located under 'html_static_path'
Expand Down
2 changes: 1 addition & 1 deletion docs/explanation/design-goals.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ There are several design goals we want to accomplish with COS:

* Provide a set of high-quality observability charmed operators that are designed to work well on their own, and better together.

* Make COS run on Kubernetes, with specific focus on [MicroK8s](https://microk8s.io/), to achieve a very "appliance-like" user experience.
* Make COS run on Kubernetes, with specific focus on [MicroK8s](https://canonical.com/microk8s), to achieve a very "appliance-like" user experience.

* Ensure a consistent, cohesive experience: all alerts go through Alertmanager, Grafana can plot all telemetry, etc.

Expand Down
2 changes: 1 addition & 1 deletion docs/explanation/telemetry-labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,5 +150,5 @@ This is useful for:

See also:
- [reference/juju-topology-labels](Juju topology labels)
- [How relabeling in Prometheus works](https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works)
- [How relabeling in Prometheus works](https://grafana.com/blog/how-relabeling-in-prometheus-works/)
- [PromLens Relabeler](https://relabeler.promlabs.com/).
30 changes: 28 additions & 2 deletions docs/how-to/exposing-a-metrics-endpoint.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,32 @@ class ScrapableCharm:
}])
```

The `*` wildcard in the target address is the most common pattern. When the
`prometheus_scrape` charm library generates the scrape configuration, it
expands the wildcard into one scrape target per unit and enriches each one
with the corresponding `juju_unit` topology label.

If your workload requires explicit hostname or IPs instead of wildcards (for
example, for TLS with strict `SNI` validation), you can use fully-qualified
addresses as targets:

```python
class ScrapableCharm:
# ...
def __init__(self, *args):
# ...
self.metrics_endpoint_provider = MetricsEndpointProvider(
self,
jobs=[{
"static_configs": [{
"targets": ["myapp-0.myapp-endpoints.mymodel.svc.cluster.local:8080"]
}],
}])
```

Non-wildcard targets whose host matches a known unit address or FQDN are
also enriched with the `juju_unit` label, just like wildcard targets.

## Declaring the relation

As a last step, you need to declare the relation in your charms `metadata.yaml` file.
Expand All @@ -62,5 +88,5 @@ provides:
interface: prometheus_scrape
```

Congratulations! You will now be able to add an integration between your charm
and a scraper!
Congratulations! You will now be able to add an integration between your charm
and a scraper!
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ See [this guide](https://github.com/canonical/cos-configuration-k8s-operator#dep

To enable secure communications with (and within) COS Lite, deploy COS Lite with the
[TLS overlay](https://github.com/canonical/cos-lite-bundle/pull/80).
You can follow [this guide](https://charmhub.io/traefik-k8s/docs/tls-termination) to enable TLS in Traefik and COS Lite.
You can follow [this guide](https://documentation.ubuntu.com/observability/track-2/how-to/configure-tls-encryption/) to enable TLS in Traefik and COS Lite.

### Grafana Agent snap as a client
As a client (e.g. scraping `/metrics` endpoint), Grafana Agent must trust the CA that signed the COS charms (or the COS
Expand All @@ -114,7 +114,7 @@ juju run ssc/0 get-ca-certificate --format=yaml \
| yq '.ssc/0.results.ca-certificate'
```

Next, you need to [add the certificate to the root store](https://documentation.ubuntu.com/server/how-to/security/install-a-root-ca-certificate-in-the-trust-store/index.html).
Next, you need to [add the certificate to the root store](https://ubuntu.com/server/docs/how-to/security/install-a-root-ca-certificate-in-the-trust-store/).

> Note: After running `update-ca-certificates` and restarting the `grafana-agent` snap service, check the Grafana Agent
> logs to confirm there are no log lines such as:
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/selectively-drop-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ processors:
- The [OTLP data model](https://betterstack.com/community/guides/observability/otlp/#the-otlp-data-model)
- Official docs: [`<relabel_config>`](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config)
- [Dropping metrics at scrape time with Prometheus](https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus/) (robustperception, 2015)
- [How relabeling in Prometheus works](https://grafana.com/blog/2022/03/21/how-relabeling-in-prometheus-works/) (grafana.com, 2022)
- [How relabeling in Prometheus works](https://grafana.com/blog/how-relabeling-in-prometheus-works/) (grafana.com, 2022)
- [How to drop and delete metrics in Prometheus](https://tanmay-bhat.github.io/posts/how-to-drop-and-delete-metrics-in-prometheus/) (gh:tanmay-bhat, 2022)
- Playgrounds:
- https://demo.promlens.com/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ apply, although you will need to tailor the exact steps and commands to your set
## Checklist

- You have run `juju trust traefik --scope=cluster`
- The [MetalLB MicroK8s add-on](https://microk8s.io/docs/addon-metallb) is enabled.
- The [MetalLB MicroK8s add-on](https://canonical.com/microk8s/docs/addon-metallb) is enabled.
- Traefik's service type is ``LoadBalancer``.
- An external IP address is assigned to Traefik.

Expand Down Expand Up @@ -54,7 +54,7 @@ This can happen when:
- MetalLB has only one IP in its range but you deployed two instances of Traefik,
or when Traefik is forcefully removed (`--force --no-wait`) and a new Traefik
app is deployed immediately after.
- The [ingress](https://microk8s.io/docs/ingress) add-on is enabled. It's possible
- The [ingress](https://canonical.com/microk8s/docs/ingress) add-on is enabled. It's possible
that Nginx from the ingress add-on has claimed the `ExternalIP`. Disable Nginx and
re-enable MetalLB.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,17 @@ Perhaps you had "no data" all along or it started happening only recently.


## Inspect variable values
Drop-down [variables](https://grafana.com/docs/grafana/latest/dashboards/variables/)
Drop-down [variables](https://grafana.com/docs/grafana/latest/visualizations/dashboards/variables/)
could be filtering out data incorrectly.
Under dashboard settings, inspect the current values of the variables.
- If you can find a combination of dropdown selections that results in data being shown, then
perhaps the offered variable options should be [narrowed down](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#add-a-query-variable) with a more accurate query.
perhaps the offered variable options should be [narrowed down](https://grafana.com/docs/grafana/latest/visualizations/dashboards/variables/add-template-variables/) with a more accurate query.
- If the options listed in the dropdown are missing items you expect to be there, then the datasource might be
missing some telemetry, or perhaps we refer to a metric that does not exist, or apply a combination of labels that does not produce a result.


## Confirm the query is valid
[Edit the panel](https://grafana.com/docs/grafana/latest/panels-visualizations/panel-editor-overview/)
[Edit the panel](https://grafana.com/docs/grafana/latest/visualizations/panels-visualizations/panel-editor-overview/)
and incrementally simplify the faulty query, until data shows up.
For example,
- drop label matchers
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,6 @@ and constructive feedback.
* `Join the Matrix community chat <https://matrix.to/#/#cos:ubuntu.com>`_
* `Contribute on GitHub <https://github.com/canonical/observability>`_

* `Code of conduct <https://ubuntu.com/community/ethos/code-of-conduct>`_
* `Code of conduct <https://ubuntu.com/community/docs/ethos/code-of-conduct>`_
* `Canonical contributor license agreement
<https://canonical.com/legal/contributors>`_
6 changes: 3 additions & 3 deletions docs/reference/best-practices/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ has a growth rate of about 50GB per day under normal operations.
So, if you want a retention interval of about two months, you'll need 3TB of storage only for the telemetry.

## Set up distributed storage
In production, **do not** use hostPath storage ([`hostpath-storage`](https://microk8s.io/docs/addon-hostpath-storage) in MicroK8s; `local-storage` in Canonical K8s):
In production, **do not** use hostPath storage ([`hostpath-storage`](https://canonical.com/microk8s/docs/addon-hostpath-storage) in MicroK8s; `local-storage` in Canonical K8s):
- `PersistentVolumeClaims` created by the host path storage provisioner are bound to the local node, so it is *impossible to move them to a different node*.
- A `hostpath` volume can *grow beyond the capacity set in the volume claim manifest*.

Expand All @@ -22,5 +22,5 @@ Use Ceph CSI. Refer to Canonical Kubernetes [snap](https://documentation.ubuntu.
and [charm](https://documentation.ubuntu.com/canonical-kubernetes/latest/charm/howto/ceph-csi/) docs.

### MicroK8s
Use the [`rook-ceph`](https://microk8s.io/docs/addon-rook-ceph) add-on together with Microceph.
See the [Microceph tutorial](https://microk8s.io/docs/how-to-ceph).
Use the [`rook-ceph`](https://canonical.com/microk8s/docs/addon-rook-ceph) add-on together with Microceph.
See the [Microceph tutorial](https://canonical.com/microk8s/docs/how-to-ceph).
2 changes: 1 addition & 1 deletion docs/reference/best-practices/topology.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,5 +120,5 @@ end

## References
- High availability: [Canonical K8s](https://documentation.ubuntu.com/canonical-kubernetes/latest/snap/explanation/high-availability/),
[MicroK8s](https://microk8s.io/docs/high-availability).
[MicroK8s](https://canonical.com/microk8s/docs/high-availability).

19 changes: 15 additions & 4 deletions docs/reference/juju-topology-labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,21 @@ Incidental dashboards coming in from a git repository via the `cos-configuration
When dashboards are forwarded through a `grafana-agent` intermediary, the juju topology labels of the charm of origin are injected (and not `grafana-agent`'s). Any subsequent chaining to additional grafana agent charms would leave the labels intact.

### Charms relating through `cos-proxy`
`cos-proxy` will apply its own topology to the labels, as old LMA-provider units don't implement the more modern interfaces that we would need to add topology to the telemetry.
`cos-proxy` will apply its own topology to the labels, as old LMA-provider units don't implement the more modern interfaces that we would need to add topology to the telemetry.

## Metrics
Metrics are workload-specific and vary from charm to charm.
## Metrics
Metrics are workload-specific and vary from charm to charm.

### Charms relating through `metrics-endpoint`

When a charm relates to `prometheus-k8s`, `opentelemetry-collector-k8s` or `opentelemetry-collector` via the `metrics-endpoint` interface, the `prometheus_scrape` library generates per-unit scrape jobs enriched with all Juju topology labels, including `juju_unit`.

Scrape targets can be specified in two ways:

- **Wildcard targets** (e.g. `*:8080`): The wildcard is expanded into one scrape job per unit, each targeting the unit's address and labeled with the corresponding `juju_unit`.
- **Non-wildcard targets** (e.g. `alertmanager-0.alertmanager-endpoints.svc.cluster.local:9093` or `10.1.14.39:8080`): The library matches each target's host (IP address or FQDN) against known unit addresses. Matched targets produce a per-unit scrape job with `juju_unit`, just like wildcard targets. Targets that cannot be matched to any known unit are grouped in a single job with all other topology labels but without `juju_unit`.

This ensures that metrics from any charmed workload — regardless of how its targets are defined — can be filtered by unit in Grafana dashboards and alert expressions.

### Charms relating through `grafana-agent` (`-k8s` or not)
For `grafana-agent`: any metrics coming from the principal charm will be tagged with the topology of the principal unit. The generic Linux metrics coming from the node exporter will be tagged with the grafana-agent unit topology.
Expand Down Expand Up @@ -71,7 +82,7 @@ In `grafana-agent`, logs scraped from files, such as `/var/log`, will be tagged
In `grafana-agent-k8s`, the charm will not modify the topology.

### Charms relating through `cos-proxy`
`cos-proxy` will apply its own topology to the logs.
`cos-proxy` will apply its own topology to the logs.

## Traces
Any charm can stream traces to Tempo using the `tracing` charm lib. Usually this is done by sending the traces to a `grafana-agent` (soon to be replaced by the OTEL collector), which forwards them to the COS stack. The agent will be responsible to attach to any trace going through it the juju topology of the unit generating them, if known, or else its own (for uncharmed workloads).
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/security-hardening-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ By default, applications of Charmed Grafana are deployed with a single administr
* change this password as described [in the Grafana charm docs](https://github.com/canonical/grafana-k8s-operator?tab=readme-ov-file#web-interface)
* consider adding less-privileged accounts as needed (see the [official Grafana Docs](https://grafana.com/docs/grafana/latest/) for how to do this manually inside Grafana)

If you're using the Canonical Identity Platform to manage authentication, this could be used to manage Grafana user accounts directly. See [the Hydra docs](https://charmhub.io/hydra/docs/how-to/integrate-oidc-compatible-charms) for more details.
If you're using the Canonical Identity Platform to manage authentication, this could be used to manage Grafana user accounts directly. See [the Hydra docs](https://canonical-identity.readthedocs-hosted.com/reference/charms/hydra/) for more details.

### Be judicious about what is exposed via an ingress

Expand Down
8 changes: 4 additions & 4 deletions docs/tutorial/installation/cos-lite-microk8s-sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Let's go and deploy that bundle!

## Configure MicroK8s

For the COS Lite bundle deployment to go smoothly, make sure the following MicroK8s [addons](https://microk8s.io/docs/addons) are enabled: `dns`, `hostpath-storage` and `metallb`.
For the COS Lite bundle deployment to go smoothly, make sure the following MicroK8s [addons](https://canonical.com/microk8s/docs/addons) are enabled: `dns`, `hostpath-storage` and `metallb`.

You can check this with `microk8s status`, and if any are missing, enable them with

Expand All @@ -33,7 +33,7 @@ $ microk8s enable dns

```{note}
While the following setup is sufficient for non-production environments, if you're looking for a more resilient storage option,
consider deploying MicroCeph on MicroK8s using this [guide](https://microk8s.io/docs/how-to-ceph).
consider deploying MicroCeph on MicroK8s using this [guide](https://canonical.com/microk8s/docs/how-to-ceph).
```

```bash
Expand All @@ -57,11 +57,11 @@ $ microk8s kubectl rollout status daemonset.apps/speaker -n metallb-system -w
```

```{note}
If you have an HTTP proxy configured, you will need to give this information to MicroK8s. See [the proxy documentation](https://microk8s.io/docs/install-proxy) for details.
If you have an HTTP proxy configured, you will need to give this information to MicroK8s. See [the proxy documentation](https://canonical.com/microk8s/docs/install-proxy) for details.
```

```{note}
By default, MicroK8s will use `8.8.8.8` and `8.8.4.4` as DNS servers, which can be adjusted. See [the DNS documentation](https://microk8s.io/docs/addon-dns) for details.
By default, MicroK8s will use `8.8.8.8` and `8.8.4.4` as DNS servers, which can be adjusted. See [the DNS documentation](https://canonical.com/microk8s/docs/addon-dns) for details.
```

## Deploy the COS Lite bundle
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/instrument-machine-charms.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

This tutorial will teach you how to integrate a charm deployed on a machine substrate with the Canonical Observability Stack running on Kubernetes.

The Grafana Agent machine charm handles installation, configuration, and Day 2 operations specific to the [Grafana Agent](https://grafana.com/oss/agent/), using [Juju](https://juju.is). The charm is designed to run in virtual machines as a [subordinate](https://discourse.charmhub.io/t/subordinate-applications/1053).
The Grafana Agent machine charm handles installation, configuration, and Day 2 operations specific to the [Grafana Agent](https://grafana.com/oss/agent/), using [Juju](https://canonical.com/juju). The charm is designed to run in virtual machines as a [subordinate](https://discourse.charmhub.io/t/subordinate-applications/1053).

```{note}
Application units are typically run in an isolated container on a machine with no knowledge or access to other applications deployed onto the same machine.
Expand Down
Loading