Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
aa244a2
feat: Charmhub module for upgrades
MichaelThamm Jan 15, 2026
6a5e192
chore: README
MichaelThamm Jan 15, 2026
d8b0fbd
feat: add an upgrade doc
MichaelThamm Jan 15, 2026
0634ec8
chore: doc improvements
MichaelThamm Jan 16, 2026
2c634d1
feat: charmhub inside COS Lite
MichaelThamm Jan 19, 2026
424447a
chore: update tutorial
MichaelThamm Jan 19, 2026
b3b6492
chore: revert apps to remote source
MichaelThamm Jan 19, 2026
91dcf5f
chore
MichaelThamm Jan 21, 2026
188e4f5
chore: dump ideas
MichaelThamm Feb 9, 2026
97f585a
chore: remove charmhub module
MichaelThamm Mar 20, 2026
440dbee
chore: implement juju_data
MichaelThamm Mar 22, 2026
f4ffc38
fix: remove ubuntu/litestream from list of oci images (#177)
sinapah Feb 4, 2026
5c7bf3a
fix: Outdated docs (#178)
MichaelThamm Feb 4, 2026
055a627
chore: add blackbox exporter machine to list of charms in docs (#176)
sinapah Feb 4, 2026
42999c2
docs: fix extra double quote in cloud-init script (#181)
Copilot Feb 5, 2026
f2b3517
Separates the storage directives for different worker roles (#182)
Gmerold Feb 12, 2026
3f6cffc
Update source URL for cos-lite module (#190)
Abuelodelanada Feb 17, 2026
7078c6e
fix: READMEs (#191)
MichaelThamm Feb 17, 2026
2170e02
feat(doc): troubleshoot grafana admin password (#194)
sed-i Feb 23, 2026
61f716f
feat(docs): update otelcol docs (#199)
sed-i Mar 5, 2026
1d27584
feat: Tests for product module `channel` input (#202)
MichaelThamm Mar 10, 2026
dba3cef
docs: point to correct file for cos-lite variables.tf (#205)
sinapah Mar 11, 2026
51dd4b7
docs: deprecation notice in tutorial for migrating from GA to otelcol…
sinapah Mar 11, 2026
db994ce
fix: Tiering HowTo doc with OTLP endpoints (#211)
MichaelThamm Mar 16, 2026
c7cfe08
add cookie banner and Google Analytics tags (#213)
YanisaHS Mar 18, 2026
b76242e
docs: Troubleshooting compressed alerts and OTLP topology labels (#215)
MichaelThamm Mar 19, 2026
911c30b
docs(explanation): add dashboard upgrade guidance for deduplication a…
Copilot Mar 20, 2026
42ac9a5
chore: cleanup
MichaelThamm Mar 23, 2026
0187773
Merge remote-tracking branch 'origin/main' into feat/charmhub-module
MichaelThamm Mar 23, 2026
8903f5b
chore: cleanup
MichaelThamm Mar 23, 2026
fa6561d
chore: cleanup
MichaelThamm Mar 23, 2026
b4c12b6
chore
MichaelThamm Apr 4, 2026
13e860b
Merge branch 'main' into feat/charmhub-module
MichaelThamm Apr 17, 2026
7d933b3
chore
MichaelThamm Apr 17, 2026
7fe0965
chore
MichaelThamm Apr 17, 2026
0355daa
Merge branch 'main' into feat/charmhub-module
MichaelThamm Apr 20, 2026
33d26fc
chore
MichaelThamm Apr 20, 2026
adf4b51
Merge branch 'main' into feat/charmhub-module
MichaelThamm Apr 22, 2026
2488f6a
chore
MichaelThamm Apr 22, 2026
634411e
chore
MichaelThamm Apr 22, 2026
c83a64c
chore
MichaelThamm Apr 23, 2026
568db40
chore
MichaelThamm Apr 24, 2026
8314d25
chore
MichaelThamm Apr 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions docs/how-to/deploy-and-manage/refresh-product-module.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Refresh COS to a new channel

In this example, you will learn how to deploy COS Lite and refresh from channel `2/stable` to `2/edge`. To do this, we can deploy COS Lite via Terraform in the same way as [in the tutorial](https://documentation.ubuntu.com/observability/track-2/tutorial/installation/cos-lite-microk8s-sandbox).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this gets refactored into a tutorial, the language/framing should change so it's not like "In this example, you'll learn...". How-to guides aren't really "learning" experiences, it's more just like "here's the steps you need to do XYZ".

Example: Charmed Kafka: How to upgrade


## Prerequisites

This tutorial assumes that you already:

- Know how to deploy {ref}`COS Lite with Terraform <deploy-cos-ref>`

## Introduction
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to a comment I made above, this section can be less narrative if being refactored into a how-to guide

(btw I'm happy to meet and discuss my comments with you if you want!)


Imagine you have COS Lite (or COS) deployed on a specific channel like `2/stable` and want to
refresh to a different channel (or track) e.g., `2/edge`. To do so, an admin would have to manually
`juju refresh` each COS charm and address any refresh errors. Alternatively, they can determine the
correct charm `channel` and `revision`(s), update the Terraform module, and apply.

This is simplified within COS (and COS Lite) by mimicking the `juju refresh` behavior on a product
level, allowing the juju admin to specify a list of charms to refresh within the specified
`track/channel`. The rest is handled by Terraform.

## Update the COS Lite Terraform module

Once deployed, we can determine which charms to refresh with the `charms_to_refresh` input variable, detailed in the [README](https://github.com/canonical/observability-stack/tree/main/terraform/cos-lite). This defaults to: all charms owned by the `observability-team`.

```{note}
This tutorial assumed you have deployed COS Lite from a root module located at `./main.tf`.
```

Then, replace `2/stable` with `2/edge` in your `cos-lite` module within the existing `./main.tf` file:

```{literalinclude} /tutorial/installation/cos-lite-microk8s-sandbox.tf
---
language: hcl
start-after: "# before-cos"
---
```

```{note}
The `base` input variable for the `cos-lite` module is important if the `track/channel` deploys charms to a different base than the default, detailed in the [README](https://github.com/canonical/observability-stack/tree/main/terraform/cos-lite).
```

Finally, add the provider definitions into the same `./main.tf` file:

```hcl
terraform {
required_providers {
juju = {
source = "juju/juju"
version = "~> 1.0"
}
http = {
source = "hashicorp/http"
version = "~> 3.0"
}
}
}
```

At this point, you will have one `main.tf` file ready for deployment. Now you can plan these changes with:

```shell
terraform plan
```

and Terraform plans to update each charm to the latest revision in the `2/edge` channel:

```shell
Terraform used the selected providers to generate the following
execution plan. Resource actions are indicated with the following
symbols:
+ create
~ update in-place

Terraform will perform the following actions:

# module.cos.module.alertmanager.juju_application.alertmanager will be updated in-place
~ resource "juju_application" "alertmanager" {

# snip ...

~ charm {
~ channel = "2/stable" -> "2/edge"
name = "alertmanager-k8s"
~ revision = 191 -> 192
# (1 unchanged attribute hidden)
}

# snip ...

Plan: 0 to add, 5 to change, 0 to destroy.
```
Comment thread
MichaelThamm marked this conversation as resolved.

and finally apply the changes with:

```shell
terraform apply
```

At this point, you will have successfully upgraded COS Lite from `2/stable` to `2/edge`!
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps overall seem fine and pretty straightforward. I can provide some better feedback once it gets refactored into a how-to


## Refresh information

This tutorial only considers upgrading COS Lite. However, the `charmhub` module is product-agnostic
and can be used to refresh charms, and other products e.g., COS.

You can consult the follow release documentation for refresh compatibility:

- [how-to cross-track upgrade](/how-to/upgrade/)
- [release policy](/reference/release-policy/)
- [release notes](/reference/release-notes/)
2 changes: 2 additions & 0 deletions docs/tutorial/cos-lite-microk8s-sandbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,8 @@ $ juju deploy cos-lite \
--overlay ./storage-small-overlay.yaml
```

(deploy-cos-ref)=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iirc we haven't started using these our docs yet.
Remind me - it's an anchor or cross-sphinx ref?
I think juju had this and it turned out to be brittle and difficult to maintain, but maybe I'm missing something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an anchor, but it can also be used as a cross-sphinx/project ref for projects that have that enabled.

You haven't started using them in the COS docs yet, but it is the standard approach to cross-linking for sphinx docs projects, so at some point, you should change your links to use these refs. Juju uses them in their docs, but I don't know what conversations happened around it

Using ref targets/anchors is nicer because otherwise anytime you change the filename/path, it'll break any of those links in your docs.

IMO, it could go either way in this PR (add it or don't add it), but the future goal should be that all docs have a ref target, and you use those for linking instead of file path. (Copilot should be able to handle it well when you do make this initiative)


## Deploy COS Lite using Terraform

Create a `cos-lite-microk8s-sandbox.tf` file with the following Terraform module, or include it in your Terraform plan:
Expand Down
2 changes: 2 additions & 0 deletions docs/tutorial/cos-lite-microk8s-sandbox.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ terraform {
}
}

# before-cos

resource "juju_model" "cos" {
name = "cos"
config = { logging-config = "<root>=WARNING; unit=DEBUG" }
Expand Down
1 change: 1 addition & 0 deletions terraform/cos-lite/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This is a Terraform module facilitating the deployment of the COS Lite solution,
| Name | Version |
|------|---------|
| <a name="provider_juju"></a> [juju](#provider\_juju) | ~> 1.0 |
| <a name="provider_terraform"></a> [terraform](#provider\_terraform) | n/a |

## Modules

Expand Down
2 changes: 1 addition & 1 deletion terraform/cos-lite/applications.tf
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ module "catalogue" {
}

module "grafana" {
source = "git::https://github.com/canonical/grafana-k8s-operator//terraform"
source = "git::https://github.com/canonical/grafana-k8s-operator//terraform?ref=feat/resource-lifecycle"
app_name = var.grafana.app_name
channel = local.channels.grafana
config = var.grafana.config
Expand Down
23 changes: 19 additions & 4 deletions terraform/cos-lite/integrations.tf
Original file line number Diff line number Diff line change
Expand Up @@ -217,10 +217,6 @@ resource "juju_integration" "ingress" {
app_name = module.catalogue.app_name
endpoint = module.catalogue.requires.ingress
}
grafana = {
app_name = module.grafana.app_name
endpoint = module.grafana.requires.ingress
}
} : k => v if var.ingress[k]
}

Expand All @@ -237,6 +233,25 @@ resource "juju_integration" "ingress" {
}
}

//TODO: Feature this in COS
resource "juju_integration" "grafana_ingress" {
count = var.ingress["grafana"] ? 1 : 0

model_uuid = var.model_uuid

application {
name = module.grafana.app_name
endpoint = module.grafana.requires.ingress
}

application {
name = module.traefik.app_name
endpoint = module.traefik.endpoints.ingress
}

lifecycle { replace_triggered_by = [terraform_data.grafana_ingress_interface] }
}

resource "juju_integration" "ingress_per_unit" {
for_each = {
for k, v in {
Expand Down
4 changes: 4 additions & 0 deletions terraform/cos-lite/offers.tf
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,11 @@ resource "juju_offer" "grafana_dashboards" {
name = "grafana-dashboards"
model_uuid = var.model_uuid
application_name = module.grafana.app_name
# TODO: Replace these with module.grafana.requires.grafana_dashboard in a tandem PR?
# TODO: and module.grafana.endpoints.grafana_dashboard in track/2?
endpoints = ["grafana-dashboard"]

lifecycle { replace_triggered_by = [terraform_data.grafana_litestream_resource] }
}

resource "juju_offer" "loki_logging" {
Expand Down
20 changes: 20 additions & 0 deletions terraform/cos-lite/upgrades.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
# -------------- Upgrade logic --------------

## -------- Grafana ingress interface changed ----------
# The ingress endpoint interface changes from traefik_route to ingress_per_app so we need a
# lifecycle to trigger integration replacement, otherwise the upgrade will fail
# https://github.com/canonical/observability-stack/issues/165
resource "terraform_data" "grafana_ingress_interface" {
triggers_replace = data.juju_charm.grafana_info.requires["ingress"]
}

## -------- Removed the litestream-image resource ----------
# The litestream-image resource was removed and given a Juju bug, we need to add a lifecycle to
# trigger integration replacement, otherwise the upgrade will fail
# https://github.com/juju/juju/issues/21648
# https://github.com/juju/juju/issues/22071
resource "terraform_data" "grafana_litestream_resource" {
triggers_replace = contains(keys(data.juju_charm.grafana_info.resources), "litestream-image")
}


# -------------- # CharmHub API -------------- #

data "juju_charm" "alertmanager_info" {
Expand Down
2 changes: 1 addition & 1 deletion terraform/cos-lite/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ variable "risk" {
}

variable "base" {
description = "The operating system on which to deploy. E.g. ubuntu@22.04. Changing this value for machine charms will trigger a replace by terraform. Check Charmhub for per-charm base support."
description = "The operating system on which to deploy. E.g. ubuntu@24.04. Check Charmhub for per-charm base support."
default = "ubuntu@24.04"
type = string
}
Expand Down
1 change: 1 addition & 0 deletions terraform/cos/integrations.tf
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,7 @@ resource "juju_integration" "ingress" {
}
grafana = {
app_name = module.grafana.app_name
# TODO: move this out so I can add a lifecycle
endpoint = module.grafana.requires.ingress
}
} : k => v if var.ingress[k]
Expand Down
2 changes: 1 addition & 1 deletion terraform/cos/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ variable "risk" {
}

variable "base" {
description = "The operating system on which to deploy. E.g. ubuntu@22.04. Changing this value for machine charms will trigger a replace by terraform. Check Charmhub for per-charm base support."
description = "The operating system on which to deploy. E.g. ubuntu@24.04. Check Charmhub for per-charm base support."
default = "ubuntu@24.04"
type = string
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from helpers import (
catalogue_apps_are_reachable,
get_tls_context,
refresh_o11y_apps,
wait_for_active_idle_without_error,
)

Expand All @@ -31,25 +30,21 @@ def test_envvars():
)


def test_deploy_from_track(
def test_deploy_from_track_2(
tmp_path, tf_manager, ca_model: jubilant.Juju, cos_model: jubilant.Juju
):
# GIVEN a module deployed from track n
# GIVEN a module deployed from track 2
tf_manager.init(TRACK_2_TF_FILE)
tf_manager.apply(ca_model=ca_model.model, cos_model=cos_model.model, **S3_ENDPOINT)
wait_for_active_idle_without_error([cos_model], timeout=5400)
tls_ctx = get_tls_context(tmp_path, ca_model, "self-signed-certificates")
catalogue_apps_are_reachable(cos_model, tls_ctx)


def test_deploy_to_track(
def test_deploy_to_track_dev(
tmp_path, tf_manager, ca_model: jubilant.Juju, cos_model: jubilant.Juju
):
# WHEN upgraded to track n
cos_model.remove_relation("traefik:traefik-route", "grafana:ingress")
wait_for_active_idle_without_error([cos_model])
# FIXME: https://github.com/juju/terraform-provider-juju/issues/967
refresh_o11y_apps(cos_model, channel="dev/edge", base="ubuntu@24.04")
# WHEN upgraded to track dev
tf_manager.init(TRACK_DEV_TF_FILE)
tf_manager.apply(ca_model=ca_model.model, cos_model=cos_model.model, **S3_ENDPOINT)

Expand Down
2 changes: 1 addition & 1 deletion tests/integration/cos/tls_external/track-dev.tf
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ module "ssc" {
module "cos" {
source = "git::https://github.com/canonical/observability-stack//terraform/cos"
model_uuid = data.juju_model.cos-model.uuid
channel = "dev/edge"
risk = "edge"
internal_tls = false
external_certificates_offer_url = "admin/${var.ca_model}.certificates"
external_ca_cert_offer_url = "admin/${var.ca_model}.send-ca-cert"
Expand Down
13 changes: 4 additions & 9 deletions tests/integration/cos/tls_full/test_upgrade_cos_tls_full.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from helpers import (
catalogue_apps_are_reachable,
get_tls_context,
refresh_o11y_apps,
wait_for_active_idle_without_error,
)

Expand All @@ -31,25 +30,21 @@ def test_envvars():
)


def test_deploy_from_track(
def test_deploy_from_track_2(
tmp_path, tf_manager, ca_model: jubilant.Juju, cos_model: jubilant.Juju
):
# GIVEN a module deployed from track n
# GIVEN a module deployed from track 2
tf_manager.init(TRACK_2_TF_FILE)
tf_manager.apply(ca_model=ca_model.model, cos_model=cos_model.model, **S3_ENDPOINT)
wait_for_active_idle_without_error([cos_model], timeout=5400)
tls_ctx = get_tls_context(tmp_path, ca_model, "self-signed-certificates")
catalogue_apps_are_reachable(cos_model, tls_ctx)


def test_deploy_to_track(
def test_deploy_to_track_dev(
tmp_path, tf_manager, ca_model: jubilant.Juju, cos_model: jubilant.Juju
):
# WHEN upgraded to track n
cos_model.remove_relation("traefik:traefik-route", "grafana:ingress")
wait_for_active_idle_without_error([cos_model])
# FIXME: https://github.com/juju/terraform-provider-juju/issues/967
refresh_o11y_apps(cos_model, channel="dev/edge", base="ubuntu@24.04")
# WHEN upgraded to track dev
tf_manager.init(TRACK_DEV_TF_FILE)
tf_manager.apply(ca_model=ca_model.model, cos_model=cos_model.model, **S3_ENDPOINT)

Expand Down
2 changes: 1 addition & 1 deletion tests/integration/cos/tls_full/track-dev.tf
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ module "ssc" {
module "cos" {
source = "git::https://github.com/canonical/observability-stack//terraform/cos"
model_uuid = data.juju_model.cos-model.uuid
channel = "dev/edge"
risk = "edge"
internal_tls = true
external_certificates_offer_url = "admin/${var.ca_model}.certificates"
external_ca_cert_offer_url = "admin/${var.ca_model}.send-ca-cert"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,7 @@
import os
from pathlib import Path

from helpers import (
catalogue_apps_are_reachable,
refresh_o11y_apps,
wait_for_active_idle_without_error,
)
from helpers import catalogue_apps_are_reachable, wait_for_active_idle_without_error

import jubilant

Expand All @@ -30,20 +26,16 @@ def test_envvars():
)


def test_deploy_from_track(tmp_path, tf_manager, cos_model: jubilant.Juju):
# GIVEN a module deployed from track n
def test_deploy_from_track_2(tf_manager, cos_model: jubilant.Juju):
# GIVEN a module deployed from track 2
tf_manager.init(TRACK_2_TF_FILE)
tf_manager.apply(model=cos_model.model, **S3_ENDPOINT)
wait_for_active_idle_without_error([cos_model], timeout=5400)
catalogue_apps_are_reachable(cos_model)


def test_deploy_to_track(tmp_path, tf_manager, cos_model: jubilant.Juju):
# WHEN upgraded to track n
cos_model.remove_relation("traefik:traefik-route", "grafana:ingress")
wait_for_active_idle_without_error([cos_model])
# FIXME: https://github.com/juju/terraform-provider-juju/issues/967
refresh_o11y_apps(cos_model, channel="dev/edge", base="ubuntu@24.04")
def test_deploy_to_track_dev(tf_manager, cos_model: jubilant.Juju):
# WHEN upgraded to track dev
tf_manager.init(TRACK_DEV_TF_FILE)
tf_manager.apply(model=cos_model.model, **S3_ENDPOINT)

Expand Down
2 changes: 1 addition & 1 deletion tests/integration/cos/tls_internal/track-dev.tf
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ variable "s3_access_key" {
module "cos" {
source = "git::https://github.com/canonical/observability-stack//terraform/cos"
model_uuid = data.juju_model.model.uuid
channel = "dev/edge"
risk = "edge"
internal_tls = true

s3_endpoint = var.s3_endpoint
Expand Down
Loading
Loading