feat(iot-ops): bump AIO component versions and harden schema-registry RBAC#471
feat(iot-ops): bump AIO component versions and harden schema-registry RBAC#471
Conversation
- add trust issuer settings and deployment script parameters - bump version for cert-manager and secret sync controller - enhance MQTT broker configurations with new application URI - update README and variable files for consistency Signed-off-by: Marcel Bindseil <marcelbindseil@gmail.com>
…butor role - add role assignment for blob data contributor on schemas container - update README and variables to reflect new role and its purpose - create upgrade guide for Azure IoT Operations Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Marcel Bindseil <marcelbindseil@gmail.com>
… README files for consistency Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Marcel Bindseil <marcelbindseil@gmail.com>
📚 Documentation Health ReportGenerated on: 2026-05-05 15:24:51 UTC 📈 Documentation Statistics
🏗️ Three-Tree Architecture Status
🔍 Quality Metrics
This report is automatically generated by the Documentation Automation workflow. |
📚 Documentation Health ReportGenerated on: 2026-05-05 15:41:48 UTC 📈 Documentation Statistics
🏗️ Three-Tree Architecture Status
🔍 Quality Metrics
This report is automatically generated by the Documentation Automation workflow. |
📚 Documentation Health ReportGenerated on: 2026-05-05 15:44:58 UTC 📈 Documentation Statistics
🏗️ Three-Tree Architecture Status
🔍 Quality Metrics
This report is automatically generated by the Documentation Automation workflow. |
katriendg
left a comment
There was a problem hiding this comment.
PR Review — AIO 2604 Release Support
Thanks for the thorough work here — the version bumps, schema-registry RBAC hardening, securityPki.applicationUri, az login --username migration, and the new upgrade doc are all well-implemented and consistent between Terraform and Bicep. Appreciate the additional fixes for the 403 and the azure-cli 2.67+ breakage. 🙌
Below are suggestions to make this fully "throughout" before merge.
1. PR Title — Suggest referencing AIO 2604
The branch is feature/aio-2604 and the description references az iot ops 2.4.0, but the title doesn't convey the AIO 2604 release or the matching version 1.3.70. Suggest something like:
feat(iot-ops): upgrade AIO 2604 release (1.3.70), harden schema-registry RBAC
2. docs/getting-started/upgrade-aio.md — Add version matrix and official supported-versions link
The upgrade guide doesn't correlate the az iot ops CLI version with the AIO release or extension versions. Users need this to know which CLI extension maps to which component versions.
Suggested addition (after "The reconciliation steps differ…"):
## Version matrix
This repository currently targets the **AIO 2604** release. The table below maps `az iot ops` CLI versions to the component versions pinned in edge-ai:
| CLI extension (`azure-iot-ops`) | AIO release | cert-manager | secret-sync-controller | iotOperations |
|---------------------------------|-------------|--------------|------------------------|---------------|
| 2.4.0 | 2604 | 0.11.0 | 1.4.0 | 1.3.70 |
For the full upstream compatibility matrix, see [Supported versions — Azure IoT Operations](https://learn.microsoft.com/en-us/azure/iot-operations/deploy-iot-ops/howto-upgrade?tabs=portal#supported-versions).Also add a References section at the end:
## References
- [Supported versions — Azure IoT Operations](https://learn.microsoft.com/en-us/azure/iot-operations/deploy-iot-ops/howto-upgrade?tabs=portal#supported-versions)
- [Upgrade Azure IoT Operations — Official guide](https://learn.microsoft.com/en-us/azure/iot-operations/deploy-iot-ops/howto-upgrade)3. docs/getting-started/upgrade-aio.md — Document behavior for pinned edge-ai releases
The doc assumes users always pull the latest main. In practice, teams pin to a tagged release. The reconciliation steps should call out what happens when the user is on a pinned release with older version defaults than what az iot ops upgrade installed.
Suggested callout after Terraform step 3:
Pinned releases: If your team pins to a specific edge-ai release tag rather than
main, the version defaults in that release may be older than whataz iot ops upgradeinstalled. In that case, after-refresh-only,terraform planwill show no diff for the AIO extensions (state matches Azure). However, if you later move to a newer edge-ai release with higher version pins, the nextapplywill attempt to upgrade again. To stay aligned, either upgrade edge-ai to the release that matches the AIO versions you upgraded to, or override the version variables in yourterraform.tfvars.
And similarly after Bicep step 2:
Pinned releases: If your team pins to a specific edge-ai release tag, ensure the version parameters passed to the blueprint match or exceed what
az iot ops upgradeinstalled. If they are lower, the next deployment will attempt to downgrade the extensions. Override the version parameters explicitly or upgrade to an edge-ai release that includes the newer defaults.
4. blueprints/full-single-node-cluster/bicep/main.bicep — Stale comments (Lines 157, 170, 176, 180)
Four comments reading // Currently disable setting shouldDeployAioDeploymentScripts, remove when DeploymentScripts supports AZ CLI 2.71+ (post May 4) remain but the code below them is now un-disabled (they are real params). These should be removed — they contradict the current implementation.
5. blueprints/only-edge-iot-ops/bicep/main.bicep + blueprints/full-multi-node-cluster/bicep/main.bicep — Incomplete param restoration
The full-single-node-cluster blueprint correctly converts the temporary var workarounds back to first-class params. However, the same workarounds still exist in these two blueprints:
| Blueprint | Still uses var workaround |
|---|---|
only-edge-iot-ops/bicep/main.bicep |
trustIssuerSettings, shouldDeployAioDeploymentScripts, shouldEnableOtelCollector, shouldEnableOpcUaSimulator |
full-multi-node-cluster/bicep/main.bicep |
trustIssuerSettings, shouldDeployAioDeploymentScripts, shouldEnableOtelCollector, shouldEnableOpcUaSimulator |
Since the "post May 4" condition (DeploymentScripts supports AZ CLI 2.71+) is now met, these should be restored consistently. Without this, only full-single-node-cluster users can override these params while other blueprint consumers remain locked to hard-coded values.
Apply the same pattern: remove the var lines, uncomment/restore as params with matching defaults, and add the iotOpsTypes import where needed.
✅ What Looks Good
- Version bumps consistent across TF/Bicep: cert-manager 0.11.0, secret-sync 1.4.0, iotoperations 1.3.70
securityPki.applicationUri— both produceurn:microsoft.com:aio:opc:ua:broker:<5-char-hash>blob_data_contributor_principal_id— optional withcoalesce()fallback, container-scoped (least-privilege)az login --identity --username— complete across all scripts, no--client-idreferences remainupgrade-aio.md— comprehensive with TF/Bicep reconciliation and troubleshooting- No old version references (0.10.2, 1.3.0, 1.3.38) found anywhere in the codebase
⚠️ Minor Awareness Items (non-blocking)
- Behavioral default change:
shouldEnableOtelCollectornow defaults totrue. Existing pipelines relying on OTel being off will needshouldEnableOtelCollector: false. PR description documents this. - Hash algorithm difference: Bicep
uniqueString()vs Terraformsha256()will produce different 5-charapplicationUrisuffixes for the same cluster ID. Acceptable for single-framework deployments but worth noting for cross-framework scenarios.
feat(iot-ops): bump AIO component versions and harden schema-registry RBAC
Description
Brings the Azure IoT Operations stack to the versions supported by
az iot ops2.4.0, restores first-class blueprint parameters that had been temporarily downgraded tovarworkarounds, fixes a 403AuthorizationPermissionMismatchduring schema upload, and unblocks the K3s VM bootstrap on azure-cli 2.67+.Specifically:
cert-manager0.10.2 → 0.11.0,secret-sync-controller1.3.0 → 1.4.0, and theiotoperationsextension 1.3.38 → 1.3.70 in both Bicep and Terraform defaults.trustIssuerSettings,shouldDeployAioDeploymentScripts,shouldEnableOtelCollector, andshouldEnableOpcUaSimulatoras realparams inblueprints/full-single-node-cluster/bicep/main.bicep(DeploymentScripts now supports az CLI 2.71+).securityPki.applicationUri(per-cluster URN) on the AIO instance configuration in both Bicep and Terraform.Storage Blob Data Contributorrole assignment scoped to the schemas container in the schema-registry Terraform module, with a configurableblob_data_contributor_principal_idand a 30s RBAC propagation wait. Removes the implicit dependency on the data-lake module's broader role grant.az login --identity --client-idtoaz login --identity --usernamein the K3s bootstrap scripts (the--client-idflag was removed in azure-cli 2.67).docs/getting-started/upgrade-aio.mdand cross-links it from the general-user getting-started guide.Related Issue
Fixes #473
Type of Change
Implementation Details
Component version bumps
src/100-edge/109-arc-extensions/bicep/types.bicepsrc/100-edge/109-arc-extensions/terraform/variables.tfsrc/100-edge/110-iot-ops/bicep/types.bicepsrc/100-edge/110-iot-ops/terraform/variables.init.tfsrc/100-edge/110-iot-ops/bicep/types.bicepsrc/100-edge/110-iot-ops/terraform/variables.instance.tfBlueprint parameter restoration
blueprints/full-single-node-cluster/bicep/main.bicepreverts the temporaryvarworkarounds that were introduced when DeploymentScripts was pinned to az CLI < 2.71. The four affected parameters are now first-classparams with default values, andiotOpsTypesis imported to provide theTrustIssuerConfigtype.Schema-registry RBAC
src/000-cloud/030-data/terraform/modules/schema-registry/main.tfadds:data "azurerm_client_config" "current"to resolve the deploying principal.azurerm_role_assignment.schema_container_blob_data_contributorgranting Storage Blob Data Contributor on the schemas container, withprincipal_id = coalesce(var.blob_data_contributor_principal_id, data.azurerm_client_config.current.object_id).time_sleep.wait_for_rbac_propagation(30s) extended to depend on the new role assignment.variables.core.tfadds the optionalblob_data_contributor_principal_id(string, defaultnull).This removes the implicit reliance on the data-lake module granting Storage Blob Data Owner at the storage-account scope and keeps the schema-registry module self-contained.
MQTT broker
securityPki.applicationUriBoth
iot-ops-instancemodules now emit a per-clustersecurityPki.applicationUriof the formurn:microsoft.com:aio:opc:ua:broker:<5-char-cluster-hash>, ensuring the OPC UA broker advertises a unique application URI:take(uniqueString(arcConnectedCluster.id), 5)substr(sha256(var.arc_connected_cluster_id), 0, 5)K3s VM bootstrap (
az loginsyntax)src/100-edge/100-cncf-cluster/scripts/deploy-script-secrets.shandk3s-device-setup.shswitch fromaz login --identity --client-id "$CLIENT_ID"toaz login --identity --username "$CLIENT_ID"(with--allow-no-subscriptionson the latter). The--client-idflag was removed in azure-cli 2.67 and broke thelinux-cluster-server-setupArc extension on fresh runners.Documentation
docs/getting-started/upgrade-aio.mdis new and walks throughaz iot ops upgrade, Terraform refresh-only reconciliation (terraform apply -refresh-onlyplus a plan grep), and Bicep stateless reapply, with a troubleshooting section.docs/getting-started/general-user.mdcross-links the new guide from "Next Steps" and "Additional Resources". Docusaurus picks the new file up automatically via the autogenerated sidebar.Testing Performed
az iot ops upgradeverified against the live cluster against the new component versions; K3s VM bootstrap script confirmed working with azure-cli 2.67+.npm run tf-validateandterraform-docsregenerated cleanly.Validation Steps
npm run tf-validateand confirm the030-dataschema-registry module validates.terraform-docs --config .terraform-docs.yml src/000-cloud/030-data/terraform/modules/schema-registryshows no diff.blueprints/full-single-node-cluster/terraformagainst a clean subscription as the deploying principal — schema upload should succeed without manual role grants. Re-run withblob_data_contributor_principal_idset to a service principal id to confirm the override path.az bicep build -f blueprints/full-single-node-cluster/bicep/main.bicep— confirm the four restored parameters appear andtrustIssuerSettings,shouldDeployAioDeploymentScripts,shouldEnableOtelCollector,shouldEnableOpcUaSimulatorcan be overridden.linux-cluster-server-setupsucceeds (azure-cli 2.67+ runner).docs/getting-started/upgrade-aio.mdend-to-end against an existing AIO instance to verify the upgrade path.Checklist
terraform fmton all Terraform codeterraform validateon all Terraform codeaz bicep formaton all Bicep codeaz bicep buildto validate all Bicep codeSecurity Review
az iot ops2.4.0 supported matrix.Additional Notes
shouldEnableOtelCollectordefaults totrue(matching upstream defaults) where it was hard-codedfalsewhile DeploymentScripts was pinned. Existing pipelines that don't want OTel must passshouldEnableOtelCollector: false.docs/getting-started/upgrade-aio.md(terraform apply -refresh-onlyafteraz iot ops upgrade) to reconcile state without recreating resources.blob_data_contributor_principal_idvariable is optional; CI deployments running as the deploying user/SP need no changes.Screenshots (if applicable)