feat(application): add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling#468
Open
feat(application): add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling#468
Conversation
…ources ## Summary Add diagnostic settings across blueprint resources, per CRISP security review findings LT-4 (Medium). Supports Threat #24: Insufficient logging and monitoring. Defender for Cloud (LT-1) is intentionally **not** managed here — it's subscription-scoped and should be enforced via Azure Policy by platform teams. ### Changes **Diagnostic Settings (LT-4)** — `azurerm_monitor_diagnostic_setting` in each component: - **Key Vault**: AuditEvent + AllMetrics - **ACR**: ContainerRegistryRepositoryEvents, ContainerRegistryLoginEvents + AllMetrics - **Event Grid**: allLogs + AllMetrics - **Event Hubs**: allLogs + AllMetrics ### Scope - Components: `010-security-identity`, `060-acr`, `040-messaging` - Blueprints: full-single-node, full-multi-node, azure-local, only-cloud, robotics - 19 files changed, 227 insertions ### Design Decisions - Diagnostics gated by `should_enable_diagnostic_settings` (bool) + `log_analytics_workspace_id` — enabled automatically when blueprints wire observability - Component-level ownership: each module manages its own diagnostic settings - Defender left to Azure Policy to avoid subscription-scoped side effects on `terraform destroy` ### Deploy Validation (2026-04-08) Rebased on `dev` and deployed 3 affected blueprints in parallel: | Blueprint | Region | Diagnostic Settings | Result | |---|---|---|---| | full-single-node-cluster | eastus2 | ✅ KV, ACR, EG, EH | All diagnostic resources created. IoT Ops proxy timeout (pre-existing) | | only-cloud-single-node-cluster | westus2 | ✅ ACR, EG, EH | All diagnostic resources created. KV contacts timeout (pre-existing transient) | | robotics | westus3 | ✅ ACR, EG, EH, KV | All diagnostic resources created. Grafana SSL EOF (pre-existing transient) | All diagnostic settings deployed successfully. All failures are pre-existing environmental issues unrelated to this change. Skipped: `full-multi-node-cluster` (pre-existing count issue), `azure-local` (requires HCI hardware). Fixes AB#1984 ---- #### AI description (iteration 5) #### PR Classification Feature enhancement to add diagnostic settings for Azure blueprint resources (ACR, Key Vault, Event Grid, Event Hubs) to address CRISP security findings LT-4 regarding insufficient logging and monitoring. #### PR Summary This PR implements diagnostic settings across Key Vault, ACR, Event Grid, and Event Hubs modules to enable audit logging and metrics collection to Log Analytics workspaces, addressing security compliance gaps. All changes are gated by optional variables and wire the Log Analytics workspace ID from observability modules through blueprint configurations. - Added `azurerm_monitor_diagnostic_setting` resources in `main.tf` files for Key Vault (AuditEvent), ACR (ContainerRegistryRepositoryEvents, ContainerRegistryLoginEvents), Event Grid (allLogs), and Event Hubs (allLogs) with AllMetrics enabled - Introduced `log_analytics_workspace_id` and `should_enable_diagnostic_settings` variables across all affected modules ...
… RBAC for connectedk8s proxy ## Summary Adds Entra ID group-based cluster admin support and Azure Arc RBAC role assignments to the CNCF K3s cluster component, enabling `az connectedk8s proxy` for team members. ## Problem - `az connectedk8s proxy` failed for non-deploying users — no Azure RBAC roles on the Arc resource - Only the deploying user received cluster-admin via individual OID/UPN - No support for Entra ID groups ## Changes ### New variable: `cluster_admin_group_oid` - Accepts an Entra ID group Object ID - Creates Kubernetes `ClusterRoleBinding` with `--group` flag (k3s-device-setup.sh) - Assigns Azure Arc RBAC roles on the Arc connected cluster resource ### Azure Arc RBAC role assignments (new) - `Azure Arc Kubernetes Viewer` — assigned to both user OID and group OID - `Azure Arc Enabled Kubernetes Cluster User Role` — assigned to both user OID and group OID - Scoped to the Arc connected cluster resource - Only created after the cluster exists (static count guard with `has_arc_cluster`) ### Cleanup - Removed `deploy-cluster-admin-oid.sh` — superseded by Terraform automation ### Files changed - `src/100-edge/100-cncf-cluster/terraform/` — variables, main.tf, ubuntu-k3s module, role-assignments module - `src/100-edge/100-cncf-cluster/scripts/k3s-device-setup.sh` — group ClusterRoleBinding - `src/100-edge/110-iot-ops/scripts/deploy-cluster-admin-oid.sh` — deleted - All 5 blueprints — expose `cluster_admin_group_oid` ## Usage ```hcl cluster_admin_group_oid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" ``` ## Deployment Testing All affected blueprints deployed and verified (Arc clusters Connected): | Blueprint | Region | Result | |---|---|---| | full-single-node-cluster | eastus2 | ✅ Arc Connected | | minimum-single-node-cluster | australiaeast | ✅ Arc Connected | | partial-single-node-cluster | swedencentral | ✅ Arc Connected | | dual-peered-single-node-cluster | westus3 | ✅ Both clusters Arc Connected | Only pre-existing failures observed: - IoT Ops sync rules (`LinkedAuthorizationFailed`) — known issue, unrelated to this PR - Grafana dashboard import script — transient 412 errors ---- #### AI description (iteration 12) #### PR Classification This PR adds new functionality to enable Entra ID group-based cluster admin access and Azure Arc RBAC support for connectedk8s proxy operations. #### PR Summary Adds support for granting cluster-admin permissions to an entire Entra ID group and assigns required Azure Arc RBAC roles to enable `az connectedk8s proxy` access for group members. - `k3s-device-setup.sh`: Added `CLUSTER_ADMIN_GROUP_OID` environment variable and logic to create ClusterRoleBinding for Entra ID groups with cluster-admin permissions - `terraform/main.tf`: Added Arc RBAC role assignments (`Azure Arc Kubernetes Viewer` and `Azure Arc Enabled Kubernetes Cluster User Role`) for both individual users and Entra ID groups on the Arc connected cluster resource - All Terraform variable files: Added `cluster_admin_group_oid` variable to specify the Entra ...
## Summary Fixes deployment issues discovered during parallel blueprint testing. Each of the 9 deployable blueprints was tested 4 times across different Azure regions (36 total deployments, all successful). ## Fixes ### 1. Grafana dashboard import fails on re-apply and fresh deploy (`020-observability`) **Problem:** `az grafana dashboard import` fails in two ways: (a) 412 version-mismatch when dashboards already exist on retry, and (b) SSL EOF errors when Grafana's SSL cert isn't ready on fresh deploys. **Fix:** Add `--overwrite` flag to all import calls, plus retry logic (10 attempts, 30s delay) to wait for SSL cert provisioning. ### 2. AzureML compute cluster requires public IPs without private endpoints (`080-azureml`) **Problem:** `compute_cluster_node_public_ip_enabled` defaults to `false`, but Azure ML requires workspace private endpoints when node public IPs are disabled. ### 3. dpkg lock race condition on VM first boot (`100-cncf-cluster`) **Problem:** `k3s-device-setup.sh` used `curl | bash` to install Azure CLI, which runs internal `apt-get` calls without lock timeout handling. Ubuntu's `unattended-upgrades` holds the dpkg lock on first boot, causing exit code 100. **Fix:** Replace `curl | bash` with inline `apt-get` calls, each using `DPkg::Lock::Timeout=300`. Eliminates the TOCTOU race entirely — no external installer with uncontrolled apt calls. ### 4. Default VM SKUs unavailable in most regions **Problem:** `Standard_D8ds_v5` (AKS node default) available in only 2/9 IoT Operations regions. `Standard_D8s_v3` (VM host default) same. `Standard_D4_v4` (minimum blueprint) same. **Fix:** Update all defaults to v6 series — available in 8/9 or 9/9 regions. ## Deployment Test Results — 36/36 Successful Each blueprint deployed 4 times in different regions: | Blueprint | R1 | R2 | R3 | R4 | |---|---|---|---|---| | only-output-cncf-cluster-script | ✅ westus | ✅ eastus2 | ✅ westeurope | ✅ southcentralus | | only-cloud-single-node-cluster | ✅ germanywestcentral | ✅ westus3 | ✅ westus3 | ✅ southcentralus | | minimum-single-node-cluster | ✅ westus | ✅ eastus2 | ✅ eastus2 | ✅ northeurope | | full-single-node-cluster | ✅ germanywestcentral | ✅ westus | ✅ southcentralus | ✅ westus2 | | partial-single-node-cluster | ✅ westus3 | ✅ germanywestcentral | ✅ westus | ✅ eastus2 | | dual-peered-single-node-cluster | ✅ eastus2 | ✅ southcentralus | ✅ southcentralus | ✅ southcentralus | | full-multi-node-cluster | ✅ eastus2 | ✅ westus3 | ✅ westeurope | ✅ westus | | azureml | ✅ southcentralus | ✅ westus | ✅ germanywestcentral | ✅ westus2 | | robotics | ✅ germanywestcentral | ✅ westus | ✅ westus2 | ✅ westeurope | **Skipped:** fabric (requires Fabric capacity), azure-local (requires Azure Local hardware) ## Changed Files - `src/000-cloud/020-observability/scripts/import-grafana-dashboards.sh` — overwrite + retry logic - `src/000-cloud/080-azureml/terraform/variables.tf` — keep secure default - `blueprints/modules/robotics/terraform/main.tf` — der...
Resolves conflicts: - import-grafana-dashboards.sh: kept dev version (retry wrapper, --overwrite) - k3s-device-setup.sh: kept dev version (install_azure_cli helper, Entra group OID, apt lock timeouts) - deploy-cluster-admin-oid.sh: kept deletion (refactored into k3s-device-setup.sh in PR 624)
- Add 520-video-query-api Azure Function for time-based video queries - Upgrade 503-media-capture-service with multi-trigger, ACSA writer, continuous recorder - Add video-capture-query blueprint with architecture diagrams - Add ONVIF camera quickstart, deployment scripts, and testing guides - Add ADR 006: dual-component video architecture - Add continuous video capture ACSA sync ADR - Update 030-data storage-account with primary_blob_endpoint output
- Fix secretlint: remove embedded basic auth credentials from RTSP URL example - Fix cspell: add ffprobe, guestconfiguration, ultrafast, vidcap, videocapture to dictionary - Fix shellcheck: SC2162 (read -r), SC2155 (local declare/assign), SC2181 (direct exit check), SC2002 (useless cat) - Fix shfmt: apply consistent formatting across all 111-assets shell scripts - Fix markdownlint MD036: convert bold emphasis to proper headings - Fix markdownlint MD040: add language specifiers to fenced code blocks - Fix markdownlint MD033: wrap angle-bracket placeholders in backticks - Fix markdownlint MD025: convert duplicate H1 to H2 - Fix markdownlint MD024: rename duplicate heading - Fix markdownlint MD029: correct ordered list numbering
- Fix cspell: add 15 domain words (Amcrest, Reolink, dtmi, anyauth, etc.)
- Fix ruff: auto-fix 48 errors + noqa S324 (non-crypto md5), S603 (trusted ffmpeg)
- Fix ruff: add S108/S603/S607 to test file ignores in .ruff.toml
- Fix shfmt: format install-ffmpeg.sh
- Fix markdown-table-formatter: 4 files reformatted
- Fix tflint: eventhubs default null → {} in full-single-node-cluster
All 8 megalinter linters validated locally before push.
- Move ONVIF quickstart to docs/getting-started/onvif-camera-quickstart.md - Remove 4 redundant markdown files from 111-assets (1,956 lines): - ONVIF-CAMERA-QUICKSTART.md (1,015 lines — duplicated content) - terraform/ONVIF-CAMERA-DEPLOYMENT.md (465 lines — deployment log) - terraform/ONVIF-CAMERA-QUICK-REFERENCE.md (230 lines — overlap) - scripts/TESTING-ONVIF-CAMERAS.md (246 lines — overlap) - Genericize scripts/README.md: replace vendor-specific names with placeholders - Add ONVIF camera quickstart link to docs/getting-started/README.md - New consolidated guide: vendor-neutral, covers Bicep/Terraform/scripts
- Fix cspell: add ultrafast to general-technical dictionary - Fix ruff: auto-fix import sorting + line-length in data-scientist-workflow.py - Fix yamllint: reformat media-capture-mqtt-triggered.yaml indentation and add document start All 7 megalinter linters validated locally before push.
…-cluster Add should_deploy_video_capture toggle, function runtime version pass-through variables, video_capture_computed_settings local, and Storage Blob Data Contributor role assignment to the full-single-node-cluster blueprint. Create video-capture-query.tfvars.example demonstrating Python 3.11 runtime configuration with mutual-exclusion documentation for Node.js-based notification functions.
- replace datetime.utcnow() with datetime.now(UTC) across all instances - use ManagedIdentityCredential(client_id=client_id) consistently with AZURE_CLIENT_ID fallback 🔧 - Generated by Copilot
Dependency ReviewThe following issues were found:
Vulnerabilitiessrc/500-application/520-video-query-api/requirements.txt
Only included vulnerabilities with severity high or higher. License Issuessrc/500-application/520-video-query-api/requirements.txt
OpenSSF Scorecard
Scanned Files
|
📚 Documentation Health ReportGenerated on: 2026-05-02 00:36:09 UTC 📈 Documentation Statistics
🏗️ Three-Tree Architecture Status
🔍 Quality Metrics
This report is automatically generated by the Documentation Automation workflow. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #467
Adds a complete video capture and time-based query pipeline to the edge-ai accelerator. This introduces continuous 24/7 video recording from ONVIF/RTSP cameras at the edge with cloud storage, lifecycle management, and a REST API for data scientists to retrieve historical video segments by camera, location, and timeframe.
Mirrors ADO PR #616.
Architecture
Changes
New Components
src/500-application/520-video-query-api/— Azure Function with REST endpoints for time-based video queries, SAS URL generation, optimized blob filtering (<1h: prefix-based, 1-24h: blob index tags), MQTT-triggered capture, and health monitoringblueprints/video-capture-query/— Complete Terraform blueprint with architecture diagrams (6 drawio/mermaid), example tfvars, data scientist workflow example, and deployment documentationEnhanced Components
src/500-application/503-media-capture-service/— Continuous recording mode, ACSA (Azure Connected Storage Account) writer, multi-trigger support (binary + MQTT), video processor improvementssrc/500-application/508-media-connector/— Updated docker-compose, mock RTSP camera Kubernetes manifestssrc/100-edge/111-assets/scripts/— ONVIF camera deployment scripts (Bicep + Terraform), PTZ profile discovery, quick PTZ test utilitiesDocumentation
docs/getting-started/onvif-camera-quickstart.md— Step-by-step ONVIF camera deployment guideproject-adrs/Accepted/006-dual-component-video-architecture.md— ADR for dual-component video architecture (503 recording + 508 streaming)docs/solution-adr-library/— Continuous video capture with ACSA sync, edge video streaming and image capture, updated ONVIF connector camera integrationBlueprint Integration
blueprints/full-single-node-cluster/— Video capture query variables, outputs, andvideo-capture-query.tfvars.exampleBuild & Code Quality
Files Changed
109 files (13,221 additions, 786 deletions) across the following areas:
src/500-application/520-video-query-api/blueprints/video-capture-query/src/500-application/503-media-capture-service/src/100-edge/111-assets/scripts/docs/getting-started/docs/solution-adr-library/project-adrs/Accepted/blueprints/full-single-node-cluster/terraform/Testing
test_video_query_api.py, 377 lines)