Skip to content

feat(application): add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling#468

Open
auyidi1 wants to merge 17 commits intomainfrom
pr/video-capture-query
Open

feat(application): add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling#468
auyidi1 wants to merge 17 commits intomainfrom
pr/video-capture-query

Conversation

@auyidi1
Copy link
Copy Markdown
Contributor

@auyidi1 auyidi1 commented May 2, 2026

Description

Closes #467

Adds a complete video capture and time-based query pipeline to the edge-ai accelerator. This introduces continuous 24/7 video recording from ONVIF/RTSP cameras at the edge with cloud storage, lifecycle management, and a REST API for data scientists to retrieve historical video segments by camera, location, and timeframe.

Mirrors ADO PR #616.

Architecture

📹 ONVIF/RTSP Cameras → Media Capture Service (503) → 5-min segments → Azure Blob Storage
                              (continuous recording)                        │
                                                                   ┌───────┴────────┐
                                                                   │ Lifecycle Mgmt │
                                                                   │ Hot → Cool →   │
                                                                   │ Archive        │
                                                                   └───────┬────────┘
                                                                           │
👨‍🔬 Data Scientist → REST API (520-video-query-api) → Query by timestamp → │
                      GET /api/video?camera=xxx&start=...&end=...          │
                              ↓                                            │
                      FFmpeg segment stitching ← ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
                              ↓
                      SAS URL (24h expiry) → Download & Analyze

Changes

New Components

  • src/500-application/520-video-query-api/ — Azure Function with REST endpoints for time-based video queries, SAS URL generation, optimized blob filtering (<1h: prefix-based, 1-24h: blob index tags), MQTT-triggered capture, and health monitoring
  • blueprints/video-capture-query/ — Complete Terraform blueprint with architecture diagrams (6 drawio/mermaid), example tfvars, data scientist workflow example, and deployment documentation

Enhanced Components

  • src/500-application/503-media-capture-service/ — Continuous recording mode, ACSA (Azure Connected Storage Account) writer, multi-trigger support (binary + MQTT), video processor improvements
  • src/500-application/508-media-connector/ — Updated docker-compose, mock RTSP camera Kubernetes manifests
  • src/100-edge/111-assets/scripts/ — ONVIF camera deployment scripts (Bicep + Terraform), PTZ profile discovery, quick PTZ test utilities

Documentation

  • docs/getting-started/onvif-camera-quickstart.md — Step-by-step ONVIF camera deployment guide
  • project-adrs/Accepted/006-dual-component-video-architecture.md — ADR for dual-component video architecture (503 recording + 508 streaming)
  • docs/solution-adr-library/ — Continuous video capture with ACSA sync, edge video streaming and image capture, updated ONVIF connector camera integration

Blueprint Integration

  • blueprints/full-single-node-cluster/ — Video capture query variables, outputs, and video-capture-query.tfvars.example

Build & Code Quality

  • cspell dictionary updates, ruff lint fixes, yamllint compliance
  • Megalinter and CI gate fixes
  • Release workflow and GitVersion configuration updates

Files Changed

109 files (13,221 additions, 786 deletions) across the following areas:

Area Key Changes
src/500-application/520-video-query-api/ New Azure Function app (function_app.py, tests, requirements, host.json)
blueprints/video-capture-query/ New blueprint (main.tf, variables, outputs, diagrams, README, examples)
src/500-application/503-media-capture-service/ Continuous recording, ACSA writer, multi-trigger, video processor
src/100-edge/111-assets/scripts/ ONVIF camera deployment and PTZ tooling
docs/getting-started/ ONVIF camera quickstart guide
docs/solution-adr-library/ Video architecture ADRs
project-adrs/Accepted/ ADR-006 dual-component video architecture
blueprints/full-single-node-cluster/terraform/ Video capture query integration

Testing

  • Video Query API unit tests (test_video_query_api.py, 377 lines)
  • Terraform validation for video-capture-query blueprint and full-single-node-cluster
  • CI lint gates (megalinter, cspell, ruff, yamllint) resolved

Azure Pipelines and others added 17 commits April 9, 2026 01:19
…ources

## Summary

Add diagnostic settings across blueprint resources, per CRISP security review findings LT-4 (Medium). Supports Threat #24: Insufficient logging and monitoring.

Defender for Cloud (LT-1) is intentionally **not** managed here — it's subscription-scoped and should be enforced via Azure Policy by platform teams.

### Changes

**Diagnostic Settings (LT-4)** — `azurerm_monitor_diagnostic_setting` in each component:
- **Key Vault**: AuditEvent + AllMetrics
- **ACR**: ContainerRegistryRepositoryEvents, ContainerRegistryLoginEvents + AllMetrics
- **Event Grid**: allLogs + AllMetrics
- **Event Hubs**: allLogs + AllMetrics

### Scope

- Components: `010-security-identity`, `060-acr`, `040-messaging`
- Blueprints: full-single-node, full-multi-node, azure-local, only-cloud, robotics
- 19 files changed, 227 insertions

### Design Decisions

- Diagnostics gated by `should_enable_diagnostic_settings` (bool) + `log_analytics_workspace_id` — enabled automatically when blueprints wire observability
- Component-level ownership: each module manages its own diagnostic settings
- Defender left to Azure Policy to avoid subscription-scoped side effects on `terraform destroy`

### Deploy Validation (2026-04-08)

Rebased on `dev` and deployed 3 affected blueprints in parallel:

| Blueprint | Region | Diagnostic Settings | Result |
|---|---|---|---|
| full-single-node-cluster | eastus2 | ✅ KV, ACR, EG, EH | All diagnostic resources created. IoT Ops proxy timeout (pre-existing) |
| only-cloud-single-node-cluster | westus2 | ✅ ACR, EG, EH | All diagnostic resources created. KV contacts timeout (pre-existing transient) |
| robotics | westus3 | ✅ ACR, EG, EH, KV | All diagnostic resources created. Grafana SSL EOF (pre-existing transient) |

All diagnostic settings deployed successfully. All failures are pre-existing environmental issues unrelated to this change.

Skipped: `full-multi-node-cluster` (pre-existing count issue), `azure-local` (requires HCI hardware).

Fixes AB#1984

----
#### AI description  (iteration 5)
#### PR Classification
Feature enhancement to add diagnostic settings for Azure blueprint resources (ACR, Key Vault, Event Grid, Event Hubs) to address CRISP security findings LT-4 regarding insufficient logging and monitoring.

#### PR Summary
This PR implements diagnostic settings across Key Vault, ACR, Event Grid, and Event Hubs modules to enable audit logging and metrics collection to Log Analytics workspaces, addressing security compliance gaps. All changes are gated by optional variables and wire the Log Analytics workspace ID from observability modules through blueprint configurations.

- Added `azurerm_monitor_diagnostic_setting` resources in `main.tf` files for Key Vault (AuditEvent), ACR (ContainerRegistryRepositoryEvents, ContainerRegistryLoginEvents), Event Grid (allLogs), and Event Hubs (allLogs) with AllMetrics enabled
- Introduced `log_analytics_workspace_id` and `should_enable_diagnostic_settings` variables across all affected modules ...
… RBAC for connectedk8s proxy

## Summary

Adds Entra ID group-based cluster admin support and Azure Arc RBAC role assignments to the CNCF K3s cluster component, enabling `az connectedk8s proxy` for team members.

## Problem

- `az connectedk8s proxy` failed for non-deploying users — no Azure RBAC roles on the Arc resource
- Only the deploying user received cluster-admin via individual OID/UPN
- No support for Entra ID groups

## Changes

### New variable: `cluster_admin_group_oid`
- Accepts an Entra ID group Object ID
- Creates Kubernetes `ClusterRoleBinding` with `--group` flag (k3s-device-setup.sh)
- Assigns Azure Arc RBAC roles on the Arc connected cluster resource

### Azure Arc RBAC role assignments (new)
- `Azure Arc Kubernetes Viewer` — assigned to both user OID and group OID
- `Azure Arc Enabled Kubernetes Cluster User Role` — assigned to both user OID and group OID
- Scoped to the Arc connected cluster resource
- Only created after the cluster exists (static count guard with `has_arc_cluster`)

### Cleanup
- Removed `deploy-cluster-admin-oid.sh` — superseded by Terraform automation

### Files changed
- `src/100-edge/100-cncf-cluster/terraform/` — variables, main.tf, ubuntu-k3s module, role-assignments module
- `src/100-edge/100-cncf-cluster/scripts/k3s-device-setup.sh` — group ClusterRoleBinding
- `src/100-edge/110-iot-ops/scripts/deploy-cluster-admin-oid.sh` — deleted
- All 5 blueprints — expose `cluster_admin_group_oid`

## Usage

```hcl
cluster_admin_group_oid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
```

## Deployment Testing

All affected blueprints deployed and verified (Arc clusters Connected):

| Blueprint | Region | Result |
|---|---|---|
| full-single-node-cluster | eastus2 | ✅ Arc Connected |
| minimum-single-node-cluster | australiaeast | ✅ Arc Connected |
| partial-single-node-cluster | swedencentral | ✅ Arc Connected |
| dual-peered-single-node-cluster | westus3 | ✅ Both clusters Arc Connected |

Only pre-existing failures observed:
- IoT Ops sync rules (`LinkedAuthorizationFailed`) — known issue, unrelated to this PR
- Grafana dashboard import script — transient 412 errors

----
#### AI description  (iteration 12)
#### PR Classification
This PR adds new functionality to enable Entra ID group-based cluster admin access and Azure Arc RBAC support for connectedk8s proxy operations.

#### PR Summary
Adds support for granting cluster-admin permissions to an entire Entra ID group and assigns required Azure Arc RBAC roles to enable `az connectedk8s proxy` access for group members.

- `k3s-device-setup.sh`: Added `CLUSTER_ADMIN_GROUP_OID` environment variable and logic to create ClusterRoleBinding for Entra ID groups with cluster-admin permissions
- `terraform/main.tf`: Added Arc RBAC role assignments (`Azure Arc Kubernetes Viewer` and `Azure Arc Enabled Kubernetes Cluster User Role`) for both individual users and Entra ID groups on the Arc connected cluster resource
- All Terraform variable files: Added `cluster_admin_group_oid` variable to specify the Entra ...
## Summary

Fixes deployment issues discovered during parallel blueprint testing. Each of the 9 deployable blueprints was tested 4 times across different Azure regions (36 total deployments, all successful).

## Fixes

### 1. Grafana dashboard import fails on re-apply and fresh deploy (`020-observability`)

**Problem:** `az grafana dashboard import` fails in two ways: (a) 412 version-mismatch when dashboards already exist on retry, and (b) SSL EOF errors when Grafana's SSL cert isn't ready on fresh deploys.

**Fix:** Add `--overwrite` flag to all import calls, plus retry logic (10 attempts, 30s delay) to wait for SSL cert provisioning.

### 2. AzureML compute cluster requires public IPs without private endpoints (`080-azureml`)

**Problem:** `compute_cluster_node_public_ip_enabled` defaults to `false`, but Azure ML requires workspace private endpoints when node public IPs are disabled.

### 3. dpkg lock race condition on VM first boot (`100-cncf-cluster`)

**Problem:** `k3s-device-setup.sh` used `curl | bash` to install Azure CLI, which runs internal `apt-get` calls without lock timeout handling. Ubuntu's `unattended-upgrades` holds the dpkg lock on first boot, causing exit code 100.

**Fix:** Replace `curl | bash` with inline `apt-get` calls, each using `DPkg::Lock::Timeout=300`. Eliminates the TOCTOU race entirely — no external installer with uncontrolled apt calls.

### 4. Default VM SKUs unavailable in most regions

**Problem:** `Standard_D8ds_v5` (AKS node default) available in only 2/9 IoT Operations regions. `Standard_D8s_v3` (VM host default) same. `Standard_D4_v4` (minimum blueprint) same.

**Fix:** Update all defaults to v6 series — available in 8/9 or 9/9 regions.

## Deployment Test Results — 36/36 Successful

Each blueprint deployed 4 times in different regions:

| Blueprint | R1 | R2 | R3 | R4 |
|---|---|---|---|---|
| only-output-cncf-cluster-script | ✅ westus | ✅ eastus2 | ✅ westeurope | ✅ southcentralus |
| only-cloud-single-node-cluster | ✅ germanywestcentral | ✅ westus3 | ✅ westus3 | ✅ southcentralus |
| minimum-single-node-cluster | ✅ westus | ✅ eastus2 | ✅ eastus2 | ✅ northeurope |
| full-single-node-cluster | ✅ germanywestcentral | ✅ westus | ✅ southcentralus | ✅ westus2 |
| partial-single-node-cluster | ✅ westus3 | ✅ germanywestcentral | ✅ westus | ✅ eastus2 |
| dual-peered-single-node-cluster | ✅ eastus2 | ✅ southcentralus | ✅ southcentralus | ✅ southcentralus |
| full-multi-node-cluster | ✅ eastus2 | ✅ westus3 | ✅ westeurope | ✅ westus |
| azureml | ✅ southcentralus | ✅ westus | ✅ germanywestcentral | ✅ westus2 |
| robotics | ✅ germanywestcentral | ✅ westus | ✅ westus2 | ✅ westeurope |

**Skipped:** fabric (requires Fabric capacity), azure-local (requires Azure Local hardware)

## Changed Files

- `src/000-cloud/020-observability/scripts/import-grafana-dashboards.sh` — overwrite + retry logic
- `src/000-cloud/080-azureml/terraform/variables.tf` — keep secure default
- `blueprints/modules/robotics/terraform/main.tf` — der...
Resolves conflicts:
- import-grafana-dashboards.sh: kept dev version (retry wrapper, --overwrite)
- k3s-device-setup.sh: kept dev version (install_azure_cli helper, Entra group OID, apt lock timeouts)
- deploy-cluster-admin-oid.sh: kept deletion (refactored into k3s-device-setup.sh in PR 624)
- Add 520-video-query-api Azure Function for time-based video queries
- Upgrade 503-media-capture-service with multi-trigger, ACSA writer, continuous recorder
- Add video-capture-query blueprint with architecture diagrams
- Add ONVIF camera quickstart, deployment scripts, and testing guides
- Add ADR 006: dual-component video architecture
- Add continuous video capture ACSA sync ADR
- Update 030-data storage-account with primary_blob_endpoint output
- Fix secretlint: remove embedded basic auth credentials from RTSP URL example
- Fix cspell: add ffprobe, guestconfiguration, ultrafast, vidcap, videocapture to dictionary
- Fix shellcheck: SC2162 (read -r), SC2155 (local declare/assign), SC2181 (direct exit check), SC2002 (useless cat)
- Fix shfmt: apply consistent formatting across all 111-assets shell scripts
- Fix markdownlint MD036: convert bold emphasis to proper headings
- Fix markdownlint MD040: add language specifiers to fenced code blocks
- Fix markdownlint MD033: wrap angle-bracket placeholders in backticks
- Fix markdownlint MD025: convert duplicate H1 to H2
- Fix markdownlint MD024: rename duplicate heading
- Fix markdownlint MD029: correct ordered list numbering
- Fix cspell: add 15 domain words (Amcrest, Reolink, dtmi, anyauth, etc.)
- Fix ruff: auto-fix 48 errors + noqa S324 (non-crypto md5), S603 (trusted ffmpeg)
- Fix ruff: add S108/S603/S607 to test file ignores in .ruff.toml
- Fix shfmt: format install-ffmpeg.sh
- Fix markdown-table-formatter: 4 files reformatted
- Fix tflint: eventhubs default null → {} in full-single-node-cluster

All 8 megalinter linters validated locally before push.
- Move ONVIF quickstart to docs/getting-started/onvif-camera-quickstart.md
- Remove 4 redundant markdown files from 111-assets (1,956 lines):
  - ONVIF-CAMERA-QUICKSTART.md (1,015 lines — duplicated content)
  - terraform/ONVIF-CAMERA-DEPLOYMENT.md (465 lines — deployment log)
  - terraform/ONVIF-CAMERA-QUICK-REFERENCE.md (230 lines — overlap)
  - scripts/TESTING-ONVIF-CAMERAS.md (246 lines — overlap)
- Genericize scripts/README.md: replace vendor-specific names with placeholders
- Add ONVIF camera quickstart link to docs/getting-started/README.md
- New consolidated guide: vendor-neutral, covers Bicep/Terraform/scripts
- Fix cspell: add ultrafast to general-technical dictionary
- Fix ruff: auto-fix import sorting + line-length in data-scientist-workflow.py
- Fix yamllint: reformat media-capture-mqtt-triggered.yaml indentation and add document start

All 7 megalinter linters validated locally before push.
…-cluster

Add should_deploy_video_capture toggle, function runtime version
pass-through variables, video_capture_computed_settings local, and
Storage Blob Data Contributor role assignment to the
full-single-node-cluster blueprint.

Create video-capture-query.tfvars.example demonstrating Python 3.11
runtime configuration with mutual-exclusion documentation for
Node.js-based notification functions.
- replace datetime.utcnow() with datetime.now(UTC) across all instances
- use ManagedIdentityCredential(client_id=client_id) consistently with AZURE_CLIENT_ID fallback

🔧 - Generated by Copilot
@auyidi1 auyidi1 requested a review from a team as a code owner May 2, 2026 00:25
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Dependency Review

The following issues were found:
  • ❌ 1 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 4 package(s) with unknown licenses.
  • ⚠️ 1 packages with OpenSSF Scorecard issues.
See the Details below.

Vulnerabilities

src/500-application/520-video-query-api/requirements.txt

NameVersionVulnerabilitySeverity
cryptography43.0.3cryptography Vulnerable to a Subgroup Attack Due to Missing Subgroup Validation for SECT Curveshigh
Only included vulnerabilities with severity high or higher.

License Issues

src/500-application/520-video-query-api/requirements.txt

PackageVersionLicenseIssue Type
azure-core>= 1.29.0NullUnknown License
azure-functions>= 1.18.0NullUnknown License
azure-identity>= 1.15.0NullUnknown License
azure-storage-blob>= 12.19.0NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
cargo/md5 0.7.0 ⚠️ 2.7
Details
CheckScoreReason
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Maintained⚠️ 00 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 0
Code-Review⚠️ 0Found 1/28 approved changesets -- score normalized to 0
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
Signed-Releases⚠️ -1no releases found
License🟢 9license file detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/cryptography 43.0.3 UnknownUnknown
pip/azure-core >= 1.29.0 UnknownUnknown
pip/azure-functions >= 1.18.0 UnknownUnknown
pip/azure-identity >= 1.15.0 UnknownUnknown
pip/azure-storage-blob >= 12.19.0 UnknownUnknown

Scanned Files

  • src/500-application/503-media-capture-service/services/media-capture-service/Cargo.lock
  • src/500-application/520-video-query-api/requirements.txt

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

📚 Documentation Health Report

Generated on: 2026-05-02 00:36:09 UTC

📈 Documentation Statistics

Category File Count
Main Documentation 219
Infrastructure Components 198
Blueprints 47
GitHub Resources 43
AI Assistant Guides (Copilot) 17
Total 524

🏗️ Three-Tree Architecture Status

  • ✅ Bicep Documentation Tree: Auto-generated navigation
  • ✅ Terraform Documentation Tree: Auto-generated navigation
  • ✅ README Documentation Tree: Manual README organization

🔍 Quality Metrics

  • Frontmatter Validation:
    success
  • Link Validation: success

This report is automatically generated by the Documentation Automation workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling

2 participants