microsoft · auyidi1 · Apr 20, 2026 · Apr 21, 2026 · Apr 21, 2026 · Apr 21, 2026
@@ -1595,3 +1595,4 @@ LASTEXITCODE
 scriptblock
 DSSE
 intoto
+SSML
@@ -0,0 +1,184 @@
+---
+name: tts-voiceover
+description: 'Text-to-speech voice-over generation from YAML speaker notes using Azure Speech SDK with SSML pronunciation control - Brought to you by microsoft/hve-core'
+metadata:
+  authors: "microsoft/hve-core"
+  spec_version: "1.0"
+---
+
+# TTS Voice Over Skill
+
+Generates per-slide WAV voice-over files from YAML `speaker_notes` using Azure Speech SDK with SSML pronunciation control.
+
+## Overview
+
+This skill reads `content.yaml` files from a PowerPoint skill content directory, extracts `speaker_notes` fields, applies SSML acronym aliases for correct pronunciation of technical terms, and produces one WAV file per slide. Supports dry-run mode for SSML template verification without Azure credentials.
+
+## Prerequisites
+
+* **Azure Speech resource** — Free tier provides 500K characters per month.
+* **Authentication** — Key-based (`SPEECH_KEY`) or Microsoft Entra ID (`SPEECH_RESOURCE_ID`).
+* **Python 3.11+** with `uv` for virtual environment management.
+
+### Key-Based Auth
+
+```bash
+export SPEECH_KEY="your-speech-key"
+export SPEECH_REGION="eastus"
+```
+
+### Microsoft Entra ID Auth
+
+Requires a custom domain on the Speech resource and `Cognitive Services Speech User` role.
+
+```bash
+export SPEECH_RESOURCE_ID="/subscriptions/.../Microsoft.CognitiveServices/accounts/your-resource"
+export SPEECH_REGION="eastus"
+```
+
+Install dependencies:
+
+```bash
+# run from this skill folder
+uv sync
+```
+
+## Quick Start
+
+Verify SSML templates without generating audio:
+
+```bash
+uv run scripts/generate_voiceover.py --dry-run --content-dir path/to/content
+```
+
+Generate voice-over WAV files:
+
+```bash
+uv run scripts/generate_voiceover.py --content-dir path/to/content --output-dir voice-over
+```
+
+Embed audio into a PPTX deck:
+
+```bash
+uv run scripts/embed_audio.py --input deck.pptx --audio-dir voice-over --output deck-narrated.pptx
+```
+
+## Parameters Reference
+
+### generate_voiceover.py
+
+| Parameter       | Type   | Default                             | Description                                   |
+|:----------------|:-------|:------------------------------------|:----------------------------------------------|
+| `--dry-run`     | flag   | `false`                             | Print SSML templates without generating audio |
+| `--voice`       | string | `en-US-Andrew:DragonHDLatestNeural` | Azure TTS voice name                          |
+| `--rate`        | string | `+10%`                              | Speech prosody rate                           |
+| `--content-dir` | path   | `content`                           | Path to slide content directory               |
+| `--output-dir`  | path   | `voice-over`                        | Path to WAV output directory                  |
+| `--lexicon`     | path   | *(auto-detect)*                     | Custom acronyms.yaml path                     |
+| `--verbose` / `-v` | flag | `false`                          | Enable verbose (DEBUG) logging output         |
+
+### embed_audio.py
+
+Embeds WAV files into corresponding PPTX slides and adds narration timing
+XML so PowerPoint recognizes the audio for video export via
+**File > Export > Create a Video > Use Recorded Timings and Narrations**.
+
+| Parameter     | Type | Default           | Description                  |
+|:--------------|:-----|:------------------|:-----------------------------|
+| `--input`     | path | *(required)*      | Source PPTX file path                 |
+| `--audio-dir` | path | `voice-over`      | Directory with slide-NNN.wav          |
+| `--output`    | path | `*-narrated.pptx` | Output PPTX file path                 |
+| `--verbose` / `-v` | flag | `false`      | Enable verbose (DEBUG) logging output |
+
+## Script Reference
+
+Generate with custom voice and rate:
+
+```bash
+uv run scripts/generate_voiceover.py \
+  --content-dir content \
+  --output-dir voice-over \
+  --voice "en-US-Jenny:DragonHDLatestNeural" \
+  --rate "+5%"
+```
+
+Use a custom lexicon:
+
+```bash
+uv run scripts/generate_voiceover.py \
+  --content-dir content \
+  --lexicon custom-acronyms.yaml
+```
+
+Embed generated audio:
+
+```bash
+uv run scripts/embed_audio.py \
+  --input slide-deck/presentation.pptx \
+  --audio-dir voice-over \
+  --output slide-deck/presentation-narrated.pptx
+```
+
+## Acronym Lexicon
+
+The lexicon controls SSML `<sub alias>` replacements for acronyms and technical terms. Create an `acronyms.yaml` file:
+
+```yaml
+acronyms:
+  HVE-Core: "H V E Core"
+  OWASP: "Oh wasp"
+  SBOM: "S Bomb"
+  SLSA: "Salsa"
+  CI/CD: "C I C D"
+```
+
+Lexicon resolution order:
+
+1. Path specified via `--lexicon` argument.
+2. `acronyms.yaml` in the content directory.
+3. Built-in defaults covering common technical acronyms.
+
+## SSML Template
+
+Each slide produces an SSML document:
+
+```xml
+<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US">
+  <voice name="en-US-Andrew:DragonHDLatestNeural">
+    <prosody rate="+10%">
+      Text with <sub alias="Oh wasp">OWASP</sub> aliases applied.
+    </prosody>
+  </voice>
+</speak>
+```
+
+## Integration with PowerPoint Skill
+
+This skill reads from the PowerPoint skill's content directory structure:
+
+```text
+content/
+├── slide-001/
+│   └── content.yaml    # Must include speaker_notes: field
+├── slide-002/
+│   └── content.yaml
+└── ...
+```
+
+Each `content.yaml` should contain a `speaker_notes:` field with the narration text. The generated WAV files are named `slide-NNN.wav` matching the directory names.
+
+## Troubleshooting
+
+| Issue                                                | Solution                                                                                                                       |
+|:-----------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------|
+| `Set SPEECH_KEY ... or SPEECH_RESOURCE_ID`           | Export `SPEECH_KEY` (key auth) or `SPEECH_RESOURCE_ID` (Entra ID) with `SPEECH_REGION`.                                        |
+| 401 with Entra ID auth                               | Verify custom domain on the Speech resource and `Cognitive Services Speech User` role. RBAC propagation takes up to 5 minutes. |
+| Empty WAV files or skipped slides                    | Verify `speaker_notes:` is present and non-empty in `content.yaml`.                                                            |
+| Mispronounced acronyms                               | Add entries to `acronyms.yaml` with phonetic aliases.                                                                          |
+| `azure-cognitiveservices-speech package is required` | Run `uv sync` in the skill directory.                                                                                          |
+| Audio icon visible in PPTX                           | Reposition or resize the audio object in PowerPoint after embedding.                                                           |
+| Authored slide animations missing after embedding    | `embed_audio.py` replaces existing `p:timing` with narration timing; re-apply animations in PowerPoint after embedding audio.  |
+| Video export shows "No timings recorded"             | Re-embed audio with the updated `embed_audio.py` which adds narration timing XML automatically.                                |
+
+> Brought to you by microsoft/hve-core
@@ -0,0 +1,34 @@
+[project]
+name = "tts-voiceover-skill"
+version = "0.0.0"
+requires-python = ">=3.11"
+dependencies = [
+    "azure-cognitiveservices-speech>=1.41",
+    "azure-identity>=1.19",
+    "lxml>=6.1.0",  # direct dep (embed_audio.py) and transitive via python-pptx; explicit pin ensures CVE patches
+    "python-pptx>=1.0",
+    "pyyaml>=6.0",
+]
+
+[dependency-groups]
+dev = [
+    "pytest>=9.0",
+    "pytest-cov>=5.0",
+    "pytest-mock>=3.14",
+    "ruff>=0.15",
+]
+fuzz = [
+    "atheris>=3.0",
+]
+
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+pythonpath = ["scripts"]
+python_files = ["test_*.py", "fuzz_harness.py"]
+
+[tool.ruff]
+line-length = 88
+target-version = "py311"
+
+[tool.ruff.lint]
+select = ["E", "F", "I", "W"]
@@ -0,0 +1,94 @@
+#!/usr/bin/env pwsh
+# Copyright (c) Microsoft Corporation.
+# SPDX-License-Identifier: MIT
+#Requires -Version 7.0
+#
+# Invoke-EmbedAudio.ps1
+#
+# Purpose: Wrapper that manages uv venv setup and delegates to embed_audio.py
+
+<#
+.SYNOPSIS
+    Embeds per-slide WAV voice-over files into a PowerPoint deck.
+
+.DESCRIPTION
+    Manages the Python virtual environment and invokes embed_audio.py to add
+    WAV files as embedded media objects in the corresponding slides of a PPTX file.
+
+.PARAMETER InputPath
+    Source PPTX file path. Required.
+
+.PARAMETER AudioDir
+    Directory containing slide-NNN.wav files. Defaults to voice-over.
+
+.PARAMETER OutputPath
+    Output PPTX file path. Defaults to input stem + '-narrated.pptx'.
+
+.PARAMETER SkipVenvSetup
+    Skip virtual environment creation and dependency installation.
+
+.EXAMPLE
+    ./Invoke-EmbedAudio.ps1 -InputPath deck.pptx -AudioDir voice-over
+
+.EXAMPLE
+    ./Invoke-EmbedAudio.ps1 -InputPath deck.pptx -AudioDir voice-over -OutputPath deck-narrated.pptx
+
+.NOTES
+    Part of the tts-voiceover skill. Manages uv virtual environment setup
+    and delegates to embed_audio.py for WAV embedding into PPTX slides.
+#>
+
+[CmdletBinding()]
+param(
+    [Parameter(Mandatory = $true)]
+    [ValidateNotNullOrEmpty()]
+    [string]$InputPath,
+
+    [Parameter(Mandatory = $false)]
+    [string]$AudioDir,
+
+    [Parameter(Mandatory = $false)]
+    [string]$OutputPath,
+
+    [Parameter(Mandatory = $false)]
+    [switch]$SkipVenvSetup
+)
+
+$ErrorActionPreference = 'Stop'
+
+$ScriptDir = $PSScriptRoot
+$SkillRoot = Split-Path $ScriptDir
+$VenvDir = Join-Path $SkillRoot '.venv'
+
+Import-Module (Join-Path $ScriptDir 'Modules/TtsVoiceoverHelpers.psm1') -Force
+
+#region Main
+
+if ($MyInvocation.InvocationName -ne '.') {
+
+    $null = Test-UvAvailability
+
+    if (-not $SkipVenvSetup) {
+        Initialize-PythonEnvironment -SkillRoot $SkillRoot
+    }
+
+    $python = Get-VenvPythonPath -VenvDir $VenvDir
+    if (-not (Test-Path $python)) {
+        throw "Python not found at $python. Run without -SkipVenvSetup to initialize."
+    }
+
+    $script = Join-Path $ScriptDir 'embed_audio.py'
+    $PythonArgs = @('--input', $InputPath)
+
+    if ($AudioDir) { $PythonArgs += '--audio-dir', $AudioDir }
+    if ($OutputPath) { $PythonArgs += '--output', $OutputPath }
+    if ($VerbosePreference -ne 'SilentlyContinue') { $PythonArgs += '--verbose' }
+
+    & $python $script @PythonArgs
+    if ($LASTEXITCODE -ne 0) {
+        throw "embed_audio.py exited with code $LASTEXITCODE"
+    }
+
+}
+
+#endregion Main
-Original file line number
+Diff line change
@@ Expand Up / @@ -1595,3 +1595,4 @@ LASTEXITCODE @@
     scriptblock
     DSSE
     intoto
+    SSML