-
Notifications
You must be signed in to change notification settings - Fork 172
feat(skills): add tts-voiceover skill for Azure Speech SDK voice-over generation #1415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
auyidi1
wants to merge
71
commits into
main
Choose a base branch
from
users/auyidi/tts-voiceover-skill
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
71 commits
Select commit
Hold shift + click to select a range
8d0f709
feat(skills): add tts-voiceover skill for Azure Speech SDK voice-over…
auyidi1 24fe74b
fix(skills): address PR review findings for tts-voiceover skill
auyidi1 91a5233
fix(skills): resolve second PR review for tts-voiceover skill
auyidi1 456eaf3
style(skills): format long lines in tts-voiceover Python scripts
auyidi1 84f00eb
fix(skills): revert external formatter and apply ruff format
auyidi1 6df83d1
fix(skills): add pytest-cov, fuzz corpus, and fix plugin freshness
auyidi1 711f37e
fix(skills): add Copilot footer to tts-voiceover SKILL.md
auyidi1 22dcc91
fix(skills): pin dependency versions and fix lxml CVE-2026-41066
auyidi1 74838c8
fix(skills): address all PR review findings for tts-voiceover skill
auyidi1 af868c3
fix(ci): allow certifi and charset-normalizer licenses in dependency …
auyidi1 c01dbbf
fix(skills): restore required Copilot footer in tts-voiceover SKILL.md
auyidi1 3cc7b1e
fix(skills): address final PR review comments for tts-voiceover
auyidi1 482565b
feat(skills): add docs and Pester tests for tts-voiceover skill
auyidi1 bf3ab3d
fix(skills): address PR review comments for tts-voiceover skill
auyidi1 c0b4e4f
fix(skills): add script name and purpose headers to PS1 wrappers
auyidi1 4412f6c
fix(skills): fix markdown lint and ruff line-length violations
auyidi1 d432388
fix(skills): apply ruff format and table formatting for CI
auyidi1 2e0455b
fix(skills): add narration timing to embed_audio for PowerPoint video…
auyidi1 1eecef2
fix(skills): address PR review findings for tts-voiceover
4e7ed66
fix(collections): add missing maturity: experimental for vscode-playw…
2973bcd
fix(docs): use pathname:// protocol for out-of-scope SKILL.md link
d4c2bc6
chore(plugins): regenerate plugins after collection maturity updates
4479aff
fix(skills): add missing Copilot footer to tts-voiceover SKILL.md
b0b136e
style(docs): fix markdown table formatting in tts-voiceover guide
7bd1052
chore(plugins): regenerate plugin READMEs after SKILL.md footer update
cc54806
Merge branch 'main' into users/auyidi/tts-voiceover-skill
auyidi1 022c725
fix(skills): address latest PR review for tts-voiceover
171cb0f
test(skills): add test_embed_audio.py for tts-voiceover skill
7c28ef5
fix(skills): address final review items for tts-voiceover
d7c1b83
style(skills): rename test methods to BDD format per python-test conv…
e987592
fix(skills): correct buffer comment and tighten assertion in test_sho…
e56b4a2
fix(skills): add pytest-mock and migrate to mocker fixture
259a19f
style(skills): move #Requires after copyright headers per PS conventions
d167b6a
test(skills): add test_generate_voiceover.py for tts-voiceover skill
1bc89da
fix(skills): log exception type in embed_slide_audio catch block
c89fb96
fix(skills): remove non-standard metadata fields from SKILL.md frontm…
05502c5
fix(skills): return EXIT_FAILURE when audio synthesis fails for any s…
3761c67
fix(skills): address final review items for tts-voiceover
b934e49
docs(skills): add input contract and lexicon constraint to apply_acro…
ec423ea
fix(skills): return False when audio shape not found, add type-safe l…
ebdb8dd
fix(skills): move Copilot footer above attribution so attribution is …
db59e6d
fix(skills): XML-escape fuzz inputs, use Slide type hint, remove unus…
0c84590
fix(skills): add ValidateNotNullOrEmpty, fix Pester skip, cache regex
79b35f1
style(skills): remove non-standard module-level synopsis block from T…
204fdd6
fix(skills): fix sidebar_position collision and add AAA test structure
19f5d0c
fix(skills): wrap token refresh in try/except, fix OutputType convention
6086ee8
refactor(skills): co-locate Pester test inside tts-voiceover skill pa…
617550a
docs(skills): clarify lxml is a direct and transitive dependency
52899f2
fix(skills): align embed_audio exit code with generate_voiceover on p…
aa42509
Merge branch 'main' into users/auyidi/tts-voiceover-skill
auyidi1 ea52f30
fix(skills): address PR review feedback for tts-voiceover
528baf2
fix(skills): address additional tts-voiceover review feedback
d3a05ea
fix(skills): clean up orphaned audio shape and reorder _run before main
27b54da
fix(skills): address CodeQL finding and review feedback
35a4721
refactor(skills): capture add_movie() return value, remove _find_audi…
95a3b0a
fix(skills): address review feedback — license, types, tests, guard
92aedea
refactor(skills): extract configure_logging in embed_audio.py
f524234
fix(skills): wire verbose flag, replace assert, explicit Mandatory
0f819d5
refactor(skills): extract configure_logging and add --verbose to gene…
9fb9d1a
fix(skills): add diagnostic log when no audio files are embedded
6e31263
Merge branch 'main' into users/auyidi/tts-voiceover-skill
auyidi1 1343bc9
fix(skills): address tts-voiceover review feedback round 11
f94e8ab
fix(skills): address tts-voiceover review feedback round 12
fbb38bf
fix(skills): address tts-voiceover review feedback round 13
1363016
fix(skills): drop unused xmlns:a and set advClick=0 for audio-driven …
701d1c9
fix(skills): add word boundaries to acronym regex to prevent partial …
c892868
fix(skills): move configure_logging before _run and document --verbos…
d4a5857
fix(skills): clean up partial WAV on failure and use static timing XM…
5986338
fix(skills): add defensive warnings for missing timing template elements
9adda51
fix(skills): move timing template to module level and tighten shape_i…
f8c4ccc
fix(skills): guard credential None at token refresh and use WAV file …
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1595,3 +1595,4 @@ LASTEXITCODE | |
| scriptblock | ||
| DSSE | ||
| intoto | ||
| SSML | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,184 @@ | ||
| --- | ||
| name: tts-voiceover | ||
| description: 'Text-to-speech voice-over generation from YAML speaker notes using Azure Speech SDK with SSML pronunciation control - Brought to you by microsoft/hve-core' | ||
| metadata: | ||
| authors: "microsoft/hve-core" | ||
| spec_version: "1.0" | ||
| --- | ||
|
auyidi1 marked this conversation as resolved.
|
||
|
|
||
| # TTS Voice Over Skill | ||
|
|
||
| Generates per-slide WAV voice-over files from YAML `speaker_notes` using Azure Speech SDK with SSML pronunciation control. | ||
|
|
||
| ## Overview | ||
|
|
||
| This skill reads `content.yaml` files from a PowerPoint skill content directory, extracts `speaker_notes` fields, applies SSML acronym aliases for correct pronunciation of technical terms, and produces one WAV file per slide. Supports dry-run mode for SSML template verification without Azure credentials. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| * **Azure Speech resource** — Free tier provides 500K characters per month. | ||
| * **Authentication** — Key-based (`SPEECH_KEY`) or Microsoft Entra ID (`SPEECH_RESOURCE_ID`). | ||
| * **Python 3.11+** with `uv` for virtual environment management. | ||
|
|
||
| ### Key-Based Auth | ||
|
|
||
| ```bash | ||
| export SPEECH_KEY="your-speech-key" | ||
| export SPEECH_REGION="eastus" | ||
| ``` | ||
|
|
||
| ### Microsoft Entra ID Auth | ||
|
|
||
| Requires a custom domain on the Speech resource and `Cognitive Services Speech User` role. | ||
|
|
||
| ```bash | ||
|
auyidi1 marked this conversation as resolved.
|
||
| export SPEECH_RESOURCE_ID="/subscriptions/.../Microsoft.CognitiveServices/accounts/your-resource" | ||
| export SPEECH_REGION="eastus" | ||
| ``` | ||
|
|
||
| Install dependencies: | ||
|
|
||
| ```bash | ||
| # run from this skill folder | ||
| uv sync | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| Verify SSML templates without generating audio: | ||
|
|
||
| ```bash | ||
| uv run scripts/generate_voiceover.py --dry-run --content-dir path/to/content | ||
| ``` | ||
|
|
||
| Generate voice-over WAV files: | ||
|
|
||
| ```bash | ||
| uv run scripts/generate_voiceover.py --content-dir path/to/content --output-dir voice-over | ||
| ``` | ||
|
|
||
| Embed audio into a PPTX deck: | ||
|
|
||
| ```bash | ||
| uv run scripts/embed_audio.py --input deck.pptx --audio-dir voice-over --output deck-narrated.pptx | ||
| ``` | ||
|
|
||
| ## Parameters Reference | ||
|
|
||
| ### generate_voiceover.py | ||
|
|
||
| | Parameter | Type | Default | Description | | ||
| |:----------------|:-------|:------------------------------------|:----------------------------------------------| | ||
| | `--dry-run` | flag | `false` | Print SSML templates without generating audio | | ||
| | `--voice` | string | `en-US-Andrew:DragonHDLatestNeural` | Azure TTS voice name | | ||
| | `--rate` | string | `+10%` | Speech prosody rate | | ||
| | `--content-dir` | path | `content` | Path to slide content directory | | ||
| | `--output-dir` | path | `voice-over` | Path to WAV output directory | | ||
| | `--lexicon` | path | *(auto-detect)* | Custom acronyms.yaml path | | ||
|
auyidi1 marked this conversation as resolved.
|
||
| | `--verbose` / `-v` | flag | `false` | Enable verbose (DEBUG) logging output | | ||
|
|
||
| ### embed_audio.py | ||
|
|
||
| Embeds WAV files into corresponding PPTX slides and adds narration timing | ||
| XML so PowerPoint recognizes the audio for video export via | ||
| **File > Export > Create a Video > Use Recorded Timings and Narrations**. | ||
|
|
||
| | Parameter | Type | Default | Description | | ||
| |:--------------|:-----|:------------------|:-----------------------------| | ||
| | `--input` | path | *(required)* | Source PPTX file path | | ||
| | `--audio-dir` | path | `voice-over` | Directory with slide-NNN.wav | | ||
| | `--output` | path | `*-narrated.pptx` | Output PPTX file path | | ||
| | `--verbose` / `-v` | flag | `false` | Enable verbose (DEBUG) logging output | | ||
|
|
||
| ## Script Reference | ||
|
auyidi1 marked this conversation as resolved.
|
||
|
|
||
| Generate with custom voice and rate: | ||
|
|
||
| ```bash | ||
| uv run scripts/generate_voiceover.py \ | ||
| --content-dir content \ | ||
| --output-dir voice-over \ | ||
| --voice "en-US-Jenny:DragonHDLatestNeural" \ | ||
| --rate "+5%" | ||
| ``` | ||
|
|
||
| Use a custom lexicon: | ||
|
|
||
| ```bash | ||
| uv run scripts/generate_voiceover.py \ | ||
| --content-dir content \ | ||
| --lexicon custom-acronyms.yaml | ||
| ``` | ||
|
|
||
| Embed generated audio: | ||
|
|
||
| ```bash | ||
| uv run scripts/embed_audio.py \ | ||
| --input slide-deck/presentation.pptx \ | ||
| --audio-dir voice-over \ | ||
| --output slide-deck/presentation-narrated.pptx | ||
| ``` | ||
|
|
||
| ## Acronym Lexicon | ||
|
|
||
| The lexicon controls SSML `<sub alias>` replacements for acronyms and technical terms. Create an `acronyms.yaml` file: | ||
|
|
||
| ```yaml | ||
| acronyms: | ||
| HVE-Core: "H V E Core" | ||
| OWASP: "Oh wasp" | ||
| SBOM: "S Bomb" | ||
| SLSA: "Salsa" | ||
| CI/CD: "C I C D" | ||
| ``` | ||
|
|
||
| Lexicon resolution order: | ||
|
|
||
| 1. Path specified via `--lexicon` argument. | ||
| 2. `acronyms.yaml` in the content directory. | ||
| 3. Built-in defaults covering common technical acronyms. | ||
|
|
||
| ## SSML Template | ||
|
|
||
| Each slide produces an SSML document: | ||
|
|
||
| ```xml | ||
| <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" | ||
| xmlns:mstts="http://www.w3.org/2001/mstts" xml:lang="en-US"> | ||
| <voice name="en-US-Andrew:DragonHDLatestNeural"> | ||
| <prosody rate="+10%"> | ||
| Text with <sub alias="Oh wasp">OWASP</sub> aliases applied. | ||
| </prosody> | ||
| </voice> | ||
| </speak> | ||
| ``` | ||
|
|
||
| ## Integration with PowerPoint Skill | ||
|
|
||
| This skill reads from the PowerPoint skill's content directory structure: | ||
|
|
||
| ```text | ||
| content/ | ||
| ├── slide-001/ | ||
| │ └── content.yaml # Must include speaker_notes: field | ||
| ├── slide-002/ | ||
| │ └── content.yaml | ||
| └── ... | ||
| ``` | ||
|
|
||
| Each `content.yaml` should contain a `speaker_notes:` field with the narration text. The generated WAV files are named `slide-NNN.wav` matching the directory names. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| | Issue | Solution | | ||
| |:-----------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------| | ||
| | `Set SPEECH_KEY ... or SPEECH_RESOURCE_ID` | Export `SPEECH_KEY` (key auth) or `SPEECH_RESOURCE_ID` (Entra ID) with `SPEECH_REGION`. | | ||
| | 401 with Entra ID auth | Verify custom domain on the Speech resource and `Cognitive Services Speech User` role. RBAC propagation takes up to 5 minutes. | | ||
| | Empty WAV files or skipped slides | Verify `speaker_notes:` is present and non-empty in `content.yaml`. | | ||
| | Mispronounced acronyms | Add entries to `acronyms.yaml` with phonetic aliases. | | ||
| | `azure-cognitiveservices-speech package is required` | Run `uv sync` in the skill directory. | | ||
| | Audio icon visible in PPTX | Reposition or resize the audio object in PowerPoint after embedding. | | ||
| | Authored slide animations missing after embedding | `embed_audio.py` replaces existing `p:timing` with narration timing; re-apply animations in PowerPoint after embedding audio. | | ||
| | Video export shows "No timings recorded" | Re-embed audio with the updated `embed_audio.py` which adds narration timing XML automatically. | | ||
|
|
||
| > Brought to you by microsoft/hve-core | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| [project] | ||
| name = "tts-voiceover-skill" | ||
| version = "0.0.0" | ||
| requires-python = ">=3.11" | ||
|
auyidi1 marked this conversation as resolved.
|
||
| dependencies = [ | ||
| "azure-cognitiveservices-speech>=1.41", | ||
| "azure-identity>=1.19", | ||
| "lxml>=6.1.0", # direct dep (embed_audio.py) and transitive via python-pptx; explicit pin ensures CVE patches | ||
| "python-pptx>=1.0", | ||
| "pyyaml>=6.0", | ||
| ] | ||
|
auyidi1 marked this conversation as resolved.
|
||
|
|
||
| [dependency-groups] | ||
| dev = [ | ||
| "pytest>=9.0", | ||
| "pytest-cov>=5.0", | ||
| "pytest-mock>=3.14", | ||
|
auyidi1 marked this conversation as resolved.
|
||
| "ruff>=0.15", | ||
| ] | ||
|
auyidi1 marked this conversation as resolved.
|
||
| fuzz = [ | ||
| "atheris>=3.0", | ||
| ] | ||
|
|
||
| [tool.pytest.ini_options] | ||
| testpaths = ["tests"] | ||
| pythonpath = ["scripts"] | ||
| python_files = ["test_*.py", "fuzz_harness.py"] | ||
|
|
||
| [tool.ruff] | ||
| line-length = 88 | ||
| target-version = "py311" | ||
|
|
||
| [tool.ruff.lint] | ||
| select = ["E", "F", "I", "W"] | ||
94 changes: 94 additions & 0 deletions
94
.github/skills/experimental/tts-voiceover/scripts/Invoke-EmbedAudio.ps1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,94 @@ | ||
| #!/usr/bin/env pwsh | ||
|
auyidi1 marked this conversation as resolved.
|
||
| # Copyright (c) Microsoft Corporation. | ||
| # SPDX-License-Identifier: MIT | ||
| #Requires -Version 7.0 | ||
| # | ||
| # Invoke-EmbedAudio.ps1 | ||
| # | ||
| # Purpose: Wrapper that manages uv venv setup and delegates to embed_audio.py | ||
|
|
||
| <# | ||
| .SYNOPSIS | ||
| Embeds per-slide WAV voice-over files into a PowerPoint deck. | ||
|
|
||
| .DESCRIPTION | ||
| Manages the Python virtual environment and invokes embed_audio.py to add | ||
| WAV files as embedded media objects in the corresponding slides of a PPTX file. | ||
|
|
||
| .PARAMETER InputPath | ||
| Source PPTX file path. Required. | ||
|
|
||
| .PARAMETER AudioDir | ||
| Directory containing slide-NNN.wav files. Defaults to voice-over. | ||
|
|
||
| .PARAMETER OutputPath | ||
| Output PPTX file path. Defaults to input stem + '-narrated.pptx'. | ||
|
|
||
| .PARAMETER SkipVenvSetup | ||
| Skip virtual environment creation and dependency installation. | ||
|
|
||
| .EXAMPLE | ||
| ./Invoke-EmbedAudio.ps1 -InputPath deck.pptx -AudioDir voice-over | ||
|
|
||
| .EXAMPLE | ||
| ./Invoke-EmbedAudio.ps1 -InputPath deck.pptx -AudioDir voice-over -OutputPath deck-narrated.pptx | ||
|
|
||
| .NOTES | ||
| Part of the tts-voiceover skill. Manages uv virtual environment setup | ||
| and delegates to embed_audio.py for WAV embedding into PPTX slides. | ||
| #> | ||
|
auyidi1 marked this conversation as resolved.
|
||
|
|
||
| [CmdletBinding()] | ||
| param( | ||
| [Parameter(Mandatory = $true)] | ||
| [ValidateNotNullOrEmpty()] | ||
| [string]$InputPath, | ||
|
auyidi1 marked this conversation as resolved.
|
||
|
|
||
| [Parameter(Mandatory = $false)] | ||
| [string]$AudioDir, | ||
|
|
||
| [Parameter(Mandatory = $false)] | ||
| [string]$OutputPath, | ||
|
|
||
| [Parameter(Mandatory = $false)] | ||
| [switch]$SkipVenvSetup | ||
| ) | ||
|
|
||
| $ErrorActionPreference = 'Stop' | ||
|
|
||
| $ScriptDir = $PSScriptRoot | ||
| $SkillRoot = Split-Path $ScriptDir | ||
| $VenvDir = Join-Path $SkillRoot '.venv' | ||
|
|
||
| Import-Module (Join-Path $ScriptDir 'Modules/TtsVoiceoverHelpers.psm1') -Force | ||
|
|
||
| #region Main | ||
|
|
||
|
auyidi1 marked this conversation as resolved.
|
||
| if ($MyInvocation.InvocationName -ne '.') { | ||
|
|
||
| $null = Test-UvAvailability | ||
|
|
||
|
auyidi1 marked this conversation as resolved.
|
||
| if (-not $SkipVenvSetup) { | ||
| Initialize-PythonEnvironment -SkillRoot $SkillRoot | ||
| } | ||
|
|
||
| $python = Get-VenvPythonPath -VenvDir $VenvDir | ||
| if (-not (Test-Path $python)) { | ||
| throw "Python not found at $python. Run without -SkipVenvSetup to initialize." | ||
| } | ||
|
|
||
| $script = Join-Path $ScriptDir 'embed_audio.py' | ||
| $PythonArgs = @('--input', $InputPath) | ||
|
|
||
|
auyidi1 marked this conversation as resolved.
|
||
| if ($AudioDir) { $PythonArgs += '--audio-dir', $AudioDir } | ||
| if ($OutputPath) { $PythonArgs += '--output', $OutputPath } | ||
| if ($VerbosePreference -ne 'SilentlyContinue') { $PythonArgs += '--verbose' } | ||
|
|
||
| & $python $script @PythonArgs | ||
| if ($LASTEXITCODE -ne 0) { | ||
| throw "embed_audio.py exited with code $LASTEXITCODE" | ||
| } | ||
|
|
||
| } | ||
|
|
||
| #endregion Main | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.