From bb9b3f65f0dc8a545a19f3eadfa453cb8e86aa3d Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Sat, 14 Mar 2026 19:41:12 +0000
Subject: [PATCH 1/2] Initial plan


From 47f5006ec87753d80d26038805c0e5a924eec2fb Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Sat, 14 Mar 2026 19:43:53 +0000
Subject: [PATCH 2/2] Fix audio output: change WAV references to MP3 throughout
 README

Co-authored-by: lmangani <1423657+lmangani@users.noreply.github.com>
---
 README.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 6f56a87..e62e9ac 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # Ace-Step Audio Generation Action
 
 A GitHub Action that generates music from text prompts using [Ace-Step 1.5](https://github.com/ACE-Step/ACE-Step-1.5) via the native [acestep.cpp](https://github.com/audiohacking/acestep.cpp) engine.  
-Text + optional lyrics in, stereo 48 kHz WAV out.
+Text + optional lyrics in, stereo 48 kHz MP3 out.
 
 **No Python. No PyTorch. No waiting.**  
 The pre-built Docker image ships with compiled `ace-qwen3`/`dit-vae` binaries **and** all ~7.7 GB of pre-quantized GGUF models baked in — action execution starts immediately.
@@ -54,7 +54,7 @@ jobs:
       Melodies that go around
     duration: '30'
     seed: '42'
-    output_path: 'generated_music.wav'
+    output_path: 'generated_music.mp3'
 ```
 
 ### Upload the result as an artifact
@@ -87,7 +87,7 @@ Audio generation is **skipped** when `understand` is set.
   id: analyze
   uses: audiohacking/acestep-action@main
   with:
-    understand: '/github/workspace/output.wav'
+    understand: '/github/workspace/output.mp3'
 
 - name: Show analysis
   run: echo '${{ steps.analyze.outputs.understand_result }}'
@@ -104,14 +104,14 @@ Audio generation is **skipped** when `understand` is set.
 | `inference_steps` | Number of DiT inference steps | No | `8` |
 | `shift` | Flow-matching shift parameter | No | `3` |
 | `vocal_language` | Vocal language code (`en`, `fr`, …) | No | `en` |
-| `output_path` | Output path for the generated WAV file | No | `output.wav` |
+| `output_path` | Output path for the generated MP3 file | No | `output.mp3` |
 | `understand` | Local file path or URL (http/https) to an MP3 or WAV file to analyze (activates understand mode — skips generation) | No | _(empty)_ |
 
 ## Outputs
 
 | Output | Description |
 |--------|-------------|
-| `audio_file` | Path to the generated WAV audio file |
+| `audio_file` | Path to the generated MP3 audio file |
 | `generation_time` | Time taken to generate the audio in seconds |
 | `understand_result` | JSON from `ace-understand`: caption, lyrics, BPM, key, duration, language |
 
@@ -134,8 +134,8 @@ At runtime the entrypoint (`src/entrypoint.sh`):
 **Generation mode** (default — when `understand` is not set):
 1. Builds a request JSON from inputs
 2. Runs `ace-qwen3` (LLM stage: caption → enriched JSON with lyrics + audio codes)
-3. Runs `dit-vae` (DiT + VAE stage: JSON → stereo 48 kHz WAV)
-4. Moves the output WAV to the requested path in `$GITHUB_WORKSPACE`
+3. Runs `dit-vae` (DiT + VAE stage: JSON → stereo 48 kHz MP3)
+4. Moves the output MP3 to the requested path in `$GITHUB_WORKSPACE`
 
 **Understand mode** (when `understand` is provided):
 1. If a URL (http/https/ftp/file) is given, downloads the audio file; if a local path is given, uses it directly