Listen to the overview demo, explore the interface, and get a better feel for how Audiobook Studio sounds in practice.
The live showcase also includes the fuller feature and cost comparison with ElevenLabs.
Audiobook Studio is a local-first production app for turning manuscripts into polished audiobooks with AI voices you control.
It is built for real long-form work, not just one-click text-to-speech. You can assign voices to characters, repair individual segments, requeue partial chapters, build portable voice profiles, and assemble finished books without sending your manuscript or cloned voices to a paid cloud service.
XTTS remains the private local-default engine. Voxtral is available as an optional cloud voice engine after you add your own Mistral API key in Settings.
Important
This is the current recommended release line for new users. It keeps XTTS as the private local-default path, adds optional Voxtral support behind Settings, and includes the latest patch-line fixes for persisted voice defaults, clearer XTTS startup visibility, and post-1.8.0 workflow stability.
What's New In The Current Release
## What's New In The Current Release- Project and chapter voice choices now persist correctly: Voice selections now save through the API, survive refreshes, and properly clear back to inherited defaults when you choose the default option again.
- Default voice labels are clearer: Default options now show the actual effective fallback voice in parentheses using the same display-name logic as the rest of the picker.
- XTTS first-run setup is easier to understand: Worker output now surfaces more Hugging Face and model-download progress so long first-run preparation looks like active work instead of a silent stall.
- Patch-line stability work continues: This release keeps the local-first XTTS path and optional Voxtral support while tightening the voice workflow and speeding up PR validation.
If you want the earlier patch-release details for chunking, playback, and Windows startup fixes, see the changelog.
- Produce full audiobooks locally with XTTS-based voice cloning and chapter assembly.
- Fix only what changed instead of regenerating an entire book every time one sentence sounds wrong.
- Assign different voices to dialogue and narration inside the same chapter.
- Keep your data private because manuscripts, voice samples, and outputs stay on your machine.
- Avoid recurring usage costs while still getting a workflow closer to professional narration tools.
Most TTS tools stop at "paste text and generate audio."
Audiobook Studio is built around the messy reality of audiobook production:
- pronunciations need hand-tuning
- dialogue needs different voices
- chapters need partial rebuilds
- queueing and progress need to stay understandable
- finished books need real assembly and export
This project gives you a production surface for that work, not just a synthesis endpoint.
ElevenLabs is a strong product. It has polished voices, a fast cloud workflow, and useful Studio tools for multi-voice generation and paragraph-level regeneration.
Audiobook Studio is strongest in different places:
- No recurring generation subscription
- Private, local-first workflow
- You own the manuscript, voices, and output
- Corrections do not keep charging you
If you are building a real audiobook instead of a few test clips, those differences matter a lot.
| Category | Audiobook Studio | ElevenLabs Studio |
|---|---|---|
| Cost over time | Free to run after setup, local hardware cost only | Subscription and credit based |
| Privacy | Local-first, files stay on your machine | Cloud workflow |
| Ownership | Local project files and local voice assets | Platform account workflow |
| Voice assignment | Character and segment based editing inside your project | Section, paragraph, and character assignment in Studio |
| Repair workflow | Local segment repair, partial chapter requeue, and production review | Paragraph or word regeneration in the cloud |
| Setup | More involved | Easier to start |
| Baseline polish | Good with careful samples and tuning | Usually stronger out of the box |
Hosted voice generation can get expensive fast for full-length books, especially when you factor in corrections and custom voices.
Show Details
Using ElevenLabs public pricing and credit rules as of March 24, 2026:
- Starter:
$5/monthfor30kcredits - Creator:
$22/monthfor100kcredits - Pro:
$99/monthfor500kcredits - Scale:
$330/monthfor2Mcredits - Flash/Turbo models:
1 text character = 0.5 credits - Other models:
1 text character = 1 credit
For a fair comparison, this table uses a 600,000 character book as a full-length example.
| Production type | Minimum realistic plan | Credit rule | Effective cost per 1,000 chars | Clean 600k-char pass | 600k chars with moderate corrections (1.5x) |
|---|---|---|---|---|---|
| Standard single voice | Starter | 0.5 credits/char |
about $0.08 |
about $50 in effective usage |
about $75 in effective usage |
| Custom cloned voice | Creator | 0.5 credits/char |
about $0.11 |
about $66 in effective usage |
about $99 in effective usage |
| Higher-cost models | Creator | 1 credit/char |
about $0.22 |
about $132 in effective usage |
about $198 in effective usage |
And this is what the real-world monthly spend often looks like when you actually need enough credits to finish the book in a normal production cycle:
| Scenario | Credits needed | Likely plan needed in practice | Monthly spend |
|---|---|---|---|
| 600k chars, Flash/Turbo clean pass | 300k |
Pro | $99 |
| 600k chars, Flash/Turbo with moderate corrections | 450k |
Pro | $99 |
| 600k chars, Flash/Turbo with heavy iteration | 600k |
Scale or multiple months | $330 or multiple months |
| 600k chars, higher-cost model clean pass | 600k |
Scale or multiple months | $330 or multiple months |
| 600k chars, higher-cost model with corrections | 900k |
Scale | $330 |
That is where Audiobook Studio becomes especially compelling:
- you do not hesitate to fix a pronunciation
- you do not pay extra to test another take
- you can iterate freely without watching credits
If you want the longer written breakdown, see the wiki page: Comparison and Cost. If you want the more visual version, open the Live Showcase.
| Feature | What it enables |
|---|---|
| Multi-voice production | Assign speaker voices to characters, narration, or paragraph groups. |
| Segment-level repair | Regenerate only the lines that changed instead of redoing a whole chapter. |
| Portable voice profiles | Keep previews, latent cache, and voice profile assets together. |
| Voice variants | Build multiple styles of the same voice, such as Default, Angry, or Calm. |
| Production queue | Queue chapters, watch progress live, and recover cleanly from interruptions. |
| Audiobook assembly | Export finished chapter audio into long-form outputs with ffmpeg-based tooling. |
| Optional cloud voices | Keep XTTS fully local, or unlock Voxtral with your own Mistral API key when you want hosted TTS. |
| Local-first privacy | XTTS stays private by default; Voxtral remains explicit and opt-in. |
- Import or create a project.
- Split the manuscript into chapters.
- Build or import voice profiles.
- Assign voices to narration and characters.
- Generate chapters, inspect the performance view, and repair only the lines that need work.
- Assemble the finished project into audiobook outputs.
If you want a fully local workflow, keep your voices on XTTS (Local). If you want to try Voxtral, add a Mistral API key in Settings first and switch only the voices you want to Voxtral (Cloud).
| Project Workflow | Voice Lab |
|---|---|
![]() |
![]() |
| Chapter Production | Queue and Progress |
|---|---|
![]() |
![]() |
- macOS, Linux, or Windows
- Python
3.10+ - Node.js
18+ ffmpeg- NVIDIA GPU recommended for faster local synthesis
This is the recommended path for new users.
On macOS or Linux, the easiest way to start is:
git clone https://github.com/senigami/audiobook-studio.git
cd audiobook-studio
./run.shOn Windows PowerShell, use:
git clone https://github.com/senigami/audiobook-studio.git
cd audiobook-studio
powershell -ExecutionPolicy Bypass -File .\run.ps1The startup scripts will:
- create or update the main
venv - create or update the XTTS environment at
~/xtts-env - automatically recreate the XTTS environment if it detects stale legacy Coqui packages that would break voice builds
- install frontend dependencies if needed
- build the frontend if needed
- start the app on
http://127.0.0.1:8123
If you are evaluating Audiobook Studio for the first time, this is the path to use. The separate manual backend/frontend flow is still documented below, but the launcher scripts are now the intended onboarding experience.
Useful options:
./run.sh --setup-only
./run.sh --no-reload
./run.sh --port 9000powershell -ExecutionPolicy Bypass -File .\run.ps1 -SetupOnly
powershell -ExecutionPolicy Bypass -File .\run.ps1 -NoReload
powershell -ExecutionPolicy Bypass -File .\run.ps1 -Port 9000###Manual Install
Manual Install Steps
-
Clone and Backend Setup
Create the primary environment for the web server and project management.git clone https://github.com/senigami/audiobook-studio.git cd audiobook-studio python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
XTTS Inference Setup
XTTS requires a separate environment to avoid dependency conflicts. The app expects this at~/xtts-envby default (configurable inapp/config.py).python3 -m venv ~/xtts-env source ~/xtts-env/bin/activate pip install -r requirements-xtts.txt
-
Frontend Build
The UI must be built before it can be served by the backend.cd frontend npm install npm run build cd ..
You only need to activate the primary venv to start the server. The XTTS environment is managed automatically as a subprocess.
source venv/bin/activate
uvicorn run:app --port 8123Then open http://127.0.0.1:8123.
Note
On first run, the application creates the current app roots it needs immediately, including projects/ and voices/. Other folders are created only when those features are used. On fresh installs, loose chapter text now defaults to chapters/. Older workspaces that already use chapters_out/, xtts_audio/, or audiobooks/ still continue to work.
Audiobook Studio supports reusable voice profiles across more than one engine.
- Add raw
.wavsamples to a voice profile - Build a preview voice
- Create variants for different delivery styles
- Rebuild only when samples change
- Keep the voice profile and latent cache together for portability
Voice profiles now carry their own engine assignment:
XTTS (Local)keeps generation on your machineVoxtral (Cloud)appears only after you add a Mistral API key in Settings- mixed-engine chapters can use XTTS and Voxtral voices together when segments call for different profiles
Voice profiles can also now work as lightweight starter assets. A reusable profile can ship with:
profile.jsonlatent.pth- an optional preview like
sample.mp3
That means starter voices do not have to bundle every original training wav just to be usable.
The app now treats voice profiles as production assets, not throwaway cache entries.
This is where the app really shines.
- Edit chapter text and invalidate only the audio that should change
- Watch progress in the chapter view instead of getting thrown into a separate queue screen
- Resume partially-rendered work instead of starting from zero
- Rebuild completed chapters intentionally with confirmation
- Review paragraph groupings, playback, and character assignments inside the production workflow
Audiobook Studio is designed for local-first production.
Your manuscript, chapter text, voice samples, latent files, and rendered audio stay under your control on your own machine.
If you enable Voxtral (Cloud), preview text, render text, and any selected reference audio for Voxtral requests are sent to Mistral for synthesis. That mode is optional and stays hidden unless you explicitly add your own API key.
If you want the deeper walkthroughs, they are here:
- Getting Started
- Library and Projects
- Voices and Voice Profiles
- Queue and Jobs
- Recording Guide
- Comparison and Cost
- Troubleshooting and FAQ
- Full Wiki
If you want a quick visual walkthrough before installing:
./venv/bin/python -m pytest -q
npm -C frontend run lint -- --ext .tsx,.ts
npm -C frontend run buildDistributed under the MIT License. See LICENSE for more information.
Build audiobooks locally. Repair them like a studio. Keep your voices and manuscript under your control.




