[feat] create client for TTS and evaluation for audio requests by ksukrit · Pull Request #175 · project-vajra/veeksha

ksukrit · 2026-03-05T00:59:26Z

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE

PR Checklist (Click to Expand)

Thank you for your contribution to Veeksha! Before submitting the pull request, please ensure the PR meets the following criteria. This helps Veeksha maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[Feat] for new features.
[Core] for changes in the core benchmarking logic
[CI/Build] for build or continuous integration improvements.
[Docs] for documentation fixes and improvements.
[Tests] for changes in the test suite.
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

Pass all linter checks. Please use make format to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of Veeksha. It helps user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to Veeksha. Your contributions make Veeksha a great tool for everyone!

Summary by CodeRabbit

Release Notes

New Features
- Added streaming Text-to-Speech (TTS) support with multiple provider integration (Deepgram, ElevenLabs, Vajra, VoxServe, vLLM Omni)
- Implemented comprehensive audio performance evaluation metrics including time-to-first-audio, audio duration, and real-time factor
- Added optional audio file saving capability for generated outputs
- Extended SLO metrics to support audio-specific performance tracking
Chores
- Improved benchmark tokenizer provider configuration architecture

coderabbitai · 2026-03-05T00:59:38Z

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 6565b032-a7cf-42bb-9de4-edbee69d83e4

📥 Commits

Reviewing files that changed from the base of the PR and between 345ed97 and 94d710e.

📒 Files selected for processing (10)

veeksha/benchmark.py
veeksha/client/__init__.py
veeksha/client/registry.py
veeksha/client/tts.py
veeksha/config/client.py
veeksha/config/evaluator.py
veeksha/config/slo.py
veeksha/evaluator/performance/audio.py
veeksha/evaluator/performance/base.py
veeksha/types/__init__.py

📝 Walkthrough

Walkthrough

Introduces comprehensive Text-to-Speech (TTS) client support to the benchmarking framework, including a new streaming TTS client supporting multiple providers, configuration management with provider-specific validation, performance evaluation with audio metrics tracking, and artifact persistence.

Changes

Cohort / File(s)	Summary
TTS Client Implementation `veeksha/client/tts.py`	New streaming TTS client supporting multiple providers (deepgram, elevenlabs, vajra, voxserve, vllm_omni) with per-provider payload construction, streaming audio collection, error handling, and metrics calculation (TTFA, latency, audio duration, RTF, token counts).
TTS Client Registration & Export `veeksha/client/__init__.py`, `veeksha/client/registry.py`, `veeksha/types/__init__.py`	Registers TTS client in ClientRegistry via lazy loader, exports TTSClient in public API, adds TTS enum variant to ClientType.
TTS Configuration `veeksha/config/client.py`	Adds TTSClientConfig with provider validation, API endpoint/key management, and tokenizer provider factory; introduces build_tokenizer_provider() method to BaseClientConfig for abstracted tokenizer instantiation.
Audio Performance Evaluation `veeksha/evaluator/performance/audio.py`, `veeksha/evaluator/performance/base.py`	Complete AudioPerformanceEvaluator implementation with thread-safe CDF-based metrics aggregation (TTFA, latency, audio duration, RTF, chunk counts), per-request tracking, session lifecycle events, JSONL/CSV export, WAV file persistence, and streaming output support; enhances base summary aggregation to include channel-level metrics.
Metrics & Configuration Updates `veeksha/config/slo.py`, `veeksha/config/evaluator.py`	Expands supported SLO metrics to include "ttfa", "generated_audio_duration", "rtf"; adds save_audio_files flag to audio channel config and makes audio_channel always present by default.
Tokenizer Provider Refactoring `veeksha/benchmark.py`	Replaces explicit TokenizerProvider construction with factory method call via benchmark_config.client.build_tokenizer_provider().

Sequence Diagram

sequenceDiagram
    participant User as Benchmark Harness
    participant TTS as TTSClient
    participant Provider as TTS Provider<br/>(API)
    participant Evaluator as AudioPerformanceEvaluator
    participant Storage as File System

    User->>TTS: send_request(text_request)
    activate TTS
    TTS->>TTS: _build_payload(text)
    TTS->>Provider: POST /synthesize (streaming)
    activate Provider
    Provider-->>TTS: audio chunk 1
    TTS->>TTS: collect chunk, measure ttfa
    Provider-->>TTS: audio chunk 2
    TTS->>TTS: collect chunk
    Provider-->>TTS: audio chunk N
    deactivate Provider
    TTS->>TTS: aggregate bytes, calculate metrics<br/>(duration, rtf, tokens)
    TTS-->>User: RequestResult(audio_channel, metrics)
    deactivate TTS

    User->>Evaluator: record_request_completed(response)
    activate Evaluator
    Evaluator->>Evaluator: extract AUDIO metrics<br/>from response
    Evaluator->>Evaluator: update CDF sketches<br/>(ttfa, latency, duration, rtf)
    Evaluator->>Storage: cache audio buffer<br/>(if save_audio_files)
    deactivate Evaluator

    User->>Evaluator: record_session_completed()
    activate Evaluator
    Evaluator->>Evaluator: update session metrics
    deactivate Evaluator

    User->>Evaluator: finalize()
    activate Evaluator
    Evaluator->>Storage: save JSONL metrics
    Evaluator->>Storage: save CSV per metric
    Evaluator->>Storage: generate CDF plots
    Evaluator->>Storage: save WAV files<br/>(if enabled)
    Evaluator-->>User: EvaluationResult
    deactivate Evaluator

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hark! The voice now flows in streams,
Audio metrics dance in dreams,
Multiple providers, metrics deep,
Performance data we now reap!
TTS woven in with grace,
Benchmarking at lightning pace! 🎙️

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chus-chus · 2026-03-05T11:36:03Z

                self.additional_sampling_params
            )

+    def build_tokenizer_provider(self):


We are trying to stay away from including logic in configuration classes. Can you explain why this PR creates these methods inside the various config classes?

Sure will move it out of the post init

chus-chus · 2026-03-05T11:43:36Z

+
+
+@frozen_dataclass
+class TTSClientConfig(BaseClientConfig):


Would it be possible to not include modality-specific clients? Is there any fundamental reason for why we can't have a multimodal client that handles audio + other modalities, just like the skeleton for the OpenAI chat suggests?

An important design decision was that, within the limits of what's possible, requests can contain and request arbitrary payload channels, which breaks if we only support modality-specific clients. That would make it a lot harder to evaluate true multimodal systems.

I think we can have a single MultimodalClient that can support Text/Audio input channels and Text/Audio output channels.

The only issue I feel with the whole thing is that there is no strong OpenAI type standard for the TextToSpeech and it's between ElevenLabs and Deepgram both of whom have their own ways of doing things, so there might be a lot of conditionals in the MultimodalClient

I see. How about following the steps of the OpenAI Router client for that? It's per-request, and yes there will be conditionals but at least we don't break veeksha ux.

initial commit

94d710e

ksukrit requested a review from chus-chus March 5, 2026 00:59

chus-chus reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] create client for TTS and evaluation for audio requests#175

[feat] create client for TTS and evaluation for audio requests#175
ksukrit wants to merge 1 commit intomainfrom
users/ksukrit/tts_client

ksukrit commented Mar 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

chus-chus Mar 5, 2026

Uh oh!

ksukrit Mar 5, 2026

Uh oh!

chus-chus Mar 5, 2026 •

edited

Loading

Uh oh!

ksukrit Mar 5, 2026

Uh oh!

chus-chus Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@frozen_dataclass
		class TTSClientConfig(BaseClientConfig):

Conversation

ksukrit commented Mar 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title and Classification

Code Quality

Notes for Large Changes

Thank You

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

chus-chus Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

ksukrit Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

chus-chus Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ksukrit Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

chus-chus Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksukrit commented Mar 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading

chus-chus Mar 5, 2026 •

edited

Loading