Skip to content

[feat] create client for TTS and evaluation for audio requests#175

Open
ksukrit wants to merge 1 commit intomainfrom
users/ksukrit/tts_client
Open

[feat] create client for TTS and evaluation for audio requests#175
ksukrit wants to merge 1 commit intomainfrom
users/ksukrit/tts_client

Conversation

@ksukrit
Copy link
Copy Markdown

@ksukrit ksukrit commented Mar 5, 2026

FILL IN THE PR DESCRIPTION HERE

FIX #xxxx (link existing issues this PR will resolve)

BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE


PR Checklist (Click to Expand)

Thank you for your contribution to Veeksha! Before submitting the pull request, please ensure the PR meets the following criteria. This helps Veeksha maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [Feat] for new features.
  • [Core] for changes in the core benchmarking logic
  • [CI/Build] for build or continuous integration improvements.
  • [Docs] for documentation fixes and improvements.
  • [Tests] for changes in the test suite.
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use make format to format your code.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of Veeksha. It helps user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to Veeksha. Your contributions make Veeksha a great tool for everyone!

Summary by CodeRabbit

Release Notes

  • New Features

    • Added streaming Text-to-Speech (TTS) support with multiple provider integration (Deepgram, ElevenLabs, Vajra, VoxServe, vLLM Omni)
    • Implemented comprehensive audio performance evaluation metrics including time-to-first-audio, audio duration, and real-time factor
    • Added optional audio file saving capability for generated outputs
    • Extended SLO metrics to support audio-specific performance tracking
  • Chores

    • Improved benchmark tokenizer provider configuration architecture

@ksukrit ksukrit requested a review from chus-chus March 5, 2026 00:59
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Free

Run ID: 6565b032-a7cf-42bb-9de4-edbee69d83e4

📥 Commits

Reviewing files that changed from the base of the PR and between 345ed97 and 94d710e.

📒 Files selected for processing (10)
  • veeksha/benchmark.py
  • veeksha/client/__init__.py
  • veeksha/client/registry.py
  • veeksha/client/tts.py
  • veeksha/config/client.py
  • veeksha/config/evaluator.py
  • veeksha/config/slo.py
  • veeksha/evaluator/performance/audio.py
  • veeksha/evaluator/performance/base.py
  • veeksha/types/__init__.py

📝 Walkthrough

Walkthrough

Introduces comprehensive Text-to-Speech (TTS) client support to the benchmarking framework, including a new streaming TTS client supporting multiple providers, configuration management with provider-specific validation, performance evaluation with audio metrics tracking, and artifact persistence.

Changes

Cohort / File(s) Summary
TTS Client Implementation
veeksha/client/tts.py
New streaming TTS client supporting multiple providers (deepgram, elevenlabs, vajra, voxserve, vllm_omni) with per-provider payload construction, streaming audio collection, error handling, and metrics calculation (TTFA, latency, audio duration, RTF, token counts).
TTS Client Registration & Export
veeksha/client/__init__.py, veeksha/client/registry.py, veeksha/types/__init__.py
Registers TTS client in ClientRegistry via lazy loader, exports TTSClient in public API, adds TTS enum variant to ClientType.
TTS Configuration
veeksha/config/client.py
Adds TTSClientConfig with provider validation, API endpoint/key management, and tokenizer provider factory; introduces build_tokenizer_provider() method to BaseClientConfig for abstracted tokenizer instantiation.
Audio Performance Evaluation
veeksha/evaluator/performance/audio.py, veeksha/evaluator/performance/base.py
Complete AudioPerformanceEvaluator implementation with thread-safe CDF-based metrics aggregation (TTFA, latency, audio duration, RTF, chunk counts), per-request tracking, session lifecycle events, JSONL/CSV export, WAV file persistence, and streaming output support; enhances base summary aggregation to include channel-level metrics.
Metrics & Configuration Updates
veeksha/config/slo.py, veeksha/config/evaluator.py
Expands supported SLO metrics to include "ttfa", "generated_audio_duration", "rtf"; adds save_audio_files flag to audio channel config and makes audio_channel always present by default.
Tokenizer Provider Refactoring
veeksha/benchmark.py
Replaces explicit TokenizerProvider construction with factory method call via benchmark_config.client.build_tokenizer_provider().

Sequence Diagram

sequenceDiagram
    participant User as Benchmark Harness
    participant TTS as TTSClient
    participant Provider as TTS Provider<br/>(API)
    participant Evaluator as AudioPerformanceEvaluator
    participant Storage as File System

    User->>TTS: send_request(text_request)
    activate TTS
    TTS->>TTS: _build_payload(text)
    TTS->>Provider: POST /synthesize (streaming)
    activate Provider
    Provider-->>TTS: audio chunk 1
    TTS->>TTS: collect chunk, measure ttfa
    Provider-->>TTS: audio chunk 2
    TTS->>TTS: collect chunk
    Provider-->>TTS: audio chunk N
    deactivate Provider
    TTS->>TTS: aggregate bytes, calculate metrics<br/>(duration, rtf, tokens)
    TTS-->>User: RequestResult(audio_channel, metrics)
    deactivate TTS

    User->>Evaluator: record_request_completed(response)
    activate Evaluator
    Evaluator->>Evaluator: extract AUDIO metrics<br/>from response
    Evaluator->>Evaluator: update CDF sketches<br/>(ttfa, latency, duration, rtf)
    Evaluator->>Storage: cache audio buffer<br/>(if save_audio_files)
    deactivate Evaluator

    User->>Evaluator: record_session_completed()
    activate Evaluator
    Evaluator->>Evaluator: update session metrics
    deactivate Evaluator

    User->>Evaluator: finalize()
    activate Evaluator
    Evaluator->>Storage: save JSONL metrics
    Evaluator->>Storage: save CSV per metric
    Evaluator->>Storage: generate CDF plots
    Evaluator->>Storage: save WAV files<br/>(if enabled)
    Evaluator-->>User: EvaluationResult
    deactivate Evaluator
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hark! The voice now flows in streams,
Audio metrics dance in dreams,
Multiple providers, metrics deep,
Performance data we now reap!
TTS woven in with grace,
Benchmarking at lightning pace! 🎙️

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread veeksha/config/client.py
self.additional_sampling_params
)

def build_tokenizer_provider(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to stay away from including logic in configuration classes. Can you explain why this PR creates these methods inside the various config classes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will move it out of the post init

Comment thread veeksha/config/client.py


@frozen_dataclass
class TTSClientConfig(BaseClientConfig):
Copy link
Copy Markdown
Collaborator

@chus-chus chus-chus Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to not include modality-specific clients? Is there any fundamental reason for why we can't have a multimodal client that handles audio + other modalities, just like the skeleton for the OpenAI chat suggests?

An important design decision was that, within the limits of what's possible, requests can contain and request arbitrary payload channels, which breaks if we only support modality-specific clients. That would make it a lot harder to evaluate true multimodal systems.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have a single MultimodalClient that can support Text/Audio input channels and Text/Audio output channels.

The only issue I feel with the whole thing is that there is no strong OpenAI type standard for the TextToSpeech and it's between ElevenLabs and Deepgram both of whom have their own ways of doing things, so there might be a lot of conditionals in the MultimodalClient

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. How about following the steps of the OpenAI Router client for that? It's per-request, and yes there will be conditionals but at least we don't break veeksha ux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants