linear spectrogram encoder/decoder + image pipeline by sankdoy · Pull Request #4 · Bebra777228/Audio-Steganography

sankdoy · 2026-01-15T14:15:28Z

switch to linear STFT for encoding/decoding, auto-fit text height, add image normalization/inversion + preview, add noise/flutter controls, fix UI scroll and prevent text clipping, improve spectrogram color rendering

Summary by CodeRabbit

New Features
- Added customizable flutter and noise effects for text-to-audio conversion
- Introduced image-to-audio encoding capability
- Added audio-to-spectrogram decoding with preview visualization
- Enhanced spectrogram rendering with improved visual quality
- Expanded UI with new parameter controls for text rendering customization

_{✏️ Tip: You can customize this high-level summary in your review settings.}

switch to linear STFT for encoding/decoding, auto-fit text height, add image normalisation/inversion + preview, add noise/flutter controls, fix UI scroll and prevent text clipping, improve spectrogram colour rendering

…-+-image-pipeline feat/encoder-ui-fixes

coderabbitai · 2026-01-15T14:15:42Z

📝 Walkthrough

Walkthrough

The txt2spec.py module is significantly enhanced with font discovery, text measurement utilities, and image transformation helpers. The text-to-spectrogram-image pipeline gains configurable flutter and noise effects. Audio processing is extended with hop_length parameter and linear spectrogram generation. New end-to-end functions orchestrate image generation and audio synthesis, while the Gradio UI expands to expose rendering parameters.

Changes

Cohort / File(s)	Summary
Font handling and text measurement `txt2spec.py`	Introduces font discovery via FONT_PATHS and find_font_path, replaces single DEFAULT_FONT_PATH. Adds load_font with fallback support, measure_text for text dimensions, and fit_font_size to adapt font size to target height.
Image transformation utilities `txt2spec.py`	Adds apply_flutter for flutter effects, add_text_noise for noise injection, spectrogram_to_rgb for colormap conversion, auto_invert_image for contrast handling, normalize_image for scaling, and prepare_image_spectrogram for dimension adjustment with minimum width enforcement.
Spectrogram rendering pipeline `txt2spec.py`	Reworks text_to_spectrogram_image with safe margins, per-character random flutter, and proper centering. Extends spectrogram_image_to_audio with hop_length parameter and dual return (audio, n_fft). Adds save_linear_spectrogram for PNG preview generation.
End-to-end encoding/decoding `txt2spec.py`	Introduces create_audio_with_spectrogram for orchestrated rendering with flutter/noise options, create_audio_from_image for image-to-audio encoding, and gradio interface functions (gradio_interface_fn, gradio_image_to_audio_fn, gradio_decode_fn) exposing new parameters.
Gradio UI expansion `txt2spec.py`	Updates UI to expose flutter_px, noise_amount, and noise_strength sliders in Text→Audio tab. Adapts Image→Audio and Audio→Spectrogram flows to new backend utilities with CSS/theme refinements.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Gradio UI
    participant Text Pipeline
    participant Image Proc
    participant Audio Synth
    
    User->>Gradio UI: Input text + flutter/noise params
    Gradio UI->>Text Pipeline: gradio_interface_fn()
    Text Pipeline->>Text Pipeline: text_to_spectrogram_image()
    Text Pipeline->>Text Pipeline: fit_font_size() → measure_text()
    Text Pipeline->>Image Proc: load_font(), apply_flutter(), add_text_noise()
    Image Proc->>Image Proc: normalize_image(), prepare_image_spectrogram()
    Image Proc->>Text Pipeline: spectrogram image
    Text Pipeline->>Audio Synth: spectrogram_image_to_audio()
    Audio Synth->>Audio Synth: librosa conversion
    Audio Synth->>Gradio UI: audio + spectrogram preview
    Gradio UI->>User: Display audio & visualization

sequenceDiagram
    actor User
    participant Gradio UI
    participant Image Pipeline
    participant Audio Synth
    
    User->>Gradio UI: Upload image
    Gradio UI->>Image Pipeline: gradio_image_to_audio_fn()
    Image Pipeline->>Image Pipeline: prepare_image_spectrogram()
    Image Pipeline->>Image Pipeline: normalize_image() → spectrogram prep
    Image Pipeline->>Audio Synth: spectrogram_image_to_audio()
    Audio Synth->>Audio Synth: Convert to audio
    Audio Synth->>Gradio UI: audio output
    Gradio UI->>User: Play audio

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Text flutters and dances with noise so fine,
Spectrograms bloom in colors divine,
From images born, new audio flows,
A rabbit's code where creativity grows! 🎵

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: introducing a linear spectrogram encoder/decoder and an image processing pipeline, which are the primary focuses of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@txt2spec.py`:
- Around line 153-154: Replace the deprecated call cm.get_cmap(cmap_name) with
the supported API; using the existing import change it to
cm.colormaps.get_cmap(cmap_name) so the code becomes cmap =
cm.colormaps.get_cmap(cmap_name) before computing rgb from norm, leaving the
subsequent rgb = (cmap(norm)[:, :, :3] * 255).astype(np.uint8) unchanged.

🧹 Nitpick comments (8)

txt2spec.py (8)
11-27: Cross-platform font discovery is limited.

The font path list is heavily macOS-centric. Consider adding common Windows and additional Linux paths for broader compatibility:
 FONT_PATHS = [
     os.path.expanduser("~/Library/Fonts/Druk Wide.otf"),
     # ... existing macOS paths ...
     "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
+    "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf",
+    "/usr/share/fonts/TTF/DejaVuSans-Bold.ttf",
+    # Windows paths
+    os.path.expandvars(r"%WINDIR%\Fonts\arial.ttf"),
+    os.path.expandvars(r"%WINDIR%\Fonts\arialbd.ttf"),
 ]
53-59: Broad exception catch is acceptable but could be narrowed.

The catch-all Exception handles font loading failures gracefully. For better diagnostics, consider catching more specific exceptions like OSError or IOError which truetype() typically raises.

183-183: Unnecessary int() call.

In Python 3, round() with a single argument already returns an int, making the outer int() call redundant.
-    new_width = max(1, int(round(width * scale)))
+    new_width = max(1, round(width * scale))
231-238: Consider adding strict=True to zip().

Since text_widths is derived from text in fit_font_size, they should always have matching lengths. Adding strict=True (Python 3.10+) would catch any future bugs if this invariant breaks:
-    for char, char_width in zip(text, text_widths):
+    for char, char_width in zip(text, text_widths, strict=True):
245-250: Unused sr parameter is misleading.

The sr parameter is declared but never used in the function body. librosa.griffinlim doesn't require sample rate. This makes the API misleading since callers might expect sr to affect the reconstruction.

Consider either:

Removing sr from the signature and documenting that sample rate is only needed when writing the audio file

Adding a docstring clarifying that sr is unused but kept for API consistency
-def spectrogram_image_to_audio(image, sr=DEFAULT_SAMPLE_RATE, hop_length=DEFAULT_HOP_LENGTH):
+def spectrogram_image_to_audio(image, sr=DEFAULT_SAMPLE_RATE, hop_length=DEFAULT_HOP_LENGTH):  # noqa: ARG001
+    """Convert spectrogram image to audio. Note: sr is unused by Griffin-Lim but kept for API consistency."""
252-260: Unused sr parameter and potential temp file accumulation.

The sr parameter is declared but unused (same issue as spectrogram_image_to_audio).

Temporary files created with delete=False will accumulate over multiple runs. Consider implementing cleanup or documenting that callers are responsible for deletion.

286-292: Temp files accumulate across multiple pipeline runs.

Both create_audio_with_spectrogram and create_audio_from_image (and save_linear_spectrogram) create temp files with delete=False. In a long-running Gradio session, this could fill up temp storage.

Consider either:

Implementing periodic cleanup

Using a session-scoped temp directory that gets cleaned on app restart

Documenting that users should periodically clear temp files

360-361: Hidden sliders could be simplified.

These sliders are hidden (visible=False) but still passed through the interface. Consider removing them from the UI and passing the defaults directly in gradio_interface_fn if they're not meant to be user-configurable:
def gradio_interface_fn(text, max_font_size, margin, letter_spacing, flutter_px, noise_amount, noise_strength):
    return create_audio_with_spectrogram(
        text, DEFAULT_BASE_WIDTH, DEFAULT_SPEC_HEIGHT, max_font_size, ...
    )
This simplifies the UI component graph.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac8cd32 and cd151ac.

📒 Files selected for processing (1)

txt2spec.py

🧰 Additional context used

🪛 Ruff (0.14.11)

txt2spec.py

57-57: Do not catch blind exception: Exception

(BLE001)

180-180: Avoid specifying long messages outside the exception class

(TRY003)

183-183: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)

231-231: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

245-245: Unused function argument: sr

(ARG001)

252-252: Unused function argument: sr

(ARG001)

300-300: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)

txt2spec.py (7)
61-74: LGTM!

The text measurement approach correctly handles custom letter spacing by measuring characters individually. The empty text fallback to a space is a reasonable edge-case handler.

76-101: LGTM!

The binary search approach for font size fitting is efficient and handles edge cases well (no font path, nothing fits).

103-124: LGTM!

The flutter implementation with smoothed random offsets creates a natural wavy effect. The bounds checking correctly handles edge cases where shifts exceed image height.

126-137: LGTM!

The noise application correctly targets only text pixels (non-zero) and properly combines Gaussian noise with random dropout. The early return condition correctly requires both parameters to be disabled.

157-172: LGTM!

The auto-inversion heuristic and percentile-based normalization are robust approaches for handling varied input images. The 1st-99th percentile clipping effectively handles outliers.

313-341: LGTM!

Clean separation between Gradio interface wrappers and core logic. The thin wrapper pattern makes the core functions testable independently.

418-418: share=True creates a public URL.

Using share=True creates a publicly accessible URL through Gradio's sharing service. Ensure this is intentional, especially if deployed in production. For local-only use:
-txt2spec.launch(share=True)
+txt2spec.launch(share=False)  # or make configurable via environment variable

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

txt2spec.py

sankdoy added 2 commits January 15, 2026 14:11

feat/encoder-ui-fixes

4c8a387

switch to linear STFT for encoding/decoding, auto-fit text height, add image normalisation/inversion + preview, add noise/flutter controls, fix UI scroll and prevent text clipping, improve spectrogram colour rendering

Merge pull request #1 from sankdoy/linear-spectrogram-encoder/decoder…

cd151ac

…-+-image-pipeline feat/encoder-ui-fixes

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

txt2spec.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linear spectrogram encoder/decoder + image pipeline#4

linear spectrogram encoder/decoder + image pipeline#4
sankdoy wants to merge 2 commits intoBebra777228:mainfrom
sankdoy:main

sankdoy commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sankdoy commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sankdoy commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading