Skip to content

fix: display original format name for ffmpeg files (#46)#62

Merged
systemblueteam merged 13 commits intosystemblueteam:mainfrom
jaimefgdev:fix/format-name
Mar 27, 2026
Merged

fix: display original format name for ffmpeg files (#46)#62
systemblueteam merged 13 commits intosystemblueteam:mainfrom
jaimefgdev:fix/format-name

Conversation

@jaimefgdev
Copy link
Copy Markdown
Contributor

What changed

Added a mapping dictionary to translate original file extensions into readable display names. Updated load_audio to use this mapped name for the metadata instead of the default sf.info format when a file has been temporarily transcoded to WAV via ffmpeg.

Why

Fixes an issue where compressed files (like MP3s) processed by the new ffmpeg fallback were incorrectly reported as 'WAV' in the output since the metadata was being read from the temporary file.

Closes

Closes #46

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: deb49634-29cf-4bb9-a551-98f207251883

📥 Commits

Reviewing files that changed from the base of the PR and between ab69fb9 and dda935b.

📒 Files selected for processing (2)
  • src/sounddiff/formats.py
  • tests/test_formats.py

📝 Walkthrough

Walkthrough

Adds FORMAT_DISPLAY_NAMES and updates load_audio() so AudioMetadata.format_name is taken from the original file suffix (via the mapping or uppercased fallback) when the read path differs from the original; otherwise keeps using info.format. No other control-flow changes.

Changes

Cohort / File(s) Summary
Format Name Preservation
src/sounddiff/formats.py
Added module constant FORMAT_DISPLAY_NAMES: dict[str, str] and changed load_audio() to prefer the original input file suffix (mapped or uppercased) for AudioMetadata.format_name when original_filepath != read_filepath; falls back to info.format when paths match or suffix unmapped.
Tests
tests/test_formats.py
Added test_transcoded_format_name_is_correct which mocks shutil.which, subprocess.run, and soundfile (sf.info/sf.read) to verify load_audio() reports the original format name (e.g., "MP3") even when the file is transcoded and read as WAV.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • systemblueteam

Poem

"I’m a rabbit who sniffs file tails and trails,
I hop through suffixes, follow audio trails.
No longer fooled by a temp WAV disguise,
I call MP3s, AACs by their true little names. 🐇🎶"

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing the display of original format names for ffmpeg-transcoded files, matching the core objective.
Description check ✅ Passed The description is directly related to the changeset, explaining what was added (mapping dictionary), why (fixing incorrect WAV format reporting), and the issue it closes.
Linked Issues check ✅ Passed The PR successfully implements all requirements from #46: mapping original file suffixes to display names and using those names for format_name in metadata when files are ffmpeg-transcoded.
Out of Scope Changes check ✅ Passed All changes are scoped to the issue #46 objective: FORMAT_DISPLAY_NAMES mapping, load_audio() updates to use original format names, and corresponding test coverage with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot added the audio-core Core audio analysis modules label Mar 27, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7fa166c-9adb-4f84-b8f4-0fa07b9690ea

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb6ac9 and 7a1f640.

📒 Files selected for processing (1)
  • src/sounddiff/formats.py

@coderabbitai coderabbitai bot added the enhancement New feature or request label Mar 27, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/sounddiff/formats.py (2)

69-74: ⚠️ Potential issue | 🔴 Critical

Undefined variables in format-name branch cause runtime failure.

At Line 69, original_filepath and read_filepath are not defined in load_audio(), so native files will hit a NameError before metadata creation. Also, Line 50 still rejects ffmpeg formats, so this branch cannot currently fulfill ffmpeg format-name preservation.

Proposed minimal fix
-    if original_filepath != read_filepath:
-        ext = original_filepath.suffix.lower()
-        display_format = FORMAT_DISPLAY_NAMES.get(ext, ext.lstrip('.').upper())
-    else:
-        display_format = info.format
+    display_format = info.format

18-25: 🛠️ Refactor suggestion | 🟠 Major

Add an explicit type annotation for FORMAT_DISPLAY_NAMES.

This exported constant should be explicitly typed for strict mypy compliance.

Proposed fix
-FORMAT_DISPLAY_NAMES = {
+FORMAT_DISPLAY_NAMES: dict[str, str] = {
     ".mp3": "MP3",
     ".aac": "AAC",
     ".m4a": "AAC",
     ".wma": "WMA",
     ".opus": "Opus",
 }

As per coding guidelines, "Use mypy strict mode with type hints everywhere".


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: c7d09fdd-5a77-431d-928a-1b7360be5160

📥 Commits

Reviewing files that changed from the base of the PR and between 7a1f640 and 062e766.

📒 Files selected for processing (1)
  • src/sounddiff/formats.py

@jaimefgdev
Copy link
Copy Markdown
Contributor Author

@systemblueteam All tests are passing now! CodeRabbit has a minor refactor suggestion, but the core issue is fixed and the CI is green. Ready for human review! 🚀

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Tip

For best results, initiate chat on the files or code changes.

@jaimefgdev Great to hear that all tests are passing and CI is green! I'll resolve all my review comments now. 🎉

(ノ◕ヮ◕)ノ*:・゚✧

[resolve]

systemblueteam

This comment was marked as duplicate.

Copy link
Copy Markdown
Owner

@systemblueteam systemblueteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimefgdev Good fix. Two things before merge: add a test that verifies an MP3 or M4A file shows the correct format name instead of "WAV", and add a type annotation to FORMAT_DISPLAY_NAMES.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1ca67c69-47c2-438a-9869-eb0d90349100

📥 Commits

Reviewing files that changed from the base of the PR and between 062e766 and c003cd7.

📒 Files selected for processing (2)
  • src/sounddiff/formats.py
  • tests/test_formats.py

Comment on lines +56 to +82
def test_transcoded_format_name_is_correct(self, tmp_path: Path) -> None:
"""Verifies that transcoded files show their original format name, not WAV."""
from unittest.mock import patch

# 1. Creamos un archivo MP3 falso para pasar la validación de path.exists()
fake_mp3 = tmp_path / "test.mp3"
fake_mp3.write_text("fake audio content")

# 2. Simulamos que ffmpeg existe y que la lectura del WAV temporal funciona
with patch("sounddiff.formats.shutil.which", return_value="ffmpeg"), \
patch("sounddiff.formats.subprocess.run"), \
patch("sounddiff.formats.sf.info") as mock_info, \
patch("sounddiff.formats.sf.read") as mock_read:

# Configuramos el mock para simular lo que devolvería el WAV temporal
class MockInfo:
format = "WAV"
subtype = "PCM_16"

mock_info.return_value = MockInfo()
mock_read.return_value = (np.zeros((100, 2), dtype=np.float64), 44100)

# 3. Llamamos a tu función
_, meta = load_audio(fake_mp3)

# 4. LA COMPROBACIÓN FINAL: Debe decir MP3 y no WAV
assert meta.format_name == "MP3"
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Test logic is sound; consider translating comments to English.

The test correctly validates the core fix by mocking the ffmpeg transcode flow and asserting that format_name reflects the original format ("MP3") rather than the intermediate WAV. Good isolation with tmp_path and proper mock setup.

The inline comments are in Spanish; for consistency with the English codebase, consider translating them:

📝 Suggested comment translations
-        # 1. Creamos un archivo MP3 falso para pasar la validación de path.exists()
+        # 1. Create a fake MP3 file to pass path.exists() validation
         fake_mp3 = tmp_path / "test.mp3"
         fake_mp3.write_text("fake audio content")

-        # 2. Simulamos que ffmpeg existe y que la lectura del WAV temporal funciona
+        # 2. Simulate ffmpeg existing and temp WAV read succeeding
         with patch("sounddiff.formats.shutil.which", return_value="ffmpeg"), \
              ...

-            # Configuramos el mock para simular lo que devolvería el WAV temporal
+            # Configure mock to simulate what the temp WAV would return
             ...

-            # 3. Llamamos a tu función
+            # 3. Call the function under test
             _, meta = load_audio(fake_mp3)

-            # 4. LA COMPROBACIÓN FINAL: Debe decir MP3 y no WAV
+            # 4. Final assertion: must report MP3, not WAV
             assert meta.format_name == "MP3"

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/test_formats.py (1)

108-108: ⚠️ Potential issue | 🟡 Minor

Fix trailing whitespace and restore final newline at EOF.

Line 108 still contains whitespace-only content, and the file is missing the trailing newline (W293, W292).

Suggested diff
@@
-        
+
@@
-        assert format_channels(6) == "6ch"
+        assert format_channels(6) == "6ch"
+

As per coding guidelines, **/*.py: Use Ruff for linting and formatting. Do not use black, isort, or flake8.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: a7d0364c-e443-49b3-9a9c-be0167e639a6

📥 Commits

Reviewing files that changed from the base of the PR and between c003cd7 and ab69fb9.

📒 Files selected for processing (1)
  • tests/test_formats.py

@systemblueteam
Copy link
Copy Markdown
Owner

CI is failing on a lint check — two files are missing a trailing newline at the end. Run ruff check --fix . to auto-fix both, then push. Once that's clean the PR looks good.

@jaimefgdev
Copy link
Copy Markdown
Contributor Author

Finally got the CI green! Fixed the trailing newlines manually. The Labeler action is still failing due to repo permissions, but the code is ready to merge.

@systemblueteam systemblueteam merged commit dff05ee into systemblueteam:main Mar 27, 2026
3 of 4 checks passed
Copy link
Copy Markdown
Owner

@systemblueteam systemblueteam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All requested changes addressed. Tests cover the transcoded format name case, type annotation is in place, CI green. Merging.

Comment on lines +56 to +82
def test_transcoded_format_name_is_correct(self, tmp_path: Path) -> None:
"""Verifies that transcoded files show their original format name, not WAV."""
from unittest.mock import patch

# 1. Creamos un archivo MP3 falso para pasar la validación de path.exists()
fake_mp3 = tmp_path / "test.mp3"
fake_mp3.write_text("fake audio content")

# 2. Simulamos que ffmpeg existe y que la lectura del WAV temporal funciona
with patch("sounddiff.formats.shutil.which", return_value="ffmpeg"), \
patch("sounddiff.formats.subprocess.run"), \
patch("sounddiff.formats.sf.info") as mock_info, \
patch("sounddiff.formats.sf.read") as mock_read:

# Configuramos el mock para simular lo que devolvería el WAV temporal
class MockInfo:
format = "WAV"
subtype = "PCM_16"

mock_info.return_value = MockInfo()
mock_read.return_value = (np.zeros((100, 2), dtype=np.float64), 44100)

# 3. Llamamos a tu función
_, meta = load_audio(fake_mp3)

# 4. LA COMPROBACIÓN FINAL: Debe decir MP3 y no WAV
assert meta.format_name == "MP3"
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@systemblueteam
Copy link
Copy Markdown
Owner

@jaimefgdev yeah sorry about the ci check. I'm consolidating everything into a single check. Hopefully that'll keep things simple in the future. Also what do you think about the code rabbit/sentry reviews? Does it feel overwhelming for you personally or are you okay with it?

@systemblueteam systemblueteam mentioned this pull request Mar 27, 2026
@jaimefgdev
Copy link
Copy Markdown
Contributor Author

Thanks for the merge! Regarding the AI reviews, they can be a bit tedious sometimes (especially with the strict formatting), but I think they are necessary to make sure everything works correctly before the final review. It’s great news that you're consolidating the CI checks, that will definitely make things smoother in the future!

@jaimefgdev
Copy link
Copy Markdown
Contributor Author

Also, is there anything else you'd like me to help with that isn't currently in the issues? I'm free for any other tasks or features you might have in mind.

@systemblueteam
Copy link
Copy Markdown
Owner

@jaimefgdev feel free to jump in anytime on what we have, but yeah I am actually working on another project right now that I'm contemplating open sourcing. If you want to talk about it feel free to shoot me an email: shelton@systemblue.io. I'm actively looking for some collaborators that would be willing to get their hands dirty so if you want just ping me in an email.

also...for the record, I'm an Arsenal fan. So we'll be seeing Real Madrid soon in CL lol. Assuming you're a Madrid fan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audio-core Core audio analysis modules enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show original format name for FFmpeg-transcoded files

2 participants