Refactor: Consolidate duplicate functions in Python codebase

## Summary

Code review identified several duplicate function implementations across the Python codebase that should be consolidated into shared modules.

## True Duplicates (should be consolidated)

### 1. `wav_to_array` - Exact duplicate
- **Locations:** `whisper_server/server.py:49` and `chunking.py:74`
- **Description:** Nearly identical code for converting WAV to numpy array
- **Recommendation:** Extract to `lib/audio.py`

### 2. `read_codec` - Exact duplicate
- **Locations:** `whisper_server/server.py:67` and `chunking.py:119`
- **Description:** Nearly identical code for reading audio codecs via ffmpeg
- **Recommendation:** Extract to `lib/audio.py`

### 3. `combine_chunks_to_wav` - Similar implementation
- **Locations:** `diarization_worker.py:196` and `play.py:103`
- **Description:** Both combine opus chunks into WAV with gap/silence handling
- **Recommendation:** Extract common logic to shared module

### 4. `mongo_cursor` - Duplicate in playground
- **Locations:** `lib/worker.py:52` (shared utility) and `playground.local.py:16`
- **Recommendation:** The playground version should import from `lib/worker`

## Similar but different (may be intentional)

### 5. `format_eta` / `_format_eta`
- `stt.py:167` returns `"02:15:30"` format
- `diarization_worker.py:127` returns `"0:15:30"` (timedelta string)
- **Note:** Different output formats - may be intentional

### 6. `get_worker_id` / `_get_worker_id`
- `lib/worker.py:47` returns `"hostname_pid"`
- `processors/vad.py:35` returns `"py-vad:hostname:pid"` (different prefix)
- **Note:** Different formats to distinguish worker types - likely intentional

## Proposed Solution

1. Create `lib/audio.py` with shared audio functions (`wav_to_array`, `read_codec`, potentially `combine_chunks_to_wav`)
2. Update `whisper_server/server.py` and `chunking.py` to import from `lib/audio.py`
3. Fix `playground.local.py` to import `mongo_cursor` from `lib/worker`
4. Consider consolidating `format_eta` functions if consistent output format is acceptable

## Benefits
- Reduced code duplication
- Single source of truth for audio processing utilities
- Easier maintenance and bug fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: Consolidate duplicate functions in Python codebase #13

Summary

True Duplicates (should be consolidated)

1. `wav_to_array` - Exact duplicate

2. `read_codec` - Exact duplicate

3. `combine_chunks_to_wav` - Similar implementation

4. `mongo_cursor` - Duplicate in playground

Similar but different (may be intentional)

5. `format_eta` / `_format_eta`

6. `get_worker_id` / `_get_worker_id`

Proposed Solution

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor: Consolidate duplicate functions in Python codebase #13

Description

Summary

True Duplicates (should be consolidated)

1. wav_to_array - Exact duplicate

2. read_codec - Exact duplicate

3. combine_chunks_to_wav - Similar implementation

4. mongo_cursor - Duplicate in playground

Similar but different (may be intentional)

5. format_eta / _format_eta

6. get_worker_id / _get_worker_id

Proposed Solution

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `wav_to_array` - Exact duplicate

2. `read_codec` - Exact duplicate

3. `combine_chunks_to_wav` - Similar implementation

4. `mongo_cursor` - Duplicate in playground

5. `format_eta` / `_format_eta`

6. `get_worker_id` / `_get_worker_id`