feat: show total time in the playlist#442
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a feature to calculate and display the total duration of all videos in a YouTube playlist that are successfully added to the ZIM file. The implementation parses ISO 8601 duration strings from the YouTube API and logs the formatted total duration to help users understand the content volume.
Changes:
- Added
parse_iso_durationfunction to parse YouTube's ISO 8601 duration format (e.g., 'PT2H3M4S') into seconds - Added
log_total_durationmethod to calculate and log the total duration of all videos in a human-readable format - Integrated the duration logging into the
make_json_filesmethod after video filtering
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
scraper/src/youtube2zim/scraper.py
Outdated
| skip_non_public_videos, | ||
| skip_outofrange_videos, | ||
| ) | ||
| import re |
There was a problem hiding this comment.
The re module is imported twice - once at line 14 and again at line 89. Remove this duplicate import statement.
| import re |
scraper/src/youtube2zim/scraper.py
Outdated
| def parse_iso_duration(duration_str): | ||
| """ | ||
| Parses a Youtube duration string (e.g 'PT2H3M4S') into seconds. | ||
| Returns 0 if the format is invalid. | ||
| """ | ||
| if not duration_str: | ||
| return 0 | ||
|
|
||
| match = re.match(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?', duration_str) | ||
| if not match: | ||
| return 0 | ||
| h = int(match.group(1) or 0) | ||
| m = int(match.group(2) or 0) | ||
| s = int(match.group(3) or 0) | ||
| return (h * 3600) + (m * 60) + s |
There was a problem hiding this comment.
The parse_iso_duration function should be moved to the utils.py module instead of being defined at module level in scraper.py. This would improve code organization by grouping similar utility functions together and make the function reusable across different modules if needed in the future.
scraper/src/youtube2zim/scraper.py
Outdated
| duration_str = f"{int(s)}s" | ||
| if m > 0 or h > 0: | ||
| duration_str = f"{int(m)}m {duration_str}" | ||
| if h > 0: | ||
| duration_str = f"{int(h)}h {duration_str}" |
There was a problem hiding this comment.
The duration formatting logic has an issue. When the total duration is exactly 1 minute or more but has 0 seconds, the output will be "1m 0s" instead of just "1m". The condition if m > 0 or h > 0 should check if there are remaining seconds before appending seconds to the output. Consider restructuring: build from hours down, and only append seconds if they're non-zero or if it's the only unit.
| duration_str = f"{int(s)}s" | |
| if m > 0 or h > 0: | |
| duration_str = f"{int(m)}m {duration_str}" | |
| if h > 0: | |
| duration_str = f"{int(h)}h {duration_str}" | |
| parts = [] | |
| if h > 0: | |
| parts.append(f"{int(h)}h") | |
| if m > 0: | |
| parts.append(f"{int(m)}m") | |
| # Only include seconds if they are non-zero or if no other unit was added | |
| if s > 0 or not parts: | |
| parts.append(f"{int(s)}s") | |
| duration_str = " ".join(parts) |
|
Closing in favor of #435 which is mostly ready and way closer to what we wanna achieve |
Calculates and logs the total duration of all videos added to the ZIM file. Fixes #401
I implemented the calculation inside the
make_json_filesmethod. The calculation is performed withinmake_json_files(after the video list is filtered).Reasoning: This is the only point in the pipeline where we have both the final filtered list of videos (after failed downloads are removed) and the video metadata (duration from
videos_channels). Calculating it earlier would risk including videos that failed to download.sample input:
youtube2zim --api-key "*********" --id "PLt1GTeualZPRblJV3jr0RYY0rokf65oif" --name "daa-chuu-test5" --output ~/kiwix/zims --skip-reencodingsample output: