Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #287 +/- ##
========================================
- Coverage 1.54% 1.50% -0.05%
========================================
Files 11 11
Lines 1102 1132 +30
Branches 162 170 +8
========================================
Hits 17 17
- Misses 1085 1115 +30 ☔ View full report in Codecov by Sentry. |
benoit74
left a comment
There was a problem hiding this comment.
I'm a bit puzzled about this issue/PR now.
Original goal I expressed in the issue was to avoid being blocked by yt-dlp ban when subtitles did not changed. Goal is not reached in this PR since we still need to get the list of subtitles with yt-dlp.
At the same time, caching as well the list of subtitles is probably not something wishable, we usually do not cache in S3 the responses to API calls, but rather resources which take time/resources to recompute (reencoded videos / images). The idea of caching subtitles in S3 was probably already a deviation from this usual behavior.
To help sort this out, can you please share some data about how this change makes the scraper run faster or not?
The reason I used yt-dlp to get the list of subtitles is that the scraper doesn't know which language subtitles need to be downloaded. Without the To avoid calling yt-dlp entirely in this scenario, we could save two zipped files in the S3 cache:
WDYT? |
|
This seems a potential improvement, but does it really help the scraper run faster? Because it has the drawback that we do not know when we should invalidate these to update them. |
|
Let's pause this issue/PR to let me reflect a bit on this |
This PR modifies the
download_subtitlesmethod to cache subtitles in S3. The modified method now works as follows:yt-dlp(e.g.,en,fr,de) and store them inrequested_subtitle_keys.requested_subtitle_keysand attempt to download each subtitle file from the S3 cache. If a file is - successfully downloaded from the S3 cache, remove the corresponding key fromrequested_subtitle_keys.requested_subtitle_keys, download the subtitles usingyt-dlp.Close #277