Modernize codebase: auto-detect GPU/CPU, replace print with logging, update deps#78
Modernize codebase: auto-detect GPU/CPU, replace print with logging, update deps#78
Conversation
There was a problem hiding this comment.
Code Review
This pull request modernizes the repository by updating to Python 3, replacing print statements with a structured logging configuration, and transitioning from requirements.txt to Conda environment files. Key improvements include updating ffmpeg subprocess calls for better error handling, removing deprecated torch.autograd.Variable usage, and refactoring the scenedetect integration to support newer API versions. Review feedback highlights a potential TypeError when logging PyTorch tensors without converting them to scalars and identifies compatibility issues with weights_only=True in torch.load when handling full model pickles.
SyncNetInstance.py
Outdated
| logger.info('AV offset: \t%d', offset) | ||
| logger.info('Min dist: \t%.3f', minval) | ||
| logger.info('Confidence: \t%.3f', conf) |
There was a problem hiding this comment.
The logger.info calls will fail with a TypeError because offset, minval, and conf are PyTorch tensors. The %d and %.3f format specifiers require real numbers (int/float), and PyTorch tensors do not automatically convert to these types when passed as arguments to the logging methods (which use the % operator internally). Use .item() to convert these 0-dimensional tensors to Python scalars.
| logger.info('AV offset: \t%d', offset) | |
| logger.info('Min dist: \t%.3f', minval) | |
| logger.info('Confidence: \t%.3f', conf) | |
| logger.info('AV offset: \t%d', offset.item()) | |
| logger.info('Min dist: \t%.3f', minval.item()) | |
| logger.info('Confidence: \t%.3f', conf.item()) |
SyncNetModel.py
Outdated
| net = torch.load(filename) | ||
| return net; | ||
| net = torch.load(filename, weights_only=True) |
There was a problem hiding this comment.
Using weights_only=True here is inconsistent with the save function on line 9, which uses torch.save(model, f) to save the entire model object. weights_only=True is intended for loading state dictionaries and will fail to unpickle custom classes like S. To maintain compatibility with the save function in this same file, weights_only should be set to False, or the save function should be updated to save only the state_dict.
| net = torch.load(filename, weights_only=True) | |
| net = torch.load(filename, weights_only=False) |
|
|
||
| def loadParameters(self, path): | ||
| loaded_state = torch.load(path, map_location=lambda storage, loc: storage); | ||
| loaded_state = torch.load(path, map_location=lambda storage, loc: storage, weights_only=True) |
There was a problem hiding this comment.
Setting weights_only=True may break loading for existing pretrained models if they were saved as full model objects (pickles) rather than state dictionaries. Given that SyncNetModel.save defaults to saving the full model, it is likely that many .model files in this ecosystem are full pickles. If you want to ensure backward compatibility with such files, set weights_only=False.
| loaded_state = torch.load(path, map_location=lambda storage, loc: storage, weights_only=True) | |
| loaded_state = torch.load(path, map_location=lambda storage, loc: storage, weights_only=False) |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request modernizes the repository by migrating to Python 3, replacing print statements with the logging module, and introducing conda environment files for dependency management. Significant changes include updating PySceneDetect API usage, improving FFmpeg command execution via subprocess.run, and enhancing security by using weights_only=True in torch.load. Review feedback identifies a missed logger conversion in SyncNetModel.py and a redundant file-opening operation in run_pipeline.py.
SyncNetModel.py
Outdated
| torch.save(model, f); | ||
| print("%s saved."%filename); | ||
| torch.save(model, f) | ||
| print(f"{filename} saved.") |
There was a problem hiding this comment.
The pull request description states that all print() calls were replaced with the logging module across every script. However, this print() call remains. For consistency and to adhere to the modernization goal, please replace it with a logger call. Note that you will also need to import logging and initialize a logger in this file.
run_pipeline.py
Outdated
| video = open_video(video_path) | ||
| scene_list = [(video.base_timecode, video.base_timecode + video.duration)] |
There was a problem hiding this comment.
This call to open_video(video_path) is redundant. The video object was already opened on line 223 and remains valid and in scope here. Metadata properties like base_timecode and duration can be accessed directly from the existing object.
| video = open_video(video_path) | |
| scene_list = [(video.base_timecode, video.base_timecode + video.duration)] | |
| scene_list = [(video.base_timecode, video.base_timecode + video.duration)] |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request modernizes the codebase by upgrading to Python 3, introducing structured logging, and improving environment management via Conda. Key technical improvements include the removal of deprecated PyTorch features like Variable, the use of weights_only=True for secure model loading, and the transition to subprocess.run for more robust external command execution. The review feedback highlights two important reliability issues: the need for math.isclose when comparing floating-point durations to avoid precision errors, and the requirement to explicitly release video resources in run_pipeline.py to prevent potential file handle leaks.
| # Check audio and video input length | ||
| # ========== ========== | ||
|
|
||
| if (float(len(audio))/16000) != (float(len(images))/25) : |
There was a problem hiding this comment.
Comparing floating-point values for exact equality is generally unreliable due to precision issues. It is safer to use math.isclose with an appropriate tolerance, especially when comparing durations derived from different sources (audio samples vs. video frames).
| if (float(len(audio))/16000) != (float(len(images))/25) : | |
| if not math.isclose(len(audio) / 16000.0, len(images) / 25.0, abs_tol=1e-4): |
| video = open_video(video_path) | ||
|
|
||
| scene_manager.detect_scenes(frame_source=video_manager) | ||
| scene_manager = SceneManager() | ||
| scene_manager.add_detector(ContentDetector()) | ||
| scene_manager.detect_scenes(video) | ||
|
|
||
| scene_list = scene_manager.get_scene_list(base_timecode) | ||
| scene_list = scene_manager.get_scene_list() | ||
|
|
||
| savepath = os.path.join(opt.work_dir,opt.reference,'scene.pckl') | ||
|
|
||
| if scene_list == []: | ||
| scene_list = [(video_manager.get_base_timecode(),video_manager.get_current_timecode())] | ||
| if not scene_list: | ||
| scene_list = [(video.base_timecode, video.base_timecode + video.duration)] | ||
|
|
||
| with open(savepath, 'wb') as fil: | ||
| pickle.dump(scene_list, fil) | ||
|
|
||
| print('%s - scenes detected %d'%(os.path.join(opt.avi_dir,opt.reference,'video.avi'),len(scene_list))) | ||
| logger.info('%s - scenes detected %d', video_path, len(scene_list)) | ||
|
|
||
| return scene_list |
There was a problem hiding this comment.
The VideoSource object returned by open_video should be explicitly released to free system resources (like file handles or hardware decoders). Using a try...finally block ensures the video is released even if an error occurs during scene detection.
| video = open_video(video_path) | |
| scene_manager.detect_scenes(frame_source=video_manager) | |
| scene_manager = SceneManager() | |
| scene_manager.add_detector(ContentDetector()) | |
| scene_manager.detect_scenes(video) | |
| scene_list = scene_manager.get_scene_list(base_timecode) | |
| scene_list = scene_manager.get_scene_list() | |
| savepath = os.path.join(opt.work_dir,opt.reference,'scene.pckl') | |
| if scene_list == []: | |
| scene_list = [(video_manager.get_base_timecode(),video_manager.get_current_timecode())] | |
| if not scene_list: | |
| scene_list = [(video.base_timecode, video.base_timecode + video.duration)] | |
| with open(savepath, 'wb') as fil: | |
| pickle.dump(scene_list, fil) | |
| print('%s - scenes detected %d'%(os.path.join(opt.avi_dir,opt.reference,'video.avi'),len(scene_list))) | |
| logger.info('%s - scenes detected %d', video_path, len(scene_list)) | |
| return scene_list | |
| video_path = os.path.join(opt.avi_dir, opt.reference, 'video.avi') | |
| video = open_video(video_path) | |
| try: | |
| scene_manager = SceneManager() | |
| scene_manager.add_detector(ContentDetector()) | |
| scene_manager.detect_scenes(video) | |
| scene_list = scene_manager.get_scene_list() | |
| if not scene_list: | |
| scene_list = [(video.base_timecode, video.base_timecode + video.duration)] | |
| finally: | |
| video.release() |
Summary
SyncNetInstanceandS3FDnow auto-detect CUDA availability and fall back to CPU. All.cuda()calls replaced with.to(device).print()calls with Pythonloggingmodule across every script.subprocess.call(str, shell=True)withsubprocess.run(list)to prevent command injection. Replacedpdb.set_trace()error handling with proper exceptions.torch.autograd.Variable, addedweights_only=Truetotorch.load(), fixednp.int→np.intp, modernized NMS tensor operations in S3FD to remove deprecatedout=resize pattern.scenedetect==0.5.1to0.6.7.1, rewrotescene_detect()to use the modernopen_video/SceneManagerAPI.super()without args, removed trailing semicolons,#!/usr/bin/env python3shebangs,mkdir -pin download script, removed unused imports,os.makedirs(exist_ok=True).Test plan
sh download_model.sh— verify model and weights download without errorspython demo_syncnet.py --videofile data/example.avi --tmp_dir data/work/pytmp— verify AV offset ~3, confidence ~10run_pipeline.py→run_syncnet.py→run_visualise.py) on a test videoCUDA_VISIBLE_DEVICES="") to verify fallback works