-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Bug Description
Classifier class has process_video that iterates video frame by frame (with a sample rate) and extract image, feature vectors, run classification. We found that the method is very slow, event on GPU. I did some manual profiling, and I believe we should work on improving video seeking code.
Reproduction steps
With
$ clams source video:/some.mp4 | curl -d@- "localhost:23231?stopAt=120000" # use first 12 secs onlylogging says;
2024-02-08 12:22:25 http://apps.clams.ai/swt-detection/unresolvable INFO 139987732137536 Processing video d1
2024-02-08 12:22:38 http://apps.clams.ai/swt-detection/unresolvable INFO 139987732137536 Processing took 12.783082723617554 secondsHmm, that's real-time. This was measured on a GPU machine, and GPU utilization was confirmed.
So I wrote a quick and simple profiling
def process_video(self, mp4_file: str) -> list:
"""Loops over the frames in a video and for each frame extracts the features
and applies the classifier. Returns a list of predictions, where each prediction
is an instance of numpy.ndarray."""
featurizing_time = 0
classifier_time = 0
extract_time = 0
seek_time = 0
if self.debug:
print(f'Processing {mp4_file}...')
print(f'Labels: {self.prebin_labels}')
logging.info(f'processing {mp4_file}...')
predictions = []
vidcap = cv2.VideoCapture(mp4_file)
if not vidcap.isOpened():
raise IOError(f'Could not open {mp4_file}')
fps = round(vidcap.get(cv2.CAP_PROP_FPS), 2)
fc = vidcap.get(cv2.CAP_PROP_FRAME_COUNT)
dur = round(fc / fps, 3) * 1000
for ms in range(0, sys.maxsize, self.sample_rate):
if ms < self.start_at:
continue
if ms > self.stop_at:
break
t = time.time()
vidcap.set(cv2.CAP_PROP_POS_MSEC, ms)
seek_time += time.time() - t
t = time.time()
success, image = vidcap.read()
extract_time += time.time() - t
if not success:
break
img = Image.fromarray(image[:,:,::-1])
t = time.time()
features = self.featurizer.get_full_feature_vectors(img, ms, dur)
featurizing_time += time.time() - t
t = time.time()
prediction = self.classifier(features).detach()
prediction = Prediction(ms, self.prebin_labels, prediction)
classifier_time += time.time() - t
if self.debug:
print(prediction)
predictions.append(prediction)
sys.stderr.write(f'Featurizing time: {featurizing_time:.2f} seconds\n')
sys.stderr.write(f'Classifier time: {classifier_time:.2f} seconds\n')
sys.stderr.write(f'Extract time: {extract_time:.2f} seconds\n')
sys.stderr.write(f'Seeking time: {seek_time:.2f} seconds\n')
return predictionsAnd with the same curl call, I got
2024-02-10 02:38:13 http://apps.clams.ai/swt-detection/unresolvable INFO 139892458784320 Processing video d1
Featurizing time: 6.11 seconds
Classifier time: 0.05 seconds
Extract time: 0.07 seconds
Seeking time: 5.98 seconds
2024-02-10 02:38:26 http://apps.clams.ai/swt-detection/unresolvable INFO 139892458784320 Processing took 12.812922716140747 secondsSo it turns out "moving cursor" in the video is taking as much time as the complex CNN feature extraction.
The cv2 instance I was using was compiled with ffmpeg, and to my knowledge, seeking to a keyframe is a constant-time operation (correct me if I'm wrong, please).
I can only guess that using POS_MSEC instead of frame number adds a big overhead. Probably further experiment is needed.
Expected behavior
No response
Screenshots
No response
Additional context
https://forum.opencv.org/t/cap-prop-pos-frames-is-abnormal-slow/11651/3 (this is possibly related to #67 too)
video file info
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/llc_data/clams/wgbh/Peabody/cpb-aacip-526-4b2x34nn45.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42isom
creation_time : 2020-01-22T10:27:50.000000Z
Duration: 00:33:09.37, start: 0.000000, bitrate: 795 kb/s
Stream #0:0[0x1](und): Video: h264 (Main) (avc1 / 0x31637661), none(progressive), 480x360, 657 kb/s, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 30k tbn (default)
Metadata:
creation_time : 2020-01-22T10:27:50.000000Z
vendor_id : TELE
encoder : AVC
Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
creation_time : 2020-01-22T10:27:50.000000Z
vendor_id : [0][0][0][0]Metadata
Metadata
Assignees
Labels
Type
Projects
Status