classifier is very slow

### Bug Description

`Classifier` class has `process_video` that iterates video frame by frame (with a sample rate) and extract image, feature vectors, run classification. We found that the method is very slow, event on GPU. I did some manual profiling, and I believe we should work on improving video seeking code. 

### Reproduction steps

With 
``` bash
$ clams source video:/some.mp4 | curl -d@- "localhost:23231?stopAt=120000"  # use first 12 secs only
```
logging says; 
``` bash 
2024-02-08 12:22:25 http://apps.clams.ai/swt-detection/unresolvable INFO     139987732137536 Processing video d1                                                               
2024-02-08 12:22:38 http://apps.clams.ai/swt-detection/unresolvable INFO     139987732137536 Processing took 12.783082723617554 seconds
``` 
Hmm, that's real-time. This was measured on a GPU machine, and GPU utilization was confirmed. 

So I wrote a quick and simple profiling 
``` python 
    def process_video(self, mp4_file: str) -> list:
        """Loops over the frames in a video and for each frame extracts the features
        and applies the classifier. Returns a list of predictions, where each prediction
        is an instance of numpy.ndarray."""
        featurizing_time = 0
        classifier_time = 0
        extract_time = 0
        seek_time = 0
        if self.debug:
            print(f'Processing {mp4_file}...')
            print(f'Labels: {self.prebin_labels}')
        logging.info(f'processing {mp4_file}...')
        predictions = []
        vidcap = cv2.VideoCapture(mp4_file)
        if not vidcap.isOpened():
            raise IOError(f'Could not open {mp4_file}')
        fps = round(vidcap.get(cv2.CAP_PROP_FPS), 2)
        fc = vidcap.get(cv2.CAP_PROP_FRAME_COUNT)
        dur = round(fc / fps, 3) * 1000
        for ms in range(0, sys.maxsize, self.sample_rate):
            if ms < self.start_at:
                continue
            if ms > self.stop_at:
                break
            t = time.time()
            vidcap.set(cv2.CAP_PROP_POS_MSEC, ms)
            seek_time += time.time() - t
            t = time.time()
            success, image = vidcap.read()
            extract_time += time.time() - t
            if not success:
                break
            img = Image.fromarray(image[:,:,::-1])
            t = time.time()
            features = self.featurizer.get_full_feature_vectors(img, ms, dur)
            featurizing_time += time.time() - t
            t = time.time()
            prediction = self.classifier(features).detach()
            prediction = Prediction(ms, self.prebin_labels, prediction)
            classifier_time += time.time() - t
            if self.debug:
                print(prediction)
            predictions.append(prediction)
        sys.stderr.write(f'Featurizing time: {featurizing_time:.2f} seconds\n')
        sys.stderr.write(f'Classifier time: {classifier_time:.2f} seconds\n')
        sys.stderr.write(f'Extract time: {extract_time:.2f} seconds\n')
        sys.stderr.write(f'Seeking time: {seek_time:.2f} seconds\n')
        return predictions
```

And with the same `curl` call, I got 
``` bash 
2024-02-10 02:38:13 http://apps.clams.ai/swt-detection/unresolvable INFO     139892458784320 Processing video d1
Featurizing time: 6.11 seconds
Classifier time: 0.05 seconds
Extract time: 0.07 seconds
Seeking time: 5.98 seconds
2024-02-10 02:38:26 http://apps.clams.ai/swt-detection/unresolvable INFO     139892458784320 Processing took 12.812922716140747 seconds
```
So it turns out "moving cursor" in the video is taking as much time as the complex CNN feature extraction. 

The cv2 instance I was using [was compiled with ffmpeg](https://github.com/clamsproject/clams-python/blob/develop/container/opencv4.containerfile#L37), and to my knowledge, seeking to a keyframe is a constant-time operation (correct me if I'm wrong, please). 

I can only guess that using `POS_MSEC` instead of frame number adds a big overhead. Probably further experiment is needed. 

### Expected behavior

_No response_

### Screenshots

_No response_

### Additional context

https://forum.opencv.org/t/cap-prop-pos-frames-is-abnormal-slow/11651/3 (this is possibly related to #67 too) 

video file info 
``` bash 
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/llc_data/clams/wgbh/Peabody/cpb-aacip-526-4b2x34nn45.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42isom
    creation_time   : 2020-01-22T10:27:50.000000Z
  Duration: 00:33:09.37, start: 0.000000, bitrate: 795 kb/s
  Stream #0:0[0x1](und): Video: h264 (Main) (avc1 / 0x31637661), none(progressive), 480x360, 657 kb/s, SAR 1:1 DAR 4:3, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      creation_time   : 2020-01-22T10:27:50.000000Z
      vendor_id       : TELE
      encoder         : AVC
  Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      creation_time   : 2020-01-22T10:27:50.000000Z
      vendor_id       : [0][0][0][0]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

classifier is very slow #69

Bug Description

Reproduction steps

Expected behavior

Screenshots

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

classifier is very slow #69

Description

Bug Description

Reproduction steps

Expected behavior

Screenshots

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions