Conversation
…orking on processing didemo segments
|
Thinking out loud: We want to support single:
Not sure if we should put effort into multi-segment multi-caption because I haven't seen any examples of this and it seems like it's just bad organization of your dataset so we don't really care about that. |
|
Looked over the PR and have some thoughts but I'll put the overarching idea here: I think we shouldn't have people write specific process_segment functions but rather organize the "segment" and "caption" columns of their csv or parquet to have information that works with our general process_segment function. For example if I know in video 1 I have segments from 0:25s, 25s:50s, 50s:75s with cap1, cap2, cap3 I should make my csv like so: segments: "(0,25),(25,50),(50,75)" Then when our general proces_segment function sees these columns it takes the embedding, chops them up according to segments and chops captions up as well |
src/evaluation/retrieval.py
Outdated
| if segment: | ||
| samp += 1 | ||
| if segment: | ||
| segments = batch["meta"]["times"] # change to ...['segment'] |
src/evaluation/retrieval.py
Outdated
| for c in cap.split(";"): # multiple captions separated by ; | ||
| toks.append(open_clip.tokenize(c)) | ||
| ground_truth.append(samp) | ||
| if segment: |
src/evaluation/retrieval.py
Outdated
|
|
||
|
|
||
| def retrieval_evaluation(model_video, model_text, data, multicaption=False): | ||
| def retrieval_evaluation(model_video, model_text, data, multicaption=False, segment=False, process_segments=None): |
There was a problem hiding this comment.
remove process_segments and segment args, instead just check for the "segments" key in the batch["meta"]
|
|
||
| return out | ||
|
|
||
| def process_didemo_segments(embeddings, segments, seq_len=200): |
There was a problem hiding this comment.
no function, no specific code for each dataset, just move these operations into the "if "segments" in batch["meta"]" part of the retrieval eval
|
|
||
| # Pyre type checker | ||
| .pyre/ | ||
| CLIP-DiDeMo/ |
| import open_clip | ||
| import torch | ||
|
|
||
| sys.path.insert(1, '/Users/daniel/Desktop/LAION_Videoclip/clip-video-encode') |
|
|
||
| with torch.no_grad(): | ||
| for i, batch in enumerate(dataloader): | ||
| if i==3: |
| toks.append(open_clip.tokenize(cap)) | ||
| ground_truth.append(samp) | ||
| samp += 1 | ||
|
|
| all_video_features.append(video_embeddings.cpu()) | ||
| all_text_features.append(text_embeddings.cpu()) | ||
|
|
||
No description provided.