Skip to content

steeedd/EgocentricVision

Repository files navigation

EgocentricVision

Egocentric videos are long and unstructured, making information retrieval challenging. This project extends temporal localization by generating textual answers from relevant video segments, enabling efficient query processing and improving interpretability in video understanding.

Features

Two Model Architectures

  • VSLBase (simplified) and VSLNet (with Query-Guided Highlighting)
  • Supports both Omnivore and EgoVLP features

End-to-End Pipeline

  1. Localizes relevant video segments
  2. Generates textual answers
  3. Evaluates with ROUGE/METEOR metrics

Optimized for Egocentric Videos

  • Handles long, unstructured first-person recordings
  • Focuses computational resources on key moments

About

Egocentric videos are long and unstructured, making information retrieval challenging. This project extends temporal localization by generating textual answers from relevant video segments, enabling efficient query processing and improving interpretability in video understanding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors