This project holds codes for training models to accomplish certain tasks. Using egocentric video inputs, the motivation of these tasks is to understand the camera wearer's activity in terms of hand and object interaction. 1
Inspired by the "state change object detection" task, this task handles object detection in the level of semantic segmentation. The objective is to determine the object that is undergoing a state change at pixel level.
This task incorporates the "object state change classification" and the "PNR temporal localization" task, which handles keyframe detection given a video clip. The objective is to detect the keyframe where the object experiences the point of no return (PNR), in other words, the point at which the state of the object cannot be reversed.
To use CUDA version of pytorch libraries, use the following command. This replaces the poetry registered CPU-only version to CUDA-version (version 11.8).
$ make gpu_setting- Download Ego4D CLI.
$ pip install ego4d- Run command, below is from the basic usage.
$ ego4d --output_directory="~/ego4d_data" --datasets full_scale annotations --metadata- Clone repository.
$ git clone https://github.com/owenzlz/EgoHOS- Download dataset from script.
$ bash download_datasets.sh- Download dataset from
Development Kitsection.
Footnotes
-
K. Grauman, et al. Ego4d: Around the world in 3,000 hours of egocentric video. arXiv preprint arXiv:2110.07058, 2021. ↩