This is the released code and models for CUMD.
-
2022-07-11: This project is going to be released, please waiting.
-
2022-07-12: The depth estimation from videos of RGB is uploaded.
-
2022-07-13: The source code of inference and training is uploaded.
-
2022-07-xx: To be continued.
The requirements of the hardware and software are given as below.
CPU: Intel(R) Core(TM) i7-6900K CPU @ 3.20GHz
GPU: GeForce GTX 1080 Ti
CUDA Version: 10.2
OS: Ubuntu 16.04.6 LTS
Configure the virtual environment on Ubuntu.
- Create a virtual with python 3.6
conda create -n asvp python=3.6
conda activate asvp
- Install requirements (Please pay attention that we use tensorflow-gpu==1.10.0)
pip install -r requirements.txt
- Additionally install ffmpeg
conda install x264 ffmpeg -c conda-forge
- Besides, the requirements for the pre-trained depth esitmation models are necessary
Please refer to the installing of Pre-trained Depth Estimation.
Here the virtual env is created on Ubuntu.
Datasets contain: KTH human action dataset & BAIR action-free robot pushing dataset. For reproducing the experiment, the processed dataset should be downloaded:
- For KTH, raw data and subsequence file should be downloaded firstly. In this turn of submission, please temporarily download from:
raw data and subsequence file. After downloading, drag all .zip and .tar.gz files into ./data directory, and run
bash data/preprocess_kth.sh
Then all preprocessed and subsequence splitted frames are obtained in ./data/kth/processed.
- Depth data obtained by estimating with the pre-trained parameters
cd LeRes
python ./tools/test_depth_kth.py --load_ckpt res101.pth --backbone resnext101
cd ..
The details of LeRes are available at here.
- Run the code below for converting images into tfrecords, the details can be referred to ASVP.
bash data/kth2tfrecords.sh
...
For downloading the released models, the released models should be placed as:
——./pretrained/pretrained_models/kth/ours_cumd
——./pretrained/pretrained_models/bair_action_free/ours_cumd
and the pre-trained models for baseline should be referred to ASVP.
- For running our released model, please run
CUDA_VISIBLE_DEVICES=1 python scripts/evaluate.py --input_dir data/kth --dataset_hparams sequence_length=30 --checkpoint logs/kth/ours_cumd/model-kth --mode test --results_dir results_test_samples/kth --batch_size 3
- For running the baseline, please refer to ASVP.
- For running our released model, please run
CUDA_VISIBLE_DEVICES=0 python scripts/evaluate.py --input_dir data/bair --dataset_hparams sequence_length=22 --checkpoint logs/bair_action_free/ours_cumd/model-bair --mode test --results_dir results_test_samples/bair_action_free --batch_size 8
- For running the baseline, please refer to ASVP.
Active pattern mining is necessary only for training and there is no need to do this if only with respective to inference with released model.
-
When trying to separate active patterns along with non-active ones from videos, please refer to details.
-
After all active patters and non-active patterns are mined, these images are convertted to tfrecords for training.
bash data/kth2tfrecords_ap.sh
The final data could be downloaded from drive.
- For training to do svp in RGB space with MMDN
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_vp2.py --input_dir data/kth --dataset kth --model cumd --model_hparams_dict hparams/kth/ours_cumd/model_hparams.json --output_dir logs/kth/save_rgb
Note that, in this turn of training, all training data should be prepared by RGB videos as in 2.3 which is the same situation as ASVP.
- For training our model with active patterns and non-active patterns on BAIR action-free, please run
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_vp2.py --input_dir data/bair --dataset bair --model cumd --model_hparams_dict hparams/bair_action_free/ours_cumd/model_hparams.json --output_dir logs/bair_action_free/save_rgb
- For training to do svp in depth space with MMDN
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_vp2.py --input_dir data/kth --dataset kth --model cumd --model_hparams_dict hparams/kth/ours_cumd/model_hparams.json --output_dir logs/kth/save_depth
Note that, in this turn of training, all training data should be prepared by depth videos as in 2.3.
- For training our model with active patterns and non-active patterns on BAIR action-free, please run
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_vp2.py --input_dir data/bair --dataset bair --model cumd --model_hparams_dict hparams/bair_action_free/ours_cumd/model_hparams.json --output_dir logs/bair_action_free/save_depth
- For training MCN, UCN during prediction.
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_vp2_fusion.py --input_dir data/kth --dataset kth --model cumd --model_hparams_dict hparams/kth/ours_cumd/model_hparams.json --output_dir logs/kth/ours_cumd --checkpoint logs/kth/save_rgb/model-rgb --checkpoint_d logs/kth/save_depth/model-depth
Please rename the pre-trained models in 4.1 and 4.2 as model-rgb and model-depth before training.
- For training on BAIR action-free, please run
CUDA_VISIBLE_DEVICES=0,1 python scripts/train_vp2_fusion.py --input_dir data/bair --dataset bair --model cumd --model_hparams_dict hparams/bair_action_free/ours_cumd/model_hparams.json --output_dir logs/bair_action_free/ours_cumd --checkpoint logs/bair_action_free/save_rgb/model-rgb --checkpoint_d logs/bair_action_free/save_depth/model-depth
It needs to pay attention that, the source code here is not well organized, and the perfect code for training will be released in future.
Add additional notes about how to deploy this on a live system
This project is licensed under the MIT License - see the LICENSE.md file for details
