Human Motion Transfer using Zero-Shot Image Diffusion

Overview

This project explores zero-shot human motion transfer using a combination of:

Image Diffusion Models
ControlNet
IP-Adapter
The method described in the paper RAVE (Zero-Shot Video Editing with Pretrained Diffusion Models).

The goal is to achieve realistic and temporally consistent human motion transfer without the need for additional fine-tuning or training. By building upon recent advancements in diffusion-based video editing, the project preserves both the motion and semantic structure of input content in a highly efficient and scalable manner.

Key Components

ControlNet: Guides the generation process by conditioning on structural information (e.g., poses, edges, depth maps).
IP-Adapter: Provides flexible, plug-and-play control over appearance features in diffusion models.
RAVE (Noise Shuffling Strategy):
- A zero-shot video editing approach using pre-trained text-to-image diffusion models.
- Preserves spatio-temporal consistency by shuffling noise across frames, leading to coherent outputs.
- Faster and more memory-efficient than traditional frame-by-frame editing methods.

How it Works

Input: A reference image (source appearance) and a target video (motion).
Feature Extraction:
- Pose or motion control maps are extracted from the video frames.
Conditioning:
- ControlNet is used to condition the diffusion process on the extracted poses.
- IP-Adapter injects the appearance information from the reference image.
Diffusion-based Generation:
- A pre-trained text-to-image diffusion model is leveraged.
- RAVE's noise shuffling strategy ensures temporal coherence across video frames without requiring retraining.
Output: A new video where the human subject adopts the motion from the target video while preserving the appearance of the reference image.

Advantages

Zero-Shot: No retraining or fine-tuning needed on custom datasets.
High Quality: Realistic transfer of appearance and motion with minimal artifacts.
Temporal Consistency: Thanks to the noise shuffling technique from RAVE.
Memory Efficiency: Suitable for processing longer video sequences without high resource demands.

Setup Instructions

Clone the repository:

git clone https://github.com/your-username/human-motion-transfer-diffusion.git
cd human-motion-transfer-diffusion
pip install -r requirements.txt

Prepare inputs reference image (main content) and target video (motion input), then change the config file in the configs directory

Run the pipeline:

python run_experiment.py configs/IP-controlnet.yaml

References

Future Work

Explore background and lighting transfer alongside motion.
Fine-grained control over transfer strength (partial vs full motion adaptation).

Acknowledgments

Thanks to the developers and researchers behind ControlNet, IP-Adapter, and RAVE for making their models and methods publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
annotator		annotator
configs		configs
pipelines		pipelines
trial_pair		trial_pair
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py
run_experiments.sh		run_experiments.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human Motion Transfer using Zero-Shot Image Diffusion

Overview

Key Components

How it Works

Advantages

Setup Instructions

References

Future Work

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Slicky325/Zero-Shot-Human-Motion-Transfer

Folders and files

Latest commit

History

Repository files navigation

Human Motion Transfer using Zero-Shot Image Diffusion

Overview

Key Components

How it Works

Advantages

Setup Instructions

References

Future Work

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages