Make the video-clip models easier to use + maybe try to set up video-clip guided stable diffusion to showcase