This is an amazing work. Is there any training scripts that are particular for audio-image generation tasks?