(Pending)
Accepted to EMNLP Findings, 2024
This repository contains files and automatic scripts to produce the Emosical dataset.
Supported movies in current version (download links) : Aladdin (1992), Aladdin (2019), Beauty and the Beast (2017), Cats (1998), Chicago (2002), Frozen (2014), Frozen2 (2019), Frozen Fever (2015), Jesus Christ Superstar (2012), Kinky Boots (2019), La la Land (2016), Moana (2016), Mulan (1998), Peter Pan (1953), Tangled (2011), The Little Mermaid (1989), The Nightmare Before Christmas (1993), The Phantom of the Opera (2011), Tick Tick Boom (2021), Trevor (2021)
-
Clone our repository recursively for Demucs and SGMSE audio processing.
git clone --recursive https://github.com/gillosae/emosical.git -
Download movie files from provided link above.
-
Place your movie file under
data/raw/theatre/. The name of placed movie files should match the name of srt files indata/raw/srt/. -
Then run the following code to produce data automatically.
python run.py
Before:
├── data/
│ └── raw/
│ ├── theatre/
│ │ ├── aladdin.mov
│ │ └── ...
│ └── srt/
│ ├── aladdin.srt
│ └── ...
└── metadata/
├── number_info.csv
└── global_persona/
│ ├── aladdin.yaml
│ └── ...
└── scene_summarization/
├── aladdin.yaml
└── ...
After:
├── data/
│ ├── raw/
│ │ ├── theatre/
│ │ │ ├── aladdin.mov
│ │ │ └── ...
│ │ └── srt/
│ │ ├── aladdin.srt
│ │ └── ...
│ ├── audio/
│ │ ├── aladdin/
│ │ │ ├── 1.wav
│ │ │ └── ...
│ │ └── ...
│ ├── video/
│ └── text/
└── metadata/
├── number_info.csv
└── global_persona/
│ ├── aladdin.yaml
│ └── ...
└── scene_summarization/
├── aladdin.yaml
└── ...