Emosical: An Emotion-Annotated Musical Theatre Dataset

(Pending)

Accepted to EMNLP Findings, 2024

Info

This repository contains files and automatic scripts to produce the Emosical dataset.

Supported movies in current version (download links) : Aladdin (1992), Aladdin (2019), Beauty and the Beast (2017), Cats (1998), Chicago (2002), Frozen (2014), Frozen2 (2019), Frozen Fever (2015), Jesus Christ Superstar (2012), Kinky Boots (2019), La la Land (2016), Moana (2016), Mulan (1998), Peter Pan (1953), Tangled (2011), The Little Mermaid (1989), The Nightmare Before Christmas (1993), The Phantom of the Opera (2011), Tick Tick Boom (2021), Trevor (2021)

How to

Clone our repository recursively for Demucs and SGMSE audio processing.
```
 git clone --recursive https://github.com/gillosae/emosical.git
```
Download movie files from provided link above.
Place your movie file under data/raw/theatre/. The name of placed movie files should match the name of srt files in data/raw/srt/.
Then run the following code to produce data automatically.
```
 python run.py
```

Dataset Structure

Before:

├── data/
│   └── raw/
│       ├── theatre/
│       │   ├── aladdin.mov
│       │   └── ...
│       └── srt/
│           ├── aladdin.srt
│           └── ...
└── metadata/
    ├── number_info.csv
    └── global_persona/
    │       ├── aladdin.yaml
    │       └── ...
    └── scene_summarization/
            ├── aladdin.yaml
            └── ...

After:

├── data/
│   ├── raw/
│   │   ├── theatre/
│   │   │   ├── aladdin.mov
│   │   │   └── ...
│   │   └── srt/
│   │       ├── aladdin.srt
│   │       └── ...
│   ├── audio/ 
│   │   ├── aladdin/
│   │   │   ├── 1.wav
│   │   │   └── ...
│   │   └── ...
│   ├── video/
│   └── text/
└── metadata/
    ├── number_info.csv
    └── global_persona/
    │       ├── aladdin.yaml   
    │       └── ...
    └── scene_summarization/
            ├── aladdin.yaml   
            └── ...

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data/raw		data/raw
demucs @ e976d93		demucs @ e976d93
metadata		metadata
sgmse @ ebdcef8		sgmse @ ebdcef8
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
movies.txt		movies.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emosical: An Emotion-Annotated Musical Theatre Dataset

Info

How to

Dataset Structure

About

Uh oh!

Releases

Packages

Languages

gillosae/emosical

Folders and files

Latest commit

History

Repository files navigation

Emosical: An Emotion-Annotated Musical Theatre Dataset

Info

How to

Dataset Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages