NOTE: This repo is deprecated. Please refer to https://github.com/laac-LSCP/analysis-service instead
A Python-based daemon that periodically checks and schedules tasks from Echolalia
Create a virtual environment somewhere, or use your Conda environment, and just install the project
pip install git@github.com:LAAC-LSCP/analysis-daemon.git
echolalia --config [path to configuration toml file] run-daemonNote you need to have an ssh key associated with LAAC.
For updating the DB
echolalia --config [path to configuration toml file] run-migrationsTo manage tasks directly from the CLI, say for bug-fixing purposes, see
echolalia --config [path to configuration toml file] task-manager --helpAn example configuration is given below for configuration.toml.
You'll need to create a file like this on your own system. Note that file paths should be absolute, not relative paths.
log_directory = "/Users/me/Desktop/echolalia_log"
conda_executable = "/Users/me/miniconda3/bin/activate"
output_folder = "/Users/me/echolalia"
script_wrapper = "/Users/me/Desktop/script_wrapper.sh
[database]
url = "sqlite:///database.db"
[http]
base_url = "ECHOLALIA_REMOTE_SERVER_URL"
client_id = "MY_ID"
client_secret = "SECRET"
[jobs]
handler = "slurm"
partition = "echolalia"
use_slurm = true # true by default
[[scripts]]
name = "run vtc"
python_script_path = "/Users/me/Desktop/scripts/run_vtc.py"
bash_script_path = "/Users/me/Desktop/scripts/apply_vtc.sh"
env_name = "pyannote"
model_name = "vtc"
[[scripts]]
name = "run alice"
python_script_path = "/Users/me/Desktop/scripts/run_alice.py"
bash_script_path = "/Users/me/Desktop/scripts/apply_alice.sh"
env_name = "alice"
model_name = "alice"
It is recommended you use the scripts from the scripts folder in the repo.
While working on this system, we realised we couldn't run a per-file slurm job, for example running vtc on a per-file basis by launching a new job for each file. This was due to memory requirements. While the smaller models could use this pattern, W2V2 presented a problem because it was 1) very large and 2) designed to be run over countless tiny files—as a result, the cost of bootstrapping the model each time would have been too high.
We have opted for a compromise that has a few anti-patterns and requires careful reading, if you want to add new scripts that is. The daemon, instead of asking SLURM for status updates, will continuously check a log file created by the running script. Running scripts must, therefore, take in a log directory. Scripts are assumed, by the daemon, to adhere to a strict interface that looks something like:
python3 vtc.py --task-id [task id] --bash-script [the .sh script used by the model] --input-folder [input_dir] --dataset [dataset name] --echolalia-folder [folder as in config] -i [file 1] -i [file 2] ...Scripts create status logs status.log that follow a specific format based on file failure or success:
SUCCESS - [some descriptive string] - [absolute file path]
ERROR - [some descriptive string] - [absolute file path] - [stack trace]This output format must be adhered to for the service to work. It periodically checks these outputs over the running tasks.
Finally, for running any of the models, you must install the associated Conda environments. More info on getting the models to work at:
https://github.com/MarvinLvn/voice-type-classifier/ for VTC https://github.com/orasanen/ALICE for ALICE
Each model has its own corresponding Conda environment.
Note that since the Python wrapper scripts rely on some libraries as well (typically only click is missing) some dependencies may be missing. You just need to pip install them into your Conda environments, or change the conda env files to include them.
Another unfortunate pattern is the need for several nested scripts. The original bash scripts for the models are often clunky to work with. While we have created wrappers in bash itself, these are hard to test, and so we created Python wrappers for the bash script.
But because the environment changes according to the model, we must also wrap the Python script in a bash scripts which prepares the environment, which we call the "script wrapper" (see config).
Scripts are output in the echolalia folder. If the output_folder (see config) was set to /echolalia, then scripts push their outputs in /echolalia/dataset_name/task_id/.
Above you find the script API. It includes an input-folder as well, and this is to allow us to faithfully reproduce the input folder's structure.
For VTC:
input_folder = "/input_folder"
echolalia_folder = "/echolalia"
dataset = "loann_2025"
task_id = "601cb879-8f86-4153-8e1a-9a3a3f5c812e"
input_1 = "/input_folder/recording_1.wav"
input_2 = "/input_folder/recording_2.wav"
input_3 = "/input_folder/folder_1/folder_2/recording_3.wav"
# This means:
outputs = [
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/recording_1.rttm",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/recording_2.rttm",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/folder_1/folder_2/recording_3.rttm",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/status.log"
]These outputs are meant to be thrown into the /annotations/vtc/raw folder in the ChildProject dataset, and then an importation is meant to be run (see ChildProject docs).
For ALICE:
input_folder = "/input_folder"
echolalia_folder = "/echolalia"
dataset = "loann_2025"
task_id = "601cb879-8f86-4153-8e1a-9a3a3f5c812e"
input_1 = "/input_folder/recording_1.wav"
input_2 = "/input_folder/recording_2.wav"
input_3 = "/input_folder/folder_1/folder_2/recording_3.wav"
# This means:
outputs = [
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/recording_1.txt",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/recording_1_sum.txt",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/recording_2.txt",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/recording_2_sum.txt",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/folder_1/folder_2/recording_3.txt",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/folder_1/folder_2/recording_3_sum.txt",
"/echolalia/outputs/loann_2025/601cb879-8f86-4153-8e1a-9a3a3f5c812e/status.log"
]These outputs are meant to be thrown into the /annotations/alice/output/raw folder in the ChildProject dataset, and then an importation is meant to be run (see ChildProject docs).
The "sum" files must likewise be thrown into /annotations/alice/output/extra.