prefect-audiomoth

Getting started

Requires an installation of Python 3.9+.

Environment variables

To successfully authenticate with Zenodo, configure the following environment variables in the set_env.sh file:

export INVENIO_RDM_ACCESS_TOKEN=<access_token>
export INVENIO_RDM_BASE_URL=<base url>

Local setup

Create a Python virtual environment and install the project dependencies:

python3 -m venv prefect-env
source prefect-env/bin/activate
pip install -r requirements.txt

Set the environment variables:

source set_env.sh

To configure concurrency for the API calls, create a global concurrency limit named rate-limit:invenio-rdm-api:

prefect gcl create rate-limit:invenio-rdm-api --limit 5 --slot-decay-per-second 1.0

For debugging, set the Prefect logging level to DEBUG:

prefect config set PREFECT_LOGGING_LEVEL="DEBUG"

Start the Prefect server:

prefect server start

Open the Prefect dashboard in your browser at http://localhost:4200.

* Note that once the Prefect server has been started, keep the terminal open while the server is running. When needed, the server can be stopped with Ctrl+C.

Overview

The project consists of two main scripts:

uploads.py: Handles the uploading of datasets to Zenodo.
records.py: Downloads metadata for already uploaded Zenodo records.

Both scripts are configured using a config.json file.

Configuration

The project uses a config.json file to control file paths and behavior. Here's an example structure of the configuration file:

{
  "uploads": {
    "total": {
      "dataset_dir": "/home/joel/Desktop/zenodo/test/total",
      "collectors_csv": "/home/joel/Desktop/zenodo/2024_total_info_updated.csv"
    },
    "annular": {
      "dataset_dir": "/home/joel/Desktop/zenodo/test/annular",
      "collectors_csv": "/home/joel/Desktop/zenodo/2023_annular_info.csv"
    },
    "successful_results_file": "/home/joel/Desktop/zenodo/results/successul_results.csv",
    "failure_results_file": "/home/joel/Desktop/zenodo/results/failed_results.csv",
    "delete_failures": false,
    "auto_publish": false
  },
  "downloads": {
    "results_dir": "/home/joel/Desktop/zenodo/results/records/"
  }
}

Configuration Breakdown

`uploads`

Controls the upload process handled by uploads.py.

total.dataset_dir: Path to the directory containing the total eclipse dataset.
total.collectors_csv: CSV file containing metadata for the total eclipse collectors.
annular.dataset_dir: Path to the directory containing the annular eclipse dataset.
annular.collectors_csv: CSV file containing metadata for the annular eclipse collectors.
successful_results_file: File path where successfully uploaded records will be logged.
failure_results_file: File path to log failed uploads.
delete_failures: If true, files that failed to upload will be deleted.
auto_publish: If true, records will be published automatically after upload.

`downloads`

Controls the download process handled by records.py.

results_dir: Directory where the downloaded metadata records will be saved.

Usage

Uploading Datasets

In a separate terminal and with prefect-env activated, create a deployment:

python uploads.py

This starts a long-running process that monitors for work from the Prefect server.

To run the deployment, navigate to the Prefect dashboard and on the left side panel go to Deployments, select upload-datasets-deployment from the list and then click Run and select Quick run from the dropdown.

Once the run has started, each dataset will be uploaded sequentially and can be tracked in the 'Runs' section on the left side panel.

Note that once a dataset has been uploaded it will be internally tracked so it's skipped in subsequent runs. Each dataset is tracked by it's file path.

Retrieving Published Records

Once the records have been published, the results can be retrieved and saved locally as a CSV formatted file which will be named in the following format: records_{timestamp}.

In a separate terminal and with prefect-env activated, create a deployment:

python records.py

This starts a long-running process that monitors for work from the Prefect server.

To run the deployment, navigate to the Prefect dashboard and on the left side panel go to Deployments, select get-published-records-deployment from the list and then click Run and select Quick run from the dropdown.

Creating README.md file for each dataset

A README.md file based on the dataset description can be added to each dataset ZIP file prior to the upload. If the README.md file already exists then it will be replaced, however, given the size of the dataset this can take a while since a new version of the dataset has to be created without the previous README.md file first.

In a separate terminal and with prefect-env activated, create a deployment:

python add_readme.py

This starts a long-running process that monitors for work from the Prefect server.

To run the deployment, navigate to the Prefect dashboard and on the left side panel go to Deployments, select create-dataset-readme-deployment from the list and then click Run and select Quick run from the dropdown.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
models		models
.gitignore		.gitignore
README.md		README.md
add_readme.py		add_readme.py
config.json		config.json
flows.py		flows.py
records.py		records.py
requests.py		requests.py
requirements.txt		requirements.txt
set_env.sh		set_env.sh
tasks.py		tasks.py
uploads.py		uploads.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prefect-audiomoth

Getting started

Environment variables

Local setup

Overview

Configuration

Configuration Breakdown

`uploads`

`downloads`

Usage

Uploading Datasets

Retrieving Published Records

Creating README.md file for each dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ARISA-Lab-LLC/prefect-audiomoth

Folders and files

Latest commit

History

Repository files navigation

prefect-audiomoth

Getting started

Environment variables

Local setup

Overview

Configuration

Configuration Breakdown

uploads

downloads

Usage

Uploading Datasets

Retrieving Published Records

Creating README.md file for each dataset

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`uploads`

`downloads`

Packages