A Python package for easily downloading datasets from the DaRUS (DataRepository of the University of Stuttgart) platform. Currently the web interface of darus limits the size of downloads by 2 GB, which makes it hard to download big datasets like the FEM Dataset example below. This package enables interaction with the dataset by downloading the whole dataset (or specific files), handles authentication and directory management.
- DaRUS Dataset Interaction
This repository can be installed using pip or uv (recommended).
pip install git+https://github.com/BaumSebastian/DaRUS-Dataset-Interaction.gitAfter installation, you can use the command line interface:
Download all files from a dataset to ./data directory:
darus-download --url "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801"Choose where to save the downloaded files:
darus-download --url "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801" --path "./downloads"Note: Every file has a value directory in its metadata (see Add a File to Dataset). The programm will create and store the downloaded file in the specific directory according to path/directory.
Download only selected files instead of the entire dataset:
darus-download --url "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801" --files metadata.tabNote: DaRUS converts tabular data like .csv files into .tab format when uploaded. This package downloads the original file format (like .csv) when available. As metadata.tab is the displayed file by darus, this file still needs to be added as --files and not metadata.csv.
Access restricted datasets using your DaRUS API token:
darus-download --url "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801" --token "your-api-token"Store settings in a YAML file for repeated use:
darus-download --config config.yamlAvailable Arguments:
--url, -u: Dataset URL--path, -p: Download directory path [optional] (default:./data)--token, -t: API token for authentication [optional]--files, -f: Specific files to download [optional] (space-separated)--config, -c: Config file path [optional]--help: Show help message
Download an entire dataset:
from darus import Dataset
url = "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801"
path = "./data"
# Download the complete dataset
ds = Dataset(url)
ds.download(path)Note: Every file has a value directory in its metadata (see Add a File to Dataset). Dataset creates and stores the downloaded file in the specific directory according to path/directory.
Download only selected files (["metadata.tab"]) from a dataset:
from darus import Dataset
url = "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801"
path = "./data"
files = ["metadata.tab"]
ds = Dataset(url)
ds.download(path, files=files)Note: DaRUS converts tabular data like .csv files into .tab format when uploaded. This package downloads the original file format (like .csv) when available. As metadata.tab is the displayed file by darus, this file still needs to be added as --files and not metadata.csv.
For datasets that require authentication use the api_token of your DaRUS account.
from darus import Dataset
url = "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801"
path = "./data"
api_token = 'xxxx-xxxx-xxxx-xxxx'
ds = Dataset(url, api_token=api_token)
ds.download(path)The method download of Dataset accepts two optional arguments.
post_process: Zip archieves are automatically extracted, after download completed. Default:True.remove_after_pp: The Zip archieves are deleted after extration. Default:True.
Executing following script, results in the output below.
from darus import Dataset
url = "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801"
path = "./data"
ds = Dataset(url)
ds.summary()
ds.download(path)Note: The Dataset Summary and Files in Dataset is only printed, when ds.summary() is called.
The output looks like following:
Dataset Summary
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Property ┃ Value ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ URL │ https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801 │
│ Persistent ID │ doi:10.18419/DARUS-4801 │
│ Last Update │ 2025-03-12 12:32:17 │
│ License │ CC BY 4.0 │
└───────────────┴───────────────────────────────────────────────────────────────────────────────────┘
Files in Dataset
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ Size ┃ Original Available ┃ Description ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 113525_116825.zip │ 59.2 GB │ │ Contains all simulations with ID between 113525 and 116825. │
│ 116826_211007.zip │ 59.2 GB │ │ Contains all simulations with ID between 116826 and 211007. │
│ 16039_19338.zip │ 59.2 GB │ │ Contains all simulations with ID between 16039 and 19338. │
│ 19339_113524.zip │ 59.1 GB │ │ Contains all simulations with ID between 19339 and 113524. │
│ 257076_260375.zip │ 59.7 GB │ │ Contains all simulations with ID between 257076 and 260375. │
│ 260376_306443.zip │ 59.8 GB │ │ Contains all simulations with ID between 260376 and 306443. │
│ 306444_309743.zip │ 59.7 GB │ │ Contains all simulations with ID between 306444 and 309743. │
│ 309744_403925.zip │ 59.6 GB │ │ Contains all simulations with ID between 309744 and 403925. │
│ 403926_406296.zip │ 42.6 GB │ │ Contains all simulations with ID between 403926 and 406296. │
│ metadata.tab │ 2.5 MB │ ✓(metadata.csv) │ Metadata of the simulations. │
└───────────────────┴─────────┴────────────────────┴─────────────────────────────────────────────────────────────┘
Downloading...
┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ Size ┃ Directory ┃ Download Original ┃ Description ┃
┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 113525_116825.zip │ 59.2 GB │ .\data\h5\ │ │ Contains all simulations with ID between 113525 and 116825. │
│ 116826_211007.zip │ 59.2 GB │ .\data\h5\ │ │ Contains all simulations with ID between 116826 and 211007. │
│ 16039_19338.zip │ 59.2 GB │ .\data\h5\ │ │ Contains all simulations with ID between 16039 and 19338. │
│ 19339_113524.zip │ 59.1 GB │ .\data\h5\ │ │ Contains all simulations with ID between 19339 and 113524. │
│ 257076_260375.zip │ 59.7 GB │ .\data\h5\ │ │ Contains all simulations with ID between 257076 and 260375. │
│ 260376_306443.zip │ 59.8 GB │ .\data\h5\ │ │ Contains all simulations with ID between 260376 and 306443. │
│ 306444_309743.zip │ 59.7 GB │ .\data\h5\ │ │ Contains all simulations with ID between 306444 and 309743. │
│ 309744_403925.zip │ 59.6 GB │ .\data\h5\ │ │ Contains all simulations with ID between 309744 and 403925. │
│ 403926_406296.zip │ 42.6 GB │ .\data\h5\ │ │ Contains all simulations with ID between 403926 and 406296. │
│ metadata.tab │ 2.5 MB │ .\data\ │ ✓(metadata.csv) │ Metadata of the simulations. │
└───────────────────┴─────────┴────────────┴───────────────────┴─────────────────────────────────────────────────────────────┘
Downloading 113525_116825.zip ━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.9% • 5.2/59.2 GB • 0:08:07 • 1:22:43 • 10.9 MB/s
....darus/
├── darus/ # Main package
│ ├── __init__.py # Package initialization
│ ├── cli.py # Command line interface
│ ├── Dataset.py # Main Dataset class
│ ├── DatasetFile.py # File download and processing
│ └── utils.py # Utility functions and logging
├── tests/ # Test suite
│ ├── fixtures/ # Test data and fixtures
│ ├── test_dataset.py # Dataset class tests
│ └── test_dataset_file.py # DatasetFile tests
├── config.yaml # Example configuration
└── setup.py # Package configuration
Clone the repository and install in development mode:
git clone https://github.com/BaumSebastian/DaRUS-Dataset-Interaction.git
cd DaRUS-Dataset-Interaction
pip install -e .[dev] # Includes testing toolsTest the CLI directly from source:
python -m darus.cli --url "https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/DARUS-4801"Running the tests locally.
# Run the full test suite:
pytest -v
# Run tests with coverage:
pytest --cov=darus --cov-report=html
# Run specific test file:
pytest tests/test_dataset.py -vHow to ensure a specific code quality.
# Format code with Black:
black darus/ tests/
# Type checking (if mypy is installed)
mypy darus/
# Linting
flake8 darus/ tests/For VSCode users, you can create .vscode/launch.json with debug configurations:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug CLI - Demo Dataset",
"type": "debugpy",
"request": "launch",
"module": "darus.cli",
"args": [
"--url", "https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/NIVKU0",
"--path", "./debug_downloads"
],
"console": "integratedTerminal",
"cwd": "${workspaceFolder}"
},
{
"name": "Debug Tests - All",
"type": "debugpy",
"request": "launch",
"module": "pytest",
"args": ["-v", "tests/"],
"console": "integratedTerminal",
"cwd": "${workspaceFolder}"
}
]
}Set breakpoints and press F5 to start debugging.
- Fork the repository and create a feature branch
- Write tests for new functionality
- Ensure tests pass:
pytest -v - Format code:
black darus/ tests/ - Update documentation if needed
- Submit a pull request with a clear description
- Dataverse API Guide - Detailed explanation of the underlying API
- Alternative Implementation - Another repository for downloading DaRUS data