-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Milestone
Description
🧩 Feature Request: Add Section on Parallel Tools
Description
Add a new section to the documentation titled Parallel Tools, focusing on lightweight command-line utilities that allow users to parallelize workloads efficiently without needing MPI or complex workflow managers.
This section should explain how to use GNU Parallel and TaskSpooler (ts) — both available as LMOD modules on the Lane cluster — and introduce other similar tools that help automate batch processing and job scheduling.
🧭 Suggested Content
-
Introduction
- Overview of lightweight parallelization tools.
- When to use these tools instead of MPI, Nextflow, or workflow engines.
- Benefits for users running multiple independent or embarrassingly parallel jobs.
-
GNU Parallel
- Module Loading
module avail parallel module load parallel
- Example Usage
or parallelize a Python script:
parallel sha256sum ::: *.tar.gzparallel python process.py ::: input/*.csv - Key Features
- Automatically detects available CPU cores.
- Supports job logging (
--joblog), retries, progress tracking, and SLURM integration. - Works seamlessly with environment modules and shared filesystems.
- Module Loading
-
TaskSpooler (ts)
- Module Loading
module avail taskspooler module load taskspooler
- Example Usage
ts sleep 10 ts -l # List queued jobs ts -t 1 # Check output of job 1 ts -C # Clear completed jobs
- Benefits
- Simple command queue system for serial or limited parallel execution.
- Keeps jobs running in the background even after logout (e.g., via
tmux). - Ideal for students or researchers needing to queue lightweight, short jobs.
- Module Loading
-
Other Recommended Tools
- xargs — Basic parallel execution with
-Pflag for concurrent jobs. - GNU Make (-j) — Useful for managing and parallelizing repetitive build or analysis tasks.
- Makeflow — Workflow system for distributed or cluster-wide execution.
- Dask — Python-based parallel computing for data workflows.
- Parsl — Python workflow engine designed for HPC.
- parallel-ssh (pssh) — Run commands simultaneously on multiple remote nodes.
- xargs — Basic parallel execution with
-
Best Practices
- Use
module loadto ensure consistent environments for GNU Parallel and TaskSpooler. - Avoid oversubscribing CPUs; set
--jobsorTS_MAXFINISHEDappropriately. - Direct logs to per-task files (
--resultsor--joblog). - Use these tools within SLURM batch scripts for larger jobs.
- Use
🧪 Expected Outcome
A new subsection under Advanced Topics → Parallel Tools in the documentation.
Users should be able to:
- Load and use
parallelandtaskspoolermodules via LMOD. - Queue, run, and monitor multiple tasks efficiently.
- Choose the right tool for their workload size and complexity.
🧰 References
- GNU Parallel Official Site
- TaskSpooler Manual
- Makeflow Documentation
- Dask Documentation
- Parsl Documentation
✅ Tasks
- Create
docs/advanced_topics/parallel_tools.md - Add Parallel Tools entry to
index.rst - Include examples using
module load parallelandmodule load taskspooler - Add sections for other recommended parallel utilities
- Build and verify documentation locally
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation