Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 104 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,37 @@
# CAPIO
# CAPIO: Cross Application Programmable IO

CAPIO (Cross-Application Programmable I/O), is a middleware aimed at injecting streaming capabilities to workflow steps
without changing the application codebase. It has been proven to work with C/C++ binaries, Fortran Binaries, JAVA,
python and bash.
CAPIO is a middleware aimed at injecting streaming capabilities into workflow steps
without changing the application codebase. It has been proven to work with C/C++ binaries, Fortran, Java, Python, and
Bash.

[![codecov](https://codecov.io/gh/High-Performance-IO/capio/graph/badge.svg?token=6ATRB5VJO3)](https://codecov.io/gh/High-Performance-IO/capio)
![CI-Tests](https://github.com/High-Performance-IO/capio/actions/workflows/ci-tests.yaml/badge.svg)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://raw.githubusercontent.com/High-Performance-IO/capio/master/LICENSE)
[![codecov](https://codecov.io/gh/High-Performance-IO/capio/graph/badge.svg?token=6ATRB5VJO3)](https://codecov.io/gh/High-Performance-IO/capio) ![CI-Tests](https://github.com/High-Performance-IO/capio/actions/workflows/ci-tests.yaml/badge.svg) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://raw.githubusercontent.com/High-Performance-IO/capio/master/LICENSE)

> [!IMPORTANT]
> This version of CAPIO does not support writes to memory.
> If you need it please refer to releases/v1.0.0
> [!TIP]
> CAPIO is now multibackend and dynamic by nature: you do not need MPI, to benefit for the in-memory IO improvements!
> Just use a MTCL provided backend, if you want the in-memory IO, or fall back to the file system backend (default) if
> oy just want to coordinate IO operations between workflow steps!

## Build and run tests

---

## 🔧 Build and Run Tests

### Dependencies

CAPIO depends on the following software that needs to be manually installed:
**Required manually:**

- `cmake >=3.15`
- `c++17` or newer
- `cmake >= 3.15`
- `C++17`
- `pthreads`

The following dependencies are automatically fetched during cmake configuration phase, and compiled when required.
**Fetched/compiled during configuration:**

- [syscall_intercept](https://github.com/pmem/syscall_intercept) to intercept syscalls
- [Taywee/args](https://github.com/Taywee/args) to parse server command line inputs
- [simdjson/simdjson](https://github.com/simdjson/simdjson) to parse json configuration files
- [syscall_intercept](https://github.com/pmem/syscall_intercept) - Intercept and handles LINUX system calls
- [Taywee/args](https://github.com/Taywee/args) - Parse user input arguments
- [simdjson/simdjson](https://github.com/simdjson/simdjson) - Parse fast JSON files
- [MTCL](https://github.com/ParaGroup/MTCL) - Provides abstractions over multiple communication backends

### Compile capio
### Compile CAPIO

```bash
git clone https://github.com/High-Performance-IO/capio.git capio && cd capio
Expand All @@ -38,103 +41,118 @@
sudo cmake --install .
```

It is also possible to enable log in CAPIO, by defining `-DCAPIO_LOG=TRUE`.
To enable logging support, pass `-DCAPIO_LOG=TRUE` during the CMake configuration phase.

---

## 🧑‍💻 Using CAPIO in Your Code

Good news! You **don’t need to modify your application code**. Just follow these steps:

### 1. Create a Configuration File *(optional but recommended)*

Write a CAPIO-CL configuration file to inject streaming into your workflow. Refer to
the [CAPIO-CL Docs](https://capio.hpc4ai.it/docs/coord-language/) for details.

### 2 Launch the workflow with CAPIO

## Use CAPIO in your code
To launch your workflow with capio you can follow two routes:

Good news! You don't need to modify your code to benefit from the features of CAPIO. You have only to do three steps (
the first is optional).
#### A) Use `capiorun` for simplfied operations

Check failure on line 61 in README.md

View workflow job for this annotation

GitHub Actions / Check codespell conformance

simplfied ==> simplified

1) Write a configuration file for injecting streaming capabilities to your workflow
You can simplify the execution of workflow steps with CAPIO using the `capiorun` utility. See the
[`capiorun` documentation](capiorun/readme.md) for usage and examples. `capiorun` provides an easier way to manage
daemon startup and environment preparation, so that the user do not need to manually prepare the environment.

2) Launch the CAPIO daemons with MPI passing the (eventual) configuration file as argument on the machines in which you
want to execute your program (one daemon for each node). If you desire to specify a custom folder
for capio, set `CAPIO_DIR` as a environment variable.
```bash
[CAPIO_DIR=your_capiodir] capio_server -c conf.json
```
#### B) Manually launch CAPIO

> [!NOTE]
> if `CAPIO_DIR` is not specified when launching capio_server, it will default to the current working directory of
> capio_server.
Launch the CAPIO Daemons: start one daemon per node. Optionally set `CAPIO_DIR` to define the CAPIO mount point:

3) Launch your programs preloading the CAPIO shared library like this:
```bash
CAPIO_DIR=your_capiodir \
CAPIO_WORKFLOW_NAME=wfname \
CAPIO_APP_NAME=appname \
LD_PRELOAD=libcapio_posix.so \
./your_app <args>
```
```bash
[CAPIO_DIR=your_capiodir] capio_server -c conf.json
```

> [!CAUTION]
> If `CAPIO_DIR` is not set, it defaults to the current working directory.

You can now start your application. Just set the right environment variable and remember to set `LD_PRELOAD` to the
`libcapio_posix.so` intercepting library:

```bash
CAPIO_DIR=your_capiodir
CAPIO_WORKFLOW_NAME=wfname
CAPIO_APP_NAME=appname
LD_PRELOAD=libcapio_posix.so
./your_app <args>
```

> [!WARNING]
> `CAPIO_DIR` must be specified when launching a program with the CAPIO library. if `CAPIO_DIR` is not specified, CAPIO
> will not intercept syscalls.
> [!CAUTION]
> if `CAPIO_APP_NAME` and `CAPIO_WORKFLOW_NAME` are not set (or are set but do not match the values present in the
> CAPIO-CL configuration file), CAPIO will not be able to operate correctly!

### Available environment variables
---

CAPIO can be controlled through the usage of environment variables. The available variables are listed below:
## ⚙️ Environment Variables

#### Global environment variable
### 🔄 Global

- `CAPIO_DIR` This environment variable tells to both server and application the mount point of capio;
- `CAPIO_LOG_LEVEL` this environment tells both server and application the log level to use. This variable works only
if `-DCAPIO_LOG=TRUE` was specified during cmake phase;
- `CAPIO_LOG_PREFIX` This environment variable is defined only for capio_posix applications and specifies the prefix of
the logfile name to which capio will log to. The default value is `posix_thread_`, which means that capio will log by
default to a set of files called `posix_thread_*.log`. An equivalent behaviour can be set on the capio server using
the `-l` option;
- `CAPIO_LOG_DIR` This environment variable is defined only for capio_posix applications and specifies the directory
name to which capio will be created. If this variable is not defined, capio will log by default to `capio_logs`. An
equivalent behaviour can be set on the capio server using the `-d` option;
- `CAPIO_CACHE_LINE_SIZE`: This environment variable controls the size of a single cache line. defaults to 256KB;
| Variable | Description |
|-------------------------|----------------------------------------------------|
| `CAPIO_DIR` | Shared mount point for server and application |
| `CAPIO_LOG_LEVEL` | Logging level (requires `-DCAPIO_LOG=TRUE`) |
| `CAPIO_LOG_PREFIX` | Log file name prefix (default: `posix_thread_`) |
| `CAPIO_LOG_DIR` | Directory for log files (default: `capio_logs`) |
| `CAPIO_CACHE_LINE_SIZE` | Size of a single CAPIO cache line (default: 256KB) |

#### Server only environment variable
### 🖥️ Server-Only

- `CAPIO_METADATA_DIR`: This environmental variable controls the location of the metadata files used by CAPIO. it
defaults to CAPIO_DIR. BE CAREFUL to put this folder on a path that is accessible by all instances of the running
CAPIO servers.
| Variable | Description |
|----------------------|----------------------------------------------------------------------------|
| `CAPIO_METADATA_DIR` | Directory for metadata files. Defaults to `CAPIO_DIR`. Must be accessible. |

#### Posix only environment variable
### 📁 POSIX-Only (Mandatory)

> [!WARNING]
> The following variables are mandatory. If not provided to a posix, application, CAPIO will not be able to correctly
> handle the
> application, according to the specifications given from the json configuration file!
> ⚠️ These are required by CAPIO-POSIX. Without them, your app will not behave as configured in the JSON file.

- `CAPIO_WORKFLOW_NAME`: This environment variable is used to define the scope of a workflow for a given step. Needs to
be the same one as the field `"name"` inside the json configuration file;
- `CAPIO_APP_NAME`: This environment variable defines the app name within a workflow for a given step;
| Variable | Description |
|-----------------------|-------------------------------------------------|
| `CAPIO_WORKFLOW_NAME` | Must match `"name"` field in your configuration |
| `CAPIO_APP_NAME` | Name of the step within your workflow |

## How to inject streaming capabilities into your workflow
---

You can find documentation as well as examples, on the official documentation page at
## 📖 Extended documentation

[Official documentation website](https://capio.hpc4ai.it/docs)
Documentation and examples are available on the official site:

🌐 [https://capio.hpc4ai.it/docs](https://capio.hpc4ai.it/docs)

## Report bugs + get help
---

[Create a new issue](https://github.com/High-Performance-IO/capio/issues/new)
## 🐞 Report Bugs & Get Help

[Get help](capio.hpc4ai.it/docs)
- [Create an issue](https://github.com/High-Performance-IO/capio/issues/new)
- [Official Documentation](https://capio.hpc4ai.it/docs)

---

## CAPIO Team
## 👥 CAPIO Team

Made with :heart: by:
Made with ❤️ by:

Marco Edoardo Santimaria <marcoedoardo.santimaria@unito.it> (Designer and maintainer) \
Iacopo Colonnelli <iacopo.colonnelli@unito.it> (Workflows expert and maintainer) \
Massimo Torquati <massimo.torquati@unipi.it> (Designer) \
Marco Aldinucci <marco.aldinucci@unito.it> (Designer)
- Marco Edoardo Santimaria <marcoedoardo.santimaria@unito.it> (Designer & Maintainer)
- Iacopo Colonnelli <iacopo.colonnelli@unito.it> (Workflow Support & Maintainer)
- Massimo Torquati <massimo.torquati@unipi.it> (Designer)
- Marco Aldinucci <marco.aldinucci@unito.it> (Designer)

Former members:
**Former Members:**

Alberto Riccardo Martinelli <albertoriccardo.martinelli@unito.it> (designer and maintainer)
- Alberto Riccardo Martinelli <albertoriccardo.martinelli@unito.it> (Designer & Maintainer)

## Papers
---

[![CAPIO](https://img.shields.io/badge/CAPIO-10.1109/HiPC58850.2023.00031-red)]([https://arxiv.org/abs/2206.10048](https://dx.doi.org/10.1109/HiPC58850.2023.00031))
## 📚 Publications

[![CAPIO](https://img.shields.io/badge/CAPIO-10.1109/HiPC58850.2023.00031-red)](https://dx.doi.org/10.1109/HiPC58850.2023.00031)

[![](https://img.shields.io/badge/CAPIO--CL-10.1007%2Fs10766--025--00789--0-green?style=flat&logo=readthedocs)](https://doi.org/10.1007/s10766-025-00789-0)
Loading