WorkingMemory storage management

### Before submitting

- [x] I've searched [open issues](issues?q=is%3Aissue%20state%3Aopen%20type%3AFeature) and found no similar request
- [x] I'm willing to start a discussion or contribute code

### Problem / motivation

Currently, most of the Stuff data is stored in-memory. It's the case for all objects except a few classes like ImageContent which can hold bytes _or_ a URL.
This is a problem for several reasons, but the main is that Pipelex workflows should be usable in a lean orchestrated way. The logic and tracking of the workflow (concepts passed from pipe to pipe) should be as light as possible. Pipelex should be able to work with potentially very large data, like videos, and passing it around in and out of pipes is not appropriate.

### Proposed solution

We have already added a StorageProvider to Pipelex's main singleton. It can be customized using dependency injection by providing a class that respects the StorageProviderAbstract interface. It's only two methods:
```
class StorageProviderAbstract(ABC):
    @abstractmethod
    def load(self, uri: str) -> bytes:
        pass

    @abstractmethod
    def store(self, data: bytes) -> str:
        pass
```
The planned feature consists in using the active storage provider, available from pipelex.hub via `get_storage_provider()`, and systematically load/store the inputs/outputs of pipes. This means every StuffContent's data should be substitutable by a URI.
The concrete implementation of StorageProviderAbstract will be responsible for defining the URI and trading it for data. For instance it could use local storage with a path, or it could use online storage such as Amazon S3 or Google Storage to store blobs and use their bucket/blob_id as URI.
Note: for structured objects (BaseModels), serialization/deserialization will be the responsibility of our other open-source library [Kajson](https://github.com/Pipelex/kajson) which is already a dependency of Pipelex.

Obviously this will add overhead for loading/storage when entering/leaving every call to `run_pipe()`. But in many cases this will be faster than inference and anyway it's the way to durable resilient workflows. That said, the first version of this feature should be based on an in-memory StorageProviderAbstract to avoid the overhead and enable testing the load/store logic with minimum dependencies. The second way should be local storage, and that will make it easy and natural to use or dispose of any generated stuff at the end of the pipeline.

### Alternatives considered

_No response_

### Would you like to help implement this feature?

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WorkingMemory storage management #160

Before submitting

Problem / motivation

Proposed solution

Alternatives considered

Would you like to help implement this feature?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

WorkingMemory storage management #160

Description

Before submitting

Problem / motivation

Proposed solution

Alternatives considered

Would you like to help implement this feature?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions