EXTRACT Joint Demonstrator WP2+WP4

Propagate Inference Data Over Data Catalog and SkyStore

Overview

In this demo, the following scenario is demonstrated, combining both DMF Data Catalog (WP2) and SkyStore (WP4) for a full data lake behavior:

Starting a DMF data catalog consumer in the wp24.ipynb Jupyter notebook, which subscribes to notifications (MQTT broker) and waits.
Starting a DMF data catalog producer using the produce.sh script. The producer stores inference data on SkyStore S3-Proxy connected to AWS region eu-centeral-1. Producer uses Nuvla to both write the data and register in the catalog, which sends notification
The consumer receives the notification and then uses the SkyStore S3-proxy connected to AWS region eu-west-1 to load the data, which causes the data to be propagated to that endpoint.
After loading the data, the consumer uses the inference data to perform inference on a model hosted on K8s service in OVH using KServe.
The inference result is the final output

0. Pre-requisites

At each setup below, be sure to collect and record all the resulting credentials and environment variables
Deploy the sample PER custom model Inference Service. See the instructions at: https://github.com/revit13/per-demo-sept . Make sure to set up port-fordwarding for inference access if needed (i.e., if Istio does not have a public IP ingress already)
Deploy SkyStore on Kubernetes following the instructions in https://github.com/gilv/skystore/blob/headbucket/CONTAINER.md . Note that you need to configure 2 S3-proxies with public access (e.g., LoadBalancer), each connected to a different S3 storage - can be a local premise (e.g., Minio/Ceph) or a cloud region (AWS, GCP, Azure, etc)
Deploy an MQTT broker on K8s with public access
Set up a Nuvla account at https://nuvla.io and register the MQTT broker for data notifications on S3 access connected to one of the public S3-proxies you set up at step 2. That S3-proxy will be used for the producer side. The consumer side will use the other S3-proxy. The screenshot below shows an example of registering one S3-proxy in Nuvla:
Organize your collected credentials and environment variables in a single .env file organized similar to the example picture below (secret data is obfuscated). Note that this file is not provided in the demo repository because it contains secrets.

1. Setup

Clone the repository for this demo from: https://github.com/erezh16/extract-wp24-demo
Create and activate a fresh Python venv (Virtual Environment) with Python 3.12 using classic venv, anaconda, pyenv, etc.
All further instructions below are to be carried out from the root folder of the cloned repository
Install the dependencies: pip install -r requirements.txt
Place the .env file you created in the pre-requisites in the root folder of the cloned repo.

2. Run the consumer (notebook)

Open a new terminal in the root folder
Activate the venv you created during setup
Start the consumer notebook. One way is using Jupyter: jupyter notebook/wp24.ipynb. You can also open the folder with vscode and click on notebook/wp24.ipynb.
Run the notebook in the cell order. It should block at cell 6, waiting for the producer to store data and generate a notification.

3. Run the producer (script)

Open a new terminal in the root folder
Activate the venv you created during setup
Run the script ./produce.sh. It should store the file reduced_tronchetto_array.pt as an object on the S3 storage connected to the S3-proxy that you registered on Nuvla.

4. Finish the demo - consumer notebook

Go to the open notebook again. You will see that the notebook finised cell 6 as it received the DMF notification
Execute remaining cells in order
The inference result should be printed as a list of values in the end of the notebook execution.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
catalogue		catalogue
client		client
images		images
notebook		notebook
.gitignore		.gitignore
README.md		README.md
produce.sh		produce.sh
reduced_tronchetto_array.pt		reduced_tronchetto_array.pt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EXTRACT Joint Demonstrator WP2+WP4

Propagate Inference Data Over Data Catalog and SkyStore

Overview

0. Pre-requisites

1. Setup

2. Run the consumer (notebook)

3. Run the producer (script)

4. Finish the demo - consumer notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EXTRACT Joint Demonstrator WP2+WP4

Propagate Inference Data Over Data Catalog and SkyStore

Overview

0. Pre-requisites

1. Setup

2. Run the consumer (notebook)

3. Run the producer (script)

4. Finish the demo - consumer notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages