In this demo, the following scenario is demonstrated, combining both DMF Data Catalog (WP2) and SkyStore (WP4) for a full data lake behavior:
- Starting a DMF data catalog consumer in the
wp24.ipynbJupyter notebook, which subscribes to notifications (MQTT broker) and waits. - Starting a DMF data catalog producer using the
produce.shscript. The producer stores inference data on SkyStore S3-Proxy connected to AWS region eu-centeral-1. Producer uses Nuvla to both write the data and register in the catalog, which sends notification - The consumer receives the notification and then uses the SkyStore S3-proxy connected to AWS region eu-west-1 to load the data, which causes the data to be propagated to that endpoint.
- After loading the data, the consumer uses the inference data to perform inference on a model hosted on K8s service in OVH using KServe.
- The inference result is the final output
- At each setup below, be sure to collect and record all the resulting credentials and environment variables
- Deploy the sample PER custom model Inference Service. See the instructions at: https://github.com/revit13/per-demo-sept . Make sure to set up port-fordwarding for inference access if needed (i.e., if Istio does not have a public IP ingress already)
- Deploy SkyStore on Kubernetes following the instructions in https://github.com/gilv/skystore/blob/headbucket/CONTAINER.md . Note that you need to configure 2 S3-proxies with public access (e.g.,
LoadBalancer), each connected to a different S3 storage - can be a local premise (e.g., Minio/Ceph) or a cloud region (AWS, GCP, Azure, etc) - Deploy an MQTT broker on K8s with public access
- Set up a Nuvla account at https://nuvla.io and register the MQTT broker for data notifications on S3 access connected to one of the public S3-proxies you set up at step 2. That S3-proxy will be used for the producer side. The consumer side will use the other S3-proxy. The screenshot below shows an example of registering one S3-proxy in Nuvla:

- Organize your collected credentials and environment variables in a single
.envfile organized similar to the example picture below (secret data is obfuscated). Note that this file is not provided in the demo repository because it contains secrets.
- Clone the repository for this demo from: https://github.com/erezh16/extract-wp24-demo
- Create and activate a fresh Python venv (Virtual Environment) with Python 3.12 using classic
venv,anaconda,pyenv, etc. - All further instructions below are to be carried out from the root folder of the cloned repository
- Install the dependencies:
pip install -r requirements.txt - Place the
.envfile you created in the pre-requisites in the root folder of the cloned repo.
- Open a new terminal in the root folder
- Activate the venv you created during setup
- Start the consumer notebook. One way is using Jupyter:
jupyter notebook/wp24.ipynb. You can also open the folder withvscodeand click onnotebook/wp24.ipynb. - Run the notebook in the cell order. It should block at cell 6, waiting for the producer to store data and generate a notification.
- Open a new terminal in the root folder
- Activate the venv you created during setup
- Run the script
./produce.sh. It should store the filereduced_tronchetto_array.ptas an object on the S3 storage connected to the S3-proxy that you registered on Nuvla.
- Go to the open notebook again. You will see that the notebook finised cell 6 as it received the DMF notification
- Execute remaining cells in order
- The inference result should be printed as a list of values in the end of the notebook execution.