Skip to content

ubenabdelkrim/PyRun-SciPy2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

☁️ CloudLab-SciPy2025

This repository contains hands-on examples for processing large-scale scientific data in the cloud using:

  • Dataplug: A lightweight, client-side Python framework for efficient partitioning of unstructured scientific data stored in object storage (like Amazon S3), enabling elastic cloud processing.
  • Lithops: Serverless framework for scalable parallel processing.

πŸš€ Quick Start (Recommended): Use pyrun.cloud

This tutorial is designed to run seamlessly on pyrun.cloud, a cloud-based JupyterLab platform with:

βœ… Pre-installed dependencies
βœ… Auto-configured Lithops backend
βœ… Direct support for Dataplug and serverless workflows

🟒 No setup required β€” just launch the notebooks and start experimenting!


πŸ§ͺ Running the Examples

πŸ“ Example 1 – Using Dataplug Locally

Notebook: dataplug_example.ipynb

This notebook shows how to:

  1. Load a FASTA file from an S3 bucket using CloudObject.from_s3
  2. Explore metadata (e.g., number of sequences)
  3. Preprocess and split the file into chunks
  4. Partition the data for analysis

Run it on pyrun or locally with:

jupyter notebook dataplug_example.ipynb

☁️ Example 2 – Scalable Processing with Dataplug + Lithops

Notebook: dataplug_lithops.ipynb

This notebook demonstrates how to scale the same processing logic to the cloud using Lithops:

  • Partition the FASTA file with co.partition(...)
  • Apply process_fasta_partition to each slice
  • Launch parallel processing with lithops.FunctionExecutor

Run it on pyrun or locally with:

jupyter notebook dataplug_lithops.ipynb

βœ… The integration between Dataplug and Lithops is native β€” no code changes needed to go from local to serverless!


πŸ’» Running Locally (Optional)

If you prefer to run the notebooks locally instead of pyrun, follow these steps:

πŸ“¦ Install required libraries

pip install git+https://github.com/CLOUDLAB-URV/dataplug
pip install lithops

βš™οΈ Configure Lithops

To execute functions in the cloud (AWS, IBM Cloud, Azure, etc.), you’ll need to configure your Lithops backend manually.

You can follow the official guide here:
πŸ‘‰ https://github.com/lithops-cloud/lithops#configuration

Create a .lithops_config file with your credentials and backend options.


πŸ“š Requirements

  • Python 3.10 or higher
  • Access to an S3-compatible storage (e.g., AWS S3, MinIO)
  • Internet connection
  • Cloud credentials (automatically set in pyrun, or configured manually for local runs)

πŸ“£ About

This code is part of the PyRun-SciPy2025 tutorial series for scientific computing in the cloud.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors