Skip to content

Curated list of data pipelines used to showcase datasets across domains

License

Notifications You must be signed in to change notification settings

nathanbaleeta/data-engineering-notebooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

data-engineering-notebooks

Curated list of experiment data pipelines used to build datasets across domains

Pre-requisites

  • Python 3.x+
  • Jupyter Lab

Skills needed

  • Web scraping
  • Consuming RESTful APIs
  • Pandas
  • Polars

Quick Setup

The project uses Pip to keep track of its dependencies. To install it, you can follow the instructions here.

Once Pip has been installed, you can run the following commands to set up the project in your local:

git clone git@github.com:nathanbaleeta/data-engineering-notebooks.git

python3 -m venv venv

source venv/bin/activate

pip install --quiet pandas requests jupterlab
To freeze the libaries use:
pip freeze > requirements.txt
#### Install packages from frozen file
pip install -r requirements. txt

Launch notebook

jupyter lab 

About

Curated list of data pipelines used to showcase datasets across domains

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published