Skip to content

My cool template for student Research/kaggle project

License

Notifications You must be signed in to change notification settings

gardiens/research-project-template

Repository files navigation


Logo

Research Project Template

Fast train with hydra and lightning
Explore the docs »

Repository structure

The repository is structured as follows. Each point is detailed below.

├── README.md        <- The top-level README for developers using this project
├── configs         <- Configuration files for Hydra. The subtree is detailed below
├── src             <- Source code for use in this project
├── data            <- Data folder, ignored by git
├── logs           <- Logs folder, ignored by git (tensorboard?, wandb, CSVs, ...)
├── venv           <- Virtual environment folder, ignored by git
├── requirements.txt  <- The requirements file for reproducing the analysis environment
├── LICENSE        <- License file
├── train.py         <- Main script to run the code
└── personal_files <- Personal files, ignored by git (e.g. notes, debugging test scripts, ...)

This architecture is based on the fact that any research project requires a configuration, possibly decomposed into several sub-configurations

Setup

Virtual environment

For the sake of reproducibility, and to avoid conflicts with other projects, it is recommended to use a virtual environment.

There are several ways to create a virtual environment. A good one is Virtual Env and conda.

The following commands create a virtual environment named ./venv/ and install the requirements.

python3 -m venv venv
source venv/bin/activate  # for linux
venv\Scripts\activate.bat  # for windows
pip install -r requirements.txt
#pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
 #uncomment if you areon DGX
# pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu118 

Setup the script

You just have to follow the script

bash first_install.sh
#python3 -m venv venv
#source venv/bin/activate  # for linux
#venv\Scripts\activate.bat  # for windows
#pip install -r requirements.txt
#pre-commit install

Configuration system

Use Hydra , see this doc for more detail

clearml

I like clearml because I used it previously and u can plug it on top of every usual loggers, in any case you can fall back to tensorboard if needs be. For a first run you have to do and copy paste what they ask you to do :

clearml-init

Other tips

DGX

I love DGX, the password is the usual as the centraleSupelec one

clearml-init

Use Jupyter On a Slurm Cluster

If you want to run Jupyter on a computer node ( the one that has usually GPU). You should do

sbatch script/jupyter.batch

Then go to this notebook and follow instruction

Macros

Command line macros are extremely useful to avoid typing the same commands over and over again. This is just a small tip that I like to do, but it can save a lot of time.

User-personal usefull files

I advice to use files gitignored (there is a personal_* field in the .gitignore file) to store personal files, such as notes, debugging scripts, etc. It is a good practice to keep the repository clean and organized.

Disclaimer

I am highly inspired from this awesome repo

Autotyper

It's something I've been working for a long time I found several options:

  • Pytype
  • MonkeyType: seems fine if your script is not too slow

About

My cool template for student Research/kaggle project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published