The repository is structured as follows. Each point is detailed below.
├── README.md <- The top-level README for developers using this project
├── configs <- Configuration files for Hydra. The subtree is detailed below
├── src <- Source code for use in this project
├── data <- Data folder, ignored by git
├── logs <- Logs folder, ignored by git (tensorboard?, wandb, CSVs, ...)
├── venv <- Virtual environment folder, ignored by git
├── requirements.txt <- The requirements file for reproducing the analysis environment
├── LICENSE <- License file
├── train.py <- Main script to run the code
└── personal_files <- Personal files, ignored by git (e.g. notes, debugging test scripts, ...)
This architecture is based on the fact that any research project requires a configuration, possibly decomposed into several sub-configurations
For the sake of reproducibility, and to avoid conflicts with other projects, it is recommended to use a virtual environment.
There are several ways to create a virtual environment. A good one is Virtual Env and conda.
The following commands create a virtual environment named ./venv/ and install the requirements.
python3 -m venv venv
source venv/bin/activate # for linux
venv\Scripts\activate.bat # for windows
pip install -r requirements.txt
#pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
#uncomment if you areon DGX
# pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu118
You just have to follow the script
bash first_install.sh
#python3 -m venv venv
#source venv/bin/activate # for linux
#venv\Scripts\activate.bat # for windows
#pip install -r requirements.txt
#pre-commit installUse Hydra , see this doc for more detail
I like clearml because I used it previously and u can plug it on top of every usual loggers, in any case you can fall back to tensorboard if needs be. For a first run you have to do and copy paste what they ask you to do :
clearml-initI love DGX, the password is the usual as the centraleSupelec one
clearml-init
If you want to run Jupyter on a computer node ( the one that has usually GPU). You should do
sbatch script/jupyter.batchThen go to this notebook and follow instruction
Command line macros are extremely useful to avoid typing the same commands over and over again. This is just a small tip that I like to do, but it can save a lot of time.
I advice to use files gitignored (there is a personal_* field in the .gitignore file) to store personal files, such as notes, debugging scripts, etc. It is a good practice to keep the repository clean and organized.
I am highly inspired from this awesome repo
It's something I've been working for a long time I found several options:
- Pytype
- MonkeyType: seems fine if your script is not too slow