Skip to content

sri8661/datascience8661

Repository files navigation

kaggle-base

  • Template directory for datascience competitions.
  • Data is saved in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉

Usage

Step0. Clone the repository

git clone https://github.com/kiccho1101/kaggle-base.git
cd kaggle-base

Step1. Pull/Build Docker image

Recommended:

make pull

or

make build

Step2. Start up jupyter notebook

make jupyter
  • Copy token and acccess to localhost:${JUPYTER_PORT} (default: 9000)

Step3. Start up DB

make start-db
  • Then you can access to localhost:${PGWEB_PORT} (default: 9002) to view the database.

Step4. Split train data into K-fold

make kfold CONFIG_NAME(default: lightgbm_0)

Step5. Create Features

  • Create all features.
make feature
  • Specify a feature that will be created.
make feature FEATURE_NAME

Step6. Cross Validation

make cv CONFIG_NAME

Step7. Create Stats of each table

make stats

Step8. Train and Predict

make train-and-predict CONFIG_NAME

Step9. Submit

  • Then submit your output file!🙆
./output/submission_xxx.csv

Commands

isort, black

make format

flake8, mypy

make check

Reset DB

make reset-db

execute scripts

Recommended:

make shell
python xxx.py

or

make run python xxx.py

References

まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。

クラスの書き方が参考になります。

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published