kaggle-base

Template directory for datascience competitions.
Data is saved in PostgreSQL on Docker🐳 container and the data is reproducibule/reusable 😄🎉

Usage

Step0. Clone the repository

git clone https://github.com/kiccho1101/kaggle-base.git
cd kaggle-base

Step1. Pull/Build Docker image

Recommended:

make pull

or

make build

Step2. Start up jupyter notebook

make jupyter

Copy token and acccess to localhost:${JUPYTER_PORT} (default: 9000)

Step3. Start up DB

make start-db

Then you can access to localhost:${PGWEB_PORT} (default: 9002) to view the database.

Step4. Split train data into K-fold

make kfold CONFIG_NAME(default: lightgbm_0)

Step5. Create Features

Create all features.

make feature

Specify a feature that will be created.

make feature FEATURE_NAME

Step6. Cross Validation

make cv CONFIG_NAME

Step7. Create Stats of each table

make stats

Step8. Train and Predict

make train-and-predict CONFIG_NAME

Step9. Submit

Then submit your output file!🙆

./output/submission_xxx.csv

Commands

isort, black

make format

flake8, mypy

make check

Reset DB

make reset-db

execute scripts

Recommended:

make shell
python xxx.py

or

make run python xxx.py

References

1.データ分析コンペにおいて特徴量管理に疲弊している全人類に伝えたい想い

まさに特徴量管理に疲弊していたときに見つけたスライド。すごくわかりやすいです。

2.Kaggleで使えるFeather形式を利用した特徴量管理法

クラスの書き方が参考になります。

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
configs		configs
cross_validation		cross_validation
db		db
docker		docker
features		features
hp_tuning		hp_tuning
input		input
k_fold		k_fold
models		models
models_ensamble		models_ensamble
notebook		notebook
output		output
postprocessing		postprocessing
stats-db		stats-db
submit		submit
train_and_predict		train_and_predict
utils		utils
.env		.env
.gitignore		.gitignore
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kaggle-base

Usage

Step0. Clone the repository

Step1. Pull/Build Docker image

Step2. Start up jupyter notebook

Step3. Start up DB

Step4. Split train data into K-fold

Step5. Create Features

Step6. Cross Validation

Step7. Create Stats of each table

Step8. Train and Predict

Step9. Submit

Commands

isort, black

flake8, mypy

Reset DB

execute scripts

References

1.データ分析コンペにおいて特徴量管理に疲弊している全人類に伝えたい想い

2.Kaggleで使えるFeather形式を利用した特徴量管理法

3.flowlight0's directory

4.upura's directory

About

Uh oh!

Releases

Packages

Languages

sri8661/datascience8661

Folders and files

Latest commit

History

Repository files navigation

kaggle-base

Usage

Step0. Clone the repository

Step1. Pull/Build Docker image

Step2. Start up jupyter notebook

Step3. Start up DB

Step4. Split train data into K-fold

Step5. Create Features

Step6. Cross Validation

Step7. Create Stats of each table

Step8. Train and Predict

Step9. Submit

Commands

isort, black

flake8, mypy

Reset DB

execute scripts

References

1.データ分析コンペにおいて 特徴量管理に疲弊している全人類に伝えたい想い

2.Kaggleで使えるFeather形式を利用した特徴量管理法

3.flowlight0's directory

4.upura's directory

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1.データ分析コンペにおいて特徴量管理に疲弊している全人類に伝えたい想い

Packages