Open tensorflow kernel for Cdiscount’s Image Classification Challenge

Open cxflow-tensorflow kernel for Cdiscount’s Image Classification Challenge Kaggle competition.

Start training on multiple GPUs with tensorflow right away!

Works on Linux with Python 3.5+.

Features:

CLI data download
Data validation with SHA256 hash
Simple data visualization
Train-Valid splitting
Low memory footprint data streams
GPU-CPU parallelism
Base VGG-like convnet
Multi-GPU training with a single argument!
TensorBoard training tracking
Model prediction and submission

Quick start

Install tensorflow and 7z.

Clone repo and install the requirements

git clone https://github.com/Cognexa/cdiscount-kernel && cd cdiscount-kernel
pip3 install -r requirements.txt --user

Download dataset with kaggle-cli (this may take a while, 3 hours in my case)

# requires >57Gb of free space
KG_USER="<YOUR KAGGLE USERNAME" KG_PASS="<YOUR KAGGLE PASSWORD>" cxflow dataset download cdc

Validate your download and see the example data:

# in the root directory (cdiscount-kernel)
cxflow dataset validate cdc
cxflow dataset show cdc
# now see the newly created visual directory

Create a random validation split with 10% of the data and start training:

cxflow dataset split cdc
cxflow train cdc model.n_gpus=<NUMBER OF GPUS TO USE>

Observe the training with TensorBoard (note: a summary is written only after each epoch)

tensorboard --logdir=log

Obtain predictions for submission:

cxflow predict log/<DIR> log/<DIR> model.restore_model_name=<CHECKPOINT NAME>

With only one checkpoint in the output directory, this simplifies to:

cxflow predict log/<DIR>

UPDATE [LB 0.65]

important: update cxflow and cxflow-tensorflow with pip3 install cxflow cxflow-tensorflow --user --upgrade

Main features:

XCeption net (https://arxiv.org/abs/1610.02357)
Fast random data access

Resize the data to dataset.size with (this may take a few hours)

cxflow dataset resize cdc/xception.yaml
cxflow dataset split cdc/xception.yaml

Run the training with

cxflow train cdc/xception.yaml

Training procedure that reached 0.65:

Train with original size, LR 0.0001, 4 middle flow repeats until stalled
Fine-tune with 128x128, LR 0.0001, 0.5 dropout, 0.00001 weight decay until stalled
Fine-tune as above but with LR 0.00001 (10x smaller)

Tips:

Use small images right away
The final GlobalAveragePooling may be a bottleneck
Net does not overfit so far, no augmentations needed

About

This kernel is written in cxflow-tensorflow, a plugin for cxflow framework. Make sure you check it out!

A simple submission script will be added soon, stay tuned!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
cdc		cdc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Open tensorflow kernel for Cdiscount’s Image Classification Challenge

Quick start

UPDATE [LB 0.65]

About

About

Uh oh!

Releases

Packages

Languages

License

shiva1387/cdiscount-kernel

Folders and files

Latest commit

History

Repository files navigation

Open tensorflow kernel for Cdiscount’s Image Classification Challenge

Quick start

UPDATE [LB 0.65]

About

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages