The main purpose of the project is to improve ML in Production skills, research how image classification models work.
The data we are using are taken from Planet: Understanding the Amazon from Space Kaggle competition. More info can be found at the competition website. In a nutshell we want to label satellite image chips with atmospheric conditions and various classes of land cover/land use. It is a multi-labeling problem with 17 different classes. In the competition algorithms were scored using the mean F2 score.
Here we only use the jpg images. Note that zip file should be unzipped.The data can still be downloaded here
-
Creating and activating the environment
python3 -m venv /path/to/new/virtual/environmentsource /path/to/new/virtual/environment/bin/activate -
Installing packages
In the activated environment:
pip install -r requirements.txt -
Customise config.yaml to suit your needs. Pay attention to
config.train_classes_path, you need to specify where the dataset filetrain_classes.csvwas downloaded to, and indata_diryou need to specify where thetrain-jpgfolder was downloaded to.
Start training:
PYTHONPATH=. python src/train.py configs/config.yaml
Command
dvc pull weights/vgg16_feature_extractor.pth.dvc
will create vgg16_feature_extractor.pth within weights directory, it will be used as a base model for prediction
command
PYTHONPATH=. python src/predict.py --image_path <path_to_image>
will be used as a inference point for <path_to_image> image, test_image is provided in the root of repo