As part of this task I:
- created Python 3.9.13 venv for such task, venv was used both for analysis notebooks and
train.py/predict.pytesting. - analysed provided dataset
train.csv(process of analysis in1.EDA.ipynband2 Model selection.ipynb) - found out that target column is generated as
target = abs(var6)**2 + var7, where var6 and var7 - columns '6' and '7' respectively. - prepared files
train.pyto recreate linear regression model training andpredict.pyfor model inference - generated predictions for
hidden_test.csvdataset -predictions.csv
1.EDA.ipynb- general analysis oftrain.csv2 Model selection.ipynb- additional research to find model that is the most effective in describing relationship target ~ data and see what info about this relationship I can extract from ittrain.py- script for model training and savingpredict.py- script for generating predictions from saved model.gitignore- git exceptionsREADME.md- this READMErequirements.txt- requirements for venv recreationpredictions.csv- predictions forhidden_test.csv
All elements (both notebooks and scripts) where created and tested in Python 3.9.13 venv with requirements as provided in requirements.txt.
To train model run in terminal:
$ python train.pyThe script accepts two optional arguments:
--train-file: Path to the CSV file containing the training data. Default istrain.csv.--model-file: Path to save the trained model. Default ismodel.pkl.
so if there is need to set dataset other than train.csv and/or model's file name other than model.pkl, use this command:
$ python train.py --train-file custom_train_data.csv --model-file custom_model.pklTo generate predictions from saved model run in terminal:
$ python predict.pyThe script accepts three optional arguments:
--model-file: Path to the pre-trained model file. Default ismodel.pkl.--test-file: Path to the CSV file containing the test data. Default ishidden_test.csv.--output-file: Path to save the prediction results as a CSV file. Default ispredictions.csv.
so if there is need to set dataset other than hidden_test.csv, and/or model's file name other than model.pkl, and/or predictions file name other than predictions.csv use this command:
$ python predict.py --model-file custom_model.pkl --test-file custom_test_data.csv --output-file custom_predictions.csv