-
inputs>processed_dfs- datasets for actual training of the models, they are more light-weight, then original dataset, which is hidden by.gitignore -
src- python and.ipynbfiles2.1
data_processing/- work with direct GCJ (time-consuming)2.2
models/- main directory with the models with the inheritance hierarchy, described below2.3
training/- things, which are used by the models:GridSearchmethods andTraining callback2.4
main.py- starting point -
outputs- images, models e.t.c
- Install the requirements from
requiremnets.txt:pip install -r requirements.txt - Change the
src/main.pyas appropriate: (uncomment the following lines, if needed):
2.1 To train the embedding-based model:
embedding = Embedding(make_initial_preprocess=False)
embedding.train(batch_size=128, epochs=1)This commands will create the model and train it for one epoch with the batch size 128.
The dataset will be taken from inputs/preprocessed_jsons/embedding_train.json,
if make_initial_preprocess is set to False. Otherwise, the access to the raw data is required.
2.2 To train the embedding-based model:
conv2d = Conv2D(make_initial_preprocess=True)
conv2d.train(batch_size=128, epochs=1)Warning: to train this model, the row dataset(py_df.csv) is required
2.3 To generate the images, which represent the focus of the models:
Visualizer("conv2d").run()
Visualizer("embedding").run()Example of embedding-based visualization
Example of conv2d visualization
2.4 To show all the layers of the models:
KeractVisualizer("conv2d").run()
KeractVisualizer("embedding").run()- Run
python3 src/main.pyand fix import errors, if there are
Model- root class (interface for all models)Triplet(Model)- triplet-loss specific methods (batch generation, fit process, full model creation e.t.c)Embedding(Triplet)- actual realization of the target architecture
Visualizer- visualization, based ontf-keras-vis, which performs per-pixel modifications of the image, which potentially can lead to errorsKeractVisualizer- visualization, based onkeractlibrary.
WARNING: when using, substitute the ._layers call with .layers call in keract.py file within a library in case of error (tensorflow version 2.5.0, keract version 4.4.0)
>> tensorboard --logdir=outputs/tensor_board
>> source ./venv/bin/activate
>> source /opt/anaconda/bin/activate root
>> docker run --gpus all --device /dev/nvidia0 --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools --device /dev/nvidiactl -v /home/alina/SourceCodeAuthorshipAttribution/:/usr/app/ 748cf8b681db python /usr/app/src/main.py
>> docker build -t scaa . 





