Yet Another Security Toolbox
YAST is a Python-based toolbox designed to help users test their machine learning models against adversarial samples in the image classification landscape. YAST is implemented using a distributed computing architecture, leveraging Celery as a task queue and Redis as message broker for efficient processing of many models and adversarial attacks.
YAST is intended for any user who wants to evaluate the robustness of their machine learning models against various adversarial attacks.
YAST offers a capable and robust pipeline batch system that allows extensive adversarial robustness tests to be performed with minimal code. Its use of Celery enables efficient processing of many models and adversarial attacks, making it possible to test a wide variety of models against various adversarial attacks in a reasonable time frame.
With YAST, users can easily test the robustness of their models and gain insights into which models perform better under different types of adversarial attacks. Moreover, the automatic functionalities of YAST make it easy to use, even for users with limited experience in adversarial testing.
In this section, we will describe the necessary setup to run YAST in a master-slave architecture. The following commands will be for Ubuntu machines, for other operating systems commands may vary.
Before starting it's important to mention that in order to properly clone the repository, the user must have "git lfs" for the handling of binary files. After cloning the repository you may use the following command to check the binary files:
git lfs ls-filesIt's recommended to use a virtual environment. In order to create a virtual environment use the following commands:
# Create a new virtual environment
python -m venv "venv_name"
# Activate virtual environment
source "venv_name"/bin/activate
# Deactivate virtual environment
deactivate- Start by installing Celery in all machine.
# [ALL MACHINES]
sudo apt-get install celery -y- On the Master machine install Redis server.
# [MASTER MACHINE]
sudo apt-get install redis-server -y- Depending on the network the redis configuration file may need to be changed.
# [MASTER MACHINE]
# edit the config file
sudo nano /etc/redis/redis.conf
# in the NETWORK section change the bind to the master IP or 0.0.0.0 to listen to all interfaces
bind 0.0.0.0- After that make sure the redis-server service is running.
# get status of redis-server
sudo systemctl status redis-server
# restart redis-server
sudo systemctl restart redis-server- Now navigate to the project directory and install all the requirements.
# [ALL MACHINES]
pip install -r requirements.txtNote: The python version used for development and testing: 3.8.10.
- Start the master machine.
python main.py- Start the workers.
python worker.py {masterip} {workername}Note: Workers must have different names.
List of available commands:
- list_workers -l | -all -> list live | all workers
- sincronize_repo -> sincronize repo with workers
- sincronize_models -> sincronize models with workers
- list_jobs -> list jobs in progress
- add_pipeline -> add a pipeline to the queue
- h, help -> shows the list of available commands
To exit the program, use control + C and wait until the program exits.
- Bandits and Prior (Black-box)
- CornerSearch (Black-box)
- DDN (White-box)
- DeepFool (White-box)
- GeoDA (Black-box)
- LogBarrier (White-box)
- SimBA (Black-box)
- Square Attack (Black-box)
- UAP (White-box)
Note: In the future, others will be added.
- CIFAR10
Note: In the future, others will be added.
- densenet
- googlenet
- mobilenet
- resnet
- senet
- vgg
Note: Others may be added.
To test a model 2 configuration files are needed. By default, the program will get the 2 configuration files inside the 'configs' folder. The paths can be specified, and the configuration files edited accordingly to the necessities.
By default, it's the 'config.yaml' configuration file. This file contains the models and datasets information necessary to run the program.
Structure:
DATASET:
dataset_name:
run: [true|false] whether to run the dataset or not
dataset_loader_path: path_to_dataset_loader
MODEL:
model_name:
run: [true|false] whether to run the model or not
model_path: path_to_model
method_name: method_to_load_inside_the_file
checkpoint_path: checkpoint_to_load [optional]
args: arguments to load the method [optional]Note: Multiple datasets and models can be defined, and the ones with the run flag as true will be tested.
By default, it's the 'attacks_config.yaml' configuration file. This file contains the attacks parameters to execute on the model and dataset.
Structure:
ATTACK:
attack_name:
run: [true|false] whether to run the attack or not
path: path_to_attack_file
arg1: attack_argument_value
arg2: attack_argument_value2
...Note: Multiple attacks can be defined in the file. Use the default 'attacks_config.yaml' file as a guideline for the attacks arguments.
To add a different dataset, you must do the following steps:
Create a file where you define the following global variables:
- DATASET_PATH (path of the current file)
- DATASET_NAME (name of the dataset)
- IMAGE_SIZE (size of the images in the dataset)
- MEAN (mean of the dataset by channel)
- STD (standard deviation of the dataset by channel)
- TRANSFORM (transforms compose for test dataset)
- TRAIN_TRANSFORM (transforms compose for train dataset)
- NUM_CLASSES (number of classes of the dataset)
There also should exist a 'dataLoader()' function that returns the train and tests loaders (dtype = torch.utils.data.dataloader.DataLoader).
You may use the datasetLoader.py file as a guide, available in 'datasets\cifar-10\datasetLoader.py'.
Add the dataset information to the configuration file, by default the file is on the root of the project and is called 'config.yaml'.
Use the following structure:
dataset_name:
run: [true|false]
dataset_loader_path: path to the previously created fileTo add a new model to be tested, you need to include some information in the 'config.yaml' file.
Add the information relative to your model using the following structure:
model_name:
run: [true|false]
model_path: path to the file containing the model
method_name: name of the class or method of the model
checkpoint_path: [optional] path of model checkpoint
args:
- arg_1
- arg_2
[optional] arguments to be passed to the method/classYou can add an attack from any directory, but it's recommended that you create a folder with the name of the attack, inside include a file attack*name.py and this folder should be inside the folder ***'attacks/'_** visible from the root of the project.
Then inside the created folder, create the file attack_name.py where attack_name is the name of your attack.
The file should be ready to be executed as an independent that receives the following parameters:
- model : path for a binary file that contains the model
- dataset : path for datasetLoader file
- total-images : number of images to be tested
- log : [true|false] flag for the logging messages
- results-path : path to store the results
Note: add all the necessary parameters to run your attack as well. The parameters above are just the default.
File behavior:
The file should load the dataset from the datasetLoader, extract the model and the classified labels of the images from the binary file.
Then execute the attack.
Store the results following the correct structure (results structure available in a section below), all results should go inside the attack_name folder.
The attack should also have a logger that writes to the attack_name.log file.
Note: There are functions available in the 'utils\func_utils.py' file that can be used to store the results with the proper structure.
To add the attack, you should modify the 'attacks_config.yaml' file and add the necessary parameters to run the attack. Use the following structure:
attack_name:
run: [true|false]
path: path to the attack file
param1: ...
param2: ...
...In this section it's presented the structure of the zip generated after executing the attacks that contains all the results.
zip_results
├── 1_model_dataset
│ ├── attacks
│ │ ├── attack_name
│ │ │ ├── perturbed_images
│ │ │ │ ├── l2_attackName_imageId.jpeg
│ │ │ │ └── ...
│ │ │ ├── attack_name.log
│ │ │ ├── labels.csv
│ │ │ └── sums.csv
│ │ └── ...
│ ├── original
│ │ ├── original_id.jpeg
│ │ └── ...
│ ├── plots
│ │ ├── plot_n.png
│ │ └── ...
│ ├── pipeline.log
│ ├── config.yaml
│ ├── results_by_attack.csv
│ └── robustness_score.csv
├── 2_...
├── config.yaml
├── attacks_config.yaml
└── robustness_score.csv
Where the attack_name and imageId should be changed to the name of the attack and id of the image in the dataset, respectively.
Clarification on the purpose of some folder and files:
- zip_results [folder]: folder where all results will be stored.
- 1_model_dataset [folder]: folder where the results of one pipeline will be stored. Depending on the configuration files multiple files can be generated, the inital number represents a counter, the model and dataset are the names of the models and datasets used.
- attacks [folder]: folder where the results of individual attacks will be stored.
- perturbed_images [folder]: the folder contains all the perturbed images of the attack.
- config.yaml [file]: configuration file used to execute the pipeline, contains the parameters for the attacks used.
- attacks_config.yaml [file]: configuration file used to prepare the pipeline, contains parameters to load the models and datasets.
- l2_attackName_imageId.jpeg [file]: perturbed image, name follows the structure (l2_attackName_imageId).
- attack_name.log [file]: log file of the attack.
- labels.csv [file]: Stores values for each image tested (Columns: Ground truth label, Original classified label, Perturbed image classified label, L2, Queries).
- sums.csv [file]: Stores the total sum of some values (Columns: Number of images, Original images correctly classified by the model, Adversarial images correctly classified by the model).
- original [folder]: folder where the original images tested are stored.
- original_id [file]: original image, follows the structure (original_imageId).
- plots [folder]: folder where the plots generated are stored.
- plot_n.png [file]: plot generated after executing all the attack in a pipeline.
- pipeline.log [file]: log file of the pipeline.
- results_by_attack.csv [file]: Summarizes the information of all the attacks executed (Columns: Attack name, Adversarial images correctly classified by the model)
- robustness_score.csv [file]: File containing the robustness score given to the model.