Skip to content

Code for the Deep Learning Applications (2024/25) uni course. Covering residual networks, transformers, multimodal, and adversarial robustness

Notifications You must be signed in to change notification settings

fbizza/deep-learning-applications

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

deep-learning-applications

This repository contains the code developed to solve the exercises of the Deep Learning Applications course (2024/25).

Laboratory 1

Exercise 1.1 and 1.2

The goal of this exercise is to demonstrate the importance of residual connections. We do this by evaluating simple MLPs and showing that deeper networks with residual connections are easier to train compared to networks of the same depth without residual connections. We compare MLPs with a width of 16 and varying depths on the MNIST dataset.

Note: For this first experiment, each residual block contains 1 layer

Depth Accuracy (MLP) Accuracy (ResidualMLP)
2 0.93 0.94
8 0.88 0.95
32 0.11 0.95
64 0.11 0.35

Notice the poor performance of the last Residual MLP (depth 64). This was improved by increasing the number of layers per residual block from 1 to 2:

Depth Accuracy (MLP) Accuracy (ResidualMLP, 2 layers/block)
2 0.93 0.93
8 0.88 0.95
32 0.11 0.96
64 0.11 0.96

The figure below shows the magnitude of gradients as they propagate through the network, comparing a standard MLP and a MLP with residual blocks. The zig-zag pattern in the graph occurs because gradients for biases and weights were not separated in this visualization. Despite this, it is clear that residual connections help preventing vanishing gradients.

gradients

Exercise 1.3

The goal of this exercise is to repeat the analysis from Exercise 1.2, this time using Convolutional Neural Networks trained on CIFAR-10. As expected, we observe improvements when residual connections are used.

cnn

Accuracy curves during training. The numbers next to each model name indicate the number of residual blocks (with or without skip connections enabled). Models with residual connections achieve consistently higher accuracy than the plain versions.

Exercise 2.3

The goal of this exercise is to explain the predictions of a CNN by visualizing Class Activation Maps (CAMs). We use the CNN trained in Exercise 1.3 and extend it with CAM to highlight which regions of the input image contribute most to the model’s classification decision.

cifar10_cam_6images

CAMs showing which image regions the trained CNN focuses on (CIFAR-10).

We also apply CAM to a pre-trained ResNet-18 on images from the Imagenette dataset:

imagenette_cam_6images

Laboratory 3

Exercise 1

The focus of this exercise was to build a stable baseline for the next exercises. The task is sentiment analysis on the Rotten Tomatoes dataset.

  • A pretrained DistilBERT was used as feature extractor
  • With a SVM classifier attached on top of it
Accuracy
0.82

(on validation data)

Exercise 2

In this exercise the pretrained DistilBERT model was fine tuned in order to achive higher accuracy compared to the baseline introduced above. By doing some pre-processing and using HuggingFace Trainer the model achived:

Accuracy
0.84 (+2%)

(Best model across 30 fine-tuning epochs)

Exercise 3.2

In this exercise, we first use a small CLIP model, openai/clip-vit-base-patch16, to evaluate its zero-shot performance on the tiny-imagenet dataset:

Zero-shot Accuracy
0.63

interestingly, just by adding "A photo of a {label}" in front of the text prompts we achive:

Zero-shot Accuracy
0.70 (+7%)

Using Low-Rank Adaptation (LoRA) and fine-tuning the attention layers of both the text and image encoder we achive:

Accuracy
0.76

(Best model across 5 fine-tuning epochs)

While fine-tuning only one of the encoders leads to:

Accuracy (fine-tune Image Encoder only) Encoder (fine-tune Text Encoder only)
0.72 0.73

I also experimented with a similar methodology on an art dataset Art Style Classification.

The goal is to classify paintings into one of the following five categories:

  • Portrait
  • Landscape
  • Abstract
  • Religious Painting
  • Cityscape

The Zero-shot Performance of CLIP is:

Zero-shot Accuracy
0.87
output

(Example of misclassification)


The performance after fine-tuning with LoRA increases up to:

Accuracy
0.92 (+5%)

(Fine-tuning performed on both encoders)

Laboratory 4

Exercise 1

In this exercise, we build a simple Out-of-Distribution (OOD) detection pipeline. The dataset used for in distribution (ID) examples is CIFAR-10, while the OOD datasets are a subset of CIFAR-100 (with classes not present in CIFAR-10) and randomly generated FakeData. For brevity only results using CIFAR-100 are discussed.

The maximum softmax probability is used for representing how OOD a test sample is. This probability is produced by a custom small CNN and a pretrained ResNet-20 model that i compare in the following table:

Custom CNN ResNet
Histogram
histogram
ROC curve
auc
auc
PR curve
ap
ap

As shown in the plots, the ResNet performs better. This is expected since it also achieves higher classification accuracy on CIFAR-10 (81%) compared to the smaller custom CNN (64%).

Exercise 2.1

In this exercise the FGSM method is used to generate adversarial examples. The model used is the custom CNN introduced in the previous exercise. Here are shown some examples of adversarial attacks generated with an epsilon = 1/255:

output1
output2

I used 3 metrics in order to evaluate how dependent on epsilon the generated adversarial images are:

  • Attack success rate
  • Average iterations to success
  • Average confidence drop

(All the attacks have a fixed max_n_iterations = 10)

quantitative_eval

As expected bigger epsilons produce more powerful (but also more noticeable) attacks .

Exercise 2.2

In this exercise FGSM adversarial samples are used to augment the training dataset used to train the the OOD detector model. The way i implemented this augmented training is by using a weighted loss function in the training loop. For each batch, I compute the loss on both the original (clean) inputs and the adversarially perturbed inputs, then combine them to form a single loss. The weights of the loss components are hyperparameters.

For an equally weighted loss, loss = 0.5 * clean_loss + 0.5 * adv_loss, there is a slight improvement of about 2% in both the ROC and PR curves.

augmented_1 augmented_2

65% and 87% vs 63% and 85% of (non augmented training) from exercise 1

For an unbalanced loss, loss = 0.2 * clean_loss + 0.8 * adv_loss, the performance slightly degrades. This suggests that the hyperparameter weights of the components of the loss might be tricky to tune.

output_4 output_5

Exercise 3.3

The goal of this exercise was to generate targeted attacks by creating adversarial samples that imitate samples from a specific class. Here is a qualitative evaluation of 2 of them, where the target class was "dog":

output_1
output_2

For a quantitative evaluation i compared the targeted and non targeted attacks using the 3 metrics introduced in exercise 2.1. The model used to generate images is the same (custom CNN trained in exercise 2.2). For both types of attacks epsilon = 1/255 and max_n_iterations = 10.

comparison

As expected, the average confidence drop is larger and the average number of iterations to success is lower for untargeted attacks compared to targeted ones. This behavior arises because untargeted attacks only need to push the sample outside the decision region of the true class, which is a simpler optimization problem. What is surprising is that the success rate is slightly higher for targeted attacks. This might be the effect of the augmented training done during exercise 2.2, where the augmentation was based on untargeted attacks.


Note: Some parts of the code in this project were generated with the assistance of generative AI tools.
For example, almost all of the code for the plots was AI-generated.
All AI-generated code was carefully reviewed and rechecked to ensure it executed as intended.
Moreover, I often used AI for debugging errors, especially issues related to tensor shapes or out-of-index errors.

About

Code for the Deep Learning Applications (2024/25) uni course. Covering residual networks, transformers, multimodal, and adversarial robustness

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published